簡易檢索 / 詳目顯示

研究生: 程毓軒
Cheng, Yu-Hsuan
論文名稱: 運用社群資訊於個人化之微網誌推薦
Personalized Microblog Recommendation System Based on Social Information
指導教授: 王惠嘉
Wang, Huei-Chia
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 50
中文關鍵詞: 微網誌推薦系統信賴關係
外文關鍵詞: Microblog, Recommendation System, Trust
相關次數: 點閱:128下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著 Web2.0 的發展,人們想要在網路上發表資訊已十分容易,目前已有許
    多平台如部落格、論壇等,都可讓使用者發佈自己的想法或是心得等訊息。而近
    幾年更發展出一種新式的資訊分享平台:微網誌。這類平台(如 Plurk、Twitter、
    Facebook 等)由於不需將資訊整理成完整的文章即可發佈,因此對一般民眾來說
    想要在網路上發表訊息已是輕而易舉。由於這些平台所包含的訊息數量繁多且涵
    蓋範圍十分廣泛,因此這類平台也成為一般民眾獲取資訊的管道。
    然而,在網路上發表訊息的低門檻也造成了資訊過載的問題,使得使用者在
    網路上搜尋資訊時,往往得從動輒數十頁的搜尋結果中自行過濾掉重覆或是無用
    的資訊才能獲取所需,這對使用者來說是一大負擔。
    有鑑於此,過去已有許多研究利用分群技術對文件進行分群,以降低使用者
    過濾相似資料時的負擔。但以往針對文件分群的方法在比較文件間相似度時並未
    將單字或句子之間的語意相似度列入考量,僅以各單字在單份文件及整體文件中
    的出現次數來評估其重要程度,並以此作為相似度計算的根據;已有研究指出,
    這種方式對微網誌這類的短文來說是有缺陷的,而且此缺陷將導致短文相似度較
    不具參考價值。因此本研究將設計一利用維基百科作為語意參考的相似度計算方
    式,以更精準的計算出微網誌之間的相似度。
    除了微網誌文字之間的相似度,本研究提出針對短文的分群方式 Min-Path
    RMcut及 Max-Depth RMCut 來對微網誌進行分群。並在分群後利用目標使用者
    的社群資訊以及微網誌的分群結果,藉由信賴遞移關係以及聲譽分數的計算,找
    出與目標使用者具有相同喜好的其他微網誌使用者,或是值得信賴的資訊提供者
    推薦給目標使用者作為參考。
    從實驗結果中我們發現將維基百科作為可機讀字典使用是可行的,並且在實
    驗數據上亦顯著地較僅考量單字頻率的計算方式為佳。

    With the development of Web 2.0, it’s much easier to share information on the
    web than before. Many platforms like blog, forums allow people to share their
    information, and a new type of information-sharing platform has emerged during the
    recent years – Micorblog. Users in the platform don’t need to integrate their
    information into a whole article, so it’s really easy for them to post information on the
    web. These platforms contain a wide range of information, so people tend to take this
    platform as a source of information. However, the low threshold of posting also leads
    to information overloading, which makes people need to refine the search results in
    the search engine by themselves. It’s a burden to users.
    Hence, many researches have been done in document clustering in order to
    alleviate users’ work on filtering the search results. Past clustering methods evaluate
    the importance of each term by the frequency of occurrences on calculation, and the
    semantic similarity wasn’t considered. It has proven that the evaluation of similarity is
    not suitable for short texts like microblg posts. Therefore, we propose a new method
    to evaluate the similarity between words based on Wikipedia, and calculate the
    similarity between microblogs more precisely.
    Except for the similarity between microblogs, we propose a clustering method
    for short texts, Min-path RMCut and Max-depth RMCut. After clustering, we evaluate
    the transitive trust relationship and the reputation with target users’ social information,
    and make recommendations of other interesting users to them.
    With the experiment result, we find it feasible to take Wikipedia as a machine
    readable dictionary, and the experiment results which take semantic into consideration
    are significantly better than those doesn’t.

    目 錄 1. 緒論....................................................................................................................... 1 1.1. 研究背景與動機............................................................................................ 3 1.2. 研究目的........................................................................................................ 4 1.3. 研究範圍與限制............................................................................................ 4 1.4. 研究流程........................................................................................................ 5 1.5. 論文大綱........................................................................................................ 7 2. 文獻探討............................................................................................................... 8 2.1. 微網誌............................................................................................................ 8 2.1.1. Plurk ..................................................................................................... 8 2.1.2. Twitter .................................................................................................. 9 2.1.3. 微網誌應用的相關研究................................................................... 10 2.2. 自然語言處理.............................................................................................. 11 2.2.1. 中文斷詞處理................................................................................... 11 2.2.2. 向量空間模型................................................................................... 12 2.3. 文字相似度計算.......................................................................................... 13 2.3.1. 單字相似度計算............................................................................... 13 2.3.2. 短文相似度計算............................................................................... 14 2.4. 文件分群方法.............................................................................................. 15 2.5. 個人化推薦系統.......................................................................................... 16 2.6. 維基百科...................................................................................................... 18 2.7. 小結.............................................................................................................. 18 3. 研究方法............................................................................................................. 19 3.1. 研究架構...................................................................................................... 19 3.2. 資料收集及前處理...................................................................................... 20 3.2.1. 單字分析及向量轉換....................................................................... 21 3.3. 微網誌分群.................................................................................................. 22 3.3.1. 微網誌分群相似度計算................................................................... 22 3.3.2. 微網誌之分群................................................................................... 26 3.4. 微網誌發佈者之推薦.................................................................................. 27 3.4.1. 推薦權重決定................................................................................... 28 3.4.2. 微網誌發佈者之評分....................................................................... 29 3.4.3. 微網誌之個人化推薦....................................................................... 30 3.5. 小結.............................................................................................................. 31 4. 系統建置與驗證................................................................................................. 32 4.1. 系統建置環境.............................................................................................. 32 4.2. 實驗設計...................................................................................................... 32 4.2.1. 資料來源........................................................................................... 33 4.2.2. 評估指標........................................................................................... 33 4.2.3. 分群停止條件................................................................................... 34 4.2.4. 有效核心詞語之定義....................................................................... 34 4.2.5. 推薦成功之定義............................................................................... 34 4.2.6. 實驗設計及結果............................................................................... 35 5. 結論及未來方向................................................................................................. 43 5.1. 研究成果...................................................................................................... 43 5.2. 未來研究方向.............................................................................................. 46 參考文獻.................................................................................................................. 48 表 目 錄 表 2-1 CKIP 部分詞性標注定義 ............................................................................... 11 表 3-1微網誌過濾範例 ............................................................................................. 20 表 4-1系統建置環境 ................................................................................................. 32 表 4-2RMCut、Max-depth RMCut 及 Min-path RMCut 擷取出前 30 個核心詞語 .............................................................................................................................. 35 表 4-3利用不同方法產生清單之平均 MRR 表現 .................................................. 36 表 4-4 ListRep核心詞語有效程度成對 t 檢定結果 ................................................... 37 表 4-5 ListCi核心詞語有效程度成對 t 檢定結果 ..................................................... 37 表 4-6 ListTi核心詞語有效程度成對 t 檢定結果 ..................................................... 37 表 4-7 Max-depth 修改前後產生清單之核心詞語有效程度平均 MRR表現 ........ 38 表 4-8 Max-depth 修改前後產生清單之核心詞語有效程度成對 t 檢定................ 38 表 4-9 τ=16之ListRep及KListRep成對t 檢定 ........................................................... 40 表 4-10 τ=3之ListCi及KListCi成對 t 檢定 .............................................................. 40 表 4-11 τ=36之ListTi及KListTi成對t 檢定 ............................................................. 41 圖 目 錄 圖 1-1 研究流程圖 ...................................................................................................... 6 圖 2-1 Plurk頁面範例.................................................................................................. 9 圖 2-2 Twitter頁面範例 ............................................................................................. 10 圖 2-3 維基百科Plurk 條目所屬分類...................................................................... 18 圖 3-1 研究架構圖 .................................................................................................... 19 圖 3-3微網誌相似度計算過程 ................................................................................. 26 圖 4-2 ListCi與KListCi在不同τ值下之推薦成效比較 ............................................. 40 圖 4-3 ListTi與KListTi在不同τ值下之推薦成效比較 .............................................. 41 圖 4-4 聲譽推薦清單及遞移興趣推薦清單之綜合比較 ........................................ 42

    英文文獻
    Banerjee, S., & Pedersen, T. (2003). Extended Gloss Overlaps as a Measure of
    Semantic Relatedness. Proceedings of the Eighteenth International Joint
    Conference on Artificial Intelligence, 805-810.
    Banerjee, S., Ramanathan, K., & Gupta, A. (2007). Clustering short texts using
    wikipedia. Paper presented at the Proceedings of the 30th annual international
    ACM SIGIR conference on Research and development in information
    retrieval, Amsterdam, The Netherlands.
    Brin, S., & Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search
    Engine. Paper presented at the Seventh International World-Wide Web
    Conference (WWW 1998), Brisbane, Australia.
    Cantador, I., Konstas, I., & Jose, J. M. (2011). Categorising social tags to improve
    folksonomy-based recommendations. Web Semantics: Science, Services and
    Agents on the World Wide Web, 9(1), 1-15.
    Chiu, P.-H., Kao, G. Y.-M., & Lo, C.-C. (2010). Personalized blog content
    recommender system for mobile phone users. International Journal of
    Human-Computer Studies, 68(8), 496-507.
    Efron, M. (2011). Information Search and Retrieval in Microblogs. Journal of the
    American Society for Information Science and Technology, 62(6), 996-1008.
    Ellen, M. V. (1999). The TREC-8 Question Answering Track Report. Paper presented
    at the Proceedings of the 8th text retrieval conference.
    Huang, T.-C., Cheng, S.-C., & Huang, Y.-M. (2009). A blog article recommendation
    generating mechanism using an SBACPSO algorithm. Expert Systems with
    Applications, 36(7), 10388-10396.
    Islam, A., & Inkpen, D. (2008). Semantic Text Similarity Using Corpus-Based Word
    Similarity and String Similarity. ACM Transactions on Knowledge Discovery
    from Data, 2(2).
    Jøsang, A., Ismail, R., & Boyd, C. (2007). A survey of trust and reputation systems for
    online service provision. Decision Support Systems, 43(2), 618-644.
    Li, X., Yan, J., Fan, W., Liu, N., Yan, S., & Chen, Z. (2009). An Online Blog Reading
    System by Topic Clustering and Personalized Ranking. ACM Transactions on
    Internet Technology, 9(3).
    Li, Y., Bandar, Z. A., & McLean, D. (2003). An approach for measuring semantic
    similarity between words using multiple information sources. Knowledge and
    Data Engineering, IEEE Transactions on, 15(4), 871-882.
    Li, Y., McLean, D., Bandar, Z. A., O'Shea, J. D., & Crockett, K. (2006). Sentence
    Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions
    on Knowledge and Data Engineering, 18(8), 1138-1150.
    Liang, T.-P., Yang, Y.-F., Chen, D.-N., & Ku, Y.-C. (2008). A semantic-expansion
    approach to personalized knowledge recommendation. Decision Support
    Systems, 45(3), 401-412.
    Liu, D.-R., Tsai, P.-Y., & Chiu, P.-H. (2011). Personalized recommendation of popular
    blog articles for mobile applications. Information Sciences, 181(9),
    1552-1572.
    Mohamed Salah, H. (2011). SOMSE: A semantic map based meta-search engine for
    the purpose of web information customization. Applied Soft Computing, 11(1),
    1310-1321.
    Nagmoti, R., Teredesai, A., & Cock, M. D. (2010). Ranking Approaches for
    Microblog Search. 2010 IEEE/WIC/ACM International Conference on Web
    Intelligence and Intelligent Agent Technology, 153-157.
    Ni, X., Quan, X., Lu, Z., Wenyin, L., & Hua, B. (2010). Short Text Clustering by
    Finding Core Terms. Knowledge and Information Systems, 27, 345-365.
    Oliva, J., Serrano, J. I., Castillo, M. D. d., & Iglesias, Á . (2011). SyMSS: A
    syntax-based measure for short-text semantic similarity. Data & Knowledge
    Engineering, 70, 390-405.
    Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application
    of a metric on semantic nets. Systems, Man and Cybernetics, IEEE
    Transactions on, 19(1), 17-30.
    Resnik, P. (1995). Using information content to evaluate semantic similarity in a
    taxonomy. Paper presented at the Proceedings of the 14th international joint
    conference on Artificial intelligence, Montreal, Quebec, Canada.
    Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shakes Twitter users:
    real-time event detection by social sensors. Paper presented at the WWW '10
    Proceedings of the 19th international conference on World wide web.
    Wang, J., & Sun, H.-J. (2009). A new evidential trust model for open communities.
    Computer Standards and Interfaces, 31(5), 994-1001.
    Zhang, J., Sun, Y., Wang, H., & He, Y. (2011). Calculating Statistical Similarity
    between Sentences. Journal of Convergence Information Technology, 6(2),
    22-34.

    網站資料
    維基百科. (2011a). Plurk - 維基百科,自由的百科全書. Retrieved Nov. 26, 2011,
    from http://zh.wikipedia.org/wiki/Plurk
    維基百科. (2011b). Twitter - 維基百科,自由的百科全書. Retrieved Nov. 26, 2011,
    from http://zh.wikipedia.org/wiki/Twitter
    維基百科. (2011c). 維基百科:頁面分類專題 - 維基百科,自由的百科全書.
    Retrieved Apr. 16, 2012, from http://zh.wikipedia.org/wiki/Plurk

    無法下載圖示 校內:2022-12-31公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE