| 研究生: |
程毓軒 Cheng, Yu-Hsuan |
|---|---|
| 論文名稱: |
運用社群資訊於個人化之微網誌推薦 Personalized Microblog Recommendation System Based on Social Information |
| 指導教授: |
王惠嘉
Wang, Huei-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 中文 |
| 論文頁數: | 50 |
| 中文關鍵詞: | 微網誌 、推薦系統 、信賴關係 |
| 外文關鍵詞: | Microblog, Recommendation System, Trust |
| 相關次數: | 點閱:128 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著 Web2.0 的發展,人們想要在網路上發表資訊已十分容易,目前已有許
多平台如部落格、論壇等,都可讓使用者發佈自己的想法或是心得等訊息。而近
幾年更發展出一種新式的資訊分享平台:微網誌。這類平台(如 Plurk、Twitter、
Facebook 等)由於不需將資訊整理成完整的文章即可發佈,因此對一般民眾來說
想要在網路上發表訊息已是輕而易舉。由於這些平台所包含的訊息數量繁多且涵
蓋範圍十分廣泛,因此這類平台也成為一般民眾獲取資訊的管道。
然而,在網路上發表訊息的低門檻也造成了資訊過載的問題,使得使用者在
網路上搜尋資訊時,往往得從動輒數十頁的搜尋結果中自行過濾掉重覆或是無用
的資訊才能獲取所需,這對使用者來說是一大負擔。
有鑑於此,過去已有許多研究利用分群技術對文件進行分群,以降低使用者
過濾相似資料時的負擔。但以往針對文件分群的方法在比較文件間相似度時並未
將單字或句子之間的語意相似度列入考量,僅以各單字在單份文件及整體文件中
的出現次數來評估其重要程度,並以此作為相似度計算的根據;已有研究指出,
這種方式對微網誌這類的短文來說是有缺陷的,而且此缺陷將導致短文相似度較
不具參考價值。因此本研究將設計一利用維基百科作為語意參考的相似度計算方
式,以更精準的計算出微網誌之間的相似度。
除了微網誌文字之間的相似度,本研究提出針對短文的分群方式 Min-Path
RMcut及 Max-Depth RMCut 來對微網誌進行分群。並在分群後利用目標使用者
的社群資訊以及微網誌的分群結果,藉由信賴遞移關係以及聲譽分數的計算,找
出與目標使用者具有相同喜好的其他微網誌使用者,或是值得信賴的資訊提供者
推薦給目標使用者作為參考。
從實驗結果中我們發現將維基百科作為可機讀字典使用是可行的,並且在實
驗數據上亦顯著地較僅考量單字頻率的計算方式為佳。
With the development of Web 2.0, it’s much easier to share information on the
web than before. Many platforms like blog, forums allow people to share their
information, and a new type of information-sharing platform has emerged during the
recent years – Micorblog. Users in the platform don’t need to integrate their
information into a whole article, so it’s really easy for them to post information on the
web. These platforms contain a wide range of information, so people tend to take this
platform as a source of information. However, the low threshold of posting also leads
to information overloading, which makes people need to refine the search results in
the search engine by themselves. It’s a burden to users.
Hence, many researches have been done in document clustering in order to
alleviate users’ work on filtering the search results. Past clustering methods evaluate
the importance of each term by the frequency of occurrences on calculation, and the
semantic similarity wasn’t considered. It has proven that the evaluation of similarity is
not suitable for short texts like microblg posts. Therefore, we propose a new method
to evaluate the similarity between words based on Wikipedia, and calculate the
similarity between microblogs more precisely.
Except for the similarity between microblogs, we propose a clustering method
for short texts, Min-path RMCut and Max-depth RMCut. After clustering, we evaluate
the transitive trust relationship and the reputation with target users’ social information,
and make recommendations of other interesting users to them.
With the experiment result, we find it feasible to take Wikipedia as a machine
readable dictionary, and the experiment results which take semantic into consideration
are significantly better than those doesn’t.
英文文獻
Banerjee, S., & Pedersen, T. (2003). Extended Gloss Overlaps as a Measure of
Semantic Relatedness. Proceedings of the Eighteenth International Joint
Conference on Artificial Intelligence, 805-810.
Banerjee, S., Ramanathan, K., & Gupta, A. (2007). Clustering short texts using
wikipedia. Paper presented at the Proceedings of the 30th annual international
ACM SIGIR conference on Research and development in information
retrieval, Amsterdam, The Netherlands.
Brin, S., & Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search
Engine. Paper presented at the Seventh International World-Wide Web
Conference (WWW 1998), Brisbane, Australia.
Cantador, I., Konstas, I., & Jose, J. M. (2011). Categorising social tags to improve
folksonomy-based recommendations. Web Semantics: Science, Services and
Agents on the World Wide Web, 9(1), 1-15.
Chiu, P.-H., Kao, G. Y.-M., & Lo, C.-C. (2010). Personalized blog content
recommender system for mobile phone users. International Journal of
Human-Computer Studies, 68(8), 496-507.
Efron, M. (2011). Information Search and Retrieval in Microblogs. Journal of the
American Society for Information Science and Technology, 62(6), 996-1008.
Ellen, M. V. (1999). The TREC-8 Question Answering Track Report. Paper presented
at the Proceedings of the 8th text retrieval conference.
Huang, T.-C., Cheng, S.-C., & Huang, Y.-M. (2009). A blog article recommendation
generating mechanism using an SBACPSO algorithm. Expert Systems with
Applications, 36(7), 10388-10396.
Islam, A., & Inkpen, D. (2008). Semantic Text Similarity Using Corpus-Based Word
Similarity and String Similarity. ACM Transactions on Knowledge Discovery
from Data, 2(2).
Jøsang, A., Ismail, R., & Boyd, C. (2007). A survey of trust and reputation systems for
online service provision. Decision Support Systems, 43(2), 618-644.
Li, X., Yan, J., Fan, W., Liu, N., Yan, S., & Chen, Z. (2009). An Online Blog Reading
System by Topic Clustering and Personalized Ranking. ACM Transactions on
Internet Technology, 9(3).
Li, Y., Bandar, Z. A., & McLean, D. (2003). An approach for measuring semantic
similarity between words using multiple information sources. Knowledge and
Data Engineering, IEEE Transactions on, 15(4), 871-882.
Li, Y., McLean, D., Bandar, Z. A., O'Shea, J. D., & Crockett, K. (2006). Sentence
Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions
on Knowledge and Data Engineering, 18(8), 1138-1150.
Liang, T.-P., Yang, Y.-F., Chen, D.-N., & Ku, Y.-C. (2008). A semantic-expansion
approach to personalized knowledge recommendation. Decision Support
Systems, 45(3), 401-412.
Liu, D.-R., Tsai, P.-Y., & Chiu, P.-H. (2011). Personalized recommendation of popular
blog articles for mobile applications. Information Sciences, 181(9),
1552-1572.
Mohamed Salah, H. (2011). SOMSE: A semantic map based meta-search engine for
the purpose of web information customization. Applied Soft Computing, 11(1),
1310-1321.
Nagmoti, R., Teredesai, A., & Cock, M. D. (2010). Ranking Approaches for
Microblog Search. 2010 IEEE/WIC/ACM International Conference on Web
Intelligence and Intelligent Agent Technology, 153-157.
Ni, X., Quan, X., Lu, Z., Wenyin, L., & Hua, B. (2010). Short Text Clustering by
Finding Core Terms. Knowledge and Information Systems, 27, 345-365.
Oliva, J., Serrano, J. I., Castillo, M. D. d., & Iglesias, Á . (2011). SyMSS: A
syntax-based measure for short-text semantic similarity. Data & Knowledge
Engineering, 70, 390-405.
Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application
of a metric on semantic nets. Systems, Man and Cybernetics, IEEE
Transactions on, 19(1), 17-30.
Resnik, P. (1995). Using information content to evaluate semantic similarity in a
taxonomy. Paper presented at the Proceedings of the 14th international joint
conference on Artificial intelligence, Montreal, Quebec, Canada.
Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shakes Twitter users:
real-time event detection by social sensors. Paper presented at the WWW '10
Proceedings of the 19th international conference on World wide web.
Wang, J., & Sun, H.-J. (2009). A new evidential trust model for open communities.
Computer Standards and Interfaces, 31(5), 994-1001.
Zhang, J., Sun, Y., Wang, H., & He, Y. (2011). Calculating Statistical Similarity
between Sentences. Journal of Convergence Information Technology, 6(2),
22-34.
網站資料
維基百科. (2011a). Plurk - 維基百科,自由的百科全書. Retrieved Nov. 26, 2011,
from http://zh.wikipedia.org/wiki/Plurk
維基百科. (2011b). Twitter - 維基百科,自由的百科全書. Retrieved Nov. 26, 2011,
from http://zh.wikipedia.org/wiki/Twitter
維基百科. (2011c). 維基百科:頁面分類專題 - 維基百科,自由的百科全書.
Retrieved Apr. 16, 2012, from http://zh.wikipedia.org/wiki/Plurk
校內:2022-12-31公開