| 研究生: |
王京盛 Wang, Ching Sheng |
|---|---|
| 論文名稱: |
考量語意及引用分析之研究主題趨勢分析方法 A Research Trend Analyzing Method Based on Semantics and Citation Count |
| 指導教授: |
王惠嘉
Wang, Hei-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 中文 |
| 論文頁數: | 62 |
| 中文關鍵詞: | 主題偵測與追蹤 、趨勢分析 、特徵選取 、分群 |
| 外文關鍵詞: | Topic Detection and Tracking, Trend Analysis, Feature Selection, Clustering |
| 相關次數: | 點閱:107 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著資料數位化的時代來臨,文件刊物的儲存方法,逐漸轉變為電子化的形式,以便於流通。然而,由於電子化刊物資料量的快速爆增,使得研究人員雖能輕易的收集大量的資料,但卻無法從中擷取重要資訊。為了解決這類的問題,目前電子資料庫通常會提供搜尋引擎,利用關鍵字搜尋比對,但其搜尋結果仍夾雜著許多不必要的資訊。
為了能夠更有效率的提供研究人員找尋研究相關資料,利用主題偵測與追蹤技術的特性,能夠整理出研究資料集中的代表主題及主題趨勢的追蹤,但以往的主題偵測技術,僅考慮單一資料集,且並未針對研究趨勢的走向進行分析。再者,在進行主題偵測時,所使用的研究資料集中,字詞的語意像是同義字、拼字變化及作者用字表達方式也會影響主題偵測的結果。另外,過去研究對於所產生出來的主題及其趨勢消長並沒有進一步將其整理分析,或是實作成系統,使得主題偵測結果並沒有直接地呈現給研究人員參考。
故本研究利用研討會論文與期刊論文之間有先後影響關係,以研討會論文及期刊論文作為資料集,擷取論文的標題、摘要及關鍵字,並在特徵選取時除了字詞頻率外,額外考慮語意及論文本身的被引用次數,來增加特徵選取的效率,來進行主題偵測與追蹤,並驗證加入語意考量及引用次數對於主題偵測結果所帶來的正面影響。除此之外,也利用主題偵測與追蹤實作的結果進行系統介面的實現,能更直接地提供研究人員熱門的研究主題及趨勢走向的分析,期望能使研究人員在找尋研究方向時,有更快速的參考依據。
With the digitization of knowledge, all kinds of documents are gradually transformed into electronic form in order to transfer easily. However, due to the rapid increase of the amount of data, researchers cannot extract important information even though they can collect research data easily. Then, most of the electronic databases provide search engine which make keywords as a filtering tool as a solution, but the results still cannot fulfill the needs of researchers.
In order to find research materials moreefficiently, researchers use topic detection and tracking technology to generalize topics of research papers and trends of research topics. Nevertheless, the methods in the past only had one date set, and usually did not focus on analyzing research trends. Moreover, the semantic information, such as the meaning, spelling or the way that every author wrote in their paper, makes it harder to do topic detection and tracking. While implementing the topic detection and tracking technology, researchers in the past didn’t take the result for further use, like realizing a user interface to present the trend.
Therefore, this paper takes advantage of the relations of papers between conferences and journals to do topic tracking. Besides, semantics and citation count are taken into consideration on feature selection to increase the efficiency of clustering, and this paper also takes use of the results to build a topic tracking system. This study can help researchers to reduce the time working on selecting research field by providing hot topics and trend analysis of each topic to them with a friendly user interface.
英文文獻
Allan, J. (2002). Detection as multi-topic tracking. [Article]. Information Retrieval, 5(2-3), 139-157.
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., Umass, J., . . . Umass, M. (1998). Topic Detection and Tracking Pilot Study Final Report. Paper presented at the In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.
Allan, J., Papka, R., & Lavrenko, V. (1998). On-line new event detection and tracking. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia.
Anaya-Sánchez, H., Pons-Porrata, A., & Berlanga-Llavori, R. (2010). A document clustering algorithm for discovering and describing topics. Pattern Recognition Letters, 31(6), 502-510.
Bonnie Jean, D. (2001). Review of Natural Language Processing in R.A. Wilson and F.C. Keil (Eds.), The MIT Encyclopedia of the Cognitive Sciences. Artificial Intelligence, 130(2), 185-189.
Blair, D. C. (1979). Information Retrieval, 2nd ed. C.J. Van Rijsbergen. London: Butterworths.
Chen, C. C., Chen, Y. T., & Chen, M. C. (2007). An aging theory for event life-cycle modeling. [Article]. Ieee Transactions on Systems Man and Cybernetics Part a-Systems and Humans, 37(2), 237-248.
Chen, K. Y., Luesukprasert, L., & Chou, S. C. T. (2007). Hot topic extraction based on timeline analysis and multidimensional sentence modeling. [Article]. Ieee Transactions on Knowledge and Data Engineering, 19(8), 1016-1025.
Chen, W., & Chundi, P. (2011). Extracting hot spots of topics from time-stamped documents. [Article]. Data & Knowledge Engineering, 70(7), 642-660.
Chiu, W. T., & Ho, Y. S. (2007). Bibliometric analysis of tsunami research. [Article]. Scientometrics, 73(1), 3-17.
Cordon, O. (2003). A review on the application of evolutionary computation to information retrieval. International Journal of Approximate Reasoning, 34(2-3), 241-264.
Cover, T., & Thomas, J. (1991). Elements of Information Theory: Wiley-Interscience.
Davies, D., & Bouldin, D. (1979). A Cluster Separation Measure. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-1(2), 224-227.
Decker, R., & Scholz, S. W. (2007). Unsupervised Topic Detection in document collections: an application in marketing and business journals. Int. J. Bus. Intell. Data Min., 2(3), 347-364.
Farhoomand, A. F., & Drury, D. H. (2002). Managerial information overload. Communications of the Acm, 45(10), 127-131.
Frakes, W. B., & Baeza-Yates, R. (1992). Information Retrieval: Data, Structures and Algorithms: Pretice Hall.
Galavotti, L., Sebastiani, F., & Simi, M. (2000). Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization. Paper presented at the Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries.
Gong, L., Zeng, J., & Zhang, S. (2011). Text stream clustering algorithm based on adaptive feature selection. Expert Systems with Applications, 38(3),1393-1399.
González-Albo, B., & Bordons, M. (2011). Articles vs. proceedings papers: Do they differ in research relevance and impact? A case study in the Library and Information Science field. Journal of Informetrics, 5(3), 369-381.
Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features (pp. 137-142): Springer Verlag.
Kleinberg, J. (2002). Bursty and hierarchical structure in streams. Paper presented at the Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada.
Lam, W., & Ho, K. S. (2001). FIDS: An intelligent financial web news articles digest system. [Article]. Ieee Transactions on Systems Man and Cybernetics Part a-Systems and Humans, 31(6), 753-762.
Li, J., Wang, M.-H., & Ho, Y.-S. (2011). Trends in research on global climate change: A Science Citation Index Expanded-based analysis. Global and Planetary Change, 77(1-2), 13-20.
Li, S., Xia, R., Zong, C., & Huang, C.-R. (2009). A framework of feature selection methods for text categorization. Paper presented at the Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, Suntec, Singapore.
Li, Y., Chung, S. M., & Holt J. D. (2007). Text document clustering based on frequent word meaning sequences.Data & Knowledge Engineering, 64(1), 381-404.
Li, Z., & Ho, Y. S. (2008). Use of citation per publication as an indicator to evaluate contingent valuation research. [Article]. Scientometrics, 75(1), 97-110.
Lin, S.-H., Shih, C.-S., Chen, M. C., Ho, J.-M., Ko, M.-T., & Huang, Y.-M. (1998). Extracting classification knowledge of Internet documents with mining term associations: a semantic approach. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia.
Luo, C., Li, Y., & Chung, S. M. (2009). Text document clustering based on neighbors. Data & Knowledge Engineering, 68(11), 1271-1288.
Mahdavi, M., Chehreghani, M. H., Abolhassani, H., & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. [Article]. Applied Mathematics and Computation, 201(1-2), 441-451.
McClave, J. T., Benson, P. G., & Sincich, T. (2010) Statistics for Business and Economics. Prentice-Hall, Inc.
Montesi, M., & Owen, J. M. (2008). From conference to journal publication: How conference papers in software engineering are extended for publication in journals. Journal of the American Society for Information Science and Technology, 59(5), 816-829.
Özgür, L., & Güngör, T. (2010). Text classification with the support of pruned dependency patterns. Pattern Recognition Letters, 31(12), 1598-1607.
Paice, C. D. (1990). Another stemmer. SIGIR Forum, 24(3), 56-61.
Porter, M. F. (2006). An algorithm for suffix stripping. Program-Electronic Library and Information Systems, 40(3), 211-218.
Robert, K. (2000). Viewing morphology as an inference process. Artificial Intelligence, 118(1-2), 277-294.
Robert M, L. (2001). Natural language processing in support of decision-making: phrases and part-of-speech tagging. Information Processing & Management, 37(6), 769-787.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Commun. ACM, 18(11), 613-620.
Salton, G. (1988). Automatic text processing. Addison-Wesley Longman Publishing Company.
Schultz, J. M., & Liberman, M. (1999). Topic Detection and Tracking using idf-Weighted Cosine Coefficient PROCEEDINGS OF THE DARPA BROADCAST NEWS WORKSHOP (pp. 189-192): Morgan Kaufmann Publishers, Inc.
Shah, P. K., Perez-Iratxeta, C., Bork, P., & Andrade, M. A. (2003). Information extraction from full text scientific articles: Where are the keywords? [Article]. Bmc Bioinformatics, 4.
Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques.
Tu, Y.-N., & Seng, J.-L. (2009). Research intelligence involving information retrieval – An example of conferences and journals. Expert Systems with Applications, 36(10), 12151-12166.
Walls, F., Jin, H., Sista, S., & Schwartz, R. (1999). Topic Detection in broadcast news In Proceedings of the DARPA Broadcast News Workshop (pp. 193-198): Morgan Kaufmann Publishers, Inc.
Wan, X. (2007). A novel document similarity measure based on earth mover's distance. Information Sciences, 177(18), 3718-3730.
Wang, H. C., Huang, T. H., Guo, J. L., & Li, S. C. (2009) Journal Article Topic Detection Based on Semantic Features. Lecture Notes in Artificial Intelligence, 5579, 644-652.
Xie, S. D., Zhang, J., & Ho, Y. S. (2008). Assessment of world aerosol research trends by bibliometric analysis. [Article]. Scientometrics, 77(1), 113-130.
Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. [Review]. Ieee Transactions on Neural Networks, 16(3), 645-678.
Xu, Y., Wang, B., Li, J., & Jing, H. (2008). An extended document frequency metric for feature selection in text categorization. Paper presented at the Proceedings of the 4th Asia information retrieval conference on Information retrieval technology, Harbin, China.
Yang, Y., Ault, T., Pierce, T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking. Paper presented at the Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece.
Yang, Y., Pierce, T., & Carbonell, J. (1998). A study of retrospective and on-line event detection. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia.
Zhang, X., & Wang, T. (2010). Topic Tracking with Dynamic Topic Model and Topic-based Weighting Method. Journal of Software, 5(5), 482-489.
Zheng, H.-T., Kang, B.-Y., & Kim, H.-G. (2009). Exploiting noun phrases and semantic relationships for text document clustering. Information Sciences, 179(13), 2249-2262.
中文文獻
吳偉銘(民97)。基於語意及時間因素之主題偵測法。國立成功大學資訊管理研 究所碩士論文,未出版,台南市。
林宜瑩(民99)。利用時間因子與名詞片語之文獻主題追蹤法。國立成功大學資訊管理研究所碩士論文,未出版,台南市。
網站文獻
WordNet (n.d.) Retrieved fromhttp://wordnet.princeton.edu/
Conference Ranking (n.d.) Retrieved from http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html
http://dbgroup.cs.tsinghua.edu.cn/ligl/CS_Conference_Ranking.htm
http://www.ntu.edu.sg/home/assourav/crank.htm
Journal Ranking (n.d.) Retrieved from http://www.ntu.edu.sg/home/assourav/jrank.htm
http://www.gianvecchio.com/tier-jnl-final2008.html