成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王京盛 Wang, Ching Sheng
論文名稱：	考量語意及引用分析之研究主題趨勢分析方法 A Research Trend Analyzing Method Based on Semantics and Citation Count
指導教授：	王惠嘉 Wang, Hei-Chia
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理研究所 Institute of Information Management
論文出版年：	2012
畢業學年度：	100
語文別：	中文
論文頁數：	62
中文關鍵詞：	主題偵測與追蹤、趨勢分析、特徵選取、分群
外文關鍵詞：	Topic Detection and Tracking, Trend Analysis, Feature Selection, Clustering
相關次數：	點閱：107 下載：3
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

　　隨著資料數位化的時代來臨，文件刊物的儲存方法，逐漸轉變為電子化的形式，以便於流通。然而，由於電子化刊物資料量的快速爆增，使得研究人員雖能輕易的收集大量的資料，但卻無法從中擷取重要資訊。為了解決這類的問題，目前電子資料庫通常會提供搜尋引擎，利用關鍵字搜尋比對，但其搜尋結果仍夾雜著許多不必要的資訊。
　　為了能夠更有效率的提供研究人員找尋研究相關資料，利用主題偵測與追蹤技術的特性，能夠整理出研究資料集中的代表主題及主題趨勢的追蹤，但以往的主題偵測技術，僅考慮單一資料集，且並未針對研究趨勢的走向進行分析。再者，在進行主題偵測時，所使用的研究資料集中，字詞的語意像是同義字、拼字變化及作者用字表達方式也會影響主題偵測的結果。另外，過去研究對於所產生出來的主題及其趨勢消長並沒有進一步將其整理分析，或是實作成系統，使得主題偵測結果並沒有直接地呈現給研究人員參考。
　　故本研究利用研討會論文與期刊論文之間有先後影響關係，以研討會論文及期刊論文作為資料集，擷取論文的標題、摘要及關鍵字，並在特徵選取時除了字詞頻率外，額外考慮語意及論文本身的被引用次數，來增加特徵選取的效率，來進行主題偵測與追蹤，並驗證加入語意考量及引用次數對於主題偵測結果所帶來的正面影響。除此之外，也利用主題偵測與追蹤實作的結果進行系統介面的實現，能更直接地提供研究人員熱門的研究主題及趨勢走向的分析，期望能使研究人員在找尋研究方向時，有更快速的參考依據。

With the digitization of knowledge, all kinds of documents are gradually transformed into electronic form in order to transfer easily. However, due to the rapid increase of the amount of data, researchers cannot extract important information even though they can collect research data easily. Then, most of the electronic databases provide search engine which make keywords as a filtering tool as a solution, but the results still cannot fulfill the needs of researchers.
In order to find research materials moreefficiently, researchers use topic detection and tracking technology to generalize topics of research papers and trends of research topics. Nevertheless, the methods in the past only had one date set, and usually did not focus on analyzing research trends. Moreover, the semantic information, such as the meaning, spelling or the way that every author wrote in their paper, makes it harder to do topic detection and tracking. While implementing the topic detection and tracking technology, researchers in the past didn’t take the result for further use, like realizing a user interface to present the trend.
Therefore, this paper takes advantage of the relations of papers between conferences and journals to do topic tracking. Besides, semantics and citation count are taken into consideration on feature selection to increase the efficiency of clustering, and this paper also takes use of the results to build a topic tracking system. This study can help researchers to reduce the time working on selecting research field by providing hot topics and trend analysis of each topic to them with a friendly user interface.

第1章	緒論	1
1研究背景	2
2研究動機與目的	3
3研究範圍與限制	5
4研究流程	5
5論文大綱	6
第2章	文獻探討	7
1主題偵測與追蹤	7
1.1主題的定義	7
1.2研究議題	7
1.3主題偵測與追蹤方法沿革	9
2 資料檢索	11
2.1向量空間模型	11
2.2文件相似度計算	12
3 自然語言處理	13
3.1詞性標記	13
3.2字根還原	13
4特徵選取	14
4.1文件頻率(Document Frequency, DF)	15
4.2共同資訊量(Mutual Information, MI)	15
4.3卡方統計量(Chi-square Statistic Measure, CHI)	15
4.4TF-IDF延伸應用	16
5文件分群	16
5.1分割式分群	17
5.2分群效度評估	18
6學術論文	18
6.1研討會論文	19
6.2期刊論文	19
6.3論文關係分析	19
7小結	20
第3章	研究方法	21
1研究架構	21
2資料收集及處理模組	22
2.1資料蒐集	23
2.2斷句	23
2.3詞性標記	23
2.4字根還原	24
3	主題偵測模組	25
3.1特徵選取	25
3.2分群	27
3.3主題偵測	28
4趨勢分析模組	28
第4章	系統建置與驗證	32
1系統建置	32
1.1資料收集及前處理	33
1.2特徵選取與分群	33
1.3主題偵測及趨勢分析	34
2實驗方法	34
2.1資料來源	34
2.2評估指標	37
2.3實驗方法設計	38
3實驗結果與分析	39
3.1實驗一：分群門檻值λ的選擇	39
3.2實驗二：探討加入語意考量後的VSM轉換對結果的影響	42
3.3實驗三：探討特徵選取時，同義字合併與否的差異	44
3.4實驗四：探討考量引用次數後，對特徵選取的影響	46
3.5實驗五：趨勢分析結果討論	48
3.6實驗結果彙整	52
4系統實作展示	53
第5章	結論及未來研究方向	55
1研究成果	55
2未來研究方向	56
參考文獻	58
                                    

英文文獻
Allan, J. (2002). Detection as multi-topic tracking. [Article]. Information Retrieval, 5(2-3), 139-157.
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., Umass, J., . . . Umass, M. (1998). Topic Detection and Tracking Pilot Study Final Report. Paper presented at the In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.
Allan, J., Papka, R., & Lavrenko, V. (1998). On-line new event detection and tracking. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia.
Anaya-Sánchez, H., Pons-Porrata, A., & Berlanga-Llavori, R. (2010). A document clustering algorithm for discovering and describing topics. Pattern Recognition Letters, 31(6), 502-510.
Bonnie Jean, D. (2001). Review of Natural Language Processing in R.A. Wilson and F.C. Keil (Eds.), The MIT Encyclopedia of the Cognitive Sciences. Artificial Intelligence, 130(2), 185-189.
Blair, D. C. (1979). Information Retrieval, 2nd ed. C.J. Van Rijsbergen. London: Butterworths.
Chen, C. C., Chen, Y. T., & Chen, M. C. (2007). An aging theory for event life-cycle modeling. [Article]. Ieee Transactions on Systems Man and Cybernetics Part a-Systems and Humans, 37(2), 237-248.
Chen, K. Y., Luesukprasert, L., & Chou, S. C. T. (2007). Hot topic extraction based on timeline analysis and multidimensional sentence modeling. [Article]. Ieee Transactions on Knowledge and Data Engineering, 19(8), 1016-1025.
Chen, W., & Chundi, P. (2011). Extracting hot spots of topics from time-stamped documents. [Article]. Data & Knowledge Engineering, 70(7), 642-660.
Chiu, W. T., & Ho, Y. S. (2007). Bibliometric analysis of tsunami research. [Article]. Scientometrics, 73(1), 3-17.
Cordon, O. (2003). A review on the application of evolutionary computation to information retrieval. International Journal of Approximate Reasoning, 34(2-3), 241-264.
Cover, T., & Thomas, J. (1991). Elements of Information Theory: Wiley-Interscience.
Davies, D., & Bouldin, D. (1979). A Cluster Separation Measure. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-1(2), 224-227.
Decker, R., & Scholz, S. W. (2007). Unsupervised Topic Detection in document collections: an application in marketing and business journals. Int. J. Bus. Intell. Data Min., 2(3), 347-364.
Farhoomand, A. F., & Drury, D. H. (2002). Managerial information overload. Communications of the Acm, 45(10), 127-131.
Frakes, W. B., & Baeza-Yates, R. (1992). Information Retrieval: Data, Structures and Algorithms: Pretice Hall.
Galavotti, L., Sebastiani, F., & Simi, M. (2000). Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization. Paper presented at the Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries.
Gong, L., Zeng, J., & Zhang, S. (2011). Text stream clustering algorithm based on adaptive feature selection. Expert Systems with Applications, 38(3),1393-1399.
González-Albo, B., & Bordons, M. (2011). Articles vs. proceedings papers: Do they differ in research relevance and impact? A case study in the Library and Information Science field. Journal of Informetrics, 5(3), 369-381.
Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features (pp. 137-142): Springer Verlag.
Kleinberg, J. (2002). Bursty and hierarchical structure in streams. Paper presented at the Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada.
Lam, W., & Ho, K. S. (2001). FIDS: An intelligent financial web news articles digest system. [Article]. Ieee Transactions on Systems Man and Cybernetics Part a-Systems and Humans, 31(6), 753-762.
Li, J., Wang, M.-H., & Ho, Y.-S. (2011). Trends in research on global climate change: A Science Citation Index Expanded-based analysis. Global and Planetary Change, 77(1-2), 13-20.
Li, S., Xia, R., Zong, C., & Huang, C.-R. (2009). A framework of feature selection methods for text categorization. Paper presented at the Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, Suntec, Singapore.
Li, Y., Chung, S. M., & Holt J. D. (2007). Text document clustering based on frequent word meaning sequences.Data & Knowledge Engineering, 64(1), 381-404.
Li, Z., & Ho, Y. S. (2008). Use of citation per publication as an indicator to evaluate contingent valuation research. [Article]. Scientometrics, 75(1), 97-110.
Lin, S.-H., Shih, C.-S., Chen, M. C., Ho, J.-M., Ko, M.-T., & Huang, Y.-M. (1998). Extracting classification knowledge of Internet documents with mining term associations: a semantic approach. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia.
Luo, C., Li, Y., & Chung, S. M. (2009). Text document clustering based on neighbors. Data & Knowledge Engineering, 68(11), 1271-1288.
Mahdavi, M., Chehreghani, M. H., Abolhassani, H., & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. [Article]. Applied Mathematics and Computation, 201(1-2), 441-451.
McClave, J. T., Benson, P. G., & Sincich, T. (2010) Statistics for Business and Economics. Prentice-Hall, Inc.
Montesi, M., & Owen, J. M. (2008). From conference to journal publication: How conference papers in software engineering are extended for publication in journals. Journal of the American Society for Information Science and Technology, 59(5), 816-829.
Özgür, L., & Güngör, T. (2010). Text classification with the support of pruned dependency patterns. Pattern Recognition Letters, 31(12), 1598-1607.
Paice, C. D. (1990). Another stemmer. SIGIR Forum, 24(3), 56-61.
Porter, M. F. (2006). An algorithm for suffix stripping. Program-Electronic Library and Information Systems, 40(3), 211-218.
Robert, K. (2000). Viewing morphology as an inference process. Artificial Intelligence, 118(1-2), 277-294.
Robert M, L. (2001). Natural language processing in support of decision-making: phrases and part-of-speech tagging. Information Processing & Management, 37(6), 769-787.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Commun. ACM, 18(11), 613-620.
Salton, G. (1988). Automatic text processing. Addison-Wesley Longman Publishing Company.
Schultz, J. M., & Liberman, M. (1999). Topic Detection and Tracking using idf-Weighted Cosine Coefficient PROCEEDINGS OF THE DARPA BROADCAST NEWS WORKSHOP (pp. 189-192): Morgan Kaufmann Publishers, Inc.
Shah, P. K., Perez-Iratxeta, C., Bork, P., & Andrade, M. A. (2003). Information extraction from full text scientific articles: Where are the keywords? [Article]. Bmc Bioinformatics, 4.
Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques.
Tu, Y.-N., & Seng, J.-L. (2009). Research intelligence involving information retrieval – An example of conferences and journals. Expert Systems with Applications, 36(10), 12151-12166.
Walls, F., Jin, H., Sista, S., & Schwartz, R. (1999). Topic Detection in broadcast news In Proceedings of the DARPA Broadcast News Workshop (pp. 193-198): Morgan Kaufmann Publishers, Inc.
Wan, X. (2007). A novel document similarity measure based on earth mover's distance. Information Sciences, 177(18), 3718-3730.
Wang, H. C., Huang, T. H., Guo, J. L., & Li, S. C. (2009) Journal Article Topic Detection Based on Semantic Features. Lecture Notes in Artificial Intelligence, 5579, 644-652.
Xie, S. D., Zhang, J., & Ho, Y. S. (2008). Assessment of world aerosol research trends by bibliometric analysis. [Article]. Scientometrics, 77(1), 113-130.
Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. [Review]. Ieee Transactions on Neural Networks, 16(3), 645-678.
Xu, Y., Wang, B., Li, J., & Jing, H. (2008). An extended document frequency metric for feature selection in text categorization. Paper presented at the Proceedings of the 4th Asia information retrieval conference on Information retrieval technology, Harbin, China.
Yang, Y., Ault, T., Pierce, T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking. Paper presented at the Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece.
Yang, Y., Pierce, T., & Carbonell, J. (1998). A study of retrospective and on-line event detection. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia.
Zhang, X., & Wang, T. (2010). Topic Tracking with Dynamic Topic Model and Topic-based Weighting Method. Journal of Software, 5(5), 482-489.
Zheng, H.-T., Kang, B.-Y., & Kim, H.-G. (2009). Exploiting noun phrases and semantic relationships for text document clustering. Information Sciences, 179(13), 2249-2262.

中文文獻
吳偉銘（民97）。基於語意及時間因素之主題偵測法。國立成功大學資訊管理研　　究所碩士論文，未出版，台南市。
林宜瑩（民99）。利用時間因子與名詞片語之文獻主題追蹤法。國立成功大學資訊管理研究所碩士論文，未出版，台南市。

網站文獻
WordNet (n.d.) Retrieved fromhttp://wordnet.princeton.edu/
Conference Ranking (n.d.) Retrieved from http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html
http://dbgroup.cs.tsinghua.edu.cn/ligl/CS_Conference_Ranking.htm
http://www.ntu.edu.sg/home/assourav/crank.htm
Journal Ranking (n.d.) Retrieved from http://www.ntu.edu.sg/home/assourav/jrank.htm
http://www.gianvecchio.com/tier-jnl-final2008.html

校內：2022-12-31公開
校外：2022-12-31公開

簡易檢索 / 詳目顯示

相關論文