簡易檢索 / 詳目顯示

研究生: 吳偉銘
Wu, Wei-Ming
論文名稱: 基於語意及時間因素之主題偵測法
A Topic Detection Method Based on Semantic and Time Decade
指導教授: 王惠嘉
Wang, Hei-Chia
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 52
中文關鍵詞: 語意分析主題偵測時間序列老化理論
外文關鍵詞: Aging Theory, Semantic Analysis, Topic Detection, Timeline
相關次數: 點閱:80下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著網際網路的快速發展,許多的紙本刊物也隨著電子化,使用者能取得的資訊也日漸豐富。其中有很多的資料都是屬於定期發表且討論領域類似的期刊,更是研究人員經常閱覽的資料。目前網路上如SDOS、PubMed等的線上資料庫就是專門蒐集這些期刊並提供搜尋引擎供使用者方便搜尋想要的資料,以減少使用者在搜尋資料上的所要花費的時間與精力。然而,這些線上資料庫皆並未對每本期刊其內容的領域做更詳細的分析。對於想要投稿期刊的研究人員或想要找特定一本期刊閱讀時,並無法由期刊名完全了解其內容趨勢,必須閱讀該本期刊的數篇甚至數十篇的文章才能得知該本期刊的內容趨勢以決定此期刊是否適合投稿或選讀,所以如何幫助研究人員方便且快速的找出期刊的內容趨勢就是本研究想要探討的問題。
    然而隨著時間的演進、技術的開發,期刊的內容趨勢與討論主題也會隨著改變。但是傳統的文件比對方法並未考慮時間因素,而每本期刊可能在不同時期會有不同的著力點。所以在分析中加入時間這個重要的特性,將一定時間內期刊文章的研究主題做分析,以擷取出期刊的內容趨勢,對於研究人員在期刊閱讀、資料收集與論文投稿等問題上將有所協助。因此本研究將對以定期出刊的電子期刊利用主題偵測及語意網的技術,根據期刊文章的標題、摘要及關鍵字做分類,並使用老化理論的概念考慮時間因素的變化來分析期刊的研究趨勢,進一步取出期刊在內容趨勢上的演進,提供研究人員更完善的資訊,並證明本研究所提出的方法的實驗結果較傳統方法來的優秀。

    Due to the well-develop of Internet, electronical journals have become a trend. User can obtain information more and faster than before. In these, research journals database is most important resource to researchers. Online academic databases, such as SODS, PubMed, collected lot of electronical journal articles and provide their own search engine for user to search which can reduce the time expenses in searching data. However, these online databases have not analyze the publication domains contained in each journal. For the researchers who want to read or submit paper to a journal, they must read a lot of articles of that journal to figure out which one is its target. Therefore, how to help researcher to find out the target by analyzing the publish trend of a journal is the issue of this research want to discuss. Since the online databases now are only provide the function of searching data but not doing analyze to topics of journals. If we can find out the research topic of journal fast, it will improve efficiency for reading or submitting.
    Moreover, the research trend and topic will change along with time and the development of technique. However, traditional document analysis method didn’t consider the time factor. If the time is taking into consideration, we can analyze journal articles’ trend. This research will use topic detection and natural language processing to analyze the research trend of journal. And further, retrieve the variation of research trend on journal according to the effect by time to provide more information for researcher in paper reading and submission. And we will prove that the result of our method is better then traditional way.

    第1章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 1.3 研究範圍與限制 3 1.4 研究流程 4 1.5 研究大綱 4 第2章 文獻探討 6 2.1 資訊擷取 6 2.2 自然語言處理 8 2.2.1 詞性標記 9 2.2.2 文法分析 10 2.2.3 字根還原 11 2.2.4 Machine Readable Dictionary 12 2.3 主題偵測 13 2.3.1 語彙鏈 13 2.3.2 單純貝式分類器 14 2.3.3 階層式分群演算法 15 2.3.4 文句關係地圖 16 2.3.5 主題偵測方法之比較 16 2.4 滑動視窗 17 2.5 老化理論 19 2.5.1 老化理論應用於主題偵測 19 2.5.2 老化理論之主題偵測方程式 19 第3章 研究方法 22 3.1 研究架構 22 3.2 DATA COLLECTION MODEL 23 3.3 TOPIC LIFECYCLE CONSTRUCTING MODEL 25 3.3.1 Topic Retrieval 26 3.3.2 Topic Merging 28 3.3.3 Temporal-based Topic Analysis 30 第4章 系統建置與驗證 33 4.1 系統建置 33 4.1.1 Data Pre-processing 33 4.1.2 Topic LifeCycle Construction 34 4.2 實驗方法 35 4.2.1 資料來源 35 4.2.2 比較對象 35 4.2.3 評估指標的選擇 36 4.3 實驗分析與討論 36 第5章 結論及未來研究方向 47 5.1 研究成果 47 5.2 未來研究方向 48

    Allan, J. (2002). Detection as multi-topic tracking. Information Retrieval, 5, 139-157.
    Carlberger, J. & Kann, V. (1999). Implementing an efficient part-of-speech tagger. Software-Practice and Experience, 29(9), 815-832.
    Carthy, J. & Sherwood-Smith, M. (2002). Lexical chains for topic tracking. IEEE International Conference on System, Man and Cybernetics, 7.
    Chen, C. C., Chen, Y. T., & Chen, M. C. (2007). An aging theory for event life-cycle modeling. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 37(2), 237-248.
    Cimiano, P., Hotho, A., & Staab, S. (2004). Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. In Proceedings of the European Conference on Artificial Intelligence (ECAI), 435-439.
    Coden, A. R., Pakhomoc, S. V., Ando, R. K, Duffy, P. H., & Chute, C. G. (2005). Domain-specific language models and lexicons for tagging. Journal of Biomedical Informatics, 38, 422-430.
    Cordon, O., Herrera-Viedma, E., Lopez-Pujalte, C., Luque, M., & Zarco, C. (2003). A review on the application of ecolutionary computation to information retrieval. International Journal of Approximate Reasoning, 34, 241-264.
    Doran, W., Stokes, N., Carthy, J., & Dunnion, J. (2004). Comparing lexical chain-based summarization approaches using an extrinsic evaluation. Global WordNet Conference(GWC), 112-117.
    Edmunds, A. & Morris, A. (2000). The problem of information overload in business organizations: a review of the literature, International Journal of Information Management, 20, 17-28.
    Kim, J. H., Kim, J. H. & Hwang, D. (2000). Korean Text Summarization Using an Aggregate Similarity. In Proceedings of the 5th international workshop on information retrieval with Asian languages, 111-118.
    Krovetz, R. (2000). Viewing morphology as an inference process. Artificial Intelligence, 118, 277-294.
    Lee, M., Wang, W. & Yu, H. (2006). Exploring supervised and unsupervised methods to detect topics in biomedical text. BMC Bioinformatics, 7:140.
    Lin, H. K. & Harding, J. A. (2007). A manufacturing system engineering ontology model on the semantic web for inter-enterprise collaboration. Computer in Industry, 58, 428-437.
    Losee, R. M. (2001). Natural language processing in support of decision-making: phrases and part-of-speech tagging. Information Processing and Management, 37, 769-787.
    Miller, G. A. (1995). WordNet: a lexical database for English. Communication of The ACM, 38(11), 39-41.
    Morris, J. & Hirst, G. (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1), 21-48.
    Nahnsen, T., Uzuner, O. & Katz, B. (2005). Lexical Chains and sliding locality windows in content-based text similarity detection. CSAIL Memo, 150-154.
    Paice, C. D. (1990). Another stemmer. ACM SIGIR Forum Archive, 24(3), 56-61.
    Peng, F., Schuurmans, D. & Wang, S. (2004) Augmenting nave bayes classifiers with statistical language models. Information Retrieval, 7, 317-345.
    Pons-Porrata, A., Berlanga-Llavori, R., & Ruiz-Shulcloper, J. (2007). Topic discovery based on text mining techniques. Information Processing and Management, 43, 752-768.
    Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130-137.
    Ratnaparkhi, A. (1999). Learning to parse natural language with maximum entropy models. Machine Learning, 34, 151-175.
    Shah, P.K., Perez-Iratxeta, C., Bork, P., & Andrade, M. A.. (2003)Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatics, 4(1), 20-28.
    Salton, G., Singhal, A., Mitra, M. & Buckley, C. (1997). Automatic text structuring and summarization. Information Processing & Management, 33(2), 193-207.
    Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Information Retrieval and Language Processing, 8, 613-620.
    Silber, H. G. & McCoy, K. F. (2000). An efficient text summarizer using lexical chains. In Proceedings of First International Conference on Natural Language Generation, 268-271.
    Tsai, M.F. & Chen, H.H. (2002). Some similarity computation methods in novelty detection. In Proceedings of the Eleventh Text Retrieval Conference, 19-22.
    Yeung, K. Y., Haynor, D. R. & Ruzzo, W. L. (2001). Validating clustering for gene expression data. Bioinformatics, 17(4), 309-318.

    下載圖示 校內:2011-08-20公開
    校外:2011-08-20公開
    QR CODE