簡易檢索 / 詳目顯示

研究生: 黃薰慧
Huang, Hsun-Hui
論文名稱: 用於文本探勘應用之模糊-粗糙混合式解決方法
Fuzzy-Rough Hybrid Approaches to Applications of Text Mining
指導教授: 郭耀煌
Kuo, Yau-Hwang
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 99
語文別: 英文
論文頁數: 125
中文關鍵詞: 文本探勘模糊-粗糙混合式跨語系文件表示方式相似度測量詞義關聯網查詢導向摘錄式多文件自動化摘要句子關聯程度句子特徵
外文關鍵詞: text mining, fuzzy-rough hybrid, cross-lingual, document representation, similarity measure, sense association network, query-oriented, multi-document extractive summarization, sentence relevance, sentence feature
相關次數: 點閱:133下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 文本探勘試圖從無特定結構的文本資料中擷取有用且重要的模式或知識,它相較於其他電腦科學的研究是一門新興的跨領域學科。典型的文本探勘應用包含社交網路分析 (social network analysis)、生物醫學資訊擷取 (biomedical information extraction)、情感分析 (sentiment analysis)、垃圾郵件辨識 (spam identification)、市場分析 (marketing analysis)、文件摘要 (document summarization)、事件追蹤 (event tracing) 等等。 由於文本探勘應用廣泛且目前在網路上與貯存庫上持續累增的資訊超過80%以上都是以文本的方式存在,文本探勘在這幾年來已引起相關產業團體與學者的關注,且一致認為它具有高度的商業潛在價值。一些文獻也深入探討且運用諸如模糊集理論、粗糙集理論、基因演算法、類神經網路等軟式運算的技術在文本探勘的應用上。
    本篇論文主要探究結合模糊集與粗糙集兩種理論的模糊-粗糙混合式模型,基於此模型提出解決「跨語系文件相似度測量 (cross-lingual document similarity measure)」及「查詢導向摘錄式多文件自動化摘要 (query-oriented multi-document extractive summarization)」兩項文本探勘應用的方法,並透過實驗來驗證所提方法的有效性。由於本論文提出的這些方法皆以概念層次為出發點來解決上列兩項應用可能面臨的問題,而概念的取得在此論文中是透過詞義消歧 (word sense disambiguation) 的方式來完成的,因此有關詞義消歧的相關議題也會在文中簡略地說明及討論。
    針對「跨語系文件相似度測量」的應用,本論文提出兩種文件表示的方式,首先為一基於模糊集模型、與語言無關之詞義層次 (sense-level) 的文件表示方式,接著使用模糊-粗糙混合式模型自目標文件集建構一個整合式詞義關聯網 (integrated sense association network),並透過分割此關聯網取得宏觀詞義 (macrosenses),進一步提出另一種更具功效之宏觀詞義層次 (macrosense-level) 的文件表示方式,來提高跨語系文件相似度測量的正確率。基於這兩種文件表示方式,本論文採用分別結合模糊集理論的Tversky相似度計算模型以及常用於資訊擷取研究領域的F1 測量值 (F1 measure)之文件相似度的測量方法,來計算文件表示式間的詞義相似程度以測量任兩件文件間的相似度。
    至於「查詢導向摘錄式多文件自動化摘要」的應用,本論文提出一基於模糊-粗糙混合式模型的句子關聯程度測量 (sentence relevance measure) 方法,以支援查詢導向摘錄式多文件自動化摘要的產出。一開始本論文從概念層次的角度探討一些用來協助建構特徵空間的句子特徵。在此特徵空間中每一個句子都以空間中的一個點來代表。由於自然語言陳述的概念彼此之間的關係本質上是模糊的,且一份查詢導向的文件摘要可視為在目標文件集中與查詢有關的所有內容的近似內容,本論文進一步使用模糊-粗糙混合式模型,自先前建構的特徵空間建構另一模糊近似空間 (fuzzy approximation space) 且提出句子關聯程度測量方法的定義。這一測量方法計算目標文件集中每一句子對於與查詢有關的所有內容的近似內容的歸屬程度,來決定其是否應納入於查詢導向摘錄式多文件自動化摘要中。而此句子歸屬程度的計算方式事實上可視為是產出所需摘要的句子關聯程度測量的一種方法。
    關於用於以上兩項文本探勘應用之所提解決方法的有效性,在跨語系文件相似度測量的應用方面,本論文藉由中文文件與其英文翻譯本的配對實驗來驗證。該實驗使用一中英文對照的雙語文件集為測試資料,其結果顯示所提的解決方法可以達成很高的配對正確率;此方法因其所用的文件表示方式不僅可以十分容易地延用到其他不同語系的跨語系文件相似度測量,且可能增進其他需進行文件相似度測量之文本探勘應用的工作效能。另外在查詢導向摘錄式多文件自動化摘要的應用方面,本論文藉由DUC 2006與DUC 2007所提供的測試文檔執行其所規定的查詢導向摘錄式多文件自動化摘要的產出任務,也明確地驗證了所提方法的有效性且展現了使用基於模糊-粗糙混合式模型的解決方法來產出文件自動化摘要的可行性。

    Text mining is a relatively new research area of computer science on seeking to extract interesting and non-trivial patterns or knowledge from unstructured textual data. Typical text mining applications include social network analysis, biomedical information extraction, sentiment analysis, spam identification, marketing analysis, document summarization, event tracing, etc. For its great diversity of applications and the continuously increasing amount of information mostly (over 80%) available in text on the Internet and repositories, text mining has attracted much attention of the research and industry communities in recent years with the belief that it has a high commercial potential value. Several soft computing techniques such as fuzzy set theory, rough set theory, genetic algorithms and neural networks, have hence been investigated in literatures for their application to text mining. In this thesis, the combination of fuzzy set theory and rough set theory is explored and such approaches to the area of text mining are demonstrated. Two types of text mining applications, cross-lingual document similarity measure and query-oriented multi-document extractive summarization, are discussed with the proposed approaches. Those approaches all place emphasis on solving the problems at concept level with the help of word sense disambiguation which is also briefly discussed in the thesis.
    For the first application, a language-independent sense-level document representation based on the fuzzy set model to reduce the barrier between different languages is proposed, and a fuzzy-rough hybrid approach is further explored to obtain a more robust macrosense-level document representation through partitioning the integrated sense association network of the document collection into macrosenses. Then Tversky’s notion of similarity and the F1 measure on information retrieval are adopted to formulate respectively two document similarity measures with fuzzy set operations on the two proposed document representations.
    For the second application, a fuzzy-rough hybrid approach to sentence relevance measure for query-oriented multi-document extractive summarization is proposed. Some sentence features are investigated from a concept level view to construct a feature space where each sentence is represented as an object. Then a fuzzy-rough hybrid model is applied to define a sentence relevance measure. Since the relationships between concepts expressed by natural languages are inherently fuzzy and a query-oriented summary is an approximation of query related content in the target text, fuzzy set and rough set theories are used to construct a fuzzy approximation space from this feature space. The query-oriented multi-document extractive summary is generated by measuring the memberships of each sentence to the approximations of the query related content of the documents. And this measure is actually the sentence relevance to the query-oriented multi-document extractive summarization.
    The experimental results concerning the proposed approaches to these two types of text mining applications are presented. The effectiveness of the first approach is demonstrated by its success rate in identifying the English translations to their corresponding Chinese documents in a collection of Chinese-English parallel documents. Moreover, the proposed approach can be easily extended to process documents in other languages. It is believed that the proposed representations along with the similarity measures will enable more effective text mining processes. As to the second approach, it is applied to the DUC 2006 and DUC 2007 query-oriented multi-document summarization tasks for evaluation. The proposed approach is shown to be promising and demonstrates the effectiveness of fuzzy set and rough set theories in the application of automatic text summarization.

    中文摘要 .......................................................................................................................................................... I ABSTRACT .................................................................................................................................................. IV 誌謝 .............................................................................................................................................................. VII TABLE OF CONTENTS ........................................................................................................................... VIII LIST OF TABLES ........................................................................................................................................ XI LIST OF FIGURES .................................................................................................................................... XII CHAPTER 1. INTRODUCTION ............................................................................................................... 1 1.1. BACKGROUND KNOWLEDGE OF TEXT MINING ................................................................................ 2 1.1.1. What is Text Mining? .................................................................................................................. 2 1.1.2. Applications of Text Mining ........................................................................................................ 3 1.1.3. Approaches to Text Mining ......................................................................................................... 4 1.2. MOTIVATION AND CONTRIBUTION ................................................................................................... 5 1.3. ORGANIZATION OF THIS THESIS ....................................................................................................... 8 CHAPTER 2. THE USE OF FUZZY AND ROUGH SET THEORIES IN TEXT MINING ................ 9 2.1. THEORETICAL BACKGROUND ........................................................................................................ 10 2.1.1. Fuzzy Set Theory ...................................................................................................................... 10 2.1.2. Rough Set Theory ..................................................................................................................... 11 2.1.3. Combination of Fuzzy Set and Rough Set................................................................................. 12 2.2. THE USE OF FUZZY SET THEORY IN TEXT MINING ........................................................................ 15 2.2.1. Fuzzy Sets In Text Classification/Categorization ..................................................................... 15 2.2.2. Fuzzy Sets In Text Clustering ................................................................................................... 16 2.2.3. Fuzzy Sets In Information Extraction ....................................................................................... 17 2.3. THE USE OF ROUGH SET THEORY IN TEXT MINING ....................................................................... 18 2.3.1. Rough Sets In Text Classification/Categorization .................................................................... 18 2.3.2. Rough Sets In Text Clustering .................................................................................................. 19 2.3.3. Rough Sets In Information Extraction ...................................................................................... 19 2.4. COMBINING FUZZY SET AND ROUGH SET IN TEXT MINING ........................................................... 20 2.5. SUMMARY .................................................................................................................................... 21 CHAPTER 3. WORD SENSE DISAMBIGUATION FOR CONCEPT RETRIEVAL ........................ 22 3.1. OVERVIEW .................................................................................................................................... 23 3.2. IDENTIFICATION OF WORD SENSES ................................................................................................ 25 3.2.1. Text preprocessing .................................................................................................................... 25 3.2.2. Disambiguation Approaches .................................................................................................... 26 3.3. KNOWLEDGE-BASED DISAMBIGUATION ......................................................................................... 30 3.3.1. Knowledge Resources ............................................................................................................... 30 3.3.2. Measure of Semantic Relatedness ............................................................................................ 33 3.3.3. ROUGE for Relatedness Measure ............................................................................................ 36 3.4. PROPOSED ACO-BASED METHOD .................................................................................................. 37 3.4.1. Sense Coherence through Combinatorial Optimization ........................................................... 37 3.4.2. Generalized Traveling Salesman Problem ............................................................................... 39 3.4.3. Ant Colony Optimization .......................................................................................................... 39 3.4.4. The Method.............................................................................................................................. 40 3.5. EXPERIMENTAL EVALUATION ......................................................................................................... 41 3.6. SUMMARY ..................................................................................................................................... 42 CHAPTER 4. CROSS-LINGUAL DOCUMENT REPRESENTATION AND SEMANTIC SIMILARITY MEASURE ............................................................................................................................ 44 4.1. OVERVIEW .................................................................................................................................... 45 4.2. RELATED WORK............................................................................................................................ 49 4.2.1. Document Representation ........................................................................................................ 49 4.2.2. Similarity Measure between Documents .................................................................................. 53 4.3. PROPOSED DOCUMENT REPRESENTATIONS .................................................................................... 55 4.3.1. Sense-level Document Representation ...................................................................................... 56 4.3.2. Macrosense-level Document Representation............................................................................ 65 4.4. SIMILARITY MEASURES ................................................................................................................. 68 4.4.1. Semantic Document Similarity Measure .................................................................................. 68 4.4.2. Definition of Similarity Measure .............................................................................................. 69 4.4.3. Note on the Similarity Measures .............................................................................................. 71 4.5. EXPERIMENTAL EVALUATION ......................................................................................................... 72 4.5.1. Dataset .................................................................................................................................... 73 4.5.2. Experiment ............................................................................................................................... 73 4.5.3. Results and Discussion ............................................................................................................. 74 4.6. SUMMARY ..................................................................................................................................... 82 CHAPTER 5. SENTENCE RELEVANCE MEASURE FOR QUERY-ORIENTED MULTI-DOCUMENT EXTRACTIVE SUMMARIZATION .................................................................... 84 5.1. OVERVIEW .................................................................................................................................... 85 5.2. RELATED WORK ............................................................................................................................ 88 5.3. SYSTEM OVERVIEW ....................................................................................................................... 89 5.4. SENTENCE FEATURE VECTOR REPRESENTATION ........................................................................... 91 5.4.1. Linguistic processing ................................................................................................................ 91 5.4.2. Word Sense Disambiguation ..................................................................................................... 93 5.4.3. Sense-based Fuzzy Sets for Sentences and The Topic Statement .............................................. 93 5.4.4. Generation of Sentence Feature Vector Representation ........................................................... 95 5.5. PROPOSED SENTENCE RELEVANCE MEASURE ............................................................................... 98 5.5.1. Fuzzy Approximation Space Construction................................................................................ 98 5.5.2. Fuzzy-Rough Hybrid based Sentence Relevance Measure ....................................................... 99 5.6. QUERY-ORIENTED MULTI-DOCUMENT SUMMARY GENERATION ................................................. 101 5.7. EXPERIMENTAL EVALUATION ...................................................................................................... 103 5.7.1. Data Set and Evaluation Metric ............................................................................................. 103 5.7.2. Results and Discussion ........................................................................................................... 104 5.8. SUMMARY ................................................................................................................................... 106 CHAPTER 6. CONCLUSIONS AND FUTURE WORK ..................................................................... 108 BIBLIOGRAPHY ....................................................................................................................................... 111 VITA ............................................................................................................................................................. 123

    [AE2006] E. Agirre and P. Edmonds, Eds., Word Sense Disambiguation Algorithms and Applications. the Netherlands, Springer, 2006.
    [ADW1998] C. Apte, F. Damerau and S. M. Weiss, “Text Mining with Decision Trees and Decision Rules,” in Proc. Conf. on Automated Learning and Discovery, CMU, Jun. 1998.
    [AM2001] E. Agirre and D. Martinez, “Learning class-to-class selectional preferences,” in Proc. 5th ACL/EACL Workshop on Computational Natural Language Learning (CoNLL ’01), Toulouse, France, 2001, pp. 6–7.
    [AR1996] E. Agirre and G. Rigau, “Word sense disambiguation using conceptual density,” in Proc. 16th Int’l. Conf. on Computational Linguistics (COLING ’96), Copenhagen, Denmark, 1996, pp.16–22.
    [AWASK2002] G. Akrivas, M. Wallace, G. Andreou, G. Stamou and S. Kollias, "Context-Sensitive Semantic Query Expansion," in Proc. IEEE Int’l. Conference on Art. Intelligence Systems (ICAIS ’02), 2002, pp. 109–114.
    [BC2008] B. W. Bader and P. A. Chew, “Enhancing multilingual latent semantic analysis with term alignment information,” in Proc. 22nd Int’l. Conf. on Computational Linguistics (COLING ’08), Manchester, UK: Coling 2008 Organizing Committee, Aug. 2008, pp. 49–56. [Online]. Available: http://www.aclweb.org/anthology-new/C/C08/C08-1007.pdf
    [BE1997] R. Barzilay and M. Elhadad, “Using lexical chains for text summarization,” in Proc. ACL Workshop on Intelligent Scalable Text Summarization (ACL ’97/EACL ’97), Madrid, Spain, 1997, pp. 10–17.
    [Bez1981] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Norwell, MA, USA: Kluwer Academic Publishers, 1981.
    [BM1998] L. D. Baker and A. K. McCallum, “Distributional clustering of words for text classification,” in Proc. 21st Annual Int’l. ACM SIGIR Conf. on Research and Development in Inform. Retrieval (SIGIR ’98), New York, NY, USA: ACM, 1998, pp. 96–103.
    [BM2002] A. Behzad and M. Modarres, “A new efficient transformation of the generalized traveling salesman problem into traveling salesman problem,” in Proc. 15th Int’l. Conf. of Systems Engineering (ICSE ’02), Las Vegas, Nevada, Aug. 2002, pp. 6–8.
    [BM2006] S. Blair-Goldensohn and K. McKeown, “Integrating rhetorical-semantic relation models for query-focused summarization,” in Proc. 6th Document Understanding Conf. (DUC ’06), New York City, NY, USA, 2006.
    [BMA2006] Y. Bi, S. McClean and T. Anderson, “Combining rough decisions for intelligent text mining using dempster’s rule,” Artificial Intelligence Review, vol. 26, no. 3, pp.191–209, 2006.
    [BMP1997] R. Basili, D. Michelangelo and M. Pazienza, “Towards a bootstrapping framework for corpus semantic tagging,” in Proc. of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What and How?, Washington, D.C. USA, Apr. 1997, pp.66–73.
    [BP2003] S. Banerjee, T. Pedersen, “Extended gloss overlaps as a measure of semantic relatedness,” in Proc. 18th Int’l. Joint Conf. on Art. Intelligence, Acapilco, 2003, pp. 805–810
    [BSA1993] C. Buckley, G. Salton and J. Allan, “The smart information retrieval project,” in Proc. Workshop on Human Language Technology (HLT ’93), Morristown, NJ, USA: Association for Computational Linguistics, 1993, pp. 392–392.
    [Bue1982] D. A. Buell, “An analysis of some fuzzy subset applications to information retrieval systems,” Fuzzy Sets Syst., vol. 7, no. 1, pp. 35–42, 1982.
    [CD2008] H. Chim and X. Deng, “Efficient phrase-based document similarity for clustering,” IEEE Trans. Knowl. Data Eng., vol. 20, no. 9, pp. 1217–1229, 2008.
    [CED1993] Chinese Electronic Dictionary, Tech. Rep. 93-05, Academia Sinica, Taiwan, 1993.
    [CGG1992] J. Cowie, J. Guthrie and L. Guthrie, “Lexical disambiguation using simulated annealing,” in Proc. 15th Int’l. Conf. on Computational Linguistics (COLING ’92), 1992, pp. 359–365.
    [CGI2005] J. Cardeńosa, C. Gallardo and L. Iraola, “Using an interlingua for document knowledge representation,” in Proc. 4th Conf. of the European Society for Fuzzy Logic and Technology, Barcelona, 2005, pp. 1231–1236.
    [Cho2008] J. Y. Choi, “Clustering analysis of collaborative tagging systems by using the graph model,” Indiana University, Bloomington, IN, Tech. Rep., 2008. [Online]. Available: http://grids.ucs.indiana.edu/ptliupages/publications/CA.pdf
    [Cilin1983] J. Mei, Y. Zhu, Y. Gao and H. Yin, Eds., Tongyici Cilin. Shanghai: Shangwu Press and Shanghai Dictionaries, 1983.
    [CKIP1999] CKIP. 1999.Ver. 1.0 (Autotag), Academia Sinica, Taiwan. [Online].Available: http://godel.iis.sinica.edu.tw/CKIP/engversion/index.htm.
    [CL2006] J. Clarke and M. Lapata, “Models for sentence compression: a comparison across domains, training requirements and evaluation measures,” in Proc. 21st Int’l. Conf. on Computational Linguistics and the 44th Annual Meeting of the ACL (ACL ’06), Morristown, NJ, USA: Association for Computational Linguistics, 2006, pp. 377–384.
    [CLL2002] H. H. Chen, C. C. Lin and W. C. Lin, “Building a chinese-english wordnet for translingual applications,” ACM Trans. Asian Lang. Inf. Process., vol. 1, no. 2, pp. 103–122, Jun. 2002.
    [Cro2006] V. Cross, “Tversky’s parameterized similarity ratio model: A basis for semantic relatedness,” in Proc. 25th North American Fuzzy Inform. Processing Society Annual Conf. (NAFIPS ’06), Jun. 2006, pp. 541–546.
    [CWN2002] “The academia sinica bilingual ontological wordnet (Sinica BOW).” [Online]. Available: http://BOW.sinica.edu.tw/wn
    [CYH1995] S. M. Chen, M. S. Yeh and P. Y. Hsiao, “A comparison of similarity measures of fuzzy values,” Fuzzy Sets Syst., vol. 72, no. 1, pp. 79–89, 1995.
    [DDFLH1990] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman, “Indexing by latent semantic analysis,” J. of the American Society for Inform. Sci., vol. 41, no. 6, pp. 391–407, 1990.
    [DH1995] J. W. Davenport and R. J. Hathaway, “Possibilistic c-means clustering for relational data,” in Proc. 1st Int’l. Conf. on Neural, Parallel & Scientific Computations, vol. 1. Dynamic Publishers Inc., USA, 1995, pp. 139–142.
    [DP1992] D. Dubois and H. Prade, “Putting rough sets and fuzzy sets together,” in Intelligent Decision Support: Handbook of Applications and Advances in Rough Sets Theory, R. Slowinski, Ed., Boston: Kluwer Academic Publishers, 1992, pp. 203–232.
    [DS2002] R. N. Dave and S. Sen, “Robust fuzzy clustering of relational data,” IEEE Trans. Fuzzy Syst., vol. 10, no. 6, pp. 713–727, 2002.
    [DS2007] M. Dorigo and K. Socha, “An introduction to ant colony optimization,” IRIDIA–Tech. Rep. Series, Université Libre de Bruxelles, Belgium, Apr. 2007.
    [EHN2007] K. J. Chen, S. L. Huang, Y. Y. Shih and Y. J. Chen, “E-HowNet.” [Online]. Available: http://ckip.iis.sinica.edu.tw/taxonomy/taxonomyedoc.htm
    [EMR2000] G. Escudero, L. Màrquez and G. Rigau, “Naïve bayes and exemplar-based approaches to word sense disambiguation revisited,” in Proc. 14th European Conf. on Art. Intelligence (ECAI ’00), Berlin, Germany, 2000, pp. 421–425.
    [ER2004] G., Erkan and D. Radev, “The university of michigan at DUC 2004,” in Proc. 4th Document Understanding Conf. (DUC ’04), Boston, MA, USA, 2004, pp. 120–127
    [Fel1998] C. Fellbaum, Ed., WordNet – An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998.
    [FS2006] R. Feldman and J Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. New York, Cambridge University Press, 2006.
    [Gru1995] T. R. Gruber, “Toward principles for the design of ontologies used for knowledge sharing,” Int’l. J. Human-Comput. Studies, vol. 43, no. 5-6, p.907-928, Nov. 1995.
    [GM2003] M. Galley and K. Mckeown, “Improving word sense disambiguation in lexical chaining,” in Proc. 18th Int’l. Joint Conf. on Art. Intelligence (IJCAI ’03), Acapulco, Mexico, 2003, pp.1486–1488.
    [GSH2003] A. Gelburkh, G. Sidorov and S, Han, “Evolutionary approach to natural language word sense disambiguation through global coherence optimization,” WSEAS Trans. on Communications, vol. 1, no. 2, pp. 11–19, 2003.
    [HB1994] R. J. Hathaway and J. C. Bezdek, “NERF c-means: Non-euclidean relational fuzzy clustering,” Pattern Recognit., vol. 27, no. 3, pp. 429–437, 1994.
    [Hea1999] M. A. Hearst “Untangling Text Data Mining,” in Proc. 37th Annual Meeting of the Association for Computational Linguistics (ACL ‘99), University of Maryland, Jun. 20-26, 1999, pp. 3–10.
    [HK2004] K. M. Hammouda and M. S. Kamel, “Efficient phrase-based document indexing for web document clustering,” IEEE Trans. Knowl. Data Eng., vol. 16, no. 10, pp. 1279–1296, 2004.
    [HKY2006] H. H. Huang, Y. H. Kuo and H. C. Yang, “Fuzzy-rough set aided sentence extraction summarization,” in Proc. 1st Int’l. Conf. on Innovative Computing, Inform. and Control (ICICIC ’06), Beijing, China, 2006. pp. 450–453.
    [HM2000] U. Hahn and I. Mani, “The challenges of automatic summarization,” IEEE Comput., vol. 33, no. 11, pp. 29–36, 2000.
    [HN1999] Z. Dong and Q. Dong, “HowNet.” [Online]. Available: http://www.keenage.com/
    [HN2000] P. Hawkins and D. Nettleton, “Large scale WSD using learning applied to senseval,” Comput. Human., vol. 34, no. 1-2, pp.135–140, 2000.
    [HS1998] G. Hirst and D. St-onge, “Lexical chains as representations of context for the detection and correction of malapropisms,” in WordNet: An Electronic Lexical Database, C. Fellbaum, Ed., Cambridge, MA: MIT Press, 1998, pp. 305–332.
    [HSCL2002] C. Haruechaiyasak, M. Shyu, S. Chen and X. Li, “Web document classification based on fuzzy association,” in Proc. 26th Int’l. Comput. Software and Applications Conf. on Prolonging Software Life: Development and Redevelopment, Washington, DC, 2002, pp. 487-492.
    [IV1993] N. Ide and J. Véronis, “Knowledge extraction from machine readable dictionaries: An evaluation,” in Proc. 3rd Int’l. EAMT Workshop on Machine Translation and the Lexicon, Heidelberg, Germany, Apr. 1993, pp.19–34.
    [IV1998] N. Ide and J. Véronis, “Word sense disambiguation: The state of the art,” Computational Linguistics, vol. 24, no. 1, pp 1–40, 1998.
    [JC1997] J. J. Jiang and D. W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” in Proc. 10th Int’l. Conf. on Research Computational Linguistics, Taiwan: Academia Sinica, 1997, pp.19–33.
    [JL1998] J. Johnson and M. Liu, “Rough sets for informative question answering,” in Proc. Int’l. Conf. on Comput. and Inform. (ICCI ’98), Winnipeg, Canada, Jun. 17-20, 1998, pp. 53–60.
    [JK2005] Z. C. Johanyák and S. Kovács, “Distance based similarity measures of fuzzy sets,” in Proc. 3rd Slovakian-Hungarian Joint Symp. on Applied Machine Intelligence (SAMI ’05), Herľany, Slovakia, Jan. 2005, pp. 265–276.
    [Joa1998] T. Joachims, “Text categorization with support vector machines: Learning with many relevant features,” in Proc. 10th European Conf. on Machine Learning (ECML ’98), 1998, pp. 137–142.
    [Jon2007] K. S. Jones, “Automatic summarising: The state of the art,” Inf. Process. Manage., vol. 43, pp. 1449–1481, 2007.
    [JS2004a] R. Jensen and Q. Shen, "Fuzzy-rough attribute reduction with application to web categorization," Fuzzy Sets Syst., vol. 141, no. 3, pp. 469-485, 2004.
    [JS2004] R. Jensen and Q. Shen, “Semantic-preserving dimensionality reduction: Rough and fuzzy-rough-based approaches,” IEEE Trans. Knowl. Data Eng., vol. 16, no. 12, pp. 1457–1471, 2004.
    [JS2007] W. Jin and R. K. Srihari, “Graph-based text representation and knowledge discovery,” in Proc. 22nd ACM Symp. on Applied Computing (SAC ’07), New York, NY, USA: ACM, 2007, pp. 807–811.
    [KK1993] R. Krishnapuram and J. M. Keller, “A possibilistic approach to clustering,” IEEE Trans. Fuzzy Syst., vol. 1, no. 2, pp. 98–110, 1993.
    [KM2002] K. Knight and D. Marcu, “Summarization beyond sentence extraction: A probabilistic approach to sentence compression,” Art. Intelligence, vol. 139, no. 1, pp. 91–107, 2002.
    [KNH2000] S. Kawasaki, N. B. Nguyen and T. B. Ho, “Hierarchical document clustering based on tolerance rough set model,” in Proc. 4th European Conf. on Principles of Data Mining and Knowledge Discovery (PKDD ’00), London, UK, 2000, pp. 458–463.
    [KPC1995] J. Kupiec, J. Pedersen and F. Chen, “A trainable document summarizer,” in Proc. 18th Annual Int’l. ACM SIGIR Conf. on Research and Development in Inform. Retrieval (SIGIR ’95), Seattle, WA, USA, 1995, pp. 68–73.
    [LC1998] C. Leacoak and M. Chodorow, “Combining local context and wordnet similarity for word sense identification,” in C. Fellbaum, Ed., WordNet, An Electronic Lexical Database, MIT Press, 1998, pp. 305-332.
    [LDL1998] M. L. Littman, S. T. Dumais and T. K. Landauer, “Automatic cross-language information retrieval using latent semantic indexing,” in Cross-Language Inform. Retrieval, chapter 5, Kluwer Academic Publishers, 1998, pp. 51–62.
    [Les1986] M. Lesk, “Automated sense disambiguation using machine-readable dictionaries: How to tell a pine cone from an ice cream cone,” in Proc. 5th Int’l. Conf. on Systems Documentation (SIGDOC ’86), Toronto, Canada, 1986, pp.24–26.
    [LH2003] C. Lin and E. Hovy, “Automatic evaluation of summaries using n-gram co-occurrence statistics,” in Proc. the Human Language Technology Conference 2003 (HLT-NAACL ’03), Edmonton, Canada, 2003, pp. 71–78.
    [Lin1992] T. Y. Lin, “Topological and fuzzy rough sets,” in Intelligent Decision Support: Handbook of Applications and Advances in Rough Sets Theory, R. Slowinski, Ed., Boston: Kluwer Academic Publishers, 1992, pp. 287–304.
    [Lin2004] C. Lin, “Rouge: A package for automatic evaluation of summaries,” in Proc. workshop on Text Summarization Branches Out (WAS ’ 04), Barcelona, Spain, 2004, pp. 74–81.
    [LJ2007] P. Lingras and R. Jensen, “Survey of rough and fuzzy hybridization,” in Proc. 16th Int’l. Conf. on Fuzzy Syst. (FUZZ-IEEE ’07), 2007, pp. 125–130.
    [LJH2005] C. S. Lee, Z. W. Jian and L. K. Huang, “A fuzzy ontology and its application to news summarization,” IEEE Trans. Syst., Man, Cybern. B, vol. 35, no. 5, pp. 859–880, Oct. 2005.
    [LK1997] D. H. Lee and M. H. Kim, “Database summarization using fuzzy isa hierarchies,” IEEE Trans. Syst., Man, Cybern. B, vol. 27, no. 4, pp. 671–680, Aug. 1997.
    [LL1991] T. K. Landauer and M. L. Littman, “A statistical method for language-independent representation of the topical content of text segments,” in Proc. 11th Int’l. Conf. Expert Systems and Their Applications, vol. 8, Avignon, France, May 1991, pp. 77–85.
    [LMG2005] J. Leskovec, N. Milic-Frayling and M. Grobelnik, “Impact of linguistic analysis on the semantic graph coverage and learning of document extracts,” in National Conf. on Art. Intelligence (AAAI '05), Pittsburgh, USA, Jul. 2005, pp.1069-1074.
    [LSL1994] H. Lee-Kwang, Y. S. Song and K. M. Lee, “Similarity measure between fuzzy sets and between elements,” Fuzzy Sets Syst., vol. 62, no. 3, pp. 291–293, 1994.
    [LSM1995] X. Li, S. Szpakowics, S. Matwin, “A wordnet-based algorithm for word sense disambiguation,” in Proc. 14th Int’l. Joint Conf. on Art. Intelligence (IJCAI ’95), Montréal, Canada, 1995, pp. 1368–1374.
    [LSPL2006] Y. Li , S. C. K. Shiu , S. K. Pal and J. N. K. Liu, “A rough set-based case-based reasoner for text categorization,” Int’l. J. Approx. Reason., vol. 41, no. 2, pp.229–255, 2006.
    [Luh1958] H. P. Luhn, “The automatic creation of literature abstracts,” IBM J. of Research and Development, vol. 2, no. 2, pp. 159–165,1958.
    [Luk1995] A. Luk, “Statistical sense disambiguation with relatively small corpora using dictionary definitions,” in Proc. 33rd Meeting of the Association for Computational Linguistics (ACL ’95), Cambridge, MA, 1995, pp. 181–188.
    [MBFGM1990] G.. Miller, R. Beckwith, C. Fellbaum, D. Gross and K. Miller, “WordNet: An on-line lexical database,” Int’l. J. of Lexicography, vol. 3, no. 4, pp. 235–244, 1990.
    [MC2003] D. Mccarthy and J. Carroll, “Disambiguating nouns, verbs and adjectives using automatically acquired selectional preferences,” Computat. Ling., vol. 29, no. 4, pp. 639–654, 2003.
    [Mih2002] R. Mihalcea, “Bootstrapping large sense tagged corpora,” in Proc. 3rd Int’l. Conf. on Languages Resources and Evaluations (LREC ’02), Las Palmas, Spain, May 2002, pp. 1407–1411.
    [Miy1989] S. Miyamoto, “Two approaches for information retrieval through fuzzy associations,” IEEE Trans. Syst. Man Cybernet., vol.19, no.1, pp.123-130, 1989
    [Miy1990] S. Miyamoto, Fuzzy sets in information retrieval and cluster analysis. Dordrecht, Boston, USA: Kluwer Academic Publishers,1990.
    [MLG2000] M. Montes-y-Gómez, A. López-López and A. F. Gelbukh, “Information retrieval with conceptual graph matching,” in Proc. 11th Int’l. Conf. on Database and Expert Systems Applications (DEXA ’00), London, UK: Springer-Verlag, 2000, pp. 312–321.
    [MLW1992] B. Masand, G. Linoff and D. Waltz, “Classifying news stories using memory based reasoning,” Int’l. ACM SIGIR Conf., Jun. 1992, pp.59-65.
    [MM1999] R. Mihalcea, D. Moldovan, “A method for word sense disambiguation of unrestricted text,” in Proc. 37th Annual Meeting of the Association for Computational Linguistics (ACL ’99), Maryland, NY, Jun. 1999, pp.152–158.
    [MM2007] M. A. Murad and T. Martin, “Similarity-based estimation for document summarization using fuzzy sets,” Int’l. J. Comput. Sci. and Security, vol. 1, no. 4, pp.1–12, 2007.
    [MMCF2006] S. Montalvo, R. Martínez, A. Casillas and V. Fresno, “Multilingual document clustering: An heuristic approach based on cognate named entities,” in Proc. 21st Int’l. Conf. on Computational Linguistics and the 44th Annual Meeting of the ACL (ACL ’06), Morristown, NJ, USA: Association for Computational Linguistics, 2006, pp. 1145–1152.
    [MMCF2007] S. Montalvo, R. Martínez, A. Casillas and V. Fresno, “Bilingual news clustering using named entities and fuzzy similarity,” in Text, Speech and Dialogue, ser. Lecture Notes in Comput. Sci., Springer Berlin / Heidelberg, 2007, vol. 4629/2007, pp. 107–114.
    [MN1995] A. T. Mccray and S. J. Nelson, “The representation of meaning in the UMLS,” Meth. Inform. Med., vol. 34, pp. 193–201, 1995.
    [MS1999] C. D. Manning and H. Schütze, Foundations of statistical natural language processing. Cambridge, MA: MIT Press, 1999, pp 230–261.
    [Nav2009] R. Navigli, “Word sense disambiguation: A survey,” ACM Comput. Surv., vol. 41, no. 2, pp. 1–69, 2009.
    [NV2005] R. Navigli and P. Velardi, “Structural semantic interconnections: A knowledge-based approach to word sense disambiguation,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 27, no. 7, pp. 1075–1088, 2005.
    [Ng1997] H. T. Ng, “Getting serious about word sense disambiguation,” in Proc. of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What and How?, Washington, D.C. USA, Apr. 1997, pp. 1–7.
    [OMK1991] Y. Ogawa, T. Morita and K. Kobayashi, “A fuzzy document retrieval system using the keyword connection matrix and a learning method,” Fuzzy Sets Syst., vol. 39, no. 2, pp. 163–179, 1991.
    [Pat2003] S. Patwardhan, “Incorporating dictionary and corpus information into a context vector measure of semantic relatedness,” Master’s thesis, University of Minnesota, Duluth, 2003.
    [Paw1982] Z. Pawlak "Rough sets," Int’l. J. Inform. and Comput. Sci., vol. 11, no. 5, pp. 341–356, 1982
    [Paw1991] Z. Pawlak, Rough sets - Theoretical aspects of reasoning about data. Dordrecht, Boston, USA: Kluwer Academic Publishers, 1991.
    [PBP2005] T. Pedersen, S. Banerjee and S. Patwardhan, “Maximizing semantic relatedness to perform word sense disambiguation,” Supercomputing Institute, University of Minnesota, Tech. Rep. UMSI 2005/25, Mar. 2005.
    [PHP2005] A. Philpot, E. Hovy and P. Pantel, “The Omega Ontology,” in Proc. IJCNLP Workshop on Ontologies and Lexical Resources, OntoLex, Jeju Island, South Korea, 2005, pp. 59–66.
    [PNL2002] A. Pease, I. Niles and J. Li, “The suggested upper merged ontology: A large ontology for the semantic web and its applications,” in Proc. AAAI-2002 Workshop on Ontologies and the Semantic Web, Edmonton, Alta., Canada, 2002.
    [PRV2007] P. Pingali, K. Rahul and V. Varma, “Iiit hyderabad at DUC 2007,” in Proc. 7th Document Understanding Conf. (DUC ’07), Rochester, NY, USA: NIST, 2007.
    [PSI2003] B. Pouliquen, R. Steinberger and C. Ignat, “Automatic identification of document translations in large multilingual document collections,” in Proc. 4th Int’l. Conf. on Recent Advances in Natural Language Processing (RANLP ’03), 2003, pp. 401–408.
    [Puc2006] M. Pucher, “Performance evaluation of wordnet-based semantic relatedness measures for word prediction in conversational speech,” in Proc. 6th Int’l. Workshop on Computational Semantics (IWCS ’06), Tilburg, Netherlands, Jan. 2006, pp. 332–342.
    [Qui1986] J. R. Quinlan, “Introduction of decision trees,” Machine Learning, vol. 1, no.1, pp. 81–106, 1986.
    [Res1997] P. Resnik, “Selectional preference and sense disambiguation,” in Proc. the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington, D.C., 1997, pp. 52–57.
    [Res1999] P. Resnik, “Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language,” J. of Art. Intelligence Research, vol. 11, pp. 95–130, 1999.
    [RJ1998] S. E. Robertson and K. S. Jones, Relevance weighting of search terms. London, UK: Taylor Graham Publishing, 1988, pp. 143–160.
    [RK2002] A. M. Radzikowska and E. E. Kerre, “A comparative study of fuzzy rough sets,” Fuzzy Sets Syst., vol. 126, no. 2, pp. 137–155, 2002.
    [RMBB1989] R. Rada, H. Mili, E. Bicknell and M. Blettner, “Development and application of a metric on semantic nets,” IEEE Trans. Syst. Man Cybernet., vol. 19, no. 1, pp. 17–30, 1989.
    [Rog1911] P. M. Roget, Roget’s International Thesaurus. 1st ed. Cromwell, New York, NY, 1911.
    [Sch1998] H. Schütze, “Automatic word sense discrimination,” J. of Computational Linguistics, vol. 24, no. 1, pp.97–123, 1998.
    [Seb2002] F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1–47, 2002.
    [SEKA2006] Y. Seki, K. Eguchi, N. Kando and M. Aono, “Opinion-focused summarization and its analysis at DUC 2006,” in Proc. 6th Document Understanding Conf. (DUC ’06), New York City, NY, USA, 2006, pp. 122–130.
    [SLBK2003] A. Schenker, M. Last, H. Bunke and A. Kandel, “Classification of web documents using a graph model,” in Proc. 7th Int’l. Conf. on Document Analysis and Recognition (ICDAR ’03), Washington, DC, USA: IEEE Comput. Society, 2003, pp. 240–244.
    [SPH2002] R. Steinberger, B. Pouliquen and J. Hagman, “Cross-lingual document similarity calculation using the multilingual thesaurus EUROVOC,” in Proc. 3rd Int’l. Conf. on Computational Linguistics and Intelligent Text Processing (CICLing ’02), London, UK: Springer-Verlag, 2002, pp. 415–424.
    [SRJ2009] S. Singh, S. K. Ray and B. P. Joshi, “Rough set based concept extraction paradigm for document ranking,” V. Snášel et al., Eds., in Advances in Intelligent and Soft Computing, Proc. 6th Atlantic Web Intelligence Conf. (AWIC ’09), vol. 67, Springer, Berlin/Heidelberg Conference Venue: Charles University, Prague, 2009
    [SRKC2001] P. Srinivasan, M. E. Ruiz, D. H. Kraft and J. Chen, “Vocabulary mining for information retrieval: Rough sets and fuzzy sets,” Inf. Process. Manage., vol. 37, no. 1, pp. 15–38, 2001.
    [STA2008] R. Saraçoğlu, K. Tűtűncű and N. Allahverdi, “A new approach on search for similar documents with multiple categories using fuzzy clustering,” Expert Syst. Appl., vol. 34, no. 4, pp. 2545–2554, 2008.
    [Sto2005] C. Stokoe, “Differentiating homonymy and polysemy in information retrieval,” in Proc. joint Conf. on Human Language Technology and Empirical methods in Natural Language Processing (HLT/EMNLP ’05), Vancouver, Canada, 2005, pp. 403–110.
    [SW2001] M. Stevenson and Y. Wilks, “The interaction of knowledge sources in word sense disambiguation,” Computational Linguistics, vol. 27, no. 3, pp. 321–349, 2001.
    [SWY1975] G. Salton, A. Wong and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, vol. 18, no. 11, pp. 613–620, 1975.
    [TBGJSV2007] K. Toutanova, C. Brockett, M. Gamon, J. Jagarlamudi, H. Suzuki and L. Vanderwende, “The pythy summarization system: Microsoft research at DUC 2007,” in Proc. 7th Document Understanding Conf. (DUC ’07), Rochester, NY, USA: NIST, 2007.
    [TH1993] K. Tzeras and S. Hartmann “Automatic indexing based on bayesian inference networks,” in Proc. 16th Annual ACM/SIGIR Conf. on Research and Development in Inform. Retrieval, Pittsburgh, USA, Jun. 27-Jul. 1, 1993, pp.22-34.
    [TK2007] V. S. Tseng and C. P. Kao, “A novel similarity-based fuzzy clustering algorithm by integrating PCM and mountain method,” IEEE Trans. Fuzzy Syst., vol. 15, no. 6, pp. 1188–1196, Dec. 2007.
    [Tve1977] A. Tversky, “Features of similarity,” in Psychological Review, vol. 84, no. 2, 1977, pp. 327–352.
    [UMY2004] H. Uchida, A. Mano and T. Yukawa, “Patent map generation using concept-based vector space model,” in Working Notes, 4th NTCIR Workshop Meeting, Tokyo, Japan: National Institute of Informatics, 2004.
    [Wan1992] G. J. Wang, “Theory of topological molecular lattices,” Fuzzy Sets Syst., vol. 47, no. 3, pp. 351–376, 1992.
    [Wan2007] X. Wan, “A novel document similarity measure based on earth mover’s distance,” Inf. Sci., vol. 177, no. 18, pp. 3718–3730, Sept. 2007.
    [WB2003] R. Witte and S. Bergler, “Fuzzy coreference resolution for summarization,” in Proc. 2003 Int’l. Symposium on Reference Resolution and Its Applications to Question Answering and Summarization (ARQAS ’03), Venice, Italy, Jun. 23–24, 2003, pp. 43-50.
    [WB2007] R. Witte and S. Bergler, “Fuzzy clustering for topic analysis and summarization of document collections,” in Proc. 20th Canadian Conf. on Art. Intelligence (Canadian A.I. ’07), LNAI 4509, Z. Kobti and D. Wu, Eds., Montréal, Québec, Canada: Springer, 2007, pp. 476–488.
    [WC2007] T. Wang and H. Chiang “Fuzzy support vector machine for multi-class text categorization,” Inf. Process. Manage., vol. 43, no. 4, pp 914–929, 2007.
    [Wea1949] W. WEAVER, “Translation,” in Machine Translation of Languages: Fourteen Essays, W. N. Locke and A. D. Booth, Eds., Cambridge, MA: MIT Press, 1949, pp.15–23.
    [WFGMPS1990] Y. Wilks, D. Fass, C. Guo, J. McDonald, T. Plate and B. Slator, “Providing machine tractable dictionary tool,” Machine Translation, vol. 5, no. 2, pp. 99–154, 1990.
    [Wit2005] I. H. Witten “Text mining,” in Practical handbook of internet computing, M. P. Singh, Ed., Boca Raton, Florida: Chapman & Hall/CRC Press, 2005, pp. 14-1–14-22.
    [WPW1995] E. Wiener, J. Pederson and A. Weigend, “A neural network approach to topic spotting,” in Proc. 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR ’95), 1995, pp. 317–332.
    [WS1996] Y. Wilks and M. Stevenson, “The grammar of sense: Is word sense tagging much more than part-of-speech tagging?,” Tech. Rep. CS-96-05, University of Sheffield, Sheffield, United Kingdom.
    [WY2000] D. H. Widyantoro and J. Yen “A fuzzy similarity approach in text classification task,” in Proc. 9th IEEE Int’l. Conf. Fuzzy Syst., 2000, vol. 2, pp. 653–658.
    [WYX2008] X. Wan, J. Yang and J. Xiao, “Towards a unified approach to document similarity search using manifold-ranking of blocks,” Inf. Process. Manage., vol. 44, no. 3, pp. 1032–1048, 2008.
    [Yao1997] Y. Y. Yao, “Combination of rough and fuzzy sets based on α-level sets,” in Rough Sets and Data Mining: Analysis of Imprecise Data, T. Lin and N. Cercone, Eds., Boston: Kluwer Academic Publishers, 1997, pp. 301–321.
    [Yar1992] D. Yarowsky, “Word sense disambiguation using statistical models of roget's categories trained on large corpora,” in Proc. 14th Int’l. Conf. on Computational Linguistics (COLING ’92), Nantes, France, 1992, pp. 454–460.
    [Zad1965] L. Zadeh, “Fuzzy sets,” Inform. and Control., vol. 8, no. 3, pp. 338–353.
    [ZCB1987] R. Zwick, E. Carlstein and D. Budescu, “Measures of similarity among fuzzy sets: a comparative analysis,” Int’l. J. Approx. Reason., vol. 1, pp. 221–242, 1987.

    下載圖示 校內:2013-01-21公開
    校外:2013-01-21公開
    QR CODE