| 研究生: | 塗宜昆 Tu, Yi-Kun | 
|---|---|
| 論文名稱: | 以單類支持向量機為基礎之階層式文件分類 Hierarchical Text Categorization Using One-Class SVM | 
| 指導教授: | 蔣榮先 Chiang, Jung-Hsing | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering | 
| 論文出版年: | 2003 | 
| 畢業學年度: | 91 | 
| 語文別: | 英文 | 
| 論文頁數: | 68 | 
| 中文關鍵詞: | 支持向量機 、單類支持向量機 、文件分類 | 
| 外文關鍵詞: | Text Categorization, SV Clustering, SVM, One-Class SVM | 
| 相關次數: | 點閱:84 下載:6 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
由於資訊量的快速成長,自動文件分類對於處理及組織資料成為一種重要的資訊分析技術。而由經驗得知當我們處理的資料類別數增加時,用來衡量效能好壞的工具如精確率(precision)、召回率(recall)都會相對的下降,採用階層式的類別架構可以解決及處理具有大量資料的問題。
	在這個研究中,我們採用單類支持向量機來達到文件聚類之目的,並使用聚類的結果來建立一個階層式的架構,這個架構描述了類別間的關係。我們採用兩類及多類支持向量機來作監督式的分類訓練。
	由所設計的三個實驗,我們探討以單類支持向量機為基礎所建立的系統的特性,並與其他研究方法作比較,實驗結果證明所提出的系統具有較佳的效能。
	With the rapid growth of online information, text categorization has become one of the key techniques for handling and organizing text data. Experience to date has demonstrated that both precision and recall decrease as the number of categories increase. Hierarchical categorization affords the ability to deal with very large problems.
	We utilize one-class SVM to perform support vector clustering, and then use the clustering results to construct a hierarchical categories. Two-class and multi-class SVMs are used to perform the supervised classification.
	We explore one-class SVM model through three experiments. Performance analysis is performed by comparing with other approaches, the experimental results show that the proposed hierarchical categories works well.
Aas K. and Eikvil L., Text categorization: A survey. Report No 941, Norwegian Computing Center, ISBN 82-539-0425-8, June, 1999.
Apte C., Damerau F., Weiss S. M., Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, pages 233-251, 1994.
Ben-Hur A., Horn D., Siegelmann H. T., and Vapnik V., A support vector clustering method. In International Conference on Pattern Recognition, 2000.
Ben-Hur A., Horn D., Siegelmann H. T., and Vapnik V., Support vector clustering. Journal of Machine Learning Research, volume 2 pages 125-137, 2001.
Blake C. L., and Merz C. J., UCI repository of machine learning databases, 1998.
Brill E., Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics. 1995.
Bishop, C., Neural networks for pattern recognition. Oxford University Press, Walton Street, Oxford OX2 6DP, 1995.
Boser B.E., Guyon I., and Vapnik V. N., A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop of Computational Learning Theory, volume 5, pages 144-152, Pittsburg, ACM, 1992.
Brill E., Rule based part of speech tagger. version 1.14, 1994.
Chih-Wei Hsu, Chih-Jen Lin, A comparison of methods for multiclass support vector machines. IEEE transactions on neural networks. volume 13:2. March, 2002.
Cortes C. and Vapnik V., Support vector networks. Machine Learning, volume 20:1, 25, 1995.
D’Alessio S., Kershenbaum A., Murray K., Schiaffino R., Category levels in hierarchical text categorization. Proceedings of the Third Conference of Empirical Methods in Natural Language Processing EMNLP-3, 1998.
D’Alessio Stephen, Aaron Kershenbaum, Keitha Murray, and Robert Schiaffino., The effect of using hierarchical classifiers in text categorization. In Proceedings of 6th International Conference Recherche d’Information Assistee par Ordinateur(RIAO-00), pages 302-313, Paris, France, 2000.
Duda R. P., Hart P. E., and Stork D. G., Pattern Classification, 2nd ed. Wiley, 2000.
Ellis Horwitz, Sartaj Sahni, and Dinesh Mehta, Fundamentals of Data Structures in C++. New York :Computer Science Press, 1995.
Frakes W. B. and Baeza-Yates R., Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992.
Georgej Klir and Bo Yuan, Fuzzy Sets and Fuzzy Logic. Prentice Hall International Editions., 1995.
Golub G. and Loan C. Van, Matrix Computations,3rd edition. Johns Hopkines, Baltimore, 1996.
Hayes P. and Weinstein S., Constre/tis: a system for content-based indexing of a database of news stories. In Annual conference on Innovative Applications of AI, 1990.
Japkowicz N., Myers C. and Gluck M., A novelty detection approach to classification. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 518-523, 1995.
Joachims Thorsten., Text categorization with Support Vector Machines: learning with many relevant features. LS-8 report 23, 1998.
Jolliffe I.T., Principal Component Analysis. Spring Verlag, 1986.
Koller D. and sahami M., Hierarchically classifying documents using very few words. International Conference on Machine Learning, volume 14, Morgan-Kauffman, 1997.
Lang, K., Newsweeder : Learning to filter netnews. In International Conference on Machine Learning (ICML), 1995.
Lewis D. D., An evaluation of phrasal and clustered representations on a text categorization task. In Proc. Of the 15th Annual Int. ACM SIGIR Conf. On Research and Development in Information Retrieval. pages 37-50, 1992a.
Lewis. D. D., Representation and learning in information retrieval, Ph.D. thesis, Computer Science Dept, Univ. of Massachusetts at Amherst, February. Technical report pages 91-93, 1992b.
Lewis D. D., Reuters-21578 collection, 1996.
Manevitz Larry M., Malik Uousef, One-class SVMS for document classification. Journal of Machine Learning Research volume 2 pages 139-154, 2001.
Meisel W. S., Computer-oriented approaches to pattern recognition. New York and London, 1972.
Miguel Á. Carreira-Perpiñán, A review of dimension reduction techniques. Technical report CS-96-09, 1997.
Moya M., Koch M. and Hosterler L., One-class classifier networks for target recognition applications. In Proceedings world congress on neural networks, pages 797-801, Portland, OR. International Neural Network Society, INNS, 1993.
Ng H.-T., Goh W.-B. and Low K.-L., Feature selection, perception learning and a usability case study. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, July 27-31, pages.67-73, 1997.
Platt J. C., Fast training of support vector machines using sequential minimal optimization. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in kernel methods: support vector learning. MIT Press, 1998.
Hao P. Y. and Chiang J. H., Support vector clustering: a new geometrical grouping approach, Proceedings of the 9-th Bellman Continuum International Workshop on Uncertain Systems and Soft Computing, volume 2, pages. 312-317, Beijing, China, July, 2002.
Porter M. F., An algorithm for suffix stripping. program: automated library and information systems, volume 14(1), pages 130-137, 1980.
Ricardo B. Y., Berthier R. N., Modern information retrieval. Addison-Wesley, ACM Press, New York, 1999.
Rijsbergen C. J. V. , Information Retrieval. London: Butterworths, 2nd edition, 1979.
Ritter G. and Gallegos M., Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recognition Letters, volume 18 pages 525-539, 1997.
Scholkopf B., Williamson R., Smola A., and Shawe-Taylor J., Single-class support vector machines. In J. Buhmann, W. Maass, H. Ritter, and N. Tishby, editors, Unsupervised Learning, Dagstuhl-Seminar-Report 235, pages 19-20, 1999.
Schőlkopf B., Platt J.C., Shawe-Tayer J., Smola A. J., and Williamson R. C., Estimating the support of a high dimensional distribution. In Proceedings of the Annual Conference on Neural Information Systems. MIT Press, 2000.
Schutze H., Hull D., and Pedersen, J., A comparison of classifiers and document representations for the routing problem. In International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995.
Sebastiani F., Machine learning in automated text categorization: a survey. Technical report IEI-B4-31-1999, Istituto di Elaborazione dell’informazione, Consiglio Nazionale delle Ricerche, Pisa, IT, 1999, Revised version, 2001.
Tax D. and Duin R., Support vector domain description. Pattern Recognition Letters volume 20 pages 11-13 , 1999.
Wang K., Zhou S., and He Y., Hierarchical classification of real life documents. In Proceedings of the 1st SIAM Int. Conference on Data Mining, Chicago, 2001.
Weigend A. S., Wiener E. D., and Pedersen J. O., Exploiting hierarchy in text categorization. Information Retrieval, volume 1(3) pages 193-216, 1999.
Weiss S. M., Apte C., Damerau F.J., Johnson D.E., Oles, F.J. Goetz, Hampp T., Maximizing text-mining performance, IEEE Intelligent Systems, volume 14(4), July-Aug, 1999.
Weston J., Watkins C., Multi-class support vector machines. Technical Report CSD-TR-98-04 May 20, 1998.
Yang Y. and Wilbur, J., Using corpus statistics to remove redundant words in text categorization. Journal of the American Society for Information Science, volume 47(5) pages 357-369, 1996.
Yang Y., An evaluation of statistical approaches to text categorization. Technical Report CMU-CS-97-127, Computer Science Department, Carnegie Mellon University, 1997a.
Yang Y., and Pedersen J. O., A comparative study on feature selection in text categorization. Proceedings of the 14th International Conference on Machine Learning ICML97, pages 412-420, 1997b.
黃曲江, 計算機演算法設計與分析 格致, 1989