研究生: |
陳麒偉 Chen, Chi-Wei |
---|---|
論文名稱: |
考慮個人偏好因素之多重文件分類方法 A Multilabel Text Classification Method with Personal Preference |
指導教授: |
王泰裕
Wang, Tai-Yue |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 中文 |
論文頁數: | 65 |
中文關鍵詞: | 多標籤 、多類別 、倒傳遞類神經網路 、個人偏好 、文件分類 |
外文關鍵詞: | back-propagation neural network (BPN), personal preference, multi-class, text categorization, multi-label |
相關次數: | 點閱:116 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
文件分類在許多領域的重要性與日俱增,當我們面對大量文件的處理問題時,自動化的文件分類是一個最佳的解決方案。目前,許多分類技術都已經被拿來進行應用在文件分類問題上,然而,這些研究通常僅針對同一個基準來進行評估,並未考慮到分類文件時,不同分類者的個人偏好會導致分類結果不同。個人偏好是指不同分類者對同一份文件會有不一樣的分類傾向,這一點在一般的文件分類模式中常常被忽略。因此,本研究之目的在於探討將個人偏好因素加入文件分類模式時,該模式是否能針對不同分類者,產生不同的分類結果,且就整體而言,能得到一定水準的分類效用。本研究透過倒傳遞類神經網路建構一個考慮個人偏好因素之多重文件分類模式,並利用Reuters-21578新聞文件資料集來進行實例驗證。本研究在考慮個人偏好因素後,發現文件分類模式可依據不同分類者之個人偏好,給予同一份文件不同的個人化分類預測結果,且對每位分類者來說,分類效用之表現皆是可接受的。
Automated text categorization has been widely used in many fields; it is the best solution to mass document management. Currently, most classification techniques have been applied to text categorization. However, these researches do not include personal preference in their classification methods. Classification results depend on personal preference, hence different users may not label the same class to the identical document. This is usually ignored in text categorization so far. The purpose of this study is to find if personal preference will affect classification results, and to improve the classification effectiveness. We use back-propagation neural network (BPN) to build a preference-based text categorization model. The well-known Reuters-21578 collection is used to perform experiments. Experiment results show that the preference-based model is superior to the original one.
中文部分
葉怡成,“類神經網路模式應用與實作”,儒林圖書有限公司,民國92年八版。
英文部分
Aamodt, A., and Plaza, E. (1994). Case-Based Reasoning - Foundational Issues, Methodological Variations, and System Approaches. AI Communications, 7(1), 39-59.
Boutell, M. R., Luo, J. B., Shen, X. P., and Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757-1771.
Chen, R. C., and Hsieh, C. H. (2006). Web page classification based on a support vector machine using a weighted vote schema. Expert Systems with Applications, 31(2), 427-435.
Denoyer, L., and Gallinari, P. (2004). Bayesian network model for semi-structured document classification. Information Processing & Management, 40(5), 807-827.
Drucker, H., Wu, D. H., and Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5), 1048-1054.
Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T. (1987). The Vocabulary Problem in Human-System Communication. Communications of the ACM, 30(11), 964-971.
Guo, G. D., Wang, H., Bell, D., Bi, Y. X., and Greer, K. (2006). Using kNN model for automatic text categorization. Soft Computing, 10(5), 423-430.
Kamba, T., Sakagami, H., and Koseki, Y. (1997). ANATAGONOMY: A personalized newspaper on the World Wide Web. International Journal of Human-Computer Studies, 46(6), 789-803.
Li, Y., Shiu, S. C. K., Pal, S. K., and Liu, J. N. K. (2006). A rough set-based case-based reasoner for text categorization. International Journal of Approximate Reasoning, 41(2), 229-255.
Mobasher, B., Cooley, R., and Srivastava, J. (2000). Automatic personalization based on Web usage mining - Web usage mining can help improve the scalability, accuracy, and flexibility of recommender systems. Communications of the ACM, 43(8), 142-151.
Sakagami, H., and Kamba, T. (1997). Learning personal preferences on online newspaper articles from user behaviors. Computer Networks and Isdn Systems, 29(8-13), 1447-1455.
Schapire, R. E., & Singer, Y. (2000). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2-3), 135-168.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47.
Selamat, A., and Omatu, S. (2004). Web page feature selection and classification using neural networks. Information Sciences, 158, 69-88.
Trappey, A. J. C., Hsu, F. C., Trappey, C. V., and Lin, C. I. (2006). Development of a patent document classification and search platform using a back-propagation network. Expert Systems with Applications, 31(4), 755-765.
Ueda, N., and Saito, K. (2003). Parametric mixture models for multi-labeled text. In Advances in Neural Information Processing Systems 15. Cambridge: MIT Press.
Wang, Z. T., Liu, Z. Q., and Ai, X. (2003). Case representation and similarity in high-speed machining. International Journal of Machine Tools & Manufacture, 43(13), 1347-1353.
Wei, C. P., Chiang, R. H. L., and Wu, C. C. (2006a). Accommodating individual preferences in the categorization of documents: A personalized clustering approach. Journal of Management Information Systems, 23(2), 173-201.
Wei, C. P., Yang, C. S., Hsiao, H. W., and Cheng, T. H. (2006b). Combining preference- and content-based approaches for improving document clustering effectiveness. Information Processing & Management, 42(2), 350-372.
Yang Y., Pedersen J. O. (1997). A comparative study on feature selection in text categorization. In Proceedings of ICML-97, 14th International Conference on Machine Learning, 412–420.
Zhang, M. L., and Zhou, Z. H. (2006). Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 18(10), 1338-1351.
Zhang, M. L., and Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038-2048.