| 研究生: | 郭俊良 Guo, Jiunn-Liang | 
|---|---|
| 論文名稱: | 數位資源之脈絡資訊探索及選用決策模式 Contextual Information Exploration and Decision Model of Digital Document Resources | 
| 指導教授: | 王惠嘉 Wang, Hei-Chia | 
| 學位類別: | 博士 Doctor | 
| 系所名稱: | 管理學院 - 資訊管理研究所 Institute of Information Management | 
| 論文出版年: | 2013 | 
| 畢業學年度: | 101 | 
| 語文別: | 英文 | 
| 論文頁數: | 71 | 
| 中文關鍵詞: | 脈絡資訊 、語意分析 、語段分析法 、權重式網頁存取評量法 | 
| 外文關鍵詞: | Contextual information, semantics, discourse analysis, weighted pagerank | 
| 相關次數: | 點閱:105 下載:0 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
當人類進入資訊時代,文件資料的數位化逐漸改變資訊擷取的方式,同時使得知識的獲得更為便利。然而隨著網路資源的不斷累積,漸漸形成海量等級的數位資料,間接產生許多資源管理上的問題,例如:關鍵資訊的搜尋不易、文件自動化處理的難度增加以及資源管理效能降低的議題等。因此,近年來許多的研究人員投入相關的研究領域,希望運用自然語言處理(Natural Language Processing)、文件採礦(Text Mining)及資訊擷取(Information Retrieval)等技術從不同的層面針對數位文件資源進行分析,期能提出更有效率的方法來改善文件資源的運用及管理作為。
有關數位文件資源的研究範圍相當廣泛,其中文件內容分析及文件資源重要性評量等領域是近年來的重要研究議題。有關文件內容分析的研究方法,多數學者主要是針對文件中出現的字詞頻率與特性進行統計分析。然而,不論何種文件資源,文本內容及資源的選用行為具有許多面向,若單從文件的用字遣詞來探討其中的重要性,不但無法深入探究文件的意義,亦將忽略文章結構中的前後文意連貫性或前後文意的脈絡關聯所隱含的重要特性,進而將使得分析的結果在未來的應用上受到限制。另一方面,文件資源重要性評量的相關議題亦受到廣泛的重視。其中在區域學術資源(如電子期刋)選用的決策評量方面,網頁式的線上系統中所具備的超連結功能,間接提供研究人員在參考相關延伸資料時重要訊息的引導。該特性與研究論文中所引用參考資料的決策過程隱含了許多重要的前後脈絡關聯值得加以重視。
綜觀在資訊擷取領域中有關脈絡資訊(contextual information)的研究議題大致可分為兩個方向進行探討。第一個方向著重在文件內文的脈絡特徵,而另一個方向則是從數位資源運用行為的層面來進行探討。因此,本論文即針對這兩方向的議題提出探索式研究設計。首先,在第一部份的研究中所提出的方法主要是利用具語意考量的語段分析(discourse analysis)技術來檢視文本內容脈絡的連貫性及語意轉折,藉此決定全文的語意段落(discourse segment)。隨後則透過改良的特徵擷取法(feature selection),自語段中選取隱含的重要特徵 - 語段次主題 (discourse subtopics) 以形成特徵集,最後藉由自動化文件分類的實驗結果驗證該方法的成效。第二部份將檢視學術文件資源的運用模式,並建構核心期刋的評估決策模式,期能透過提出的權重式網頁存取評量法(weighted PageRank)檢視數位文件資源(如電子期刋)存取行為中的脈絡關聯性,同時結合研究者文獻引用的資訊,建構區域電子期刋評估指標 (Local Impact Factor, LIF),以協助資源使用者在引用相關學術資料及圖書資訊管理人員未來在進行電子期刋資料庫採購工作時的決策參考。
經由本研究相關實驗結果得知,文件內容以及文件資源的選用行為的確隱含重要的脈絡資訊。透過本研究所提出的方法證明,脈絡資訊可透過設計的方法萃取並進而應用於改善自動化文件分類工作及評估重要數位資源時的決策參考。
Digital document resources possess implicit contextual information, which raises many research challenges in the information retrieval discipline. Such information remained either in the discourse context of document or in the access of web-based resources has led to the need of deep investigation on the value of contextual attributes to the widespread application of information processing. For the content of document, the contextual information is believed to be existed in the discourse segments of text, which has long been treated as difficult issue because of the diversified document structure. On the other hand, the contextual information occurred in the access of web resources is even more difficult to be explored because it involves the unpredictable human behavior and the varied background knowledge. In addition, such a circumstance makes monitoring the user decision-making process even more complicated because the usage of resource is untraceable. However, contextual information has long been treated as important pattern which is believed to be a critical factor to improving the performance of information processing. 
Regarding the analysis of contextual information, this work aims to propose two novel approaches on the exploration of contextual information existed in both textual level and web access aspects by means of adopting discourse structure analysis and designing a core decision model, respectively. For textual resources, this study designs a framework to detect the context by analyzing the discourse structure not only addressing the shifts and continuity of coherent subtopics but also exploiting the syntactic attributes, which are capable of enhancing the performance of text classification. To inspect the validation, the first model will implement to e-book classification task to testify the contribution of the explored contextual information. For the web access aspect, the second study focuses on the local access of digital library and proposes a novel system - the Local Impact Factor (LIF) to evaluate and rank the importance of digital resources. The system investigates the requirements of local user community as incorporating both the access rate of adopted journals and the weighted impact factor technique to capture the contextual information existed between the usages of resources and citation of thesis. And, by measuring the citation information from the local users’ articles, it helps reveal the relationship between the download of resources and the real application of the citation decision.
Both studies are fully implemented and tested on two real-world datasets together with a series of integrated experiments. As the result, the evaluations have demonstrated the vital role of contextual information existed in both textual and web resources and the significant improvement in performance is also revealed. Also, our proposed methods are proven to be feasible and beneficial for future information processing applications.
Arauzo-Azofra, A., Benitez, J., and Castro, J. (2008), “Consistency measures for feature selection”, Journal of Intelligent Information Systems, Vol. 30, No. 3, pp. 273-292. 
Bauerly, R. J., and Johnson, D. T. (2005), “An evaluation of journals used in doctoral marketing programs”, Journal of the Academy Of Marketing Science, Vol. 33, No. 3, pp. 313-329.
Beattie, V. A., and Ryan, R. J. (1991), “The impact of non-serial publications on research in accounting and finance”, ABACUS, Vol. 27, No. 1, pp. 32-50.
Bolchini, C., Curino, C. A., Quintarelli, E., Schreiber, F. A., and Tanca, L. (2009), “Context information for knowledge reshaping”, Internatonal Journal of Web Engineering and Technology, Vol. 5, No. 1, pp. 88-103.
Bollen, J., Luce, R., Vemulapalli, S., and Xu, W. (2003), “Detecting research trends in digital library readership”, In Proceedings of the seventh European conference on digital libraries, Springer-Verlag, Trondheim, Norway, pp. 24-28.
Bollen, J., Rodriguez, M., and van de Sompel, H. (2006), “Journal status”, Scientometrics, Vol. 69 No. 3, pp. 669-687.
Bollen, J., van de Somple, H., Smith, J. A., and Luce, R. (2005), “Toward alternative metrics of journal impact: A comparison of download and citation data”, Information Processing and Management, Vol. 41, No. 6, pp. 1419-1440.
Bollen, J., and van de Sompel, H. (2008), “Access impact factor: The effects of sample characteristics on access-based impact metrics”, Journal of the American Society for Information Science and Technology, Vol. 59, No. 1, pp. 136-149.
Brin, S., and Page, L. (1998), “The anatomy of a large scale hypertextual web search engine”, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pp. 107-117.
Brown, G., and G. Yule (1983), Discourse Analysis, Cambridge: CUP.
Budd, J. M., and Raber, D. (1996), “Discourse analysis: method and application in the study of information. Information Processing and Management”, Vol. 32, No. 2, pp. 217-226.
Chan, S. W. K. (2004), “Automatic discourse structure detection using shallow textual continuity”, International Journal of Human-Computer Studies, Vol. 61, No. 1, pp. 138-164.
Chang, C. C., and Lin, C. J. (2011), “LIBSVM: A library for support vector machines”, ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, pp. 1-27.
Chen, S. J., Li, S. T., Lin, H. W., Hung S. C., Chang, C. T., and Yeh, S. K. (2005), “Diversity in management journals in Taiwan: Ranking of journal quality”, Sun Yat-Sen Management Review, Vol. 13, No. 1, pp. 15-48.
Cheng, C. H., Kumar, A., Motwani, J. G., and Reisman, A. (1999), “A citation analysis of the technology innovation management journals”, IEEE Transactions on Engineering Management, Vol. 46, No. 1, pp. 4-13.
Choudhary, A. K., Harding, J. A., and Popplewell, K. (2006), “Knowledge discovery for moderating collaborative projects”, In Proceedings of the 4th IEEE International Conference on Industrial Informatics, Singapore, pp. 519-524.
Cordón, O., Herrera-Viedma, E., López-Pujalte, C., Luque, M., and Zarco, C. (2003), “A review on the application of evolutionary computation to information retrieval”, International Journal of Approximate Reasoning, Vol. 34, No. 2-3, pp. 241-264.
Darmoni, S. J., Roussel, F., Benichou, J., Thirion, B., and Pinhas, N. (2002), “Reading factor: A new bibliometric criterion for managing digital libraries”, Journal of the Medical Library Association, Vol. 90, No. 3, pp. 323–327.
Debes, M., Lewandowska, A., and Seitz, J. (2005), “Definition and Implementation of Context Information”, Paper presented at the Joint second Workshop on Positioning Navigation and Communication.
Declan Butler (2008), “Free journal-ranking tool enters citation market”, Nature, Vol. 451, No.6, pp. 6.
Diaz, J., Black, R. T., and Rabianski, J. (1996), “A note on the ranking of real estate research journals”, Real Estate Economics, Vol. 24, No. 4, pp. 551-563.
Do, T. D., Hui, S. C., and Fong, A. C. M. (2006), “Associative Feature Selection for Text Mining”, International Journal of Information Technology, Vol. 12, No. 4, pp. 59-68.
Extejt, M. M., and Smith, J. E. (1990), “The behavioral sciences and management: An evaluation of relevant journals”, Journal of Management, Vol. 16, No. 3, pp. 539-551.
Forgionne, G. A., and Kohli, R. (2001), “A multiple criteria assessment of decision technology system journal quality”, Information and Management, Vol. 38, pp. 421-435.
Forman, G. (2003), “An Experimental Study of Feature Selection Metrics for Text Categorization”, Journal of Machine Learning Research, Vol. 3, pp. 1289-1305.
Forrester, M. A., Ramsden, C., and Reason, D. (1997), “Conversation and Discourse Analysis in Library and Information Services”, Education for Information,1Vol. 5, No. 4, pp. 283-295.
Garfield, E. (1979), Citation indexing: Its theory and application in science, technology, and humanities. New York: John Wiley and Sons.
Gillenson, M. L., and Stutz, J. D. (1991), “Academic issues in MIS: journals and books”, MIS Quarterly, Vol. 15, No. 4, pp. 447-452.
Gillian, B., and Yule, G. (1983), Discourse Analysis. Cambridge, United Kingdom: Cambridge University Press.
González-Pereira, B., V.P. Guerrero-Bote and F. Moya-Anegón (2009), The SJR indicator: A new indicator of journals' scientific prestige, arXiv:0912.4141v1.
Grimes, J. E. (1972), The Thread of Discourse. Cornell University, Ithaca, NY. 
Grosz, B. J., Weinstein, S., and K., J. A. (1995), “Centering: a framework for modelling the local coherence of discourse”, Computational Linguistics, Vol. 21, pp. 203-225.
Harding, J. A., M.Shahbaz, Kuisak, A., and Srinivas. (2006), “Data mining in manufacturing: a review”, Journal of Manufacturing Science and Engineering, Vol. 128, No. 4, pp. 969-976.
Hearst, M. A. (1997), “TextTiling: segmenting text into multi-paragraph subtopic passages”, Computational Linguistics, Vol. 23, No. 1, pp. 33-64.
Huang, K.-C., Geller, J., Halper, M., Perl, Y., and Xu, J. (2009), “Using WordNet synonym substitution to enhance UMLS source integration”, Artif. Intell. Med., Vol. 46, No. 2, pp. 97-109.
Holsapple, C. W., Johnson, L. E., Manakyan, H., and Tanner, J. T. (1994), “Business computer research journals: a normalized citation analysis”, Journal of Management Information Systems, Vol. 11, No. 1, pp. 131-140.
Hovy, E. H. (1994), “Automated discourse generation using discourse structure relations”, Artificial Intelligence, Vol. 63, pp. 341-385.
Joachims, T., Informatik, F., and Viii, L. (1997), Text categorization with Support Vector Machines: Learning with many relevant features.
Kaplan, N. R., and Nelson, M. L. (2000), “Determining the publication impact of a digital library”, Journal of the American Society for Information Science and Technology, Vol. 51, No. 4, pp. 324-339.
Kauchak, D., and Chen, F. (2005), “Feature-based segmentation of narrative documents”, In Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, Ann Arbor, Michigan, United States, pp. 32-39.
Komatsu, S. (1996), “JCR for citation analysis”, Joho Kanri, Vol. 39, No. 3, pp. 199-207.
Konchady, M. (2006), Text Mining Application Programming (Programming Series): Charles River Media, Inc.
Kovacevic, A., Devedzic, V., and Pocajt, V. (2010), “Enhancing a core journal collection for digital libraries”, Journal of Program: electronic library and information systems, Vol. 44, No. 2, pp. 132-148.
Kullback, S., and Leibler, R.A. (1951), “On Information and Sufficiency”, Annals of Mathematical Statistics, Vol. 22, No. 1, pp. 79–86.
Kupiec, J., Pedersen, J., and Chen, F. (1995), “A trainable document summarizer”, In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, United States, pp. 68-73. 
Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C. S., Demleitner, M., and Murray, S. S. (2005), “The bibliometric properties of article readership information”, Journal of the American Society for Information Science and Technology, Vol. 56, No. 2, pp. 111-128.
Lautamatti, L. (1978), “Observations on the development of the topic in simplified discourse”, In V. Kohonen and N.E. Enkvist (eds.) 1978. Text Linguistics, Cognitive Learning and Language Teaching. Turku, Finland.
Lewis, D. D., and Ringuette, M. (1994), “A Comparison of Two Learning Algorithms for Text Categorization”, In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, United States, pp. 81-93.
Li, S., Xia, R., Zong, C., and Huang, C.-R. (2009), “A framework of feature selection methods for text categorization”, In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, Vol. 2, pp. 692-700.
Li, H., and Yamanishi K. (2003), “Topic analysis using a finite mixture model”, Information Processing and Management, Vol. 39, pp. 521-541.
Liang T. P., and Ku, Y. C. (2004), “Diversity in international information and management journals: Ranking by journal quality”, In Proceeding of 15th international conference on information management, Chungli, Taiwan.
Liebowitz, S. J., and Palmer, J. P. (1984), “Assessing the relative impacts of economic journals”, Journal of Economic Literature, Vol. 22, No. 1, pp. 77-88.
Line, M. (1997), “On the irrelevance of citation analyses to practical librarianship”, In Proceeding of European conference on the application of research in information services in libraries, London: Aslib, pp. 53-55.
Mann, W. C., and Thompson, S. A. (1988), “Rhetorical Structure Theory: Toward a functional theory of text organization”, Text - Interdisciplinary Journal for the Study of Discourse, Vol. 8, No. 3, pp. 243-281.
Maron, M. E. (1961), “Automatic Indexing: An Experimental Inquiry”, Journal of the ACM, Vol.8, No. 3, pp. 404-417.
Matthew E. Falagas et al (2008), “Comparison of SCImago journal rank indicator with journal impact factor”, The FASEB Journal, Vol.22, No.22, pp. 2623-2628.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. (1993). Five Papers on WordNet. Technical Report, Princeton University, Princeton, NJ, USA.
Mladenic, D. (1998), “Feature Subset Selection in Text-Learning”, In Proceedings of the 10th European Conference on Machine Learning, Glasgow, UK, pp. 95-100. 
Mooney, D. J., Carberry, S., and McCoy, K. F. (1990), “The generation of high-level structure for extended explanations”, In Proceedings of the 13th conference on Computational linguistics, Helsinki, Finland, Vol. 2, pp. 276-281.
Neaga, E. I., and Harding, J. A. (2005), “An enterprise modelling and integration framework based on knowledge discovery and data mining”, International Journal of Production Research, Vol. 43, No. 6, pp. 1089 – 1108. 
Noruzi, A. (2005), “The web impact factor: A survey of some Iranian university web sites”, Journal of Education and Psychology, Vol. 5, No. 2, pp. 105-119.
Noruzi, A. (2006), “The web impact factor: a critical review”, The Electronic Library, Vol. 24, No. 4, pp. 490-500.
Paice, C. D. (1990), “Constructing literature abstracts by computer: techniques and prospects”, Information Processing and Management, Vol. 26, No. 1, pp. 171-186.
Paradis, F., and Nie, J.-Y. (2007), “Contextual feature selection for text classification”, Information Processing and Management, Vol. 43, No. 2, pp. 344-352.
Peng, H., Long, F., and Ding, C. (2005), “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp. 1226-1238.
Pham, D. T., and Afify, A. A. (2005), “Machine learning techniques and their applications in manufacturing”, In Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, Vol. 219, No. 5, pp. 395-412.
Saeys Y., Inza I., and Larrañaga P. (2007), “A review of feature selection techniques in bioinformatics”, Bioinformatics, Vol. 23, No. 19, pp. 2507-2517.
Salton, G. (1968), Automatic information organization and retrieval. USA: McGraw-Hill.
Salton, G., and M. J. McGill (1983), Introduction to modern information retrieval. McGraw-Hill.
Salton, G., Wong, A., and Yang, C. S. (1975), “A vector space model for automatic indexing”, Communications of the ACM, Vol. 18, No. 11, pp. 613-620.
SCImago. (2007). SJR — SCImago Journal & Country Rank. Retrieved June 12, 2011, from http://www.scimagojr.com
Sidiropoulos, A., and Manolopoulos, Y. (2005), “A new perspective to automatically rank scientific conferences using digital libraries”, Information Processing and Management, Vol. 41, No. 2, pp. 289-312.
Soricut, R., and Marcu, D. (2003), “Sentence level discourse parsing using syntactic and lexical information”, In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, Vol. 1, pp. 149-156. 
Soteriou, A. C. Hadjinicola, G. C., and Patsia, K. (1999), “Assessing production and operations management related journals: The European perspective”, Journal of Operations Management, Vol. 17, No. 2, pp. 225-238.
Stark, H. A. (1988), “What do paragraph markings do?”, Discourse Processes, Vol. 11, No. 3, pp. 275 - 303.
Tagarelli, A., and Karypis, G. (2008), “A Segment-based Approach To Clustering Multi-Topic Documents”, In the Text Mining Workshop, SIAM Datamining Conference 2008.
Torrance,. M., and Bouayad-Agha, N. (2001), “Rhetorical structure analysis as a method for understanding writing processes”, In Proceedings of the International Workshop on Multi-disciplinary Approaches of discourse.
Turban, E., Zhou D., and Ma, J. (2004), “A group decision support approach to evaluating journals”, Information and Management, Vol. 42, No. 1, pp. 31-44.
Voorhees, E. M. (1993), “Using WordNet to disambiguate word senses for text retrieval”, In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, Pittsburgh, ennsylvania, United States, pp. 171-180.
Walstrom, K. A., and Hardgrave, B. C. (2001), “Forums for information systems scholars: III”, Information and Management, Vol. 39, No. 2, pp. 117-124.
Wing, C. K. (1997), “The ranking of construction management journals”, Construction Management and Economics, Vol. 15, No. 4, pp. 387-398.
Xie, X. L., and Beni, G. (1991), “A Validity Measure for Fuzzy Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 8, pp. 841-847.
Xing, W., and Ghorbani, A. (2004), “Weighted pagerank algorithm”, In Proceeding of second annual conference on communication networks and services research, Fredericton, N.B., Canada, pp. 305-314.
Xu, Y., Wang, B., Li, J., and Jing, H. (2008), “An extended document frequency metric for feature selection in text categorization”, In Proceedings of the 4th Asia information retrieval conference on Information retrieval technology, Harbin, China, pp. 71-82.
Yu, L., and Liu, H. (2003), “Efficiently handling feature redundancy in high-dimensional data”, In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C., United States, pp. 685-690.
Zhang, Y., Xie, F., Huang, D., and Ji, M. (2010), “Support vector classifier based on fuzzy c-means and Mahalanobis distance”, Journal of Intelligent Information Systems, Vol. 35, No. 2, pp. 333-345.
Zinkhan, G. M. (2004), “Accessing academic research through an e-database: Issues of journal quality and knowledge use”, Journal of the Academy of Marketing Science, Vol. 32, No. 4, pp. 369-370.
Zhong, N., Dong, J., & Ohsuga, S. (2001), “Using Rough Sets with Heuristics for Feature Selection”, Journal of Intelligent Information Systems, Vol. 16, No. 3, pp. 199-214.
 校內:2018-05-06公開
                                        校內:2018-05-06公開