簡易檢索 / 詳目顯示

研究生: 林意涵
Lin, Yi-Han
論文名稱: 考量時間權重的期刊推薦方法
An e-Journal Recommendation Method by Considering Time Weighting
指導教授: 王惠嘉
Wang, Hei-Chia
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系碩士在職專班
Department of Industrial and Information Management (on the job class)
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 55
中文關鍵詞: N-gram選詞法字詞頻率維基百科影響指數向量空間
外文關鍵詞: N-gram, Term Frequency, Wikipedia, Impact Factor, Vector Space
相關次數: 點閱:84下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今網際網路發達,大多數的學術期刊以數位資料庫來儲存,因此e化電子期刊是目前學術研究不可或缺的參考資源,其中又以Institute for Scientific Information (ISI)機構所建置JCR(Journal Citation Reports)資料庫中所列約8,000多種期刊,最常被國內學者參考使用,然而龐大的數位期刊卻容易造成使用者更大的負擔與迷失(Disorientation)。目前學界的教師大多以選擇分類及鍵入關鍵字並依年份來搜尋期刊論文,但查詢出的結果仍是資料量冗雜。此外,教師也常以目標發表的期刊來查詢,而沒考慮研究方向會隨時間而改變,導致教師不易找尋適合目前研究的期刊。
    為了解決上述的問題,本研究提出一考量時間權重的期刊推薦方法,降低教師在選擇參考期刊時的茫然和負擔。首先,利用N-gram選詞法和字詞頻率(Term Frequency)的技術進行特徵字詞選取,再利用維基百科(Wikipedia)所開發的Java Wikimedia API來進行專有名詞查詢以選取出研究主題。因為教師和期刊的研究主題會隨時間而改變,一般來說,越近期發表的論文越重要,其研究主題越新穎。因此,本研究設定一時間權重(Time Weighting),越早期發表的權重越低,相反的,越近期發表越高,來擷取更適合目前教師和期刊的研究主題。接著利用教師研究主題資料庫來建立一個參考向量(Reference Vector),並以此參考向量來建構每位教師和各種期刊的研究主題之二元向量(Binary Vector),接著利用二元向量空間(Binary Vector Space)進行教師和期刊的研究主題相似度比對,降低期刊推薦的複雜度。
    本研究所提出的方法能找出更適合教師目前研究的期刊來加以推薦,避免教師按自己習慣性發表的期刊種類,或僅以影響指數(Impact Factor)來判斷,而忽略其它適合參考及發表論文的期刊,以符合教師在找尋期刊時的真正需求。

    With the explosive growth of the emerging popularity of the Internet, it has the trend to publish researching papers on-line. The Institute for Scientific Information (ISI, now Thomas Reuters) has the most widely used online database (Journal Citation Reports, JCR) to provide the most valuable journals. The JCR Science edition contains data over 8,000 journals in science and technology. However, the mass of content available on the Internet arises a problem of information overload and disorientation. Currently, the teachers usually seek the researching papers by keying the keywords from academic search engine. However, the number of selected papers from the searching results is still very large. On the other hand, many teachers get used to using the target-based searching to seek the journal papers on-line. Since they don’t consider the factor of time-varying, it becomes difficult to find the appropriate journal papers to match their current studying.
    To overcome above-mentioned problems, we propose a journal recommendation method based on considering time-weighting parameter. Firstly, we utilize the N-gram and term frequency (TF) to classify and categorize words. Secondly, we identify the keywords according to the Java Wikimedia API. Generally, we judge the newer papers to be more important than older papers. Therefore, in order to extract the suitable for researching topics, a time-weighting is set in our method according to time factor of the journal papers. Finally, we make a reference vector (RV) from the set of studying topics of teachers, and utilize the RV to set up the binary vectors of researching topics of teachers and journals. To reduce the complexity of the proposed method, we perform the similarity matching module in binary vector space.
    In this thesis, we propose a time-aware journal recommendation method to seek appropriate journals to teachers. Experimental results show that the proposed approach can efficiently improve the accuracy of the recommendation.

    第1章 緒論 1 1.1研究背景 2 1.2研究動機與目的 3 1.3研究流程 3 1.4研究範圍與限制 4 1.5論文架構 5 第2章 文獻探討 6 2.1資訊檢索 6 2.1.2文章與索引詞相關程度 7 2.1.2文章與文章相關程度 8 2.2資料前處理 9 2.2.1刪除停用字 9 2.2.2還原字根 10 2.2.3 選詞處理 10 2.3學術論文 11 2.3.1 碩士論文 11 2.3.2 研討會論文 12 2.3.3 期刊論文 12 2.3.4 論文時間相關性分析 12 2.4 維基百科 13 第3章 研究方法 15 3.1研究架構 15 3.2資料收集模組 16 3.2.1資料收集 17 3.2.2停用字 20 3.2.3字根還原 21 3.2.4選詞處理 22 3.2.5維基百科比對專有名詞 23 3.3主題相似度比對模組 24 3.3.1教師研究主題選取 25 3.3.2 期刊主題選取 27 3.3.3 相似度比對 29 3.4期刊推薦模組 33 第4章 實作驗證 35 4.1系統架構與實作配置圖 35 4.2資料收集模組設計與實作 36 4.2.1教師研究資料建置 36 4.2.2期刊論文資料建置 38 4.3主題相似度比對模組設計與實作 41 4.3.1教師研究主題擷取實作 41 4.3.2期刊研究主題擷取實作 43 4.5系統驗證 46 4.5.1評估指標 46 4.5.2實驗設計及實驗流程 47 4.5.3實驗系統流程 47 4.6實驗結果 50 第5章 結論及未來研究方向 51 5.1結論 51 5.2未來研究方向 52 參考文獻 53

    ■ 英文文獻
    Baeza-Yates, R., & Castillo, C. (2001). Relating web characteristics with link based web page ranking. Paper presented at the String Processing and Information Retrieval, 2001. SPIRE 2001. Proceedings.Eighth International Symposium on.
    Banker, Rajiv D., & Kauffman, Robert J. (2004). 50th Anniversary Article: The Evolution of Research on Information Systems: A Fiftieth-Year Survey of the Literature in Management Science. Management Science, 50(3), 281-298.
    Chien Chin, Chen, Yao-Tsung, Chen, & Meng Chang, Chen. (2007). An Aging Theory for Event Life-Cycle Modeling. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 37(2), 237-248.
    Coleman, Raymond. (1999). Impact factors: Use and abuse in biomedical research. The Anatomical Record, 257(2), 54-57.
    Dumais, Susan, & Chen, Hao. (2000). Hierarchical classification of Web content. Paper presented at the Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece.
    Fox, Christopher. (1989). A stop list for general text. SIGIR Forum, 24(1-2), 19-21.
    Ghosh, R., Tsung-Ting, Kuo, Chun-Nan, Hsu, Shou-De, Lin, & Lerman, K. (2011). Time-Aware Ranking in Dynamic Citation Networks. Paper presented at the Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference.
    Gong, Linghui, Zeng, Jianping, & Zhang, Shiyong. (2011). Text stream clustering algorithm based on adaptive feature selection. Expert Systems with Applications, 38(3), 1393-1399.
    Jie, Wei, Bressan, S., & Beng-Chin, Ooi. (2000). Mining term association rules for automatic global query expansion: methodology and preliminary results. Paper presented at the Web Information Systems Engineering, 2000. Proceedings of the First International Conference.
    Kowalski, Gerald, & Maybury, Mark T. (2000). Information Storage and Retrieval Systems: Theory and Implementation: Kluwer Academic Publishers.
    Levent. (2010). Text classification with the support of pruned dependency patterns. Pattern Recogn. Lett., 31(12), 1598-1607.
    Orive. (2003). Reflections on the Impact Factor. Archivos de Bronconeumología (English Version), 39(09), 409-417.
    Oezguer, L., & Geungoer, T. (2010). Text classification with the support of pruned dependency patterns. Pattern Recognition Letters, 31(12), 1598-1607
    Paice, Chris D. (1990). Another stemmer. SIGIR Forum, 24(3), 56-61.
    Porter, M. F. (2006). An algorithm for suffix stripping. Program: electronic library and information systems, 401(3), 211-218.
    Robert, K. (2000).Viewing morphology as an inference process. Artifical Intelligence, 118(1-2), 277-294.
    Salton, Gerard. (1989). Automatic text processing: the transformation, analysis, and retrieval of information by computer: Addison-Wesley Longman Publishing Co., Inc.
    Seglen, Per O. (1997). Why the impact factor of journals should not be used for evaluating research. BMJ, 314(7079), 497.
    Tu, Yi-Ning, & Seng, Jia-Lang. (2009). Research intelligence involving information retrieval - An example of conferences and journals. Expert Syst. Appl., 36(10), 12151-12166.
    Wang, H. C., Huang, T. H., Guo, J. L., & Li, S. C. (2009) Journal Article Topic
    Detection Based on Semantic Features. Lecture Notes in Artificial Intelligence,
    5579, 644-652.
    Xie, Shaodong, Zhang, Jing, & Ho, Yuh-Shan. (2008). Assessment of world aerosol research trends by bibliometric analysis. Scientometrics, 77(1), 113-130. doi: 10.1007/s11192-007-1928-0
    Zhang, Xiaoyan, & Wang, Ting. (2010). Topic Tracking with Dynamic Topic Model and Topic-based Weighting Method. Journal of Software; Vol 5, No 5 (2010): Special Issue: Recent Advances in Information Processing & Intelligent Information Systems and Applications.
    ■ 中文文獻
    林宜瑩(民 99)。利用時間因子與名詞片語之文獻主題追蹤法。國立成功大學資訊管理研究所碩士論文,未出版,台南市。
    林柏安(民 101)。混合式學術會議資訊分類法。國立成功大學資訊管理研究所碩士論文,未出版,台南市。
    王京盛(民,101)。考量語意及引用分析之研究主題趨勢分析方法。國立成功大學資訊管理研究所碩士論文,未出版,台南市。
    葉乃靜。「影響指數:一個有爭議的期刊和研究品質評估指標」。圖書館學與資訊科學 31 卷1 期(民94 年4 月):54-62。
    陳俊彰(民90)。利用網頁資訊建再多階層指導教授與研究生之網絡關係。國立中山大學資訊管理研究所碩士論文,高雄市。
    吳中信(民97)。應用資料探勘於期刊論文與專利檢索建構具時序概念。銘傳大學資訊管理研究所碩士論文,台北市。
    ■ 網站資料
    NIST, http://trec.nist.gov/data/reuters/reuters.html

    下載圖示 校內:2017-08-27公開
    校外:2018-08-27公開
    QR CODE