簡易檢索 / 詳目顯示

研究生: 許子庭
Hsu, Tzu-Ting
論文名稱: 運用研究趨勢與階層式主題架構於研究題目之產生與推薦
Research ideas recommendation using research trend and hierarchical topic model
指導教授: 王惠嘉
Wang, Hei-Chia
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 70
中文關鍵詞: 研究方向發想階層式主題架構個人化推薦系統標題自動生成
外文關鍵詞: Research idea development, hierarchical topic model, personalized recommendation system, automatic title generation
相關次數: 點閱:96下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著全球各國競爭趨於激烈,為了增加自身競爭力,各國更加重視學術研究所帶來的科技發展,期望能利用創新的研究及科技增強國力,而在現今科技及知識進展更快速的時代,發想新研究方向成為一件重要且需跟上時代趨勢的任務。目前文件刊物大多以電子化的方式儲存於網路上,方便大眾存取。尤其在學術界,研究論文數量更是快速成長,如果利用快速的運算能力並設計一適合的分析方法,來運用目前可用的大量電子資料協助使用者發想新研究方向是一個值得探討的議題。
    為了解決這樣的問題,過往許多研究以主題偵測與追蹤的方法幫助研究人員快速分析現今研究的趨勢,而以往的方法多數並未考慮使用者的背景知識與喜好,並以關鍵字的方式表達所推薦的主題,如此一來不僅所推薦的主題研究人員不一定熟悉或是有興趣,只呈現關鍵字更無法清楚得讓研究人員了解能進行的研究方向。
    之前的研究提到可表達研究方法的最好方式是研究標題,因此本研究提出一種新的標題自動生成方法,融合並改良舊有的個人化推薦方法與主題趨勢分析方法來達成此任務,運用階層式隱性樹分析(Hierarchical Latent Tree Analysis)先找出潛藏在過往研究中的主題架構及其代表關鍵字,接著以混合式推薦方法(Hybrid Recommendation)綜合考慮主題趨勢、熱門程度、使用者的背景知識與喜好,推薦出適合的主題,再設計一適合學術論文標題的自然語言生成演算法,將原本的關鍵字,轉變組合為通順的標題句呈現給使用者,期望能讓研究人員快速的獲得符合自身興趣且清晰的研究方向。
    實驗發現加入的Google Trend指標及個人化因素,皆能提升主題推薦的表現。利用模板基準與統計資訊設計的標題自動生成法,在文法正確性及語意表達上,皆有優良的表現,且對使用者來說,標題確實比單純的關鍵字更能啟發對於研究方向的想法。

    In the era of rapid advancement of technology, it is an important task for all researchers to keep up with trends. How to efficiently conduct new researchs in the massive number of papers is worth exploring. Some researches tried to analyze with topic detection and tracking methods. However, these method do not consider the user's background knowledge and preferences, and express a topic with general keywords, which do not effectively help researcher to develop new research ideas.
    In the past, some studies mentioned that the title was the best way to express a research. Therefore, this study proposes a new automatic title generation method and combines with personalized recommendation and topic trend analysis to achieve this task. First, use Hierarchical Latent Tree Analysis to finds the topic structure and its representative keywords hidden in existing researches. Second, considers the topic trend, popularity and user preferences in a hybrid recommendation method. Finally, we design a natural language generation algorithm that is suitable for the titles of academic papers, and convert the recommended-keywords into fluent title sentence to the user.
    Experiments have found that adding Google Trend indicators and personal factors can improve the performance of topic recommendations. The title generation method use template-based and statistical information methods, and lead to excellent performances in both grammar and semantic expression. For users, the title is indeed more inspired than keywords for user to develop new research idea.

    第1章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 4 1.3 研究範圍與限制 5 1.4 研究流程 5 1.5 論文大綱 6 第2章 文獻探討 8 2.1 主題偵測與追蹤 8 2.1.1 文件分群 8 2.1.2 主題模型 9 2.2 文件特徵處理 11 2.2.1 特徵擷取 11 2.2.2 特徵選取 12 2.2.3 主題模型中的特徵處理 14 2.3 個人化推薦系統 15 2.4 標題自動生成 16 2.5 小結 18 第3章 研究方法 20 3.1 研究架構 20 3.2 資料收集與前處理模組 22 3.2.1 資料收集 22 3.2.2 資料前處理 22 3.3 特徵處理模組 24 3.4 階層式主題偵測模組 25 3.4.1 階層式主題架構建立 25 3.4.2 文件主題偵測 30 3.5 研究機會排名模組 31 3.5.1 新穎性分析 31 3.5.2 協同過濾 34 3.5.3 內容分析 36 3.5.4 混合式推薦 36 3.6 標題自動生成模組 37 3.6.1 名詞片語辨識 38 3.6.2 代表字詞擴充 40 3.6.3 自動生成 41 3.7 小結 45 第4章 系統建置與驗證 47 4.1 系統環境建置 47 4.2 實驗方法 47 4.2.1 資料來源 48 4.2.2 實驗設計 48 4.2.3 評估指標 49 4.3 參數設定 51 4.4 實驗結果與討論 56 4.4.1 實驗一 56 4.4.2 實驗二 57 4.4.3 實驗三 58 4.4.4 實驗四 59 4.4.5 實驗五 60 4.4.6 實驗六 61 第5章 結論與未來方向 62 5.1 研究成果 62 5.2 未來研究方向 65 參考文獻 67

    Allan, J., Papka, R., & Lavrenko, V. (1998). On-line new event detection and tracking. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval.
    Allen, R. (2017, Apr). Search Engine Statistics 2017. Smart Insights. Retrived from https://www.smartinsights.com/search-engine-marketing/search-engine-statistics/
    Beil, F., Ester, M., & Xu, X. (2002). Frequent term-based text clustering. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining.
    Blei, D. M., Griffiths, T. L., & Jordan, M. I. (2010). The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. Journal of the ACM (JACM), 57(2), 7.
    Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd international conference on Machine learning.
    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
    Boon, S. (2016, Jan). 21st Century Science Overload. Canadian Science Publishing. Retrieved from http://www.cdnsciencepub.com/blog/21st-century-science-overload.aspx
    Butler, M., & Keselj, V. (2009). Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports. Canadian Conference on AI.
    Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28.
    Chen, P., Zhang, N. L., Liu, T., Poon, L. K., Chen, Z., & Khawar, F. (2017). Latent tree models for hierarchical topic detection. Artificial Intelligence,250, 105-124.
    Cordón, O., Herrera-Viedma, E., López-Pujalte, C., Luque, M., & Zarco, C. (2003). A review on the application of evolutionary computation to information retrieval. International Journal of Approximate Reasoning, 34(2-3), 241-264.
    Deemter, K. V., Theune, M., & Krahmer, E. (2005). Real versus template-based natural language generation: A false opposition?. Computational Linguistics, 31(1), 15-24.
    Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391
    Fattah, M. A. (2015). New term weighting schemes with combination of multiple classifiers for sentiment analysis. Neurocomputing, 167, 434-442.
    Groth, S. S., & Muntermann, J. (2011). An intraday market risk management approach based on textual analysis. Decision Support Systems, 50(4), 680-691.
    Hagenau, M., Liebmann, M., & Neumann, D. (2013). Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems, 55(3), 685-697.
    Hartley, J. (2005). To attract or to inform: what are titles for? Journal of Technical Writing and Communication, 35(2), 203-213.
    Ho, J. C., Saw, E.-C., Lu, L. Y., & Liu, J. S. (2014). Technological barriers and research trends in fuel cell technologies: A citation network analysis. Technological Forecasting and Social Change, 82, 66-79.
    Hofmann, T. (1999). Probabilistic latent semantic analysis. Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence.
    Howald, B., Kondadadi, R., & Schilder, F. (2013). Domain adaptable semantic clustering in statistical nlg. In Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)–Long Papers (pp. 143-154).
    Jamali, H. R., & Nikzad, M. (2011). Article title type and its relation with the number of downloads and citations. Scientometrics, 88(2), 653–661.
    Jinha, A. E. (2010). Article 50 million: an estimate of the number of scholarly articles in existence. Learned Publishing, 23(3), 258-263.
    Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Machine learning: ECML-98, 137-142.
    Lau, J. H., Baldwin, T., & Newman, D. (2013). On collocations and topic models. ACM Transactions on Speech and Language Processing (TSLP), 10(3), 10.
    Li, S., Xia, R., Zong, C., & Huang, C.-R. (2009). A framework of feature selection methods for text categorization. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2.
    Lopez, C., Prince, V., & Roche, M. (2011). Automatic titling of articles using position and statistical information. RANLP'11: Recent Advances in Natural Language Processing.
    Lopez, C., Prince, V., & Roche, M. (2014). How can catchy titles be generated without loss of informativeness? Expert Systems with Applications, 41(4), 1051-1062.
    Lü, L., Medo, M., Yeung, C. H., Zhang, Y.-C., Zhang, Z.-K., & Zhou, T. (2012). Recommender systems. Physics Reports, 519(1), 1-49.
    Lu, Y., Mei, Q., & Zhai, C. (2011). Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Information Retrieval, 14(2), 178-203.
    Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
    Mahdavi, M., Chehreghani, M. H., Abolhassani, H., & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1), 441-451.
    Mairesse, F., Gašić, M., Jurčíček, F., Keizer, S., Thomson, B., Yu, K., & Young, S. (2010, July). Phrase-based statistical language generation using graphical models and active learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 1552-1561). Association for Computational Linguistics.
    Nokel, M., & Loukachevitch, N. (2016). Accounting ngrams and multi-word terms can improve topic models. ACL 2016, 44.
    Ogawa, T., & Kajikawa, Y. (2017). Generating novel research ideas using computational intelligence: A case study involving fuel cells and ammonia synthesis. Technological Forecasting and Social Change.
    Özgür, L., & Güngör, T. (2010). Text classification with the support of pruned dependency patterns. Pattern Recognition Letters, 31(12), 1598-1607.
    Paisley, J., Wang, C., Blei, D. M., & Jordan, M. I. (2015). Nested hierarchical Dirichlet processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2), 256-270.
    Perera, R., & Nand, P. (2017). Recent Advances in Natural Language Generation: A Survey and Classification of the Empirical Literature. Computing and Informatics, 36(1), 1-32.
    Reiter, E., & Dale, R. (2000). Building natural language generation systems. Cambridge university press.
    Schumaker, R. P., & Chen, H. (2009). Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Transactions on Information Systems (TOIS), 27(2), 12.
    Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 6(2), 461-464.
    Small, H., Boyack, K. W., & Klavans, R. (2014). Identifying emerging topics in science and technology. Research Policy, 43(8), 1450-1467.
    Stent, A., Marge, M., & Singhai, M. (2005). Evaluating evaluation methods for generation in the presence of variation. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 341-351). Springer, Berlin, Heidelberg.
    Sun, L., & Yin, Y. (2017). Discovering themes and trends in transportation research using topic modeling. Transportation Research Part C: Emerging Technologies, 77, 49-66.
    Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
    Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168.
    Tu, Y. N., & Seng, J. L. (2009). Research intelligence involving information retrieval–An example of conferences and journals. Expert systems with applications, 36(10), 12151-12166.
    Turtle, H., & Croft, W. B. (2017, August). Inference networks for document retrieval. In ACM SIGIR Forum (Vol. 51, No. 2, pp. 124-147). ACM.
    Xie, S., Zhang, J., & Ho, Y. S. (2008). Assessment of world aerosol research trends by bibliometric analysis. Scientometrics, 77(1), 113-130.
    Yin, C., Wang, J., & Park, J. H. (2017). An improved recommendation algorithm for big data cloud service based on the trust in sociology. Neurocomputing.
    Zhang, Y., Zhang, G., Chen, H., Porter, A. L., Zhu, D., & Lu, J. (2016). Topic analysis and forecasting for science, technology and innovation: Methodology with a case study focusing on big data research. Technological Forecasting and Social Change, 105, 179-191.
    Zheng, H. T., Kang, B. Y., & Kim, H. G. (2009). Exploiting noun phrases and semantic relationships for text document clustering. Information Sciences, 179(13), 2249-2262.
    Zhu, D., & Porter, A. L. (2002). Automated extraction and visualization of information for technological intelligence and forecasting. Technological Forecasting and Social Change, 69(5), 495-506.
    中文文獻
    林宜螢(民99)。利用時間因子與名詞片語之文獻主題追蹤法。國立成功大學資訊管理研究所碩士論文,為出版,台南市。
    劉鎮寧、張瑞娟(民99)。知識經濟與學校組織變革。2010「教育行政與管理」學術研討會論文集,頁3-9。

    下載圖示 校內:2023-05-25公開
    校外:2023-05-25公開
    QR CODE