| 研究生: |
黃文錡 Huang, Wun-Ci |
|---|---|
| 論文名稱: |
共詞網路之資料管道:搜尋、分析和視覺化 A Data Pipeline for Co-word Networks: Searching, Analyzing, and Visualizing |
| 指導教授: |
鄧維光
Teng, Wei-Guang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 英文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 共詞分析 、資料管道 、資料品質 、混合資料庫 、網路視覺化 |
| 外文關鍵詞: | co-word analysis, data pipeline, data quality, hybrid databases, network visualization |
| 相關次數: | 點閱:43 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在學術研究領域,篩選和分析大量文字資料以發現重要模式的任務至關重要,而共詞分 析是一種複雜而強大的分析技術,可以揭示學術出版物中詞彙之間錯綜複雜的聯繫關係, 以幫助研究人員識別關鍵主題和趨勢。為了充分利用共詞分析的力量,我們開發了一個高效的資料管道,以實現資料搜尋、網路分析和視覺化等目的,此資料管道目標在提升多維網路資料流處理的效率,並結合了關聯式和非關聯式資料庫的優點;明確而言,關聯式資料庫用以儲存結構化數據,其中包括了共詞分析所需的元數據,為維度搜尋提供了快速查詢能力,而非關聯式的圖資料庫則用以處理和分析網路關係。此外,我們利用詞嵌入技術來評估對文字資料處理的結果,從而提高網路分析的準確性和可靠性,我們亦實作了網頁介面來呈現共詞網路分析以及其相關來源資料統計結果,還提供了互動功能,使用戶可以更深入地探索網路,實驗結果顯示我們的資料管道增強了結果的可解釋性和可用性,並有助於更直觀地發現潛在的研究主題和趨勢。
In the realm of academic research, the process of sifting through and analyzing vast amounts of written material to uncover significant patterns is crucial. Co-word analysis stands out as a sophisticated and effective technique that unveils the complex network of connections between words in academic publications, aiding researchers in pinpointing key themes and trends. To maximize the utility of co-word analysis, we propose to develop a comprehensive data pipeline that incorporates capabilities of data search, network analysis, and visualization. This pipeline is designed to enhance the efficiency of processing multi-dimensional network data streams and merges the benefits of both SQL and NoSQL databases. Specifically, the relational database within the pipeline manages structured data, including essential metadata for co-word analysis, offering rapid query capabilities for precise dimensional searches. Concurrently, the graph database handles the textual data of the co-word network, overseeing the processing and analysis of network relationships. To further refine our methodology, word embedding technology is employed to assess text processing outcomes, thereby boosting the accuracy and reliability of analyses. Our pipeline also includes advanced visualization tools that not only display the results of the co-word network analysis on a web interface but also allow users to interactively explore the network. These tools improve the interpretability and usability of the results, facilitating a deeper understanding and discovery of potential research topics and trends.
[1] HE, Qin. Knowledge discovery through co-word analysis. 1999.
[2] CHEN, Xiuwen, et al. Mapping the research trends by co-word analysis based on keywords from funded project. Procedia computer science, 2016, 91: 547-555.
[3] COURTIAL, J. A coword analysis of scientometrics. Scientometrics, 1994, 31.3: 251-260.
[4] ZHOU, Xiaokang, et al. Academic influence aware and multidimensional network analysis for research collaboration navigation based on scholarly big data. IEEE Transactions on Emerging Topics in Computing, 2018, 9.1: 246-257.
[5] MORAL-MUÑOZ, José A., et al. Software tools for conducting bibliometric analysis in science: An up-to-date review. Profesional de la Información, 2020, 29.1.
[6] CHEN, Chaomei. CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for information Science and Technology, 2006, 57.3: 359-377.
[7] VAN ECK, Nees; WALTMAN, Ludo. Software survey: VOSviewer, a computer program for bibliometric mapping. scientometrics, 2010, 84.2: 523-538.
[8] VAN ECK, Nees Jan; WALTMAN, Ludo. VOS: A new method for visualizing similarities between objects. In: Advances in Data Analysis: Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation eV, Freie Universität Berlin, March 8– 10, 2006. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. p. 299-306.
[9] VAN ECK, Nees Jan; WALTMAN, Ludo. Visualizing bibliometric networks. In: Measuring scholarly impact: Methods and practice. Cham: Springer International Publishing, 2014. p. 285-320.
[10] DONTHU, Naveen, et al. How to conduct a bibliometric analysis: An overview and guidelines. Journal of business research, 2021, 133: 285-296.
[11] LANDHUIS, Esther. Scientific literature: Information overload. Nature, 2016, 535.7612: 457-458.
[12] O’DONOVAN, Peter, et al. An industrial big data pipeline for data-driven analytics maintenance applications in large-scale smart manufacturing facilities. Journal of big data, 2015, 2: 1-26.
[13] MESBAH, Sepideh, et al. Semantic annotation of data processing pipelines in scientific publications. In: The Semantic Web: 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28–June 1, 2017, Proceedings, Part I 14. Springer International Publishing, 2017. p. 321-336.
[14] MUNAPPY, Aiswarya Raj; BOSCH, Jan; OLSSON, Helena Homström. Data pipeline management in practice: Challenges and opportunities. In: Product-Focused Software Process Improvement: 21st International Conference, PROFES 2020, Turin, Italy, November 25–27, 2020, Proceedings 21. Springer International Publishing, 2020. p. 168- 184.
[15] RAMAN, Karthik, et al. Beyond myopic inference in big data pipelines. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 2013. p. 86-94.
[16] POKORNÝ, Jaroslav. Integration of relational and NoSQL databases. In: Intelligent Information and Database Systems: 10th Asian Conference, ACIIDS 2018, Dong Hoi City, Vietnam, March 19-21, 2018, Proceedings, Part II 10. Springer International Publishing, 2018. p. 35-45.
[17] FOIDL, Harald, et al. Data pipeline quality: Influencing factors, root causes of data-related issues, and processing problem areas for developers. Journal of Systems and Software, 2024, 207: 111855.
[18] FOIDL, Harald, et al. Data pipeline quality: Influencing factors, root causes of data-related issues, and processing problem areas for developers. Journal of Systems and Software, 2024, 207: 111855.
[19] DAZZEO, Pasquale. Uncertainty propagation in experimental data pipelines. 2022.
[20] HUSEINI, Besar, et al. Transformation From Relational Database To Nosql Database: Migration and Integration. In: Proc. of 3rd International Conference on Business and Economics. 2019. p. 479-489.
[21] DE HARO-OLMO, Francisco José, et al. ELI: an IoT-aware big data pipeline with data curation and data quality. PeerJ Computer Science, 2023, 9: e1605.
[22] NESCA, Marcello, et al. A scoping review of preprocessing methods for unstructured text data to assess data quality. International Journal of Population Data Science, 2022, 7.1.
[23] KIEFER, Cornelia. Assessing the Quality of Unstructured Data: An Initial Overview. In: LWDA. 2016. p. 62-73.
[24] WIEBE, Janyce; BRUCE, Rebecca; O’HARA, Thomas P. Development and use of a gold- standard data set for subjectivity classifications. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics. 1999. p. 246-253.
[25] MIKOLOV, Tomas, et al. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[26] PAGLIARDINI, Matteo; GUPTA, Prakhar; JAGGI, Martin. Unsupervised learning of sentence embeddings using compositional n-gram features. arXiv preprint arXiv:1703.02507, 2017.
[27] FERNANDES, Diogo, et al. Graph Databases Comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J, and OrientDB. Data, 2018, 10: 0006910203730380.
[28] WEISS, Sholom M., et al. Text mining: predictive methods for analyzing unstructured information. Springer Science & Business Media, 2010.
[29] NEWMAN, Mark EJ. Modularity and community structure in networks. Proceedings of the national academy of sciences, 2006, 103.23: 8577-8582.
[30] ARORA, Sanjeev; LIANG, Yingyu; MA, Tengyu. A simple but tough-to-beat baseline for sentence embeddings. In: International conference on learning representations. 2017.
[31] OpenAI API https://platform.openai.com/docs/api-reference
校內:2028-09-02公開