| 研究生: |
劉家豪 Liu, Chia-Hao |
|---|---|
| 論文名稱: |
基於長文本嵌入模型之心理健康資源檢索助理 A Mental Health Resource Retrieval Agent Based on Long Text Embedding Model |
| 指導教授: |
侯廷偉
Hou, Ting-Wei 鄧維光 Teng, Wei-Guang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2026 |
| 畢業學年度: | 114 |
| 語文別: | 中文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 混合式搜尋 、異構資料整合 、語意搜尋 、全文向量化 、地理資訊系統 、自動語音辨識 |
| 外文關鍵詞: | hybrid search, heterogeneous data integration, semantic search, whole document embedding, geospatial information system, automatic speech recognition |
| 相關次數: | 點閱:10 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究旨在設計並實作一套名為「心快活小幫手」的混合式資訊檢索系統,以解決當前資訊檢索面臨的兩大核心挑戰:多源異構資料的整合難題,以及使用者口語查詢與專業醫學術語之間的詞彙鴻溝 (Vocabulary Gap)。
本系統整合自動語音辨識(ASR)與自然語言處理(NLP)技術,從 1,964 筆 非結構化衛教影音與文章中提取知識。在架構上,採用 OpenAI Whisper 將專家講座影音轉為結構化文本,並利用 Jina Embeddings 模型建構支援 8,192-token 長上下文視窗的高維向量空間。本研究採用全文向量化 (Whole Document Embedding)策略,有效保留了心理衛教資源中的情緒脈絡與因果邏輯,克服了傳統文本分塊 (Chunking) 導致的語意破碎問題。
檢索核心採用雙路機制,並透過加權線性融合演算法 (Weighted Linear Fusion) 整合規則式關鍵字匹配與向量語意搜尋。此外,系統結合地理資訊系統與 Haversine 公式,提供針對全台 1,718 個 實體諮詢據點的適地性服務,實現線上資源與線下服務的無縫銜接。
實驗結果顯示,Jina v4 模型 (2048維) 在處理隱喻性心理健康查詢時表現最佳。值得注意的是,約 35% 的高度相關搜尋結果來自於經 Whisper 轉錄的影音內容,驗證了多模態資料管線的有效性。系統在非同步架構下維持了 1.6 至 3.4 秒 的平均回應時間,證實了即時推薦的可行性。
This study designs and implements a hybrid information retrieval system named Happiness Assistant to address two core challenges in current information retrieval: the difficulty of integrating multi-source heterogeneous data and the vocabulary gap existing between user queries and professional medical terminology.
By integrating computer vision, automatic speech recognition, and natural language processing techniques, the system extracts knowledge from 1,964 structured units, including unstructured health articles and expert lecture videos. In the proposed framework, OpenAI Whisper (medium-level weights) is employed to transcribe audio into structured text. Specifically, a whole document embedding strategy is adopted utilizing Jina Embeddings models with an 8,192-token long-context window. This approach effectively preserves the emotional continuity and causal logic of mental health resources, overcoming semantic fragmentation caused by traditional chunking.
The system employs a dual-path retrieval mechanism that combines rule-based keyword matching and vector-based semantic search. The final ranking is determined by a Weighted Linear Fusion algorithm, ensuring a balance between precise keyword matching and semantic recall. Furthermore, a geospatial information system module utilizes the Haversine formula to provide location-based services for 1,718 physical counseling centers, seamlessly connecting online resources with offline assistance.
Experimental results demonstrate that the Jina v4 model (2048-dim) achieves the highest precision in matching metaphorical mental health queries compared to previous versions. Notably, approximately 35% of the highly relevant search results were derived from video transcripts processed by Whisper, validating the effectiveness of the multi-modal pipeline. The system maintains an average response time of 1.6 to 3.4 seconds via an asynchronous architecture, confirming its feasibility for real-time recommendation.
[1] Koopman, B., & Zuccon, G. (2016). A test collection for matching patient health queries to medical content. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 853-856.
[2] Zuccon, G., Koopman, B., Palotti, J., & Hanbury, A. (2015). The impact of health literacy on medical information search. Proceedings of the 37th European Conference on IR Research (ECIR 2015), 91-102.
[3] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (NIPS 2017), 5998-6008.
[4] Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3982-3992.
[5] Günther, M., Wang, J., Mohr, P., Han, Z., & Xiao, H. (2023). Jina Embeddings 2: 8192-token general-purpose text embeddings for long documents. arXiv preprint arXiv:2310.19923.
[6] Dai, Z., & Callan, J. (2019). Deeper text understanding for IR with contextual neural language modeling. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19), 985-988.
[7] Press, O., Smith, N. A., & Lewis, M. (2021). Train short, test long: Attention with linear biases enables input length extrapolation. arXiv preprint arXiv:2108.12409.
[8] Fox, E. A., & Shaw, J. P. (1994). Combination of multiple searches. National Institute of Standards and Technology (NIST) Special Publication 500-225, 105-108.
[9] Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '09), 758-759.
[10] Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision (Whisper). arXiv preprint arXiv:2212.04356.
[11] Sinnott, R. W. (1984). Virtues of the haversine. Sky and Telescope, 68(2), 158-159.
[12] Siahaan, A. P. U. (2017). Haversine method in looking for the nearest Masjid. International Journal of Recent Trends in Engineering & Research, 3(8), 187-195.
[13] FastAPI Framework. (2024). Concurrency and async / await. FastAPI Documentation. https://fastapi.tiangolo.com/async/
[14] Microsoft. (2024). Synchronous and Asynchronous I/O. Microsoft Learn. https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
[15] Günther, M., Wang, J., Mohr, P., Han, Z., & Xiao, H. (2024). Jina Embeddings v3: Multilingual Embeddings with Task LoRA. arXiv preprint arXiv:2409.01753.
[16] Günther, M., Wang, J., Han, Z., & Xiao, H. (2025). Jina Embeddings v4: Universal Embeddings for Multimodal Multilingual Retrieval. Jina AI Technical Report.
[17] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS), 9459-9474.
[18] Nogueira, R., & Cho, K. (2019). Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085.
[19] Malkov, Y. A., & Yashunin, D. A. (2020). Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 824-836.
[20] Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240.
[21] Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., & Gurevych, I. (2021). BEIR: A heterogenous benchmark for zero-shot retrieval from text-to-image models. arXiv preprint arXiv:2104.08663.
[22] Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333-389.
[23] Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 39-48.
[24] Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157-173.
[25] Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., ... & Yih, W. T. (2020). Dense passage retrieval for open-domain question answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 6769-6781.
[26] Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.
[27] Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
[28] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., ... & Lample, G. (2023). LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
[29] Johnson, J., Douze, M., & Jégou, H. (2021). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535-547.
[30] Huang, Z., Liang, X., Tang, X., Xie, C., Liu, X., & Chu, X. (2023). A comparative study on vector indexing methods for high-dimensional data. arXiv preprint arXiv:2306.01428.
[31] Zhao, W. X., Liu, J., Ren, R., & Wen, J. R. (2024). Dense text retrieval based on pretrained language models: A survey. ACM Transactions on Information Systems, 42(4), Article 96.