簡易檢索 / 詳目顯示

研究生: 劉家豪
Liu, Chia-Hao
論文名稱: 基於長文本嵌入模型之心理健康資源檢索助理
A Mental Health Resource Retrieval Agent Based on Long Text Embedding Model
指導教授: 侯廷偉
Hou, Ting-Wei
鄧維光
Teng, Wei-Guang
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2026
畢業學年度: 114
語文別: 中文
論文頁數: 58
中文關鍵詞: 混合式搜尋異構資料整合語意搜尋全文向量化地理資訊系統自動語音辨識
外文關鍵詞: hybrid search, heterogeneous data integration, semantic search, whole document embedding, geospatial information system, automatic speech recognition
相關次數: 點閱:10下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在設計並實作一套名為「心快活小幫手」的混合式資訊檢索系統,以解決當前資訊檢索面臨的兩大核心挑戰:多源異構資料的整合難題,以及使用者口語查詢與專業醫學術語之間的詞彙鴻溝 (Vocabulary Gap)。

    本系統整合自動語音辨識(ASR)與自然語言處理(NLP)技術,從 1,964 筆 非結構化衛教影音與文章中提取知識。在架構上,採用 OpenAI Whisper 將專家講座影音轉為結構化文本,並利用 Jina Embeddings 模型建構支援 8,192-token 長上下文視窗的高維向量空間。本研究採用全文向量化 (Whole Document Embedding)策略,有效保留了心理衛教資源中的情緒脈絡與因果邏輯,克服了傳統文本分塊 (Chunking) 導致的語意破碎問題。

    檢索核心採用雙路機制,並透過加權線性融合演算法 (Weighted Linear Fusion) 整合規則式關鍵字匹配與向量語意搜尋。此外,系統結合地理資訊系統與 Haversine 公式,提供針對全台 1,718 個 實體諮詢據點的適地性服務,實現線上資源與線下服務的無縫銜接。

    實驗結果顯示,Jina v4 模型 (2048維) 在處理隱喻性心理健康查詢時表現最佳。值得注意的是,約 35% 的高度相關搜尋結果來自於經 Whisper 轉錄的影音內容,驗證了多模態資料管線的有效性。系統在非同步架構下維持了 1.6 至 3.4 秒 的平均回應時間,證實了即時推薦的可行性。

    This study designs and implements a hybrid information retrieval system named Happiness Assistant to address two core challenges in current information retrieval: the difficulty of integrating multi-source heterogeneous data and the vocabulary gap existing between user queries and professional medical terminology.

    By integrating computer vision, automatic speech recognition, and natural language processing techniques, the system extracts knowledge from 1,964 structured units, including unstructured health articles and expert lecture videos. In the proposed framework, OpenAI Whisper (medium-level weights) is employed to transcribe audio into structured text. Specifically, a whole document embedding strategy is adopted utilizing Jina Embeddings models with an 8,192-token long-context window. This approach effectively preserves the emotional continuity and causal logic of mental health resources, overcoming semantic fragmentation caused by traditional chunking.

    The system employs a dual-path retrieval mechanism that combines rule-based keyword matching and vector-based semantic search. The final ranking is determined by a Weighted Linear Fusion algorithm, ensuring a balance between precise keyword matching and semantic recall. Furthermore, a geospatial information system module utilizes the Haversine formula to provide location-based services for 1,718 physical counseling centers, seamlessly connecting online resources with offline assistance.

    Experimental results demonstrate that the Jina v4 model (2048-dim) achieves the highest precision in matching metaphorical mental health queries compared to previous versions. Notably, approximately 35% of the highly relevant search results were derived from video transcripts processed by Whisper, validating the effectiveness of the multi-modal pipeline. The system maintains an average response time of 1.6 to 3.4 seconds via an asynchronous architecture, confirming its feasibility for real-time recommendation.

    摘要 I EXTENDED ABSTRACT II 誌謝 V 目錄 6 表目錄 9 圖目錄 10 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 1 1.3 預期貢獻 2 1.4 論文架構 3 第二章 文獻探討 4 1.1 消費者健康資訊檢索與詞彙鴻溝 4 1.2 自然語言處理與向量嵌入技術之演進 4 1.2.1 從靜態詞嵌入到動態上下文理解 4 1.2.2 長文本模型與 ALiBi 技術 7 1.3 文本表示策略:分塊與全文向量化之比較 8 1.3.1 分塊策略 (Chunking Strategy) 的侷限 8 1.3.2 全文向量化 (Whole Document Embedding) 的優勢 8 1.4 混合檢索機制 (HYBRID SEARCH) 9 1.4.1 線性加權組合理論 (Linear Combination, CombSUM) 9 1.4.2 加權係數與語意偏置理論 10 1.5 輔助技術:自動語音辨識與地理資訊 10 1.5.1 自動語音辨識 - OpenAI Whisper 10 1.5.2 適地性服務與實體媒合 11 1.6 系統架構技術:FASTAPI 12 1.6.1 非同步 I/O 處理理論 12 1.6.2 實作優勢 12 1.7 本章小結 13 第三章 系統設計與研究方法 14 3.1 系統架構概述 14 3.2 資料集建置與前處理 15 3.2.1 資料來源分類與屬性 15 3.2.2 多模態自動化處理管線 (Data Pipeline) 16 3.2.3 資料增刪與同步邏輯 (CRUD & Sync) 17 3.2.4 地理資訊預處理 (Geospatial Data Normalization) 18 3.2.5 最終資料結構定義 18 3.3 語意向量化空間模型建構 18 3.3.1 嵌入模型選用與編碼架構 19 3.3.2 全文向量化策略與長文本處理 (Long-Context Strategy) 19 3.3.3 高維空間向量表示 19 3.3.4 模型註冊表與版本演進實驗 20 3.3.5 運算效能優化實作 20 3.4 混合檢索與服務實作 21 3.4.1 查詢意圖分析與多語言處理 21 3.4.2 雙路檢索機制 (Dual-Path Retrieval) 21 3.4.3 加權評分融合演算法 (Weighted Fusion) 22 3.4.4 後端服務架構與性能優化 23 3.5 地理資訊檢索模組 23 3.5.1 地理資訊資料庫建置 23 3.5.2 地址正規化與地理編碼 (Geocoding) 23 3.5.3 距離運算演算法:Haversine 公式 23 3.5.4 空間檢索邏輯與過濾機制 24 3.5.5 適地性服務與語意檢索結果之整合 24 3.6 前端互動介面與展示層設計 24 3.6.1 設計原則:以使用者為核心 (User-Centric Design) 25 3.6.2 功能模組與介面配置 25 3.6.3 前後端通訊機制 (Client-Server Communication) 25 3.6.4 地理資訊服務整合與導航橋接 25 3.7 本章小結 26 第四章 實驗結果與分析 27 4.1 開發環境與工具 27 4.1.1 實驗資料集與評估指標 27 4.1.2 硬體與軟體環境 28 4.2 系統功能實作展示 28 4.2.1 前端互動介面與 RWD 設計 29 4.2.2 後端核心邏輯與混合檢索 31 4.2.3 地理資訊服務整合與導航橋接 33 4.3 向量模型版本對比分析 35 4.3.1 實驗結果討論 36 4.4 檢索案例分析:語意理解與跨語言功能 36 4.4.1 隱喻性表達與全文語境 36 4.4.2 跨語言檢索與泛化測試 37 4.5 多模態資料整合與地理媒合 38 4.5.1 Whisper 轉錄效果評估 38 4.5.2 實體據點導航與適地性服務 服務 39 第五章 結論與未來展望 40 5.1 結論 40 5.2 未來展望 41 參考文獻 42 附錄 45

    [1] Koopman, B., & Zuccon, G. (2016). A test collection for matching patient health queries to medical content. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 853-856.
    [2] Zuccon, G., Koopman, B., Palotti, J., & Hanbury, A. (2015). The impact of health literacy on medical information search. Proceedings of the 37th European Conference on IR Research (ECIR 2015), 91-102.
    [3] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (NIPS 2017), 5998-6008.
    [4] Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3982-3992.
    [5] Günther, M., Wang, J., Mohr, P., Han, Z., & Xiao, H. (2023). Jina Embeddings 2: 8192-token general-purpose text embeddings for long documents. arXiv preprint arXiv:2310.19923.
    [6] Dai, Z., & Callan, J. (2019). Deeper text understanding for IR with contextual neural language modeling. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19), 985-988.
    [7] Press, O., Smith, N. A., & Lewis, M. (2021). Train short, test long: Attention with linear biases enables input length extrapolation. arXiv preprint arXiv:2108.12409.
    [8] Fox, E. A., & Shaw, J. P. (1994). Combination of multiple searches. National Institute of Standards and Technology (NIST) Special Publication 500-225, 105-108.
    [9] Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '09), 758-759.
    [10] Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision (Whisper). arXiv preprint arXiv:2212.04356.
    [11] Sinnott, R. W. (1984). Virtues of the haversine. Sky and Telescope, 68(2), 158-159.
    [12] Siahaan, A. P. U. (2017). Haversine method in looking for the nearest Masjid. International Journal of Recent Trends in Engineering & Research, 3(8), 187-195.
    [13] FastAPI Framework. (2024). Concurrency and async / await. FastAPI Documentation. https://fastapi.tiangolo.com/async/
    [14] Microsoft. (2024). Synchronous and Asynchronous I/O. Microsoft Learn. https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o
    [15] Günther, M., Wang, J., Mohr, P., Han, Z., & Xiao, H. (2024). Jina Embeddings v3: Multilingual Embeddings with Task LoRA. arXiv preprint arXiv:2409.01753.
    [16] Günther, M., Wang, J., Han, Z., & Xiao, H. (2025). Jina Embeddings v4: Universal Embeddings for Multimodal Multilingual Retrieval. Jina AI Technical Report.
    [17] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS), 9459-9474.
    [18] Nogueira, R., & Cho, K. (2019). Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085.
    [19] Malkov, Y. A., & Yashunin, D. A. (2020). Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 824-836.
    [20] Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240.
    [21] Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., & Gurevych, I. (2021). BEIR: A heterogenous benchmark for zero-shot retrieval from text-to-image models. arXiv preprint arXiv:2104.08663.
    [22] Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333-389.
    [23] Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 39-48.
    [24] Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157-173.
    [25] Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., ... & Yih, W. T. (2020). Dense passage retrieval for open-domain question answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 6769-6781.
    [26] Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.
    [27] Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
    [28] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., ... & Lample, G. (2023). LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
    [29] Johnson, J., Douze, M., & Jégou, H. (2021). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535-547.
    [30] Huang, Z., Liang, X., Tang, X., Xie, C., Liu, X., & Chu, X. (2023). A comparative study on vector indexing methods for high-dimensional data. arXiv preprint arXiv:2306.01428.
    [31] Zhao, W. X., Liu, J., Ren, R., & Wen, J. R. (2024). Dense text retrieval based on pretrained language models: A survey. ACM Transactions on Information Systems, 42(4), Article 96.

    QR CODE