簡易檢索 / 詳目顯示

研究生: 陳建安
CHEN, CHIEN-AN
論文名稱: 應用人工智慧多模態環境促進建築設計中的互動工作流程
Utilizing AI multimodal environment to facilitate an interactive workflow in Architectural design.
指導教授: 鄭泰昇
Jeng, Tay-Sheng
黃聖鈞
Hwang, Cheng-Chun
學位類別: 碩士
Master
系所名稱: 規劃與設計學院 - 建築學系
Department of Architecture
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 188
中文關鍵詞: 生成式人工智慧電腦輔助建築設計多模態設計流程
外文關鍵詞: Generative AI, CAD, Multimodal, Design process
相關次數: 點閱:171下載:65
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,設計實務單位為了應用新科技,時常需要重建工作流程而產生「數位轉型陣痛期」。生成式人工智慧的出現,數位化的設計作業流程演進再度成為建築產業的關注點。
    本研究提出可應用於建築設計討論並快速迭代產生圖像之生成式AI多模態輔助設計平台-ChatCanvas,強調基於直覺的設計操作運用人工智慧科技輔助設計思考脈絡,透過多模態設計模式對實務需求的回應,以高雄在地建設公司設計管理流程的訪談為基礎,最終研發客製化的設計工具與互動工作流程。
    為實現多模態設計模式,本研究透過開源資料調查近期生成式人工智慧發展特性、分析現階段應用數位工具的瓶頸,輔以訪談分析建築產業數位轉型的挑戰,作為建立生成式人工智慧輔助建築設計工作流程平台基礎。並探索:1.分析各式生成式人工智慧工具,2.多模態設計模式的特性,3.落實建築產業的務實應用,4.發展客製化人工智慧輔助之數位工具潛力。
    本研究基於訪談想像未來人與機器之溝通模式架設之ChatCanvas多模態輔助設計平台,整合生成式人工智慧工具於建築設計工作流程,並列出研究過程困難瓶頸,提供未來研究參考。研究結論顯示,生成式人工智慧多模態環境確實具備促進建築設計互動的能力,且可基於建築設計的原始操作形成工作流程,可提供未來生成式人工智慧於建築設計於建築設計相關研究與應用更多面向參考。

    Over the past few years, companies often need to rebuild their workflow to apply new technologies, resulting in a "digital transformation." As the Generative AI appeared. The evolution of the design process has become essential again in the AEC industry.
    This study proposes a multi-modal-aided design platform for architectural design called ChatCanvas, which emphasizes utilizing generative AI technology to assist in design thinking processes through intuitive design operations. By conducting interviews with local construction companies in Kaohsiung to understand the needs of their design management processes, the study ultimately develops customized design tools and interactive workflows.
    To realize a multi-modal-aided design process, this study reviews recent developments in generative AI applications, explores the limitations of current digital tools, and analyzes the challenges of digitally transforming the building industry to establish a basis for building a generative AI multi-modal-aided design platform. The exploration will cover:
    1. Analyzing various generative AI tools
    2. Investigating the characteristics of multimodal design modes
    3. Implementing practical applications in the architecture industry
    4. Developing the potential of customizable AI-assisted digital tools
    In order to achieve a multi-modal design approach, this study will explore recent developments in open-source generative AI features, analyze the limitations of current digital tools, and conduct interviews to diagnose the challenges of digital transformation in the architecture industry. This will serve as the foundation for building a generative AI-assisted architectural design workflow platform.
    The study is based on interviews imagining scenarios of future human-machine communication in the ChatCanvas multimodal-aided design platform is architected. The study integrates generative AI tools into the architecture design workflow and lists the difficulties and barriers encountered during the research, providing a reference for future studies. The research results showed that AI multimodal environment can promote architecture design interactions, offering a more comprehensive discussion on the application of generative AI in architecture design research and applications.

    摘要 I 謝誌 VIII 目錄 IX 圖目錄 XI 表目錄 XVI 第一章 緒論 1 1.1 研究背景 1 1.2 研究問題與動機 5 1.3 研究目標 7 1.4 研究方法 14 1.5 研究架構與流程 15 第二章 文獻回顧與架構 17 2.1 人工智慧的演進 17 2.2 建築設計媒材演變 22 2.3 GPT(Generative pre-trained transformer)模型 29 2.4 生成式人工智慧可應用開源平台與架構 38 第三章 生成式人工智慧初探 44 3.1 建築設計實務於新科技發展所需的執行核心 44 3.2 基礎使用 46 3.3 可用增強資源 65 3.4 工具初探結論 70 第四章 建築產業實務訪談與工作流程分析 72 4.1 建設公司相關資訊與業務 72 4.2 公司內部會議 73 4.3 訪談小節 77 4.4 使用情境模擬 78 4.5 使用情境探索與使用者介面 84 第五章 AI工具研發與介面設計 107 5.1 系統架構 108 5.2 系統運作流程設計 112 5.3 系統技術說明 119 第六章 研究結論與後續發展 123 6.1 ChatCanvas的研究成果 123 6.2 透過『溝通』提升資料維度 126 6.3 回訪實務單位 126 6.4 與AI協作共創的限制 129 6.5 研究瓶頸 130 6.6 多元科技的接續 135 參考資料 138 中文(依姓氏筆畫順序) 138 英文(依A-Z順序) 139 附錄1-使用情境細節描述 144 設計端 144 公司端 147 附錄2-與GPT-3.5-Turbo對話生成DXF繪製圖形過程 150 附錄3-訪談資訊 156 訪綱內容 157 訪談錄音逐字紀錄 158 訪談重點紀錄 163 附錄4-本研究使用程式碼紀錄 166 以LangChain package建構FAISS向量資源,並開啟可讀取格式(PDF) 166 以UTF-8寫入傳輸資料至主機,觸發後由主機運算並傳回運算結果 167 說話設計模擬 手繪設計模擬 168 互動設計介面資料 169 檢索介面操作 171

    吳典育(2018)。BIM整合操作運用之設計流程研究。國立成功大學https://hdl.handle.net/11296/3vur7f
    龔智群(2023)。人工智慧輔助BIM建築設計流程。國立成功大學https://hdl.handle.net/11296/326d8q
    邱浩修(2024)。AI設計風格?從機去生產到機器學習的設計思想演繹。TA台灣建築,343,p.46-57。
    侯君昊(2024)。探索AI輔助設計:從工具到夥伴。TA台灣建築,343,p.30-37。
    陳君毅(2023)。一次搞懂ChatGPT狂潮。數位時代,346,p.43-49
    鄭泰昇(2024)。AI建築師。TA台灣建築,343,p.22-29。
    鄭博仁(2023)。AI進軍職場,你應該避免變成「行走的路由器」。數位時代,346,p.20-23
    簡聖芬(2024)。大型語言模型與建築計畫。TA台灣建築,343,p.38-45。
    可以舉例說明何為多模態 AI?Google cloud.https://cloud.google.com/use-cases/multimodal-ai?hl=zh-TW
    Aqasizade, H., Ataie, E., & Bastam, M. (2024). Kubernetes in Action: Exploring the Performance of Kubernetes Distributions in the Cloud.
    Aakhus, M. (2007). Communication as Design. Communication Monographs, 74(1), p.112-117.
    Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., … Liang, P. (2021). On the Opportunities and Risks of Foundation Models.
    Dabbish, L., Stuart, C., Tsay, J., Herbsleb, J. (2012) Social coding in GitHub: transparency and collaboration in an open software repository,CSCW '12, Carnegie Mellon University
    Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, 4171–4186.
    Nanyi Fei et al. (2022). Towards artificial general intelligence via a multimodal foundaton model
    Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and A. Lerer. Automatic Differentiation in PyTorch (2017), NIPS 2017 Workshop on Autodiff
    Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A. H., & Riedel, S. (2019). Language Models as Knowledge Bases?
    Simon, H. A. (1973). The structure of ill structured problems. Artificial Intelligence, 4(3-4), p.181-201.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need.
    Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P.-E., Lomeli, M., Hosseini, L., & Jégou, H. (2024). The Faiss library.
    Agrawal, A., Gans, J. S., & Goldfarb, A. (2023). Do we want less automation? Science, 381(6654), p.155–158. https://doi.org/10.1126/science.adh9429
    Zylinska, J. (2023). Art in the age of artificial intelligence. Science, 381(6654), p.139–140. https://doi.org/10.1126/science.adh0575
    Simon, H. A. (1973). The structure of ill structured problems. Artificial Intelligence, 4(3–4), 181–201. https://doi.org/10.1016/0004-3702(73)90011-8
    Chaillou, S. (2022). Artificial Intelligence and Architecture: From research to practice. Birkhauser Verlag GmbH.
    Kalay, Y.E. (2004). Architecture’s New Media:Communication is the key to the success of design projects
    Mitchell, W. J., & McCullough, M. (1991). Digital Design media. Van Nostrand Reinhold.
    Murray, Peter (1986). Burckhardt, Jacob (ed.). The Architecture of the Italian Renaissance. Knopf Doubleday Publishing Group. p. 242. ISBN 0-8052-1082-2.
    M Rocker, I. (2006). When code matters. Architectural Design, 76(4), 16–25. https://doi.org/10.1002/ad.289
    An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy.(2022). TheNew York Times https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-artists.html
    A Coming-Out Party for Generative A.I., Silicon Valley’s New Craze.(2022). TheNew York Times https://www.nytimes.com/2022/10/21/technology/generative-ai.html
    Cambridge Dictionary-Multimodal https://dictionary.cambridge.org/dictionary/english-chinese-traditional/multimodal
    DAC-1(IBM, 1964) https://www.youtube.com/watch?v=usMGj7K3pvM&ab_channel=VladimirSedach
    GitHub-About repositories https://docs.github.com/en/get-started/using-git/about-git#about-repositories
    Hugging Face https://huggingface.co/
    How MVRDV is using AI to design their buildings https://www.youtube.com/watch?v=dvKAyTRptkw
    Harper, Douglas. 「architect」. Online Etymology Dictionary. https://www.etymonline.com/word/architect#etymonline_v_16961. Archived from the original on 5 December 2022. Retrieved 17 October 2024.
    Is ChatGPT biased?-Bias in ChatGPT https://help.openai.com/en/articles/8313359-is-chatgpt-biased Retrieved 12 June 2023
    LangChain-RAG Architecture https://python.langchain.com/v0.1/docs/use_cases/question_answering/
    OpenAI. (2023). Planning for AGI and beyond https://openai.com/index/planning-for-agi-and-beyond/
    OpenFaaS Architecture https://ericstoekl.github.io/faas/architecture/
    Part time Larry |7. OpenAI Whisper and GPT-3 - Voice Commands and Live Transcription https://www.youtube.com/watch?v=hqJ2K3C8unA&ab_channel=PartTimeLarry
    Rick Merritt |What Is a Transformer Model?(March 25, 2022) https://blogs.nvidia.com/blog/what-is-a-transformer-model/
    Stanislas Chaillou(2019), The Advent of Architectural AI, Harvard GSD https://issuu.com/stanislaschaillou/docs/stanislas_chaillou_thesis_
    Sketchpad(Sutherland, 1963)https://bimaplus.org/news/the-very-beginning-of-the-digital-representation-ivan-sutherland-sketchpad/
    Sapunov, G. (2022)OpenAI and the road to text-guided image generation: DALL·E, CLIP, GLIDE, DALL·E 2 (unCLIP) https://moocaholic.medium.com/openai-and-the-road-to-text-guided-image-generation-dall-e-clip-glide-dall-e-2-unclip-c6e28f7194ea
    Stable Diffusion Playground(Mar. 2023) https://stable-diffusion-web.com/
    nsrinidhibhat/gradio_RAG https://github.com/nsrinidhibhat/gradio_RAG
    facebookresearch/faiss https://github.com/facebookresearch/faiss
    oobabooga/text-generation-webui https://github.com/oobabooga/text-generation-webui
    AUTOMATIC1111/stable-diffusion-webui https://github.com/AUTOMATIC1111/stable-diffusion-webui
    flowtyone/floaty-real time-lmc-canvas https://github.com/flowtyone/flowty-realtime-lcm-canvas
    hackingthemarkets/openai-whisper-voice-commands https://github.com/hackingthemarkets/openai-whisper-voice-commands
    JohnZolton/scribe https://github.com/JohnZolton/scribe
    Wei, Y., Wang, Z., Liu, J., Ding, Y., & Zhang, L. (2023). Magicoder: Empowering Code Generation with OSS-Instruct.
    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021). High-Resolution Image Synthesis with Latent Diffusion Models.
    Hsu, C.-J., Liu, C.-L., Liao, F.-T., Hsu, P.-C., Chen, Y.-C., & Shiu, D.-S. (2024). Breeze-7B Technical Report.
    taide/TAIDE-LX-7B-Chat https://huggingface.co/taide/TAIDE-LX-7B-Chat
    Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust Speech Recognition via Large-Scale Weak Supervision.
    Bhat, S. F., Birkl, R., Wofk, D., Wonka, P., & Müller, M. (2023). ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE