| 研究生: |
陳建安 CHEN, CHIEN-AN |
|---|---|
| 論文名稱: |
應用人工智慧多模態環境促進建築設計中的互動工作流程 Utilizing AI multimodal environment to facilitate an interactive workflow in Architectural design. |
| 指導教授: |
鄭泰昇
Jeng, Tay-Sheng 黃聖鈞 Hwang, Cheng-Chun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
規劃與設計學院 - 建築學系 Department of Architecture |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 188 |
| 中文關鍵詞: | 生成式人工智慧 、電腦輔助建築設計 、多模態 、設計流程 |
| 外文關鍵詞: | Generative AI, CAD, Multimodal, Design process |
| 相關次數: | 點閱:171 下載:65 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,設計實務單位為了應用新科技,時常需要重建工作流程而產生「數位轉型陣痛期」。生成式人工智慧的出現,數位化的設計作業流程演進再度成為建築產業的關注點。
本研究提出可應用於建築設計討論並快速迭代產生圖像之生成式AI多模態輔助設計平台-ChatCanvas,強調基於直覺的設計操作運用人工智慧科技輔助設計思考脈絡,透過多模態設計模式對實務需求的回應,以高雄在地建設公司設計管理流程的訪談為基礎,最終研發客製化的設計工具與互動工作流程。
為實現多模態設計模式,本研究透過開源資料調查近期生成式人工智慧發展特性、分析現階段應用數位工具的瓶頸,輔以訪談分析建築產業數位轉型的挑戰,作為建立生成式人工智慧輔助建築設計工作流程平台基礎。並探索:1.分析各式生成式人工智慧工具,2.多模態設計模式的特性,3.落實建築產業的務實應用,4.發展客製化人工智慧輔助之數位工具潛力。
本研究基於訪談想像未來人與機器之溝通模式架設之ChatCanvas多模態輔助設計平台,整合生成式人工智慧工具於建築設計工作流程,並列出研究過程困難瓶頸,提供未來研究參考。研究結論顯示,生成式人工智慧多模態環境確實具備促進建築設計互動的能力,且可基於建築設計的原始操作形成工作流程,可提供未來生成式人工智慧於建築設計於建築設計相關研究與應用更多面向參考。
Over the past few years, companies often need to rebuild their workflow to apply new technologies, resulting in a "digital transformation." As the Generative AI appeared. The evolution of the design process has become essential again in the AEC industry.
This study proposes a multi-modal-aided design platform for architectural design called ChatCanvas, which emphasizes utilizing generative AI technology to assist in design thinking processes through intuitive design operations. By conducting interviews with local construction companies in Kaohsiung to understand the needs of their design management processes, the study ultimately develops customized design tools and interactive workflows.
To realize a multi-modal-aided design process, this study reviews recent developments in generative AI applications, explores the limitations of current digital tools, and analyzes the challenges of digitally transforming the building industry to establish a basis for building a generative AI multi-modal-aided design platform. The exploration will cover:
1. Analyzing various generative AI tools
2. Investigating the characteristics of multimodal design modes
3. Implementing practical applications in the architecture industry
4. Developing the potential of customizable AI-assisted digital tools
In order to achieve a multi-modal design approach, this study will explore recent developments in open-source generative AI features, analyze the limitations of current digital tools, and conduct interviews to diagnose the challenges of digital transformation in the architecture industry. This will serve as the foundation for building a generative AI-assisted architectural design workflow platform.
The study is based on interviews imagining scenarios of future human-machine communication in the ChatCanvas multimodal-aided design platform is architected. The study integrates generative AI tools into the architecture design workflow and lists the difficulties and barriers encountered during the research, providing a reference for future studies. The research results showed that AI multimodal environment can promote architecture design interactions, offering a more comprehensive discussion on the application of generative AI in architecture design research and applications.
吳典育(2018)。BIM整合操作運用之設計流程研究。國立成功大學https://hdl.handle.net/11296/3vur7f
龔智群(2023)。人工智慧輔助BIM建築設計流程。國立成功大學https://hdl.handle.net/11296/326d8q
邱浩修(2024)。AI設計風格?從機去生產到機器學習的設計思想演繹。TA台灣建築,343,p.46-57。
侯君昊(2024)。探索AI輔助設計:從工具到夥伴。TA台灣建築,343,p.30-37。
陳君毅(2023)。一次搞懂ChatGPT狂潮。數位時代,346,p.43-49
鄭泰昇(2024)。AI建築師。TA台灣建築,343,p.22-29。
鄭博仁(2023)。AI進軍職場,你應該避免變成「行走的路由器」。數位時代,346,p.20-23
簡聖芬(2024)。大型語言模型與建築計畫。TA台灣建築,343,p.38-45。
可以舉例說明何為多模態 AI?Google cloud.https://cloud.google.com/use-cases/multimodal-ai?hl=zh-TW
Aqasizade, H., Ataie, E., & Bastam, M. (2024). Kubernetes in Action: Exploring the Performance of Kubernetes Distributions in the Cloud.
Aakhus, M. (2007). Communication as Design. Communication Monographs, 74(1), p.112-117.
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., … Liang, P. (2021). On the Opportunities and Risks of Foundation Models.
Dabbish, L., Stuart, C., Tsay, J., Herbsleb, J. (2012) Social coding in GitHub: transparency and collaboration in an open software repository,CSCW '12, Carnegie Mellon University
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, 4171–4186.
Nanyi Fei et al. (2022). Towards artificial general intelligence via a multimodal foundaton model
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and A. Lerer. Automatic Differentiation in PyTorch (2017), NIPS 2017 Workshop on Autodiff
Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A. H., & Riedel, S. (2019). Language Models as Knowledge Bases?
Simon, H. A. (1973). The structure of ill structured problems. Artificial Intelligence, 4(3-4), p.181-201.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need.
Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P.-E., Lomeli, M., Hosseini, L., & Jégou, H. (2024). The Faiss library.
Agrawal, A., Gans, J. S., & Goldfarb, A. (2023). Do we want less automation? Science, 381(6654), p.155–158. https://doi.org/10.1126/science.adh9429
Zylinska, J. (2023). Art in the age of artificial intelligence. Science, 381(6654), p.139–140. https://doi.org/10.1126/science.adh0575
Simon, H. A. (1973). The structure of ill structured problems. Artificial Intelligence, 4(3–4), 181–201. https://doi.org/10.1016/0004-3702(73)90011-8
Chaillou, S. (2022). Artificial Intelligence and Architecture: From research to practice. Birkhauser Verlag GmbH.
Kalay, Y.E. (2004). Architecture’s New Media:Communication is the key to the success of design projects
Mitchell, W. J., & McCullough, M. (1991). Digital Design media. Van Nostrand Reinhold.
Murray, Peter (1986). Burckhardt, Jacob (ed.). The Architecture of the Italian Renaissance. Knopf Doubleday Publishing Group. p. 242. ISBN 0-8052-1082-2.
M Rocker, I. (2006). When code matters. Architectural Design, 76(4), 16–25. https://doi.org/10.1002/ad.289
An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy.(2022). TheNew York Times https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-artists.html
A Coming-Out Party for Generative A.I., Silicon Valley’s New Craze.(2022). TheNew York Times https://www.nytimes.com/2022/10/21/technology/generative-ai.html
Cambridge Dictionary-Multimodal https://dictionary.cambridge.org/dictionary/english-chinese-traditional/multimodal
DAC-1(IBM, 1964) https://www.youtube.com/watch?v=usMGj7K3pvM&ab_channel=VladimirSedach
GitHub-About repositories https://docs.github.com/en/get-started/using-git/about-git#about-repositories
Hugging Face https://huggingface.co/
How MVRDV is using AI to design their buildings https://www.youtube.com/watch?v=dvKAyTRptkw
Harper, Douglas. 「architect」. Online Etymology Dictionary. https://www.etymonline.com/word/architect#etymonline_v_16961. Archived from the original on 5 December 2022. Retrieved 17 October 2024.
Is ChatGPT biased?-Bias in ChatGPT https://help.openai.com/en/articles/8313359-is-chatgpt-biased Retrieved 12 June 2023
LangChain-RAG Architecture https://python.langchain.com/v0.1/docs/use_cases/question_answering/
OpenAI. (2023). Planning for AGI and beyond https://openai.com/index/planning-for-agi-and-beyond/
OpenFaaS Architecture https://ericstoekl.github.io/faas/architecture/
Part time Larry |7. OpenAI Whisper and GPT-3 - Voice Commands and Live Transcription https://www.youtube.com/watch?v=hqJ2K3C8unA&ab_channel=PartTimeLarry
Rick Merritt |What Is a Transformer Model?(March 25, 2022) https://blogs.nvidia.com/blog/what-is-a-transformer-model/
Stanislas Chaillou(2019), The Advent of Architectural AI, Harvard GSD https://issuu.com/stanislaschaillou/docs/stanislas_chaillou_thesis_
Sketchpad(Sutherland, 1963)https://bimaplus.org/news/the-very-beginning-of-the-digital-representation-ivan-sutherland-sketchpad/
Sapunov, G. (2022)OpenAI and the road to text-guided image generation: DALL·E, CLIP, GLIDE, DALL·E 2 (unCLIP) https://moocaholic.medium.com/openai-and-the-road-to-text-guided-image-generation-dall-e-clip-glide-dall-e-2-unclip-c6e28f7194ea
Stable Diffusion Playground(Mar. 2023) https://stable-diffusion-web.com/
nsrinidhibhat/gradio_RAG https://github.com/nsrinidhibhat/gradio_RAG
facebookresearch/faiss https://github.com/facebookresearch/faiss
oobabooga/text-generation-webui https://github.com/oobabooga/text-generation-webui
AUTOMATIC1111/stable-diffusion-webui https://github.com/AUTOMATIC1111/stable-diffusion-webui
flowtyone/floaty-real time-lmc-canvas https://github.com/flowtyone/flowty-realtime-lcm-canvas
hackingthemarkets/openai-whisper-voice-commands https://github.com/hackingthemarkets/openai-whisper-voice-commands
JohnZolton/scribe https://github.com/JohnZolton/scribe
Wei, Y., Wang, Z., Liu, J., Ding, Y., & Zhang, L. (2023). Magicoder: Empowering Code Generation with OSS-Instruct.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021). High-Resolution Image Synthesis with Latent Diffusion Models.
Hsu, C.-J., Liu, C.-L., Liao, F.-T., Hsu, P.-C., Chen, Y.-C., & Shiu, D.-S. (2024). Breeze-7B Technical Report.
taide/TAIDE-LX-7B-Chat https://huggingface.co/taide/TAIDE-LX-7B-Chat
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust Speech Recognition via Large-Scale Weak Supervision.
Bhat, S. F., Birkl, R., Wofk, D., Wonka, P., & Müller, M. (2023). ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth.