| 研究生: |
曾昱崴 Zeng, Yu-Wei |
|---|---|
| 論文名稱: |
室內設計風格圖片自動生成之深度學習網路模型 A Deep Learning Model for Automatic Generate of Interior Design Style Pictures |
| 指導教授: |
王宗一
Wang, Tzone-I |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 室內設計 、提示詞 、文生圖 、圖生圖 、風格轉換 、擴散模型 |
| 外文關鍵詞: | Interior Design, Prompt, Text-to-Image, Image-to-Image, Style Transfer, Diffusion Model |
| 相關次數: | 點閱:77 下載:23 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著房屋的逐漸高齡化,室內外結構與空間設施也同樣陷入老化,這不僅引發了安全疑慮,也對日常生活帶來諸多不便,因此,房屋的重建及翻新逐漸成為熱門且必要的需求,雖然網路上有許多不同設計跟需求的案例圖片及影片可以參考,但這些案例畢竟不是建構在自己的住家空間上,所以房屋的重建及翻新往往需要與承包商或設計師進行大量的溝通,以確保新的房屋結構和室內空間能夠符合自己的要求和期望,本研究將針對室內空間,目標是透過訓練深度學習模型使模型能夠藉由文生圖及圖生圖的方式,讓有房屋翻新需求的人對於室內空間設計的不同風格能加速了解並找出自己期望的雛型及需求,同時也能夠加速與室內設計師的溝通。
本研究提出一套方法以針對特定室內空間來自動生成不同室內設計風格的圖片,研究設定了六種主流的室內設計風格,依照此六種風格從三個室內設計網站挑選對應的風格分類圖片做為資料集,資料選擇和標註的方式會因對應的模型而有所不同,訓練模型則以Stable Diffusion Model的架構做為文生圖模型,搭配ControlNet控制室內空間輪廓及物品框架,最後配合LoRA加強控制風格,利用輸入的提示詞及圖片去生成所要求之室內設計風格圖片,本研究在評估部分找了三位室內設計師對生成的室內設計風格圖片進行評價,根據生成的六種類別風格圖片的空間輪廓合理性、風格類別符合性、提示詞符合性,給出1至5分等五種評價,其中,獲得4及5分評價的統計結果約占整體評論的68%,由此證明此模型在生成室內風格設計圖片上有一定的效果跟符合性。
With the gradual aging of houses, both the internal and external structures and spatial facilities are falling into obsolescence. This not only raises safety concerns but also brings many inconveniences to daily life. As a result, the reconstruction and renovation of houses have gradually become popular and necessary demands. Although there are many different design and requirement case pictures and videos on the internet for reference, these examples are not constructed on one's own living space. Therefore, rebuilding and renovating houses often require extensive communication with contractors or designers to ensure that the new house structure and interior space meet one's requirements and expectations. This study aims to address interior spaces, with the goal of training a deep learning model that can facilitate the understanding of different interior design styles and help individuals quickly identify their preferred prototypes and requirements through text-to-image and image-to-image methods. This could also speed up communication with interior designers.
This study proposes a method for automatically generating images of different interior design styles for specific indoor spaces. The study identified six mainstream interior design styles, and according to these six styles, corresponding style category images were selected from three interior design websites to form the dataset. The data selection and annotation method vary according to the corresponding model. The training model employs the architecture of the Stable Diffusion Model as the text-to-image model, in conjunction with ControlNet to control the interior space contours and object frames. Finally, LoRA is used to enhance control of the style for generating the required interior design style images from prompt words and input images.
In the evaluation part of this study, three interior designers were enlisted to evaluate the generated interior design style images. Based on the generated images of six style categories, evaluations were made on spatial contour rationality, style category conformity, and prompt words conformity, with a rating scale from 1 to 5. Among them, the statistical results of evaluations rated 4 and 5 accounted for approximately 68% of the overall comments. This demonstrates that the model has a certain effectiveness and conformity in generating interior design style images.
[1] Robin Rombach1, Andreas Blattmann1, Dominik Lorenz1, Patrick Esser, Bj¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. Computer Vision and Pattern Recognition (CVPR), 2022.
[2] Lvmin Zhang, Maneesh Agrawala. Adding Conditional Control to Text-to-Image Diffusion Models. Computer Vision and Pattern Recognition (CVPR), 2023.
[3] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models. Computation and Language (cs.CL), 2021.
[4] Prafulla Dhariwal, Alex Nichol. Diffusion Models Beat GANs on Image Synthesis (cs.LG), 2021.
[5] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative Adversarial Networks (stat.ML), 2014.
[6] Choi, Jun-Hyeck ; Lee, Jae-Seung. A study of interior style transformation with GAN model, 2022.
[7] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. Learning Transferable Visual Models From Natural Language Supervision. Computer Vision and Pattern Recognition (cs.CV), 2021.
[8] Jonathan Ho, Ajay Jain, Pieter Abbeel. Denoising Diffusion Probabilistic Models. Machine Learning (cs.LG), 2020
[9] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Computer Vision and Pattern Recognition (cs.CV), 2022.
[10] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. Hierarchical Text-Conditional Image Generation with CLIP Latents. Computer Vision and Pattern Recognition (cs.CV), 2022.
[11] Olaf Ronneberger, Philipp Fischer, Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Computer Vision and Pattern Recognition (cs.CV), 2015.
[12] Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. Computation and Language (cs.CL), 2014.
[13] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention Is All You Need. Computation and Language (cs.CL), 2017.
[14] Jinsung Kim, Jin-Kook Lee. Stochastic Detection of Interior Design Styles Using a Deep-Learning Model for Reference Images. ,2020.
[15] Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, Jianxiong Xiao. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. Computer Vision and Pattern Recognition (cs.CV), 2015.
[16] Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, Jenia Jitsev. LAION-5B: An open large-scale dataset for training next generation image-text models (cs.CV), 2022.
[17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. Computer Vision and Pattern Recognition (cs.CV), 2015.
[18] http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017_2/Lecture/seq1.pdf
[19] https://en.wikipedia.org/wiki/Interior_design
[20] https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2
[21] https://huggingface.co/lllyasviel/sd-controlnet-mlsd
[22] https://zh.wikipedia.org/zh-tw/%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86
[23] https://zh.wikipedia.org/zh-tw/Transformer%E6%A8%A1%E5%9E%8B
[24] https://blog.infuseai.io/openai-%E7%9A%84-multimodal-%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-%E4%B8%8B-clip-connecting-text-and-images-2e9962905504
[25] https://blog.csdn.net/zcyzcyjava/article/details/126558368
[26] https://zh.wikipedia.org/zh-tw/%E6%AD%A3%E6%80%81%E5%88%86%E5%B8%83
[27] https://hackmd.io/@GanCheeKim99/tutorial-DM-2023
[28] https://zh.wikipedia.org/zh-tw/%E9%A9%AC%E5%B0%94%E5%8F%AF%E5%A4%AB%E9%93%BE
[29] https://ithelp.ithome.com.tw/articles/10294763
[30] https://zh.wikipedia.org/zh-tw/%E6%BD%9C%E7%A9%BA%E9%97%B4_(%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0)
[31] https://www.youtube.com/watch?v=azBugJzmz-o
[32] https://medium.com/ai-blog-tw/%E9%82%8A%E5%AF%A6%E4%BD%9C%E9%82%8A%E5%AD%B8%E7%BF%92diffusion-model-%E5%BE%9Eddpm%E7%9A%84%E7%B0%A1%E5%8C%96%E6%A6%82%E5%BF%B5%E7%90%86%E8%A7%A3-4c565a1c09c
[33] https://towardsdatascience.com/difference-between-autoencoder-ae-and-variational-autoencoder-vae-ed7be1c038f2
[34] https://medium.com/ml-note/autoencoder-%E4%B8%80-%E8%AA%8D%E8%AD%98%E8%88%87%E7%90%86%E8%A7%A3-725854ab25e8
[35] https://blog.csdn.net/MengYa_Dream/article/details/126688503
[36] https://zhuanlan.zhihu.com/p/104393915
[37] https://haren.medium.com/paper-notes-attention-is-all-you-need-transformer-f33c828239b9
[38] https://medium.com/@x02018991/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92%E7%AD%86%E8%A8%98-transformer-628cfcd17ba6
[39] https://blog.51cto.com/u_15127527/2688221
[40] https://jalammar.github.io/illustrated-stable-diffusion/
[41] https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2
[42] https://www.yungching.com.tw/
[43] https://towardsdatascience.com/difference-between-autoencoder-ae-and-variational-autoencoder-vae-ed7be1c038f2
[44] https://www.youtube.com/watch?v=aw3H-wPuRcw
[45] https://stable-diffusion-art.com/how-stable-diffusion-work/#Text_conditioning_text-to-image