簡易檢索 / 詳目顯示

研究生: 邱建智
Qiu, Jian-Zhi
論文名稱: 應用模擬退火於點雲語義分割之自訓練域適應演算法研究
Self-Training Based Domain Adaptation Algorithm Using Simulated Annealing for Point Cloud Semantic Segmentation
指導教授: 謝明得
Shieh, Ming-Der
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 52
中文關鍵詞: 點雲語義分割域偏移域適應自訓練模擬退火
外文關鍵詞: point cloud semantic segmentation, domain shift, domain adaptation, self-training, simulated annealing
相關次數: 點閱:88下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 點雲語義分割是現今仍被熱烈關注的議題,它是電腦視覺中重要的任務之一,被廣泛應用於自動駕駛、無人機、機器導航等與視覺相關的任務。然而,因帶有標註的點雲資料不易取得,且不同的點雲場景有著不同的資料分布之特色,使得在真實世界實作上會遇到域偏移(domain shift)的問題。域適應(domain adaptation)旨在解決此問題,但域適應在3D資料領域中仍屬於早期發展階段,本論文採用2D域適應的方法-自訓練(self-training),將其應用於點雲語義分割任務中。
    自訓練屬於半監督學習(semi-supervised learning)的一種,本論文在自訓練技術的基礎上受模擬退火(simulated annealing)的啟發,調整了原先自訓練候選偽標籤的機制,將原先無法解釋的部分替換掉,取而代之的是更加符合自然定律的規則。與傳統自訓練方法在點雲語義分割的效果有限相比,本論文所提出之方法允許模型在自訓練過程中生成更穩定的偽標籤,從而提高模型的跨域效能。
    本論文使用點雲室內場景的數據集來完成實驗以及效能評估,分別以S3DIS以及ScanNet依序當作源域資料集與目標域資料集,實驗結果顯示,所提出之做法可以在域轉移上獲得更好的效能,並於S3DISScanNet之域適應實驗中展現6.9%的準確率提升。

    Point cloud semantic segmentation is a topic that is still being actively studied and is one of the important tasks in computer vision. It is widely used in tasks related to vision such as autonomous driving, drones, machine navigation, etc. However, obtaining annotated point cloud data is difficult and different point cloud scenes have different data distribution, leading to the problem of domain shift in real-world implementation. Domain adaptation aims to solve this problem, but domain adaptation in the 3D data is still in early stages of development. This thesis adopts a 2D domain adaptation method - self-training and applies it to the point cloud semantic segmentation task.
    Self-training belongs to one type of semi-supervised learning. This thesis is inspired by simulated annealing and adjust the mechanism of the original self-training candidate pseudo-labels, replacing the previously inexplicable parts with rules more in line with natural laws. Compared to the limited effect of traditional self-training methods in point cloud semantic segmentation, the proposed method allows the model to generate more stable pseudo-labels during the self-training process, thus improving the model’s cross-domain performance.
    The experiments and performance evaluation in this thesis are conducted using point cloud indoor scene datasets. S3DIS and ScanNet are used as the source and target domain datasets respectively. The results show that the proposed method can achieve better performance in domain transfer, with a 6.9% accuracy improvement demonstrated in the experiment of S3DIS to ScanNet.

    摘要 i ABSTRACT ii 致謝 iv Content v List of Tables vii List of Figures viii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Thesis Overview 2 1.3 Thesis Organization 3 Chapter 2 Background 4 2.1 Point Cloud Semantic Segmentation 4 2.1.1 Deep Learning-Based Point Cloud Semantic Segmentation Method 5 2.1.2 Representative Work Review 7 2.2 Domain Adaptation 10 2.2.1 Domain Adaptation Method 11 2.2.2 Domain Adaptation on Cloud Semantic Segmentation 14 2.3 Self-Training 15 2.3.1 Traditional Self-Training 15 2.3.2 Class-Balanced Self-Training 17 2.3.3 Confidence Regularized Self-Training 19 2.4 Concept of Simulated Annealing 20 2.4.1 Simulated Annealing 21 Chapter 3 Proposed Algorithm based on CRST-LR and Simulated Annealing 22 3.1 Flow Overview 22 3.1.1 System Flow 22 3.1.2 Loss Function 28 3.2 Temperature of Soft Pseudo-Label 28 3.2.1 Softmax with Temperature 28 3.3 Criteria of Pseudo-Label Selection 29 3.3.1 Growth Criteria of k_c 30 3.3.2 Sampling with Probability by Rerverse Fermi-Dirac Function 30 3.3.3 Treated as Weighted Loss 32 Chapter 4 Experimental Results and Evaluation 33 4.1 Experimental Setup 33 4.1.1 S3DIS Dataset & ScanNet Dataset 33 4.1.2 Experiment Label Mapping 35 4.1.3 Other Experiment Settings 37 4.2 Performance of Domain Adaptation 38 4.2.1 Hyperparameter Selection 38 4.2.2 Domain Adaptation Results 39 4.2.3 Ablation Study 44 4.3 Implementation on 2D Dataset 45 Chapter 5 Conclusion and Future Work 48 5.1 Conclusion 48 5.2 Future Work 49 References 50

    [1] Zou, Yang, et al. "Unsupervised domain adaptation for semantic segmentation via class-balanced self-training." Proceedings of the European conference on computer vision (ECCV). 2018.
    [2] Zou, Yang, et al. "Confidence regularized self-training." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
    [3] Guo, Yulan, et al. "Deep learning for 3d point clouds: A survey." IEEE transactions on pattern analysis and machine intelligence 43.12 (2020): 4338-4364.
    [4] Su, Hang, et al. "Multi-view convolutional neural networks for 3d shape recognition." Proceedings of the IEEE international conference on computer vision. 2015.
    [5] Maturana, Daniel, and Sebastian Scherer. "Voxnet: A 3d convolutional neural network for real-time object recognition." 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2015.
    [6] Qi, Charles Ruizhongtai, et al. "Pointnet: Deep learning on point sets for 3d classification and segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    [7] Qi, Charles Ruizhongtai, et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space." Advances in neural information processing systems 30 (2017).
    [8] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.
    [9] Jiang, Mingyang, et al. "Pointsift: A sift-like network module for 3d point cloud semantic segmentation." arXiv preprint arXiv:1807.00652 (2018).
    [10] Zhang, Zhiyuan, Binh-Son Hua, and Sai-Kit Yeung. "Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics." Proceedings of the IEEE/CVF international conference on computer vision. 2019.
    [11] Hu, Qingyong, et al. "Randla-net: Efficient semantic segmentation of large-scale point clouds." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
    [12] Zhao, Hengshuang, et al. "Point transformer." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
    [13] Gretton, Arthur, et al. "A kernel method for the two-sample-problem." Advances in neural information processing systems 19 (2006).
    [14] Tzeng, Eric, et al. "Deep domain confusion: Maximizing for domain invariance." arXiv preprint arXiv:1412.3474 (2014).
    [15] Goodfellow, Ian, et al. "Generative adversarial networks." Communications of the ACM 63.11 (2020): 139-144.
    [16] Ganin, Yaroslav, et al. "Domain-adversarial training of neural networks." The journal of machine learning research 17.1 (2016): 2096-2030.
    [17] Kirkpatrick, Scott, C. Daniel Gelatt Jr, and Mario P. Vecchi. "Optimization by simulated annealing." science 220.4598 (1983): 671-680.
    [18] Armeni, Iro, et al. "3d semantic parsing of large-scale indoor spaces." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    [19] Dai, Angela, et al. "Scannet: Richly-annotated 3d reconstructions of indoor scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    [20] Zhang, Youshan. "A survey of unsupervised domain adaptation for visual recognition." arXiv preprint arXiv:2112.06745 (2021).
    [21] Jaritz, Maximilian, et al. "xmuda: Cross-modal unsupervised domain adaptation for 3d semantic segmentation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
    [22] Yi, Li, Boqing Gong, and Thomas Funkhouser. "Complete & label: A domain adaptation approach to semantic segmentation of lidar point clouds." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
    [23] Bian, Yikai, et al. "Unsupervised Domain Adaptation for Point Cloud Semantic Segmentation via Graph Matching." 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022.
    [24] Richter, Stephan R., et al. "Playing for data: Ground truth from computer games." Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer International Publishing, 2016.
    [25] Ros, German, et al. "The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

    下載圖示
    2026-02-06公開
    QR CODE