| 研究生: |
吳振宇 WU, CHEN-YU |
|---|---|
| 論文名稱: |
多時期光學衛星影像去雲方法使用半監督式時間注意力深度學習模型 A Semi-Supervised Temporal Attention Deep Learning Model for Cloud Removal in Multi-Temporal Optical Satellite Imagery |
| 指導教授: |
林昭宏
Lin, Chao-Hung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 測量及空間資訊學系 Department of Geomatics |
| 論文出版年: | 2026 |
| 畢業學年度: | 114 |
| 語文別: | 中文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 影像去雲 、SEN12MS-CR-TS 、U-TAE 、半監督學習 、Sentinel-2 |
| 外文關鍵詞: | cloud removal, SEN12MS-CR-TS, U-TAE, semi-supervised learning, Sentinel-2 |
| 相關次數: | 點閱:52 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
雲層遮蔽長期以來是光學遙測影像應用中最主要的限制之一,平均約覆蓋地球表面 67% 的區域,並遮蔽超過一半的陸地面積,導致地表資訊缺失,進而影響土地利用變遷監測、災害評估與農業調查等多項應用。傳統除雲方法如影像疊合與統計濾波,雖可透過多時相影像選取無雲像素進行重建,然而其缺乏對地物語意與空間結構的深層建模能力,在厚雲遮蔽、觀測頻率不足或地表快速變化的情境下,往往無法有效還原細節,甚至可能產生偽影與結構失真。
隨著深度學習技術於電腦視覺與遙測領域的快速發展,結合時間與空間特徵的模型逐漸成為去雲研究的重要方向。此類模型能自動在多時相影像中辨識並強化高品質觀測資訊,對於厚雲區域的影像重建具有顯著潛力。本研究採用 SEN12MS-CR-TS 多模態多時相資料集,以 Sentinel-2 影像作為主要研究資料來源,藉由多時序影像間的互補關係,提升在高雲量與觀測不完整條件下的地表資訊重建能力。
本研究以 U-TAE(U-Net with Temporal Attention Encoder)模型為核心架構,透過時間注意力機制(L-TAE)在多時相輸入中自動分配時間權重,強化對高品質時序影像的利用並抑制低品質觀測的干擾,同時結合 U-Net 的跳躍式連接以保留空間細節,提升重建影像的連貫性與結構合理性。針對大規模遙測資料訓練過程中常見的資料載入瓶頸,本研究將影像資料預先打包為 npy 格式並設計隨機取樣機制,有效降低資料存取成本,最終訓練速度由每日約 2 個 epoch 提升至約 6個 epoch。此外,更透過損失監控策略,進一步改善模型於長時間訓練下的收斂穩定性。
實驗結果顯示,改良後之 U-TAE 模型在低雲量場景中能有效保留地物輪廓、道路結構與陰影分布;在厚雲遮蔽情境下,雖局部細節仍存在限制,但整體無雲影像具備良好的結構連貫性與地形合理性,並於 PSNR 與 SSIM 等客觀指標上達到不錯的表現。研究結果證實,結合監督式學習的機制,透過訓練流程與效率優化,在不改變核心模型架構的前提下,能夠兼具多時相遙測影像去雲實務之高精度與可行性。
本研究除提出一套兼顧影像品質與運算效率的去雲方法外,亦透過半監督學習與遮罩導向訓練策略,探討在地真資料有限情境下之應用潛力,為未來擴展至資料受限地區的遙測應用提供可行方向。後續研究可進一步結合更進階的時間注意力模型或教師–學生式半監督學習架構,以持續提升去雲模型在複雜雲型與極端觀測條件下的穩健性。
Cloud cover has long been one of the most critical limitations in optical remote sensing applications, obscuring approximately 67% of the Earth’s surface on average and affecting more than half of terrestrial areas. This results in substantial loss of surface information and consequently degrades the reliability of land-use change monitoring, disaster assessment, and agricultural surveys. Conventional cloud removal approaches, such as image compositing and statistical filtering, reconstruct cloud-free imagery by selecting clear-sky pixels from multi-temporal observations. However, due to their limited capability to model semantic and spatial structures, these methods often fail to recover fine details under conditions of thick cloud cover, insufficient observation frequency, or rapid land surface changes, and may introduce artifacts or structural distortions.
With the rapid advancement of deep learning in computer vision and remote sensing, models that jointly exploit temporal and spatial features have become a prominent direction for cloud removal research. Such models are capable of automatically identifying and emphasizing high-quality observations within multi-temporal imagery, offering substantial potential for reconstruction in heavily cloud-covered regions. In this study, the SEN12MS-CR-TS multi-temporal dataset is adopted, with Sentinel-2 optical imagery serving as the primary data source. By exploiting the complementary information among multi-temporal observations, the proposed approach aims to enhance surface information reconstruction under conditions of high cloud coverage and incomplete observations.
This research employs the UTAE (U-Net with Temporal Attention Encoder) architecture as the core model. A temporal attention mechanism (L-TAE) is used to automatically assign temporal weights across multi-temporal inputs, reinforcing the contribution of high-quality observations while suppressing the influence of low-quality or cloud-contaminated imagery. In addition, U-Net skip connections are incorporated to preserve spatial details and improve the structural coherence of reconstructed images. To address data-loading bottlenecks commonly encountered during large-scale remote sensing training, the dataset is pre-packaged into npy files with a random sampling strategy, significantly reducing data access overhead. As a result, the training speed is improved from approximately 2 epochs per day to about 6 epochs per day. Furthermore, a loss monitoring strategy is introduced to enhance convergence stability during long-term training.
Experimental results demonstrate that the improved UTAE model effectively preserves object boundaries, road networks, and shadow patterns in low-cloud scenarios. Under thick cloud conditions, although local details remain challenging to recover, the reconstructed cloud-free images exhibit strong structural consistency and reasonable terrain continuity. Quantitative evaluations using PSNR and SSIM indicate that the proposed method achieves robust performance without modifying the core model architecture. These findings confirm that, through optimized training procedures and computational efficiency improvements, high-precision and practical cloud removal for multi-temporal remote sensing imagery can be achieved.
Beyond proposing an efficient and high-quality cloud removal framework, this study further explores the application potential of semi-supervised learning and mask-guided training strategies under limited ground-truth availability. The results provide valuable insights for extending cloud removal models to data-scarce regions. Future work may incorporate more advanced temporal attention architectures or teacher–student-based semi-supervised learning frameworks to further enhance model robustness under complex cloud patterns and extreme observation conditions.
Berthelot, D., Carlini, N., Cubuk, E. D., Kurakin, A., Sohn, K., Zhang, H., & Raffel, C. (2019). MixMatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems.
Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P., Martimort, P., Meygret, A., Spoto, F., Sy, O., Marchese, F., & Bargellini, P. (2012). Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sensing of Environment, 120, 25–36.
Ebel, P., Xu, Y., Schmitt, M., & Zhu, X. X. (2022). SEN12MS-CR-TS: A remote-sensing dataset for multimodal multitemporal cloud removal. IEEE Transactions on Geoscience and Remote Sensing.
Ebel, P., Sainte Fare Garnot, V. S., Schmitt, M., Wegner, J. D., & Zhu, X. X. (2023). UnCRtainTS: Uncertainty quantification for cloud removal in optical satellite time series. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 2086–2096).
Enomoto, K., Sakurada, K., & Wang, W. (2017). Filmy cloud removal on satellite imagery with multispectral conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.
Garnot, V. S. F., & Landrieu, L. (2021). Panoptic segmentation of satellite image time series with convolutional temporal attention networks.
Guo, J., Yang, J.-Y., Yue, H., Liu, X., & Li, K. (2022). Semi-supervised cloud detection in satellite images by considering the domain shift problem. Remote Sensing, 14(11), 2641.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International Conference on Learning Representations.
Miyato, T., Maeda, S.-i., Koyama, M., & Ishii, S. (2018). Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Pan, H. (2020). Cloud removal for remote sensing imagery via spatial attention generative adversarial network. arXiv preprint arXiv:2009.13015.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention.
Sintarasirikulchai, W., Kasetkasem, T., Isshiki, T., Chanwimaluang, T., & Rakwatin, P. (2018). A multi-temporal convolutional autoencoder neural network for cloud removal in remote sensing images. In Proceedings of the 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (pp. 360–363).
Sohn, K., Berthelot, D., Li, C.-L., Zhang, Z., Carlini, N., Cubuk, E. D., Kurakin, A., Zhang, H., & Raffel, C. (2020). FixMatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems.
Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning. Advances in Neural Information Processing Systems.
Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2018). Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Xie, Q., Dai, Z., Hovy, E., Luong, M.-T., & Le, Q. V. (2020). Unsupervised data augmentation for consistency training. Advances in Neural Information Processing Systems.
Xu, M., Deng, F., Jia, S., Jia, X., & Plaza, A. J. (2022). Attention mechanism-based generative adversarial networks for cloud removal in Landsat images (AMGAN-CR). Remote Sensing of Environment, 271, 112902.
Xu, Z., Wu, K., Wang, W., Lyu, X., & Ren, P. (2022). Semi-supervised thin cloud removal with mutually beneficial guides. ISPRS Journal of Photogrammetry and Remote Sensing, 192, 327–343.
Zhang, B., Wang, Y., Hou, W., Wu, H., Wang, J., & Zhang, L. (2021). FlexMatch: Boosting semi-supervised learning with curriculum pseudo labeling. Advances in Neural Information Processing Systems.
Zhu, X. X., Tuia, D., Mou, L., Xia, G.-S., Zhang, L., Xu, F., & Fraundorfer, F. (2017). Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geoscience and Remote Sensing Magazine.
Zupanc, A., et al. (2019). Sentinel-2 cloudless: Global cloud and cloud-shadow detection with Sentinel-2.