| 研究生: |
楊景倫 Yang, Jing-Lune |
|---|---|
| 論文名稱: |
基於深度遮罩式區域卷積神經網路之自動化蘭花瓶苗影像表徵萃取與計算 Automatic Orchid Bottle Seedling Image Feature Extraction and Measurement based on Deep Mask Regions Convolutional Neural Networks |
| 指導教授: |
王振興
Wang, Jeen-Shing |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 69 |
| 中文關鍵詞: | 蘭花瓶苗 、深度學習 、遮罩式區域卷積神經網路 、表徵萃取與計算 、影像辨識 、精準培育 |
| 外文關鍵詞: | Orchid Bottle Seedling, Deep Learning, Mask Regions Convolutional Neural Network (Mask R-CNN), Feature Extraction and Measurement, Image Detection, Precise Cultivation |
| 相關次數: | 點閱:213 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文旨在開發出基於遮罩式區域卷積神經網路之蘭花瓶苗影像辨識演算法與自動化表徵計算演算法,以人工智慧技術萃取蘭花瓶苗的成長歷程表徵,以期達到精準培育之目標。本研究首先於蘭園場域收集不同拍攝角度下之蘭花瓶苗影像,以作為影像辨識模型的訓練與測試資料集,接續將影像透過扭曲方式來擴增其資料量,最後對影像資料進行標記以產生瓶苗部位表徵對應之標籤影像作為訓練之黃金標準。本論文所提出之基於遮罩式區域卷積神經網路蘭花瓶苗影像辨識演算法是以不同層數殘差網路(Residual Network, ResNet)如ResNet-26、ResNet-41、ResNet-50、ResNet-101、ResNet-152分別搭配全卷積網路(Fully Convolutional Network, FCN)及U型網路(U-Network, UNet)所組成的10種不同遮罩式區域卷積神經網路模型為主要特徵萃取之工具,對蘭花瓶苗部位表徵(葉、根、根尖顏色白色部分、根尖顏色綠色部分、枯葉黃色部分、枯葉綠色部分)自動且有效的進行辨識,實驗結果顯示,ResNet-101-UNet相較於其他模型有更好的辨識表現,其平均整體AP之辨識準確度達到了77.89%,且其訓練時間為199毫秒/影像。此外,本論文提出之表徵計算演算法可將上述模型所辨識出的蘭花瓶苗不同部位表徵計算出其長度、寬度、數量及面積等表徵數值,實驗結果顯示,於葉部位面積因容易受拍攝角度影響而有較高之平均誤差百分比為16.47±6.41%;於根部位長度則有較低之平均誤差百分比為7.28±3.01%,而其整體表徵數值之平均誤差則皆能被有效抑制,因此研究結果也驗證了本論文所開發的表徵計算演算法應用於蘭花瓶苗部位表徵數值計算之可行性。希冀未來能實現演算法於感測系統上,並於實際場域中落實蘭花瓶苗精準培育之目標。
This thesis aims to develop an automatic orchid bottle seedling image feature extraction and measurement algorithms based on mask regions convolutional neural networks (Mask R-CNN) for extracting the important growth features of orchid bottle seedlings to reach the goal of precise cultivation. In this study, to train and test the Mask R-CNN, orchid bottle seedling images from different view angles were obtained from an orchid plantation factory in the southern Taiwan. The original images collected from the factory were first labeled for their outlook contours such leaves and roots. These contours are called as masks. Then, the labeled images were distorted to increase the diversity of the training and testing images. Finally, these images with their corresponding masks were served as the golden standards for the network training. The Mask R-CNN-based image detection algorithm has been developed to extract the features of orchid bottle seedlings, including leaf, root, green root tip, white root tip, yellow leaf, green leaf effectively and automatically. Ten different Mask R-CNN models were constructed for performance comparisons. These ten models are the different layers of residual network (ResNet) including ResNet-26, ResNet-41, ResNet-50, ResNet-101, and ResNet-152 combined with fully convolutional network (FCN) and U-network (UNet), respectively. The experimental results show that the ResNet-101-UNet outperforms the other models with higher average precision (AP) of feature extraction at 77.89%, and its training time is 199 ms/image. In addition to the feature extraction, a feature measurement algorithm has been developed to measure/calculate the features, such as the number of leaves and the length, width, and area of each leaf from orchid bottle seedling images detected by the Mask R-CNN models. The experimental results show that the average percentage error of the area measurement of leaves is 16.47±6.41% due to the shading or blocking by other leaves or curly leaves, while the average percentage error of the length measurement of roots is 7.28±3.01%. The overall average errors of the feature measurements/calculations were satisfactory, and thus validated the effectiveness of the proposed methods for the feature extraction of orchid bottle seedlings. In the future, we hope these algorithms can be applied to the orchid plantation industry and reach the goal of precise cultivation of orchid bottle seedlings.
[1] L. Bottou, “Stochastic gradient descent tricks,” Neural Networks: Tricks of the trade, pp. 421-436, 2012.
[2] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, pp. 273-297, 1995.
[3] P. F. Felzenszwalb, F. Pedro, and D. P. Huttenlocher, “Efficient graph-based image segmentation,” International Journal of Computer Vision, vol. 59, no. 2, pp. 167-181, 2004.
[4] C. A. Glasbey and K. V. Mardia, “A review of image-warping methods,” Journal of applied statistics, vol. 25, no. 2, pp. 155-171, 1998.
[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
[6] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440-1448.
[7] H. C. Hsin, et al. “An adaptive training algorithm for back-propagation neural networks,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 25, no. 3, pp. 512-514, 1995.
[8] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
[9] A. G. Howard, et al. “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” ArXiv Preprint ArXiv:1704.04861, pp. 1-9, 2017.
[10] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2961-2969.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
[12] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541-551, 1989.
[13] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[14] J. Long, E. Shelhamer, and T. Darrell, “ Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431-3440.
[15] T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
[16] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807-814.
[17] O. Ronneberger, P. Fischer, and T. Brox, “ U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-assisted Intervention, 2015, pp. 234-241.
[18] A. Samal and D. Chaudhuri, “A simple method for fitting of bounding rectangle to closed regions,” Pattern recognition, vol. 40, no. 7, pp. 1981-1989, 2007.
[19] K. Saeed, M. Tabędzki, M. Rybnik, and M. Adamski, “K3M: A universal algorithm for image skeletonization and a review of thinning techniques,” International Journal of Applied Mathematics and Computer Science, vol. 20, no. 2, pp. 317-335, 2010.
[20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ArXiv Preprint ArXiv:1409.1556, pp. 1-14, 2014.
[21] C. Szegedy, et al. “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1-9.
[22] C. Szegedy and S. Loffe, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” ArXiv Preprint ArXiv:1502.03167, pp. 1-11, 2015.
[23] R. Shaoqing, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 1137-1149, 2017.
[24] N. Wang and Z. Huang, “Data-driven sparse structure selection for deep neural networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 304-320.
[25] L. Yang and F. Albregtsen, “Fast and exact computation of Cartesian geometric moments using discrete Green's theorem,” Pattern Recognition, vol. 29, no. 7, pp. 1061-1073, 1996.
[26] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 10, 2010, pp. 2528-2535.
[27] A. Zisserman, K. Simonyan, and M. Jaderberg, “Spatial transformer networks,” in Advances in neural information processing systems, pp. 2017-2025, 2015.
[28] https://www.nvidia.com/zh-tw/data-center/dgx-1/
[29] https://support-tw.canon-asia.com/contents/TW/TC/6200170600.html
[30] https://about.taitra.org.tw/News_Detail.aspx?id=9849
[31] https://enews.url.com.tw/enews/43640