研究生: |
沈昇廷 Shen, Sheng-Ting |
---|---|
論文名稱: |
基於立體攝影機運用二次性訓練卷積神經網路進行海上魚類捕捉追蹤與魚隻長度量測系統 Fish Tracking by Two-Staged Convolutional Neural Network Training and Length Measurement by PCA in Real Fishery Catch Event Stereo Video |
指導教授: |
詹寶珠
Chung, Pau-Choo |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 52 |
中文關鍵詞: | 卷積神經網路 、RGB-D鏡頭 、卡爾曼濾波器 、兩階段訓練 、3D物體追蹤 |
外文關鍵詞: | convolutional neural network, electronic monitoring, object tracking, stereo video, stereo segmentation, plane detection, fishery |
相關次數: | 點閱:99 下載:8 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文主旨在於開發基於卷積神經網路之海上魚群捕撈計數與魚類長度量測系統,透過卷積神經網路在偵測魚類捕撈的事件發生,並透過3D鏡頭去計算魚隻的身體在三維空間的角度連續變化,進而使用Kalman濾波器追蹤魚隻的移動方向來補強卷積神經網路未能偵測到的幀,應用於海上捕撈魚群之計數之實時的魚隻長度量測。首先,因訓練資料集有限的因素,未能涵蓋所有場景如天氣、光線與不同漁船之相機設置角度與背景差異等因素,本論文設計了兩次性訓練(Two-stage training)神經網路方法來對不同種的船隻進行適應性的訓練以降低錯誤的偵測(False Alarm),此方法可以有效的利用有限的資料集與人工標記來達到部署多部船隻的適應性物體偵測系統。再者,本論文利用NOAA所提供之RGB-D 攝影鏡頭來計算捕撈魚隻的過程中該魚隻在三維空間中的位置,透過魚隻像素點在三維空中中的分佈進行PCA得出的主成份分析來量測頭部和尾部的最大長度,進而預估魚隻的體長,這對於大量捕撈的作業減少了很繁瑣的工作,自動化與實時的量測將繁瑣的工作自動化並減輕作業人員的捕撈作業之負擔。接著,在卷積神經網路的偵測之下,偶爾會有沒有檢測的幀,這讓畫面看起來並不連貫,我們採用Kalman Filter 來將沒有偵測到的畫面幀進行預測位置來補足,此方法參照了卷積神經網路與Kalman filter 之加權總和來進行3D空間的 Bounding Box之位置輸出,我們針對魚隻的捕撈情況擺動行為定義了Pose, Size, Orientation 此三種參數來描述魚隻在三維空間中的位置,藉由此三個參數能將Bounding Box在三維空間中的輸出更為恰當的做描述。在追蹤的過程中,我們採用最後經結果顯示,本論文提出的方法在追蹤魚類的表現上進行實驗,我們採樣10分鐘時間的捕撈過程共計62個捕撈事件,實驗統計此方法可以捕捉到58次的捕撈事件,並有2次的假警報(False Alarm),共計可達到93.54%的準確率,並在3D空間的追蹤與魚類的體長量測上表現也僅有13%的平均誤差,透過以上實驗結果可以驗證本論文提出的方法之有效性。期望本論文提出之系統架構可以有效幫助NOAA海洋協會與船業人員在捕撈魚群的工作上作為一個輔助捕撈的電子監測工具,將繁瑣的工作自動化減輕作業時的量測工作負擔。
The purpose of the thesis is use the state-of-the-art real time object detector to detect the rail catch event and combine with RGB-D camera to track the fish and estimate the length in 3D space. EM (Electronic monitoring) for fishery activities has drawn increasing attention. In the wild sea surface, contains dynamic background, noise from the sea water and deformable objects, however, make conventional tracking and segmentation methods unreliable. In this thesis, we take advantage on deep learning and convolutional neural network in this work. we present a tracking and segmentation system in stereo video for monitoring fish catching on wild sea surface. Based on the result of a state-of-the-art pre-trained real time convolutional neural network object detector. Since the CNN(Convolutional Neural Network) object detector is based on frame-by-frame to detect the object. It will be not a continuous tracking for each object. In other words, it will cause a not-continuous missing frame in some cases due to the detection confidence is not higher than threshold. To deal with that problem and to make the system more reliable, the Kalman filtering-based tracking system is used to rescore the multiple object proposals and track the objects. Which will fill those missing frames cause by the CNN, and also makes the length measurement result more robust. Then, to segment the objects, we first apply a sampling-and-scoring strategy to classify the background plane based on background subtraction and disparity map, and then refine the segmentation of objects using color and geometric features. With the segmentation results, we can measure the 3D lengths of objects and help the tracking system as well. Experimental results show that a reliable tracking and measurement performance under noisy and dynamic environment is achieved.
[1] B. Zion, “The Use of Computer Vision Technologies in Aquaculture – A Review,” Computers and Electronics in Agriculture, vol. 88, pp. 125-132, 2012.
[2] J.R. Mathiassen, E. Misimi, M. Bondø, E. Veliyulin, and S.O. Østvik, “Trends in Application of Imaging Technologies to Inspection of Fish and Fish Products,” Trends in Food Science and Technology, vol. 22, no. 6, pp. 257-275, 2011.
[3] D.J. White, C. Svellingen, and N.J.C. Strachan, “Automated Measurement of Species and Length of Fish by Computer Vision,” Fisheries Research, vol. 80, pp. 203-210, 2006.
[4] Meng-Che Chuang, Jenq-Neng Hwang, Craig S. Rose “ Aggregated Segmentation of Fish from Conveyor Belt Videos,” IEEE International Conf. on Acoustics, Speech and Signal Processing (ICASSP), Vancouver Canada, May 26-31, 2013.
[5] Tsung-Wei Huang, Jenq-Neng Hwang, Craig Rose “ Chute-based Automated fish Length Measurement and Water Drop Detection,” IEEE Int'l Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, March 20-25, 2016.
[6] Kresimir Williams, Nathan Lauffenburger, Meng-Che Chuang, Jenq-Neng Hwang, and Rick Towler, “Automated measurements of fish within a trawl using stereo images from a Camera-Trawl device (CamTrawl),” Methods in Oceanography, vol. 17, pp. 138-152, 2016.
[7] M.-C. Chuang, J.-N. Hwang, K. Williams, and R. Towler, “Tracking Live Fish from Low-Contrast and Low-Frame-Rate Stereo Videos,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 1, pp. 167-179, 2015.
[8] Tsung-Wei Huang, Jenq-Neng Hwang, Suzanne Romain and Farron Wallace, “Live Tracking of Rail-Based Fish Catching on Wild Sea Surface,” Computer Vision for Analysis of Underwater Imagery (CVAUI), 2016 ICPR 2nd Workshop on, 4 Dec. 2016.
[9] Gaoang Wang, Jenq-Neng Hwang, Kresimir Williams, George Cutter, “ Closed-Loop Tracking-by-Detection for ROV-Based Multiple Fish Tracking,” Computer Vision for Analysis of Underwater Imagery (CVAUI), IEEE 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4-8 December, 2016.
[10] T. Zhao, R. Nevatia, and B. Wu, “Segmentation and Tracking of Multiple Humans in Crowded Environments,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 7, pp. 1198-1211, 2008.
[11] C. Chu, J. Hwang, S. Wang, and Y. Chen, “Human tracking by adaptive Kalman filtering and multiple kernels tracking with projected gradients,” ACM/IEEE Intl. Conf. on Distributed Smart Cameras, Aug, 2011.
[12] Randy T-M C.T. Chu, J.N. Hwang, H.Y. Pai, and K.M. Lan, “Tracking Human Under Occlusion Based on Adaptive Multiple Kernels With Projected Gradients,” IEEE Trans. Multimedia , vol. 15, pp. 1602-1615, June 2013.
[13] Z. Tang, J.-N. Hwang, Y.-S. Lin and J.-H. Chuang, “Multiple-kernel adaptive segmentation and tracking (MAST) for robust object tracking,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. 1115-1119, Shanghai, China, Mar. 2016.
[14] Kai Kang, Wanli Ouyang, Hongsheng Li and Xiaogang Wang, “Object Detection from Video Tubelets with Convolutional Neural Networks,” CVPR, 2016.
[15] Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu, “Visual Tracking with Fully Convolutional Networks,” ICCV, 2015.
[16] Seung-Hwan Bae and Kuk-Jin Yoon, “Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning,” CVPR, 2014.
[17] K.-H. Lee, J.-N. Hwang, G. Okopal, and J. Pitton, "Driving recorder based on-road pedestrian tracking using visual SLAM and constrained multiple-kernel," Proc. IEEE Int. Conf. Intell. Transp. Syst. (ITSC), pp. 2629-2635, Oct. 2014.
[18] Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele, “People-Tracking-by-Detection and People-Detection-by-Tracking,” CVPR, 2008.
[19] Girshick, R. “Fast R-CNN,” ICCV, 2015.
[20] Ren, S., He, K., Girshick, R., Sun, J. “Faster R-CNN: Towards real-time object detection with region proposal networks,” NIPS, 2015.
[21] Redmon, J., Divvala, S., Girshick, R., Farhadi, A. “You only look once: Unified, real-time object detection,” CVPR, 2016.
[22] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, and Cheng-Yang Fu, Alexander C. Berg, “SSD: Single Shot MultiBox Detector,” ECCV, 2016.
[23] Koichiro Yamaguchi, David McAllester, and Raquel Urtasun, “Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation,” ECCV, 2014.
[24] Heiko Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(2):328–341, 2008.
[25] Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik, “Learning Rich Features from RGB-D Images for Object Detection and Segmentation,” ECCV, 2014.
[26] Xiaofeng Ren, Liefeng Bo, and Dieter Fox, “RGB-(D) Scene Labeling: Features and Algorithms,” CVPR, 2012.
[27] Evan Herbst, Peter Henry, and Dieter Fox, “Toward Online 3-D Object Segmentation and Mapping,” ICRA, 2014.
[28] Andreas C. Müller, Sven Behnke, “Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images,” Robotics and Automation (ICRA), IEEE International Conference on, 2014.
[29] Xiaozhi Chen, Kaustav Kundu, Yukun Zhu, Andrew Berneshawi, Huimin Ma, Sanja Fidler, and Raquel Urtasun, “3D Object Proposals for Accurate Object Class Detection,” NIPS, 2015.
[30] Matteo Munaro, Filippo Basso, and Emanuele Menegatti, “Tracking people within groups with RGB-D data,” Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, 2012.
[31] E. Trucco, F. Isgro, and F. Bracchi, “Plane detection in disparity space,” International Conference on Visual Information Engineering (VIE 2003). Ideas, Applications, Experience, 2003 p. 73 – 76.
[32] Frederico A. Limberger, Manuel M. Oliveira, “Real-Time Detection of Planar Regions in Unorganized Point Clouds,” Pattern Recognition, Volume 48, Issue 6, Pages 2043–2053, 2015
[33] Jann Poppinga, Narunas Vaskevicius, Andreas Birk, and Kaustubh Pathak, “Fast Plane Detection and Polygonalization in noisy 3D Range Images,” International Conference on Intelligent Robots and Systems (IROS), Nice, France, IEEE Press, 2008.
[34] C. Stauffer, W. Eric, and L. Grimson, “Adaptive Background Mixture Models for Real-time Tracking,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2. 1999.
[35] P. St-Charles, G. Bilodeau, and R. Bergevin, “SuBSENSE: A Universal Change Detection Method with Local Adaptive Sensitivity,” IEEE Trans. Image Processing , vol. 24, no. 1, pp. 359-373, 2015.
[36] P. Kr¨ahenb¨uhl, and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” In NIPS, 2011.
[37] Settles, Burr, "Active learning literature survey." University of Wisconsin, Madison 52.55-66 2010.
[38] S. Abhinav, A. Gupta, R. Girshick, “Training region-based object detectors with online hard example mining.” In CVPR, 2016.
[39] K.-K. Sung and T. Poggio, “Learning and Example Selection for Object and Pattern Detection.” In MIT A.I. Memo No. 1521, 1994.
[40] S. Ahmad and S. Omohundro, “A Network for Extracting the Locations of Point Clusters using Selective Attention.” Technical Report TR 90-011, International Computer Science Institute, University of California, Berkeley, 1990.
[41] J. Hwang, J. Choi, S. Oh, and R. Marks, “Query Learning based on Boundary Search and Gradient Computation of Trained Multi-layer Perceptrons.” In Proceedings IJCNN, San Diego, CA, 1990. IEEE Press.
[42] P. Niyogi, “The Informational Complexity of Learning from Examples.” PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 1995.
[43] Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A, “The pascal visual object classes (voc) challenge. IJCV 88(2) 303–338, 2010
[44] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based convolutional networks for accurate object detection and segmentation.” TPAMI, 2015.
[45] Simonyan, K., Zisserman, A., “Very deep convolutional networks for large-scale image recognition.” In: NIPS. (2015)
[46] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L, “Imagenet large scale visual recognition challenge.” IJCV, 2015
[47] L. Prechelt. “Automatic early stopping using cross validation: quantifying the criteria” Neural Networks 11 761–767, 1998
[48] D.P. Kingma, J. Ba. “Adam: A Method for Stochastic Optimization” In: ICLR, 2015
[49] Krogh, Anders, and Jesper Vedelsby. “Neural network ensembles, cross validation, and active learning. Advances in neural information processing systems” 7 231-238, 1995
[50] Zhu, Xiaojin. “Semi-supervised learning literature survey.” 2005.
[51] Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y. “Overfeat: Integrated recognition, localization and detection using convolutional networks.” In: ICLR, 2014
[52] H. Rowley, S. Baluja, and T. Kanade. “Neural network based face detection.” IEEE PAMI, 1998.
[53] K.-K. Sung and T. Poggio. “Learning and Example Selection for Object and Pattern Detection.” In MIT A.I. Memo No. 1521, 1994.
[54] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. “Object detection with discriminatively trained part-based models.” PAMI, 2010.
[55] N. Dalal and B. Triggs. “Histograms of oriented gradients for human detection.” In CVPR, 2005
[56] P. Dollar, Z. Tu, P. Perona, and S. Belongie. “Integral channel features.” In BMVC, 2009.
[57] X. Wang and A. Gupta. “Unsupervised learning of visual representations using videos.” In ICCV, 2015.
[58] S. Gidaris and N. Komodakis. “Object detection via a multi-region & semantic segmentation-aware cnn model.” In ICCV, 2015
[59] S. Singh, A. Gupta, and A. A. Efros. “Unsupervised discovery of mid-level discriminative patches.” In European Conference on Computer Vision, 2012.
[60] Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.“Slic superpixels compared to state-of-the-art superpixel methods.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 34(11), 2274–2282, 2012
[61] Y. Zhang, R. Hartley, and L. Wang, “Fast multi-labelling for stereo matching,” in ECCV, pp. 524–537, 2010
[62] A. P. Moore, S. Prince, J. Warrell, U. Mohammed, and G. Jones, “Superpixel lattices,” in CVPR, 2008.
[63] O. Veksler, Y. Boykov, and P. Mehrani, “Superpixels and supervoxels in an energy optimization framework,” in ECCV 2010, 2010.
[64] Y. Zhang, R. Hartley, J. Mashford, and S. Burn, “Superpixels via pseudoboolean optimization,” in ICCV, 2011.