| 研究生: |
邱碁森 Chiou, Chi-Sen |
|---|---|
| 論文名稱: |
結合深度影像梯度特徵與流形學習於動作辨識之研究 A Study on Action Recognition Using Manifold Learning and Gradient Feature of Depth Image |
| 指導教授: |
楊竹星
Yang, Chu-Sing |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 中文 |
| 論文頁數: | 50 |
| 中文關鍵詞: | 深度影像 、方向梯度直方圖 、流形學習 、動作辨識 |
| 外文關鍵詞: | Depth Image, Historgram of Oriented Gradient, Manifold Learning, Action Recognition |
| 相關次數: | 點閱:89 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在電腦視覺領域中,人體動作辨識一直是重要的研究主題。使用攝影機捕捉運動影像是基本且低成本的資料取得方式,但傳統攝影機用於辨識人體姿態時,在前景擷取與識別部分受光影、同色背景與重疊肢體影響嚴重,使用深度資訊輔助可以得到更佳的效果。
本研究使用Microsoft Kinect攝影機取得深度影像(Depth Image),提出使用三維空間上的連通區塊標記來得到目標物件的方法。並由深度影像計算梯度,取出方向梯度直方圖(Histogram of Oriented Gradient, HOG)作為動作姿勢上的特徵。最後使用流形學習(Manifold Learning)方法將連續動作映射到低維度的圖形上。將資料庫中動作經過上述訓練後,得到各動作各自的對應流形分布圖與投影矩陣。當未知動作輸入時,即可計算與各個分布圖的相似程度,來辨識其所屬的動作。
實驗中嘗試不同的前景擷取與特徵擷取方法,與數種流形學習的分布結果,比較其執行速度與辨識率,證明本研究的方法能幫助取得正確目標,提高準確性。
Human action recognition is an important area in Computer Vision. We used to capture motion video by CCD camera. It is an economical and basic solution when inputing body pose, but color information could be easily influenced by brightness and occlusion. Using depth sensor should lead to better effect on action recognition.
We get depth information by Kinect, a motion sensing input device by Microsoft. A method of 3D connected-component labeling is presented to extract object in scene. We use histogram of oriented gradients as feature descriptors of body pose, and map each pose to an Euclidean space by manifold learning. After training existing actions in database, the projection matrixs and manifold distributions are stored as action model. The similarity between new data and existing model could be computed to recognize new action.
In the experiment, we test different foreground and feature extractions with manifold learning, and compare the recognition rate and execution time. The results show the effectiveness of the proposed method.
[1] Aharon, M. and R. Kimmel (2006), "Representation Analysis and Synthesis of Lip Images Using Dimensionality Reduction." Int. J. Comput. Vision 67(3): 297-312.
[2] Belkin, M. and P. Niyogi (2002) , "Laplacian eigenmaps and spectral techniques for embedding and clustering." Advances in neural information processing systems 1: 585-592.
[3] Bengio, Y., J. F. Paiement, et al. (2003) , "Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering." Advances in Neural Information Processing Systems.
[4] Dalal, N. and B. Triggs (2005) , "Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on.
[5] Davis, J. W. and A. F. Bobick (2001) , "The Representation and Recognition of Action Using Temporal Templates." IEEE Transactions on Pattern Analysis and Machine Intelligence 23: 257-267.
[6] Felzenszwalb, P. F., R. B. Girshick, et al. (2010) , "Object Detection with Discriminatively Trained Part-Based Models." IEEE Transactions on Pattern Analysis and Machine Intelligence 32: 1627-1645.
[7] Ganapathi, V., C. Plagemann, et al. (2010) , "Real time motion capture using a single time-of-flight camera." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on.
[8] Gonzalez, R. C. and R. E. Woods (2007) , "Digital Image Processing, 3rd edition."
[9] Hansen, D. W., R. Larsen, et al. (2007) , "Improving Face Detection with TOF Cameras. Signals," Circuits and Systems, 2007. ISSCS 2007. International Symposium on.
[10] He, X. and P. Niyogi (2003) , "Locality preserving projections." Advances in Neural Information Processing Systems.
[11] Holte, M. B., T. B. Moeslund, et al. (2010) , "View-invariant gesture recognition using 3D optical flow and harmonic motion context." Comput. Vis. Image Underst. 114(12): 1353-1361.
[12] Howe, N. R. (2007) , "Silhouette lookup for monocular 3D pose tracking." Image Vision Comput. 25(3): 331-341.
[13] KaewTraKulPong, P. and R. Bowden (2001) , "An improved adaptive background mixture model for real-time tracking with shadow detection." Proc. 2nd European Workshp on Advanced Video-Based Surveillance.
[14] Kanade, T., A. Yoshida, et al. (1996) , "A stereo machine for video-rate dense depth mapping and its new applications." Computer Vision and Pattern Recognition, 1996. Proceedings CVPR '96, 1996 IEEE Computer Society Conference on.
[15] Kobayashi, T., A. Hidaka, et al. (2008) , "Selection of Histograms of Oriented Gradients Features for Pedestrian Detection." Neural Information Processing, Springer-Verlag: 598-607.
[16] Kolb, A., E. Barth, et al. (2008) , "ToF-sensors: New dimensions for realism and interactivity." Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on.
[17] Li, Z. and R. Jarvis (2009) , "Real time Hand Gesture Recognition using a Range Camera." Australasian Conference on Robotics and Automation(ACRA).
[18] L.J.P. van der Maaten, E.O. Postma, and H.J. van den Herik (2009), "Dimensionality reduction: A comparative review." Tilburg University Technical Report.
[19] Malassiotis, S., N. Aifanti, et al. (2002), "A gesture recognition system using 3D data." 3D Data Processing Visualization and Transmission, 2002. Proceedings. First International Symposium on.
[20] Masoud, O. and N. Papanikolopoulos (2003). "A method for human action recognition." Image and Vision Computing 21(8): 729-743.
[21] Moeslund, T. B. and E. Granum (2001). "A survey of computer vision-based human motion capture." Comput. Vis. Image Underst. 81(3): 231-268.
[22] Moeslund, T. B., A. Hilton, et al. (2006). "A survey of advances in vision-based human motion capture and analysis." Comput. Vis. Image Underst. 104(2): 90-126.
[23] Oggier, T., R. Kaufmann, et al. (2004) , "3D-Imaging in Real-Time with Miniaturized Optical Range Camera." Opto Conference Nurnberg.
[24] Okutomi, M. and T. Kanade (1993) , "A multiple-baseline stereo." Pattern Analysis and Machine Intelligence, IEEE Transactions on 15(4): 353-363.
[25] Plagemann, C., V. Ganapathi, et al. (2010) , "Real-time identification and localization of body parts from depth images." Robotics and Automation (ICRA), 2010 IEEE International Conference on.
[26] Pless, R. (2003) , " Image spaces and video trajectories: using Isomap to explore video sequences." Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on.
[27] Poppe, R. (2010). "A survey on vision-based human action recognition." Image Vision Comput. 28(6): 976-990.
[28] Primesense, "PrimeSense Ltd. | FAQ", Primesense.com, 2011.
[29] Qing Jun, W. and Z. Ru Bo (2008) , "LPP-HOG: A New Local Image Descriptor for Fast Human Detection." Knowledge Acquisition and Modeling Workshop, 2008. KAM Workshop 2008. IEEE International Symposium on.
[30] Roweis, S. T. and L. K. Saul (2000), "Nonlinear dimensionality reduction by locally linear embedding." Science 290: 2323-2326.
[31] Salvi, J., J. Pagès, et al. (2004), "Pattern codification strategies in structured light systems." Pattern Recognition 37(4): 827-849.
[32] Schwarz, L., D. Mateus, et al. (2010) , "Manifold Learning for ToF-based Human Body Tracking and Activity Recognition." Proceedings of the British Machine Vision Conference, BMVA Press: 80.81-80.11.
[33] Sedgewick, Robert, "Algorithms in C, 3rd Ed.", Addison-Wesley, 1998, pp. 11-20.
[34] Shotton, J., A. Fitzgibbon, et al. (2011) , "Real-Time Human Pose Recognition in Parts from Single Depth Images." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on.
[35] Souvenir, R. and J. Babbs (2008) , "Learning the viewpoint manifold for action recognition." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on.
[36] Spinello, L., K. O. Arras, et al. (2010) , "A Layered Approach to People Detection in 3D Range Data." Proceedings of the Twenty-Fourth Conference Artificial Intelligence (AAAI-10).
[37] Tat-Jun, C., W. Liang, et al. (2007) , "Extrapolating Learned Manifolds for Human Activity Recognition." Image Processing, 2007. ICIP 2007. IEEE International Conference on.
[38] Tenenbaum, J. B., V. Silva, et al. (2000), "A global geometric framework for nonlinear dimensionality reduction." Science 290: 2319-2323.
[39] Wang, L. and D. Suter (2008), "Visual learning and recognition of sequential data manifolds with applications to human movement analysis." Comput. Vis. Image Underst. 110(2): 153-172.
[40] Weinland, D., R. Ronfard, et al. (2011), "A survey of vision-based methods for action representation, segmentation and recognition." Comput. Vis. Image Underst. 115(2): 224-241.
[41] Xiaofei, H., Y. Shuicheng, et al. (2005), "Face recognition using Laplacianfaces." Pattern Analysis and Machine Intelligence, IEEE Transactions on 27(3): 328-340.
[42] Yazhou, L., Y. Hongxun, et al. (2006), "Nonparametric Background Generation." Pattern Recognition, 2006. ICPR 2006. 18th International Conference on.
[43] Zhu, Y., B. Dariush, et al. (2008), "Controlled human pose estimation from depth image streams." Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on.
[44] Zhu, Y. and K. Fujimura (2010). "A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences." Sensors 10(5): 5280-5293.