研究生: |
陳佑銘 Chen, You-Ming |
---|---|
論文名稱: |
應用權重三維骨架關節於動作辨識之研究 On the Study of Using Weighted 3D Skeleton Joints on Action Recognition |
指導教授: |
楊竹星
Yang, Chu-Sing |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
論文出版年: | 2012 |
畢業學年度: | 101 |
語文別: | 中文 |
論文頁數: | 63 |
中文關鍵詞: | 深度影像 、動態時間扭曲法 、動作辨識 |
外文關鍵詞: | Depth Image, Dynamic Time Warping, Action Recognition |
相關次數: | 點閱:93 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在電腦視覺領域中,人體動作辨識一直是重要的研究主題。傳統攝影機用於辨識人體姿態時,在前景擷取與識別部分受光影與重疊肢體影響嚴重,使用深度資訊輔助可以得到更佳的效果。
本研究使用Microsoft Kinect攝影機取得深度影像(Depth Image),並利用骨架追蹤演算法取得人體骨架關節結構,由人體各關節的三維座標轉換成關節的方向(Orientation)以作為動作姿勢上的特徵。再將上述特徵結合動態時間扭曲法(Dynamic Time Warping)來解決同一動作的速度不一致所造成動作辨識錯誤之問題。因此,當未知動作輸入時,即可比對資料庫內各個動作的相似程度,具有最高相似度的動作即為辨識結果。但是動態時間扭曲法的缺點為耗費較高的計算成本,因此本研究分析各關節的累積旋轉變化量佔各動作中之比例多寡,由此比例給予各關節對應的權重,並找出各動作之主要運用的關節,進而將動作分群,而達到改善動作比對速度與辨識率的目的。實驗結果以具有深度變化的全身性動作為辨識目標,並與其他動作辨識方法比較其辨識率,證明本研究的方法,除了能夠解決目標物運動速度不一的問題,也能達到很高的動作辨識率。
Human action recognition is an important area in Computer Vision. We used to capture motion video by CCD camera. It is an economical and basic solution when inputing body pose, but color information could be easily influenced by brightness and occlusion. Using depth sensor should lead to better effect on action recognition.
We get depth information by Kinect, a motion sensing input device by Microsoft. We build our feature vector from joint orientation along time series that invariant to human body size instead of 3D coordinates of body skeleton joints. Dynamic Time Warping is then applied to the resulted feature vector. However, the drawback of dynamic time warping method is ineffective to computing result. Therefore, we use accumulative angular variation of skeleton joints to analysis action, and then divide actions to several class by above variation.
In the experiment, we try different feature extractions, and compare the recognition rate and execution time. The results show the effectiveness and robustness against variation in speed of the proposed method.
[1] Ahad, M.A.R., Tan, J.K., Kim, H. S.,& Ishikawa, S. (2010),“Analysis of motion self-occlusion problem due to motion overwriting for human activity recognition.” (2010). Journal of Multimedia 5(1): 36-46.
[2] Aharon, M. & R. Kimmel (2006),“Representation Analysis and Synthesis of Lip Images Using Dimensionality Reduction.” Int. J. Comput. Vision 67(3): 297-312.
[3] Belkin, M. & P. Niyogi (2002),“Laplacian eigenmaps and spectral techniques for embedding and clustering.” Advances in neural information processing systems 1: 585-592.
[4] Bengio, Y., J. F. Paiement, et al. (2003),“Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering.” Advances in Neural Information Processing Systems.
[5] Bobick, A.F., Davis, J.W. (2001),“The recognition of human movement using temporal templates.”IEEE Transactions on Pattern Analysis and Machine Intelligence 23(3): 257-267.
[6] Bradski, G. R., &Davis, J.W. (2000),“Motion segmentation and pose recognition with motion history gradients.”IEEE Workshop on Application of Computer Vision.
[7] Chen, C. C.,Hsieh, J. W., Hsu,Y. T., &Huang,C.Y.(2006),“Segmentation of human body parts using deformable triangulation.”International Conference on Pattern Recognition, vol. 1, pp.355-358.
[8] Davis, J. W. & A. F. Bobick (2001),“The Representation and Recognition of Action Using Temporal Templates.” IEEE Transactions on Pattern Analysis and Machine Intelligence 23: 257-267.
[9] Ganapathi, V., PlagemannC., et al. (2010),“Real time motion capture using a single time-of-flight camera.” Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on.
[10] Gonzalez, R. C., & Woods, R. E. (2007), Digital Image Processing (3rd Edition), N.J.: Prentice Hall.
[11] Hansen, D. W., R. Larsen, et al. (2007),“Improving Face Detection with TOF Cameras. Signals,” Circuits and Systems, 2007. ISSCS 2007. International Symposium on.
[12] He, X. & P. Niyogi (2003),“Locality preserving projections.” Advances in Neural Information ProcessingSystems.
[13] Holte, M. B., Moeslund, T. B., & Fihl, P. (2010),“View-invariant gesture recognition using 3D optical flow and harmonic motion context.”Computer Vision and Image Understanding 114(12): 1353-1361.
[14] Howe, N. R. (2007),“Silhouette lookup for monocular 3D pose tracking.” Image Vision Comput. 25(3): 331-341.
[15] Huynh, D. Q. (2009), “Metrics for 3D Rotations: Comparison and Analysis.” Journal of Mathematical Imaging and Vision, vol.35,no. 2.
[16] Jia, K., &Yeung, D. Y. (2008),“Human action recognition using local spatio-temporal discriminantembedding.”IEEE Conference on Computer Vision and Pattern Recognition.
[17] Jian, H.O.C. (2010),“Gesture Recognition using windowed dynamic time warping.” M.Eng. thesis, National University of Singapore, Singapore.
[18] KaewTraKulPong, P. & R. Bowden (2001),“An improved adaptive background mixture model for real-time tracking with shadow detection.” Proc. 2nd European Workshp on Advanced Video-Based Surveillance.
[19] Kanade, T., A. Yoshida, et al. (1996),“A stereo machine for video-rate dense depth mapping and its new applications.” Computer Vision and Pattern Recognition, 1996. Proceedings CVPR '96, 1996 IEEE Computer Society Conference on.
[20] Kobayashi, T., A. Hidaka, et al. (2008),“Selection of Histograms of Oriented Gradients Features for Pedestrian Detection.” Neural Information Processing, Springer-Verlag: 598-607.
[21] Kolb, A., E. Barth, et al. (2008),“ToF-sensors: New dimensions for realism and interactivity.” Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on.
[22] Li, W. Q.,Zhang, Z. Y., &Liu, Z. C.(2010),“Actionrecognitionbased on abag of 3D points.”IEEEInternational Workshop on CVPR forHuman Communicative Behavior Analysis.
[23] Li, Z. & R. Jarvis (2009), “Real time Hand Gesture Recognition using a Range Camera.” Australasian Conference on Robotics and Automation(ACRA).
[24] Lin, S. J., Lee, C. Y., Chion, C. S., & Yang, C. S. (2011),“Action recognition using manifold learning and gradient feature of depth image.” National Computer Symposium.
[25] L.J.P. van der Maaten, E.O. Postma, & H.J. van den Herik (2009),“Dimensionality reduction: A comparative review.” Tilburg University Technical Report.
[26] Malassiotis, S., N. Aifanti, et al. (2002),“A gesture recognition system using 3D data.” 3D Data Processing Visualization and Transmission, 2002. Proceedings. First International Symposium on.
[27] Martinez-Contreras, F., Orrite-Urunuela, C., Herrero-Jaraba, E., Ragheb, H., &Velastin, S. A. (2009),“Recognizing human actions using silhouette-based HMM.” IEEE International Conference on Advanced Video and Signal Based Surveillance.
[28] Moeslund, T. B. and E. Granum (2001). “A survey of computer vision-based human motion capture.” Comput. Vis. Image Underst. 81(3): 231-268.
[29] Moeslund, T. B., A. Hilton, et al. (2006). “A survey of advances in vision-based human motion capture and analysis.” Comput. Vis. Image Underst. 104(2): 90-126.
[30] Ofli, F.,Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (2012),“Sequenceof the most informative joints (SMIJ): A new representation for human skeletal action recognition.”IEEE Computer Society Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW).
[31] Oggier, T., R. Kaufmann, et al. (2004),“3D-Imaging in Real-Time with Miniaturized Optical Range Camera.” Opto Conference Nurnberg.
[32] Okutomi, M. & T. Kanade (1993),“A multiple-baseline stereo.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 15(4): 353-363.
[33] Plagemann, C., V. Ganapathi, et al. (2010),“Real-time identification and localization of body parts from depth images.” Robotics and Automation (ICRA), 2010 IEEE International Conference on.
[34] Pless, R. (2003),“Image spaces and video trajectories: using Isomap to explore video sequences.” Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on.
[35] Poppe, R. (2010). “A survey on vision-based human action recognition.” Image Vision Comput. 28(6): 976-990.
[36] Primesense, “PrimeSense Ltd. | FAQ”, Primesense.com, 2011.
[37] Qing Jun, W. & Z. Ru Bo (2008),“LPP-HOG: A New Local Image Descriptor for Fast Human Detection.” Knowledge Acquisition and Modeling Workshop, 2008. KAM Workshop 2008. IEEE International Symposium on.
[38] Reyes, M., Domínguez, G. & Escalera, S. (2011),“Feature weighting in dynamic time warping for gesture recognition in depth data.” IEEE International Conference on Computer Vision Workshops.
[39] Ahad, M.A.R., Tan, J.K., Kim, H. S.,& Ishikawa, S. (2010),“Analysis of motion self-occlusion problem due to motion overwriting for human activity recognition.” (2010). Journal of Multimedia 5(1): 36-46.
[40] Roweis, S. T. & L. K. Saul (2000),“Nonlinear dimensionality reduction by locally linear embedding.” Science 290: 2323-2326.
[41] Sakoe,H., & Chiba, S. (1978), “Dynamic programming algorithm optimization for spoken word recognition.” IEEE Transactions on Acoustics, Speech, and Signal ProcessingVol. ASSP-26(1).
[42] Salvi, J., J. Pagès, et al. (2004),“Pattern codification strategies in structured light systems.” Pattern Recognition 37(4): 827-849.
[43] Shotton, J., A. Fitzgibbon, et al. (2011),“Real-Time Human Pose Recognition in Parts from Single Depth Images.” Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on.
[44] Souvenir, R. & J. Babbs (2008),“Learning the viewpoint manifold for action recognition.” Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on.
[45] Spinello, L., K. O. Arras, et al. (2010),“A Layered Approach to People Detection in 3D Range Data.” Proceedings of the Twenty-Fourth Conference Artificial Intelligence (AAAI-10).
[46] Tat-Jun, C., W. Liang, et al. (2007),“Extrapolating Learned Manifolds for Human Activity Recognition.” Image Processing, 2007. ICIP 2007. IEEE International Conference on.
[47] Tenenbaum, J. B., V. Silva, et al. (2000),“A global geometric framework for nonlinear dimensionality reduction.” Science 290: 2319-2323.
[48] Wang, L. and D. Suter (2008),“Visual learning and recognition of sequential data manifolds with applications to human movement analysis.” Comput. Vis. Image Underst. 110(2): 153-172.
[49] Weinland, D., R. Ronfard, et al. (2011),“A survey of vision-based methods for action representation, segmentation and recognition.” Comput. Vis. Image Underst. 115(2): 224-241.
[50] Xiaofei, H., Y. Shuicheng, et al. (2005),“Face recognition using Laplacianfaces.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 27(3): 328-340.
[51] Yazhou, L., Y. Hongxun, et al. (2006),“Nonparametric Background Generation.” Pattern Recognition, 2006. ICPR 2006. 18th International Conference on.
[52] Yuan, Xin.&Yang, X. (2009),“A robust human action recognition system using single camera.”International Conference on Computational Intelligence and Software Engineering.
[53] Zhang, L., &Liang, Y. (2010),“Motion human detection based on background subtraction.”International Workshop on Education Technology and Computer Science.
[54] Zhu, Y., B. Dariush, et al. (2008), “Controlled human pose estimation from depth image streams.” Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on.
[55] Zhu, Y. & K. Fujimura (2010). “A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences.” Sensors 10(5): 5280-5293.