| 研究生: |
方競賢 Fang, Chin-Hsieng |
|---|---|
| 論文名稱: |
時空混合空間的行為辨識系統 Spatio-temporal Space Learning for Human Action Recognition |
| 指導教授: |
連震杰
Lien, Jenn-Jier James |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 46 |
| 中文關鍵詞: | 行為辨識 |
| 外文關鍵詞: | Human Action Recognition |
| 相關次數: | 點閱:52 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文我們提出了一個新的行為辨識系統,主要概念為除了使用單張影像的空間資訊之外,我們更加入了影片的時間資訊概念,以這概念為出發點我們提出了五種兼具時間與空間資訊的資料型態。本系統可以輸入一個二值化的動作影片來判斷此影片中的人正在做甚麼動作,也可以即時判斷影片中的人當時的行為,其應用性是廣泛的,而且我們的系統在辨識時所花費的時間是短的,比起相關的行為辨識技術我們的系統兼顧了時間以及準確率,此外我們也提出一個以線性維持為出發點的降維技術 “ALPP”來產生一個更有意義的子空間。
實驗顯示我們的系統不僅有很高的準確率,我們也同時顧慮到時間因素,我們的系統在測試時所花的時間是可以達到即時處理的。同時也測試我們的系統對雜訊的忍受度,結果顯示我們的系統在雜訊的影響下仍然保有不錯的準確率。
In this thesis we propose a novel framework for human action recognition. In our framework, the major concept is that we would like to add the temporal information into the action recognition process. Based on this purpose, here we propose five kinds of temporal information in chapter 2.2, and we do have some progress by adding temporal concept. Moreover, we introduce a new dimensionality reduction method “Adaptive Locality Preserving Projections” (ALPP) that learns a better spatial subspace. And the experimental results demonstrate that our method can recognize the actions well. Especially with our DTM and DTMWB framework, they can achieve good accuracy rates. We also test noisy data to see if our framework has flexibility or not. The results show that our framework still has great accuracies toward noise, that proves our framework do have good tolerance for noise.
[1] S. Ali, A. Basharat, and M. Shah, “Chaotic Invariants for Human Action Recognition,” Proc. Int’l Conf. Computer Vision, pp. 1-8, 2007.
[2] M. Belkin and P. Niyogi, “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering,” Advances in Neural Information Processing Systems 14, pp.585-591, 2002.
[3] A. Bissacco, A. Chiuso, Y. Ma, and S. Soatto, “Recognition of Human Gaits,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 52-57, 2001.
[4] A. Bobick and J. Davis, “The Recognition of Human Movement using Temporal Templates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, 2001.
[5] C. Bregler, “Learning and Recognizing Human Dynamics in Video Sequences,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 568-574, 1997.
[6] M. Bregonzio, S. Gong, and Tao Xiang, “Recognising Action as Clouds of Space-Time Interest Points,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp.1948-1955, 2009.
[7] D. Cai, X. He, K. Zhou, J. Han, and H. Bao, “Locality Sensitive Discriminant Analysis,” International Joint Conferences on Artificail Intelligence, pp. 708-713, 2007.
[8] R. Chaudhry, Avinash Ravichandran, G. Hager, and R. Vidal, “Histograms of Oriented Optical Flow and Binet-Cauchy Kernels on Nonlinear Dynamical Systems for the Recognition of Human Actions,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1932-1939, 2009.
[9] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior Recognition via Sparse Spatio-temporal Features,” Proc. Int’l Conf. Computer Vision, pp. 65-72, 2005.
[10] A. Efros, A. Berg, G. Mori, and J. Malik, “Recognizing Action at A Distance,” Proc. Int’l Conf. Computer Vision, vol. 2, pp. 726-733, 2003.
[11] A. Elgammal and C.S. Lee, “Inferring 3D Body Pose from Silhouettes using Activity Manifold Learning,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 681-688, 2004.
[12] A. Fathi and G. Mori, “Action Recognition by Learning Mid-level Motion Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[13] R. Filipovych and E. Ribeiro, “Learning Human Motion Models from Unsegmented Videos,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-7, 2008.
[14] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Action as Space-time Shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247-2253, 2007.
[15] X. He and P. Niyogi, “Locality Preserving Projections,” Advances in Neural Information Processing Systems 16, pp. 152-160, 2003.
[16] L.K. Jia and D.Y. Yeung, “Human Action Recognition Using Local Spatio-Temporal Discriminant Embedding,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[17] Y. Ke, R. Sukthankar, and M. Hebert, “Efficient Visual Event Detection Using Volumetric Features,” Proc. Int’l Conf. Computer Vision, pp. 166-173, 2005.
[18] I. Laptev, “On Space-time Interest Points,” Int’l J. Computer Vision, vol. 64 no. 2-3 pp. 107-123, 2005.
[19] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning Realistic Human Actions from Movies,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[20] W.T. Lee and H.T. Chen, “Histogram-based Interest Point Detectors,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1590-1596, 2009.
[21] L. Fengjun and R. Nevatia, “Single View Human Action Recognition Using Key Pose Matching and Viterbi Path Searching,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[22] J.C. Niebles, H. Wang, and F.F. Li, “Unsupervised Learning of Human Action Categories Using Spatial-temporal Words,” British Machine Vision Conference, 2006.
[23] J.C. Neilbles and F.F. Li, “A Hierarchical Model of Shape and Appearance for Human Action Classification,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 99. 1-8, 2007.
[24] R. Poppe and M. Poel, “Discriminative Human Action Recognition Using Pairwise CSP Classifiers,” Proc. Int’l Conf. Automatic Face and Gesture Recognition, pp. 1-6, 2008.
[25] S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science 22, vol. 290, no. 5500, pp. 2322-2326, 2000.
[26] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing Human Actions: A Local SVM Approach,” Proc. Int’l Conf. Pattern Recognition, vol. 3, pp. 32-36, 2004.
[27] J.B. Tenenbaum, V.D. Silva, and J.C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science 22, vol. 290, no. 5500, pp. 2319-2323, 2000.
[28] D. Tran and A. Sorokin, "Human Activity Recognition with Metric Learning,” European Conference on Computer Vision, pp. 548-561, 2008.
[29] L. Wang, H.Z. Ning, T.N. Tan, and W.M. Hu, “Fusion of Static and Dynamic Body Biometrics for Gait Recognition,” Proc. Int’l Conf. Computer Vision, pp. 1449-1454, 2003.
[30] L. Wang and D. Suter, “Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[31] L. Wang and D. Suter, “Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition,” IEEE Trans. on Image Processing, vol. 16, no. 6, pp. 1646-1661, 2007.
[32] L. Wang and D. Suter, “Visual Learning and Recognition of Sequential Data Manifolds with Applications to Human Movement Analysis,” Computer Vision and Image Understanding, vol. 110, no. 2, pp. 152-172, 2008.
[33] R. Wang and X. Chen, “Manifold Discriminant Analysis,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 429-436, 2009.
[34] Y. Wang and G. Mori, “Max-Margin Hidden Conditional Random Fields for Human Action Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 872-879, 2009.
[35] D. Weinland and Edmond Boyer, “Action Recognition using Exemplar-based Embedding,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-7, 2008.
[36] K.Q. Weinberger and L.K. Saul, “Distance Metric Learning for Large Margin Nearest Neighbor Classification,” Journal of Machine Learning Research 10, pp. 209-244, 2009.
[37] Y. Yacoob and M.J. Black, “Parameterized Modeling and Recognition of Activities,” Computer Vision and Image Understanding, vol. 73, no. 2, pp. 232-247, 1999.
[38] S. Yan, D. Xu, B. Zhang, and H.J. Zhang, “Graph Embedding and Extensions: A General Framework for Dimensionality Reduction,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 1 pp. 40-51, 2007.
[39] L. Zelnik-Manor and M. Irani, “Event-based Analysis of Video,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 122-130, 2001.