| 研究生: |
陳敬堯 Chen, Jing-Yao |
|---|---|
| 論文名稱: |
利用深層特徵編碼進行手部姿態追蹤 Hand Pose Tracking using Deep Feature Encodings |
| 指導教授: |
吳馬丁
Nordling, Torbjörn |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 機械工程學系 Department of Mechanical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 77 |
| 中文關鍵詞: | 姿勢估計 、特徵追蹤 、自動編碼器 、無監督式學習法 、深度學習 |
| 外文關鍵詞: | Pose estimation, feature tracking, autoencoder, unsupervised learning, deep learning |
| 相關次數: | 點閱:133 下載:11 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
背景:帕金森氏症是一種不間斷的神經退化性疾病。除了導致無意的運動,即震顫,也會使患病者有運動上的困難,如僵硬、遲緩。甚至導致一些步態、平衡等協調性問題。正確的治療能明顯改善運動能力與生活品質,因此量化症況是至關重要的。這篇論文的重點在於統一帕金森氏評定表 (UPDRS) 的姿勢性顫抖測試中,對微小非自主動作進行基於影像的量化分析。目前已有研究應用姿勢估計技術進行運動分析。然而,在我們嘗試應用 OpenCV 標準方法的 OpenPose 中姿勢估計的結果於我們的案例當中產生不理想的結果。
目標:本論文的目標是將深度特徵編碼技術 (DFE) 納入姿勢估計,用於跟蹤人類手部運動,並嘗試提高追蹤精確度。
方法: DFE 是由 Chang 與 Nordling 等人於 2021 年提出的技術,他們在人臉資 料訓練一個自動編碼器。DFE 技術可以達到亞像素等級的皮膚特徵追蹤。此處,我們重現 DFE 技術並以手部資料進行訓練。並且我們考慮了膚色過濾與時間訊息兩項特徵以提高跟蹤精確度。姿勢估計是通過跟蹤手部的 21 個關鍵 點來完成。為了評估姿勢估計的精確度與影像品質的關係,我們將高解析度的影像添加高斯模糊或高斯噪音或將解析度減以破壞影像品質。
結果與結論: 在我們預先錄製好的 8 個驗證影片中的 40 個手動標籤中,重新訓 練的 DFE 的平均誤差 (MAE) 是 1.26 像素,而 OpenPose 是 12.47 像素。其中,我們發現在一關鍵點沒有明顯的特徵,使得它們比具有明顯特徵的關鍵點相比更難以有正確的追蹤。因此,比起挑選最突出的特徵相比,若要以如同姿勢估計中使用固定關鍵點的進行追蹤,將會有更高的追蹤誤差。
Background: Parkinson’s disease is a progressive neurodegenerative disorder that causes both unintentional movements, i.e. tremors, and difficulty to carry out movements, such as stiffness, slowness and gait, balance, and coordination problems. Correct treatment significantly improve the ability to move as intended and life quality, thus symptom quantification is essential. This thesis focus on camera based quantification of small involuntary movements, so called micromovements, during the postural tremor test in the Unified Parkinson’s Disease Rating Scale (UPDRS). Previously, pose estimation has been applied successfully to movement analysis in other applications, but OpenPose–the standard method for pose estimation in OpenCV–yielded unsatisfactory results in our case.
Aim: The goal of this thesis is to incorporate Deep Feature Encodings (DFE) in pose estimation for tracking of human hand movement and explore the precision. More accurate tremor tracking will allow us to better assist in the diagnosis of Parkinson’s disease in the future.
Method: DFE is an autoencoder trained on human faces to enable tracking of skin features with sub-pixel accuracy, introduced by Chang and Nordling (2021). Here it is retrained on human hands, and both skin filtering and temporal information is added to improve the tracking accuracy. The pose estimation is done by tracking 21 keypoints on the hand. To assess how the precision of the pose estimation depend on video quality, high quality videos are destructed by adding Gaussian blur or noise, or halving the resolution.
Results and conclusion: On a set of 40 manually labelled frames in 8 videos the mean absolute error (MAE) of the retrained DFE is 1.26 px, while it for OpenPose is 12.47 px. Some of the keypoints have no clear distinctive feature, which make them harder to track using DFE compared to keypoints with distinctive features. Thus the benefit of having fixed keypoints in pose estimation comes at the expense of increased MAE compared to picking the most salient features.
Afifi, M. (2019). 11k hands: Gender recognition and biometric identification using a large dataset of hand images. Multimedia Tools and Applications, 78(15):20835–20854.
Amin, S., Andriluka, M., Rohrbach, M., and Schiele, B. (2013). Multi-view pictorial structures for 3d human pose estimation. In Bmvc, volume 1.
Andriluka, M., Roth, S., and Schiele, B. (2010). Monocular 3d pose estimation and tracking by detection. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 623–630. IEEE.
Archdeacon, T. J. (1994). Correlation and regression analysis: a historian’s guide. Univ of Wisconsin Press.
Arnab, A., Doersch, C., and Zisserman, A. (2019). Exploiting temporal context for 3d human pose estimation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3395–3404.
Ashyani, A., Lin, C.-L., Roman, E., Yeh, T., Kuo, T., Tsai, W.-F., Lin, Y., Tu, R., Su, A., Wang, C.-C., Tan, C.-H., and Nordling, T. E. M. (2022). Digitization of updrs upper limb motor examinations towards automated quantification of symptoms of parkinson’s disease [unpublished manuscript]. Manuscript in preparation.
Baker, S. and Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. International journal of computer vision, 56(3):221–255.
Bay, H., Tuytelaars, T., and Gool, L. V. (2006). Surf: Speeded up robust features. In European conference on computer vision, pages 404–417. Springer.
Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., and Ilic, S. (2014). 3d pictorial structures for multiple human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1669–1676.
Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., and Torr, P. H. (2016). Fullyconvolutional siamese networks for object tracking. In European conference on computer vision, pages 850–865. Springer.
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., and Black, M. J. (2016). Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In European conference on computer vision, pages 561–578. Springer.
Bulat, A. and Tzimiropoulos, G. (2016). Human pose estimation via convolutional part heatmap regression. In European Conference on Computer Vision, pages 717–732. Springer.
Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.-J., Yuan, J., and Thalmann, N. M. (2019). Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2272–2281.
Cao, Z., Simon, T., Wei, S.-E., and Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7291–7299.
Chang, J. R. and Nordling, T. E. M. (2021). Skin feature point tracking using deep feature encodings. arXiv preprint.
Chen, C.-H. and Ramanan, D. (2017). 3d human pose estimation= 2d pose estimation+ matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7035–7043.
Chen, Y., Ma, H., Wang, J., Wu, J., Wu, X., and Xie, X. (2021). Pd-net: Quantitative motor function evaluation for parkinson’s disease via automated hand gesture analysis. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 2683–2691.
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., and Sun, J. (2018). Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7103–7112.
Cheng, Y., Yang, B., Wang, B., and Tan, R. T. (2020). 3d human pose estimation using spatio-temporal networks with explicit occlusion training. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 10631–10638.
Ci, H., Wang, C., Ma, X., and Wang, Y. (2019). Optimizing network structure for 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2262–2271.
Colantoni, A., Grigoriadis, E., Sateriano, A., Venanzoni, G., and Salvati, L. (2016). Cities as selective land predators? a lesson on urban growth, deregulated planning and sprawl containment. Science of the Total Environment, 545:329–339.
Einfalt, M., Zecha, D., and Lienhart, R. (2018). Activity-conditioned continuous human pose estimation for performance analysis of athletes using the example of swimming. In 2018 IEEE winter conference on applications of computer vision (WACV), pages 446–455. IEEE.
Fang, H.-S., Xie, S., Tai, Y.-W., and Lu, C. (2017). Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 2334–2343.
Felzenszwalb, P. F. and Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International journal of computer vision, 61(1):55–79.
Gasparini, F. and Schettini, R. (2006). Skin segmentation using multiple thresholding. In Internet Imaging VII, volume 6061, page 60610F. International Society for Optics and Photonics.
Graving, J. M., Chae, D., Naik, H., Li, L., Koger, B., Costelloe, B. R., and Couzin, I. D. (2019). DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife, 8:e47994. Publisher: eLife Sciences Publications, Ltd.
Güler, R. A., Neverova, N., and Kokkinos, I. (2018). Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7297–7306.
He, Y., Yan, R., Fragkiadaki, K., and Yu, S.-I. (2020). Epipolar transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7779–7788.
Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. science, 313(5786):504–507.
Hong, C., Yu, J., Wan, J., Tao, D., and Wang, M. (2015). Multimodal deep autoencoder for human pose recovery. IEEE Transactions on Image Processing, 24(12):5659–5670.
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., and Schiele, B. (2016). Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In European Conference on Computer Vision, pages 34–50. Springer.
Iqbal, U. and Gall, J. (2016). Multi-person pose estimation with local joint-to-person associations. In European Conference on Computer Vision, pages 627–642. Springer.
Iqbal, U., Molchanov, P., and Kautz, J. (2020). Weakly-supervised 3d human pose learning via multi-view images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5243–5252.
Iskakov, K., Burkov, E., Lempitsky, V., and Malkov, Y. (2019). Learnable triangulation of human pose. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7718–7727.
Jiao, L., Wang, D., Bai, Y., Chen, P., and Liu, F. (2021). Deep learning in visual tracking: A review. IEEE Transactions on Neural Networks and Learning Systems.
Kanazawa, A., Black, M. J., Jacobs, D. W., and Malik, J. (2018). End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7122–7131.
Kocabas, M., Athanasiou, N., and Black, M. J. (2020). Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5253–5263.
Kocabas, M., Karagoz, S., and Akbas, E. (2018). Multiposenet: Fast multi-person pose estimation using pose residual network. In Proceedings of the European conference on computer vision (ECCV), pages 417–433.
Kostrikov, I. and Gall, J. (2014). Depth sweep regression forests for estimating 3d human pose from images. In BMVC, volume 1, page 5.
Kreiss, S., Bertoni, L., and Alahi, A. (2019). Pifpaf: Composite fields for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11977–11986.
Li, C. and Lee, G. H. (2019). Generating multiple hypotheses for 3d human pose estimation with mixture density network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9887–9895.
Li, S., Ke, L., Pratama, K., Tai, Y.-W., Tang, C.-K., and Cheng, K.-T. (2020). Cascaded deep monocular 3d human pose estimation with evolutionary training data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6173–6183.
Li, S., Zhang, W., and Chan, A. B. (2015). Maximum-margin structured learning with deep networks for 3d human pose estimation. In Proceedings of the IEEE international conference on computer vision, pages 2848–2856.
Li, X., Cui, R., Sun, L., Aifantis, K. E., Fan, Y., Feng, Q., Cui, F., and Watari, F. (2014). 3dprinted biopolymers for tissue engineering application. International Journal of Polymer Science, 2014.
Liu, R., Shen, J., Wang, H., Chen, C., Cheung, S.-c., and Asari, V. (2020). Attention mechanism exploits temporal contexts: Real-time 3d human pose reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5064–5073.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. (2016).
Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110.
Martinez, J., Hossain, R., Romero, J., and Little, J. J. (2017). A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 2640–2649.
Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy, V. N., Mathis, M. W., and Bethge, M. (2018). DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience, 21(9):1281–1289. Number: 9 Publisher: Nature Publishing Group.
Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib, M., Fua, P., Seidel, H.-P., Rhodin, H., Pons-Moll, G., and Theobalt, C. (2019). Xnect: Real-time multi-person 3d human pose estimation with a single rgb camera. arXiv preprint arXiv:1907.00837.
Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib, M., Fua, P., Seidel, H.-P., Rhodin, H., Pons-Moll, G., and Theobalt, C. (2020). Xnect: Real-time multi-person 3d motion capture with a single rgb camera. ACM Transactions on Graphics (TOG), 39(4):82–1.
Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., and Theobalt, C. (2018). Single-shot multi-person 3d pose estimation from monocular rgb. In 2018 International Conference on 3D Vision (3DV), pages 120–130. IEEE.
Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H., Shafiei, M., Seidel, H.-P., Xu, W., Casas, D., and Theobalt, C. (2017). Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Transactions on Graphics (TOG), 36(4):1–14.
Mittal, A., Saad, M. A., and Bovik, A. C. (2015). A completely blind video integrity oracle. IEEE Transactions on Image Processing, 25(1):289–300.
Moon, G., Chang, J. Y., and Lee, K. M. (2019). Posefix: Model-agnostic general human pose refinement network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7773–7781.
Moreno-Noguer, F. (2017). 3d human pose estimation from a single image via distance matrix regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2823–2832.
Nagarkoti, A., Teotia, R., Mahale, A. K., and Das, P. K. (2019). Realtime indoor workout analysis using machine learning & computer vision. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1440–1443. IEEE.
Newell, A., Huang, Z., and Deng, J. (2016). Associative embedding: End-to-end learning for joint detection and grouping. arXiv preprint arXiv:1611.05424.
Nie, B. X., Wei, P., and Zhu, S.-C. (2017). Monocular 3d human pose estimation by predicting depth on joints. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 3467–3475. IEEE.
Nie, X., Feng, J., Zuo, Y., and Yan, S. (2018). Human pose estimation with parsing induced learner. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2100–2108.
Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., and Schiele, B. (2018). Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In 2018 international conference on 3D vision (3DV), pages 484–494. IEEE.
Papandreou, G., Zhu, T., Chen, L.-C., Gidaris, S., Tompson, J., and Murphy, K. (2018). Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In Proceedings of the European Conference on Computer Vision (ECCV), pages 269–286.
Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., and Murphy, K. (2017). Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4903–4911.
Pavlakos, G., Zhou, X., and Daniilidis, K. (2018). Ordinal depth supervision for 3d human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7307–7316.
Pavlakos, G., Zhou, X., Derpanis, K. G., and Daniilidis, K. (2017). Coarse-to-fine volumetric prediction for single-image 3d human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7025–7034.
Pavllo, D., Feichtenhofer, C., Grangier, D., and Auli, M. (2019). 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7753–7762.
Pereira, T. D., Aldarondo, D. E., Willmore, L., Kislin, M., Wang, S. S.-H., Murthy, M., and Shaevitz, J. W. (2019). Fast animal pose estimation using deep neural networks. Nature Methods, 16(1):117–125. Number: 1 Publisher: Nature Publishing Group.
Pereira, T. D., Tabris, N., Li, J., Ravindranath, S., Papadoyannis, E. S., Wang, Z. Y., Turner, D. M., McKenzie-Smith, G., Kocher, S. D., Falkner, A. L., et al. (2020). Sleap: multianimal pose tracking. bioRxiv.
Qiu, H., Wang, C., Wang, J., Wang, N., and Zeng, W. (2019). Cross view fusion for 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4342–4351.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788.
Rhodin, H., Salzmann, M., and Fua, P. (2018). Unsupervised geometry-aware representation for 3d human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 750–767.
Rogez, G., Weinzaepfel, P., and Schmid, C. (2017). Lcr-net: Localization-classificationregression for human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3433–3441.
Sárándi, I., Linder, T., Arras, K. O., and Leibe, B. (2020). Metrabs: Metric-scale truncationrobust heatmaps for absolute 3d human pose estimation. IEEE Transactions on Biometrics, Behavior, and Identity Science.
Shere, M., Kim, H., and Hilton, A. (2019). 3d human pose estimation from multi person stereo 360 scenes. In CVPR Workshops, pages 1–8.
Sun, K., Xiao, B., Liu, D., and Wang, J. (2019). Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5693–5703.
Sun, X., Shang, J., Liang, S., and Wei, Y. (2017). Compositional human pose regression. In Proceedings of the IEEE International Conference on Computer Vision, pages 2602–2611.
Tan, J. K. V., Budvytis, I., and Cipolla, R. (2017). Indirect deep structured learning for 3d human body shape and pose prediction.
Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., and Fua, P. (2016). Structured prediction of 3d human pose with deep neural networks. arXiv preprint arXiv:1605.05180.
Tekin, B., Márquez-Neila, P., Salzmann, M., and Fua, P. (2017). Learning to fuse 2d and 3d image cues for monocular body pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 3941–3950.
Tome, D., Russell, C., and Agapito, L. (2017). Lifting from the deep: Convolutional 3d pose estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2500–2509.
Toshev, A. and Szegedy, C. (2014). Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1653–1660.
Tu, H., Wang, C., and Zeng, W. (2020). Voxelpose: Towards multi-camera 3d human pose estimation in wild environment. arXiv preprint arXiv:2004.06239.
Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E., Laptev, I., and Schmid, C. (2018). Bodynet: Volumetric inference of 3d human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV), pages 20–36.
Wandt, B., Ackermann, H., and Rosenhahn, B. (2016). 3d reconstruction of human motion from monocular image sequences. IEEE transactions on pattern analysis and machine intelligence, 38(8):1505–1516.
Wandt, B. and Rosenhahn, B. (2019). Repnet: Weakly supervised training of an adversarial reprojection network for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7782–7791.
Wang, J., Yan, S., Xiong, Y., and Lin, D. (2020). Motion guided 3d pose estimation from videos. In European Conference on Computer Vision, pages 764–780. Springer.
Wang, Z., Chen, B., Wang, J., Kim, J., and Begovic, M. M. (2014). Robust optimization based optimal dg placement in microgrids. IEEE Transactions on Smart Grid, 5(5):2173–2182.
Xiao, B., Wu, H., and Wei, Y. (2018). Simple baselines for human pose estimation and tracking. In Proceedings of the European conference on computer vision (ECCV), pages 466–481.
Xie, R., Wang, C., and Wang, Y. (2020). Metafuse: A pre-trained fusion model for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13686–13695.
Xu, Z., Li, J., Yin, J., and Wu, Y. (2018). Localization of human 3d joints based on binocular vision. In International Conference on Cognitive Systems and Signal Processing, pages 65–75. Springer.
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., and Wang, X. (2018). 3d human pose estimation in the wild by adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5255–5264.
Yasin, H., Iqbal, U., Kruger, B., Weber, A., and Gall, J. (2016). A dual-source approach for 3d pose estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4948–4956.
Zanfir, A., Marinoiu, E., and Sminchisescu, C. (2018). Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2148–2157.
Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C.-L., and Grundmann, M. (2020). Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214.
Zhang, Z., Song, Y., and Qi, H. (2017). Age progression/regression by conditional adversarial autoencoder. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
Zhao, L., Peng, X., Tian, Y., Kapadia, M., and Metaxas, D. N. (2019). Semantic graph convolutional networks for 3d human pose regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3425–3435.
Zhou, X., Huang, Q., Sun, X., Xue, X., and Wei, Y. (2017). Towards 3d human pose estimation in the wild: a weakly-supervised approach. In Proceedings of the IEEE International Conference on Computer Vision, pages 398–407.
Zhou, X., Sun, X., Zhang, W., Liang, S., and Wei, Y. (2016a). Deep kinematic pose regression. In European Conference on Computer Vision, pages 186–201. Springer.
Zhou, X., Zhu, M., Leonardos, S., Derpanis, K. G., and Daniilidis, K. (2016b). Sparseness meets deepness: 3d human pose estimation from monocular video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4966–4975.
Zhou, Y., Jiang, G., and Lin, Y. (2016c). A novel finger and hand pose estimation technique for real-time hand gesture recognition. Pattern Recognition, 49:102–114.