| 研究生: |
張議隆 Chang, I-Lung |
|---|---|
| 論文名稱: |
應用寬基線立體視覺於估計3D排球軌跡與棒球投手骨架 Wide Baseline Stereo for 3D Volleyball Trajectory and Baseball Pitcher Skeleton Estimation |
| 指導教授: |
連震杰
Lien, Jenn-Jier 郭淑美 Guo, Shu-Mei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 114 |
| 語文別: | 英文 |
| 論文頁數: | 108 |
| 中文關鍵詞: | 立體視覺 、廣基線立體應用 、三維重建 、相機校正 |
| 外文關鍵詞: | Stereo Vision, Wide-Baseline Stereo Application, 3D Reconstruction, Camera Calibration |
| 相關次數: | 點閱:31 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文利用廣基線立體視覺系統與非水平攝影機配置,推進運動分析中的三維重建技術,以支持教練和裁判工作。排球軌跡估計與棒球投手骨架姿勢估計均採用兩步驟校準方法,解決廣基線攝影機設置及非標準對齊的挑戰。內在參數使用棋盤格進行校準,而外在參數則利用各場景特定的參考物:排球場景使用標準尺寸的排球場,投手練習場使用雷射測距儀測量的物體。重新訓練的電腦視覺模型提升了穩健性,實現精確的三維重建,用於動態運動環境中的生物力學分析和表現評估。
在排球場景中,位於距離球場30至40公尺的內向攝影機捕捉二維球軌跡,通過三角測量估計三維軌跡,以分析發球和首次進攻。TrackNetv3進行二維球檢測,ARTrackv2在先前檢測位置的縮小區域內進行追蹤,優化管線效率。排球場作為外在校準參考物。通過三角測量球場點與已知尺寸比較進行評估,平均誤差為11公分,展現出在遠距離成像和廣基線下的穩健性。
在棒球投手練習場中,側視和前視攝影機重建投手的三維骨架姿勢,提供投擲優化的生物力學反饋。挑戰包括遮擋(例如,攝影機內手臂被軀幹遮擋或攝影機間單視圖肢體可見性)以及可變光照條件。YOLOv7檢測邊界框,BoT-SORT追蹤投手,ViTPose估計二維骨架,所有模型均針對應用重新訓練。外在校準使用雷射測距儀測量的練習場物體。評估估計肢體和軀幹長度,通過常識期望進行驗證,因缺乏真實數據,確保可靠的姿勢重建。
This thesis advances 3D reconstruction for sports analytics using wide-baseline stereo vision systems with non-horizontal camera configurations, supporting coaching and officiating. Both volleyball trajectory estimation and baseball pitcher skeletal pose estimation employ a two-step calibration to address challenges of wide-baseline camera setups and non-standard alignments. Intrinsic parameters are calibrated using a chessboard, while extrinsic parameters use scenario-specific references: the volleyball court's standardized dimensions for volleyball and objects measured with a laser rangefinder for the bullpen. Retrained computer vision models enhance robustness, enabling precise 3D reconstructions for biomechanical analysis and performance evaluation in dynamic sports environments.
For volleyball, inward-facing cameras positioned 30–40 meters from the court capture 2D ball trajectories, triangulated to estimate 3D trajectories for analyzing serves and first attacks. TrackNetv3 detects the ball in 2D, and ARTrackv2 tracks it within a reduced region around prior detections, optimizing pipeline efficiency. The volleyball court serves as the extrinsic calibration reference. Evaluation via triangulation of court points against known dimensions yields a mean error of 11 cm, demonstrating robustness despite long-range imaging and wide baselines.
In the baseball bullpen, side and front-view cameras reconstruct the pitcher's 3D skeletal pose to provide biomechanical feedback for throw optimization. Challenges include occlusions (e.g., intra-camera arm-torso occlusion and inter-camera single-view limb visibility) and variable lighting. YOLOv7 detects bounding boxes, BoT-SORT tracks the pitcher, and ViTPose estimates 2D skeletons, all retrained for the application. Extrinsic calibration uses bullpen objects measured with a laser rangefinder. Evaluation estimates limb and torso lengths, validated against common-sense expectations due to the absence of ground truth, ensuring reliable pose reconstruction.
[1] N. Aharon, R. Orfaig, and B.-Z. Bobrovsky. “Bot-Sort: Robust Associations Multi-Pedestrian Tracking, ”arXiv preprint arXiv:2206.14651, 2022.
[2] Y. Bai, Z. Zhao, Y. Gong, and X. Wei. “Artrackv2: Prompting Autoregressive Tracker Where to Look and How to Describe,” in Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 19048-19057, 2024.
[3] H. Bay, T. Tuytelaars, and L. Van Gool. “Surf: Speeded up Robust Features,” in Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9, pp. 404-417, 2006.
[4] J.-Y. Bouguet. "Camera Calibration Toolbox for Matlab". In California Institute of Technology, 1999 https://robots.stanford.edu/cs223b04/JeanYvesCalib/index.html
[5] G. Bradski. “The Opencv Library, ”Dr. Dobb's Journal: Software Tools for the Professional Programmer, 25(11), pp. 120-123, 2000.
[6] J.-R. Chang, and Y.-S. Chen. “Pyramid Stereo Matching Network,” in Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5410-5418, 2018.
[7] K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, and J. Xu. “Mmdetection: Open Mmlab Detection Toolbox and Benchmark, ”arXiv preprint arXiv:1906.07155, 2019.
[8] Y.-J. Chen, and Y.-S. Wang. “Tracknetv3: Enhancing Shuttlecock Tracking with Augmentations and Trajectory Rectification,” in Proceedings of the Proceedings of the 5th ACM International Conference on Multimedia in Asia, pp. 1-7, 2023.
[9] "Configuring Synchronized Capture with Multiple Cameras". In. Teledyne FLIR, 2025 https://www.teledynevisionsolutions.com/support/support-center/application-note/iis/configuring-synchronized-capture-with-multiple-cameras/
[10] R. Drillis, R. Contini, and M. Bluestein. “Body Segment Parameters, ”Artif. limbs, 8(1), pp. 44-66, 1964.
[11] H.-S. Fang, J. Li, H. Tang, C. Xu, H. Zhu, Y. Xiu, Y.-L. Li, and C. Lu. “Alphapose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time, ”Ieee Transactions on Pattern Analysis and Machine Intelligence, 45(6), pp. 7157-7173, 2022.
[12] P. S. Furgale, Hannes; Maye, Jérôme; Rehder, Jörn; Schneider, Thomas; Oth, Luc. "Kalibr Visual-Inertial Calibration Toolbox". In ETH Zurich, Autonomous Systems Lab, 2013 https://github.com/ethz-asl/kalibr
[13] G. Gallego, and A. Yezzi. “A Compact Formula for the Derivative of a 3-D Rotation in Exponential Coordinates, ”Journal of Mathematical Imaging and Vision, 51, pp. 378-384, 2015.
[14] A. Geiger, P. Lenz, and R. Urtasun. “Are We Ready for Autonomous Driving? The Kitti Vision Benchmark Suite,” in Proceedings of the 2012 IEEE conference on computer vision and pattern recognition, pp. 3354-3361, 2012.
[15] R. C. Gonzalez. (2009). Digital Image Processing. Pearson education india.
[16] C. Harris, and M. Stephens. “A Combined Corner and Edge Detector,” in Proceedings of the Alvey vision conference, pp. 10-5244, 1988.
[17] R. Hartley. (2003). Multiple View Geometry in Computer Vision (Vol. 665). Cambridge university press.
[18] H. Hirschmuller. “Stereo Processing by Semiglobal Matching and Mutual Information, ”Ieee Transactions on Pattern Analysis and Machine Intelligence, 30(2), pp. 328-341, 2007.
[19] R. E. Kalman. “A New Approach to Linear Filtering and Prediction Problems, 1960.
[20] H. W. Kuhn. “The Hungarian Method for the Assignment Problem, ”Naval research logistics quarterly, 2(1‐2), pp. 83-97, 1955.
[21] K. Levenberg. “A Method for the Solution of Certain Non-Linear Problems in Least Squares, ”Quarterly of applied mathematics, 2(2), pp. 164-168, 1944.
[22] D. G. Lowe. “Distinctive Image Features from Scale-Invariant Keypoints, ”International journal of computer vision, 60, pp. 91-110, 2004.
[23] D. W. Marquardt. “An Algorithm for Least-Squares Estimation of Nonlinear Parameters, ”Journal of the society for Industrial and Applied Mathematics, 11(2), pp. 431-441, 1963.
[24] M. Muja, and D. G. Lowe. “Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration, ”VISAPP (1), 2(331-340), pp. 2, 2009.
[25] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. “Orb: An Efficient Alternative to Sift or Surf,” in Proceedings of the 2011 International conference on computer vision, pp. 2564-2571, 2011.
[26] P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich. “Superglue: Learning Feature Matching with Graph Neural Networks,” in Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4938-4947, 2020.
[27] D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling. “High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth,” in Proceedings of the Pattern Recognition: 36th German Conference, GCPR 2014, Münster, Germany, September 2-5, 2014, Proceedings 36, pp. 31-42, 2014.
[28] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao. “Yolov7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors,” in Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7464-7475, 2023.
[29] G. Xu, X. Wang, Z. Zhang, J. Cheng, C. Liao, and X. Yang. “Igev++: Iterative Multi-Range Geometry Encoding Volumes for Stereo Matching, ”Ieee Transactions on Pattern Analysis and Machine Intelligence, 2025.
[30] Y. Xu, J. Zhang, Q. Zhang, and D. Tao. “Vitpose: Simple Vision Transformer Baselines for Human Pose Estimation, ”Advances in neural information processing systems, 35, pp. 38571-38584, 2022.
[31] Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang. “Bytetrack: Multi-Object Tracking by Associating Every Detection Box,” in Proceedings of the European conference on computer vision, pp. 1-21, 2022.
[32] Z. Y. Zhang. “A Flexible New Technique for Camera Calibration, ”Ieee Transactions on Pattern Analysis and Machine Intelligence, 22(11), pp. 1330-1334, 2000 https://doi.org/10.1109/34.888718.