簡易檢索 / 詳目顯示

研究生: 李沅諭
Li, Yuan-Yu
論文名稱: 結合聲音感知系統之雙眼機械頭強健追蹤控制
Robust Binocular Tracking with the Application of an Auditory Perception System
指導教授: 蔡清元
Tsay, Tsing-Iuan
學位類別: 碩士
Master
系所名稱: 工學院 - 機械工程學系
Department of Mechanical Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 65
中文關鍵詞: 聲音雙眼機械頭聲源追蹤
外文關鍵詞: visual tracking, sound source tracking, binocular tracking, auditoy
相關次數: 點閱:63下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   一般在雙眼機械頭於目標物追蹤的應用中,利用雙眼的影像追蹤演算法來得到目標物在影像中或實際空間中的座標位置,並透過機械頭的伺服控制使目標物能夠留在雙眼攝影機的視覺範圍內或影像中心。但假設所追蹤的目標物不在機械頭的視覺範圍內或是被障礙物遮蔽時,以影像為基礎的目標物追蹤演算法便會導致追蹤失敗的結果。因此引入聲源追蹤的概念,對於一個會發出聲音的目標物,例如:飛行器、交通工具等,利用其所發出的聲音訊號為追蹤的依據,如此完成一個聲源追蹤系統,做為視覺追蹤系統的前導與輔助,如此可以增加整個追蹤系統的追蹤速度與強健性。

      在聲源方位判別,採用一個由三顆麥克風所組成的正三角形陣列,將陣列平面上的二維空間畫分成數個判別區域,透過離散時間複利葉轉換之頻譜分析與濾波器之設計及倒頻譜聲源辨識,得到正確的發聲體聲源訊號,再經過Cross-Correlation計算出各麥克風所接收訊號之彼此延遲時間關係,來計算聲源在二維空間上所在的區域,達到聲源追蹤之目的。

      在視覺追蹤方面,為了改善一般樣版比對之演算法對於非單純背景的強健性,提出利用影像邊緣區塊分析之方法,並配合SAD演算法;利用邊緣區塊分析將一些與目標物差異甚大的區塊先濾除,以減少搜尋次數與誤判率,再做SAD樣板比對,達成視覺追蹤的目的。
    總結以上,本論文建構出一個結合視覺追蹤與聲源方位判別應用於機械頭控制之即時追蹤系統,以增加系統在追蹤目標物的強健性。未來此架構不但可以利用於日夜保全系統,亦可應用於家用機器人與人類交談互動,使機器人更具人性化的功能。

     Generally, in visual tracking by a robotic binocular head, an image tracking algorithm is first used to seek the target and determine the error between the targets’ coordinates and the coordinates of the center of the image. A servo control algorithm is then utilized to control the robotic head and keep the target in the center of the image planes. If the target is not in the image planes of the CCD cameras, or is covered by objects in the work space, much time is required to find the target in the work space. Therefore, the visual tracking fails easily if only an image-based tracking algorithm is used. This study uses an auditory perception system to assist visual tracking. Using auditory perception system to track the source of the sound from a target and then moving the robotic head to face the target help the CCD cameras of the head locate the target’s image, ensuring that the visual tracking works. Combining the visual tracking system with an auditory perception system increases the overall efficiency of the visual tracking system.

     In auditory perception, an equilateral triangular array of microphones is used to receive sound signals, and the direction to the sound source is determined using the cross-spectrum method. Digital signal filters and a cepstrum lifter can be used to recognize the digital signal to determine whether the sound matches the target’s sound model. Then, the target’s sound source is recognized and its direction is determined. In image tracking, Three Step Hierarchal Search Method is replaced with the Edge-Blobs pattern matching method proposed herein to increase the tracking speed. It is combined with Edge-Blobs contour matching and YCbCr space image recognition to improve the recognition of the target image and to filter out background noise. The target is then tracked regardless it is out of shape or passes through a work space with complex background.

     A robust binocular tracking system with auditory perception is developed to enhance target tracking. This control architecture not only can be used in security systems and used in home robots to improve the way in which they interact with humans.

    Contents Abstract i Content ii List of Figures v 1 Introduction 1.1 Preface 1 1.2 Motivation and Objective 1 1.3 Literature Survey 2 1.4 Contribution 3 1.5 Thesis Organization 3 2 Image Tracking 2.1 Fundamental Image Processing 5 2.1.1 Edge Detection 5 2.1.2 Morphological Processing 6 2.2 Pattern Matching 7 2.2.1 SSD and SAD 8 2.2.2 Three-Step Hierarchal Search 8 2.2.3 Edge-Blobs Pattern Matching 9 2.2.4 Refreshing the Pattern Model 10 2.3 Contour Matching 11 2.3.1 Initial Contour Model 11 2.3.2 Edge-Blobs Contour Match and Refreshing the Contour 12 2.4 Color Space Conversion 13 2.4.1 HSI Color Space 13 2.4.2 YCbCr Color Space 14 2.4.3 Filtering Background Noises 14 2.5 Color Space Image Tracking 15 2.5.1 Match Error Normalization 15 2.5.2 YCbCr Space Image Tracking 16 3 Sound Source Tracking 3.1 Fundamental Digital Signal Processing 24 3.1.1 Digital Signal Sampling 25 3.1.2 Digital Signal Detection by Determining Sound Intensity 26 3.1.3 Digital Signal Spectrum Analysis 26 3.1.4 Filtering Digital Signals 27 3.1.5 Determining Digital Signal Phase Lead-Lag 27 3.2 Sound Source Tracking and Recognizing 28 3.2.1 Equilateral Triangle Array of Microphones 29 3.2.2 Locating the Sound Source 29 3.2.3 Recognizing the Sound Source 30 3.2.4 The Steps of Tracking Sound Source 32 4 Hardware and Control Architecture 4.1 Hardware 37 4.1.1 Five Axes Robotic Binocular Head 37 4.1.2 Array of Microphones 38 4.2 Target Tracking Control Architecture 38 4.2.1 Sound Source Tracking Control Architecture 38 4.2.2 Visual Tracking Control Architecture 38 4.3 Binocular Tracking with Auditory Perception 39 4.3.1 System Communication 39 4.3.2 Overall Tracking System 40 5 Experiment 5.1 Experiments Setup 48 5.1.1 Path of the Target 48 5.1.2 Coordinates Transformation 49 5.2 Sound Source Tracking Experiment 50 5.3 Experiment on Overall Binocular Tracking System 50 6 Conclusions and Future Work 6.1 Conclusions 61 6.2 Future Work 62 Reference 63

    Reference:
    [1] J. L. Barron, D. J. Fleet, S. S. Beaucnemin and T. A. Burkitt, “Performance of optical Flow Technique,” in Proc. of the 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 236-242, 1992.
    [2] E. Oran Brigham, The Fast Fourier Transform and its Application, Englewood Cliffs, 1988.
    [3] P. Y. Chen, “A Robust Visual Servo System for Tracking an Arbitrary-Shaped Object by a New Active Contour Method,” Master Thesis, Department of Electronical Engineering, National Taiwan University, 2003.
    [4] J. L. Chen, “Development of a Sound Direction Detection System,” Master Thesis, Department of Electrical and Control Engineering, National Chiao-Tung University, 2002.
    [5] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, 2nd Edition, Chapter 3, Prentice Hall, 2002.
    [6] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, 2nd Edition, Chapter 6, Prentice Hall, 2002.
    [7] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, 2nd Edition, Chapter 9, Prentice Hall, 2002.
    [8] C. Z. Chen, “Motion Detection and Estimation of a Real-Time Visual, Servo Tracking System,” Master Thesis, Dept. of Mech., NCKU, Taiwan, 2003.
    [9] D. Liu, and L. C. Fu, “Target Tracking in an Environment of Nearly Stationary and Biased Clutter,” IEEE Int. Conf. on Intelligent Robots and Systems, Vol. 3, pp. 1358-1363, 2001.
    [10] Christophe Garcia and Georios Tziritas, “Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis,” IEEE Trans. on Multimedia, Vol. 1, No. 3, Sept. 1999.
    [11] S. Hutchinson, G. D. Hager and P.I. Corke, “A Tutorial on Visual Servo Control,” IEEE Trans. on Robotics and Automation, Vol. 12, pp.651-670, 1996.
    [12] C. M. Huang, S. C. Wang, L.C. Fu, P. Y. Chen and Y. S. Cheng, “A Robust Visual Tracking of an Arbitrary-Shaped Object by a New Active Contour Method for a Virtual Reality Application,” IEEE Proc. Conf. on Networking, Sensing and Control, Mar. 2004.
    [13] B. K .P. Horn and B. G. Schunck, “Determine Optical Flow,” Artificial Intelligence, Vol. 17, pp.185-203, 1981.
    [14] Hu. J., Su, T.M., Cheng, C.C., Lio, W.H., and Wu, T.I, “A self-calibrated Speaker Tracking System Using both Audio and Video Data,” Proceedings of the 2002 IEEE International Conference on Control Applications, 2002.
    [15] Lilian Ji, Hong Yan, “Attractable Snakes Based on the Greedy Algorithm for Contour Extraction”, Pattern Recognition, Vol. 35, pp. 791-806, April, 2002.
    [16] H. M. Jong, “Parallel Architectures for 3-Steps Hierarchical Search Block-Matching Algorithm,” IEEE Trans. on Circuits and Systems for Viewo Technology, Vol. 4, No. 4, pp. 407-416, 1994.
    [17] R. Kelly, “Robust Asymptotically Stable Visual Servoing of Planar Robots,” IEEE Trans. on Robotics and Automation, Vol. 12, pp. 449-459, 1994.
    [18] R. Kelly, R. Carelli, O. Nasisi, B. Kuchen, and F. Reyes, “Stable Visual Servoing of Camera-in-Hand Robotic Systems,” IEEE/ASME Trans. on Mechatronics, Vol. 5, pp. 39-48, 2000.
    [19] Won Kim, Ju-Jang Lee, ”Visual Tracking using Snake Based on Target’s Contour Information”, Industrial Electronics, 2001. Proceedings. ISIE 2001. IEEE International Symposium on, Vol. 1, pp.12-16 June 2001.
    [20] B. Lucas and T. Kanade, “An Iterative Image Registration Technique with and Application to Stereo Vision,” in Proc. of DARPA Image Understanding Workshop, pp. 121-130, 1981.
    [21] Y. C. Li, Class Note of Digital Signal Processing.
    [22] A. J. Lipton, H. Fujiyoshi and R. S. Patil, “Moving Target Classification and Tracking from Real-Time Video,” Processing of the Fourth IEEE Workshop on Application of Computer Vision, pp.8-14, 1998.
    [23] Brunelli, R. and T. Poogio, “Template Matching: Matched Spatial Filters and Beyond,” MIT AI Memo 1549, July 1995.
    [24] G.. K. Wang, “Design and Implementation of a Multi-Purpose Real-time Visual Tracking System based on Modified Adaptive Background Subtraction and Multi-Cue Template Matching,” Master Thesis, Department of Electronical Engineering, National Cheng-Kung University, 2004.
    [25] Koichi Yanagisawa, Akihisa Ohya and Shin’ichi Yuta, “An Operator Interfance for an Autonomous Mobile Robot Using Whistle Sound and a Source Direction Detection System”, Proceeding of 21th IEEE IECON, pp.1118-1123, 1995.
    [26]王小川, 語音訊號處理, 1st Edition, 全華科技圖書股份有限公司, 2004.
    [27]林宸生, 數位信號-影像與語音處理, 2nd Edition, 全華科技圖書股份有限公司, 2003.
    [28] http://www.incx.nec.co.jp/robot/R100/english/index.html.

    下載圖示 校內:2008-09-06公開
    校外:2008-09-06公開
    QR CODE