| 研究生: |
廖檍菖 Liao, Yi-Chang |
|---|---|
| 論文名稱: |
以注視點輔助手勢操作之非接觸式人機介面 Using Gaze Pointing to Assist Gesture Operating in Non-Contact Human Computer Interface |
| 指導教授: |
戴顯權
Tai, Shen-Chuan 陳敬 Chen, Jing |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 中文 |
| 論文頁數: | 101 |
| 中文關鍵詞: | 視線追蹤 、視線定位 、手勢辨識 、手部追蹤 、虛擬鏡頭 、人機介面 |
| 外文關鍵詞: | Gaze Tracking, Gaze Pointing, Gesture Recognition, Hand Tracking, Virtual Camera, Human-Computer Interaction (HCI) |
| 相關次數: | 點閱:209 下載:11 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
受到近年新冠疫情(Covid – 19)影響,公共衛生安全的重要性日益增加,間接帶動 非接觸式技術的蓬勃發展。目前在非接觸式應用於電腦人機介面的操作功能設計上,通常僅使用單一手勢追蹤技術或視線追蹤技術來實現,然而此兩種技術皆有操作上的使用限制,導致使用者體驗不佳。此外,在系統架設上亦需考量大規模部署之外部硬體成本。因此本研究著重於降低外部硬體成本之限制條件下,探討手勢及視線追蹤操作之差異與互補,並開發一整合應用系統以提升使用者的非接觸式介面操作體驗。
本研究提出透過單一實體 RGB 視訊鏡頭實現手勢及視線追蹤技術,設計一注視點輔助手勢操作之人機介面系統,讓使用者在非接觸式電腦人機介面上執行一般視窗介面的功能操作。本研究之重點包含:(1)透過虛擬鏡頭技術使手勢及視線追蹤模組同時運作於單一實體RGB視訊鏡頭,以降低硬體成本;(2)利用手部特徵點座標辨識多類手勢以實現不同的介面功能,讓使用者能以非接觸的方式進行基本電腦介面操作;(3)分析本研究設計之介面功能手勢於系統操作的延遲時間,以確保各功能操作的流暢度;(4)分析使用者測試注視點輔助手勢定位於移動任務的耗費時間,並與僅使用手勢進行移動操作之方法進行比較,以驗證本研究開發之注視點輔助手勢定位功能之有效性;(5)測試公共資訊網頁介面,並模擬使用者的操作實例以驗證本系統之可用性。
本研究主要貢獻為設計與實作以注視點輔助手勢操作之非接觸式電腦人機介面系統,讓系統架設人員僅需以一般RGB視訊鏡頭即可在公共資訊機台上部署非接觸式操作介面。藉由視訊鏡頭偵測使用者於螢幕上的注視點及簡易手勢,讓使用者能快速且方便地操作公共資訊介面。
The growing concern over public health and safety due to the impact of COVID-19 pandemic during past years has spurred adopting non-contact input method for easy and safe user interaction of human-computer interface (HCI). While the technologies of gesture recognition and gaze tracking are common approaches to realizing non-contact user input, each has its own operational limitation and results in degrading user experience. For large-scale deployment of public computer devices, the incurred cost of specialized hardware devices cannot be overlooked. This study attempts to develop a non-contact input method for operating computer applications. The objective is to reduce hardware cost while combining the technologies of gesture recognition and line-of-sight tracking in addition to achieving better user experience.
The system architecture of the non-contact user input interface includes the following components: (1) Image Processor which implements virtual cameras to achieve image sharing using one RGB video camera and sends captured images to Gesture Tracker and Gaze Tracker, (2) Gaze Tracker which detects and tracks user’s gaze point, (3) Gesture Tracker which is responsible for the recognition of a set of simple and intuitive gesture operations designed by this study, (4) Function Displayer which takes the charge of processing the result of recognizing user’s non-contact operations. The gesture design and the implementation of gesture recognition process ensure that exact one gesture is recognized. Further, the implementation synchronizes the processing to make sure that the next image is processed only after the completion of the previous one. The results therefore are produced with coherence and consistency. A set of test runs demonstrate the effectiveness, real-time performance, and usability.
The main contribution of this study is combining gaze pointing and gesture control using one single ordinary RGB video camera. It can simplify the deployment of public computer devices such as an information kiosk and avoids the hardware cost incurred from using specialized non-contact input devices. In addition, better user experience is achieved with simple and easy-to-use gestures, and efficient cursor movement.
[1] Alyssa Morales,“Christopher Sholes invented the typewriter and the QWERTY keyboard”, https://pabook.libraries.psu.edu/literary-cultural-heritage-map-pa/bios /Sholes__Christopher (accessed on Jan. 29, 2024).
[2] Valerie Landau,“How Douglas Engelbart Invented the Future”,https://www.smithsonianmag.com/innovation/douglas-engelbart-invented-future180967498/ (accessed on Jan. 29, 2024).
[3] Hviid Andersen, Johan Hviid et al.,“Physical, psychosocial, and individual riskfactorsfor neck/shoulder pain with pressure tenderness in the muscles among workers performing monotonous, repetitive work” , Spine, vol. 27, no. 6, pp. 660–667, 2002.
[4] Samir M. Abdelmagid et al.,“Performance of repetitive tasks induces decreased grip strength and increased fibrogenic proteins in skeletal muscle: role of force and inflammation” , PLoS One, vol. 7, no. 5, May 2012.
[5] Glenn Anderson and Enzo A. Palombo,“Microbial contamination of computerkeyboards in a university setting”, American Journal of Infection Control, vol. 37, pp. 507-509, 2009.
[6] Sima Rahimizhian and Foad Irani,“Contactless hospitality in a post-Covid-19 world”,International Hospitality Review, pp. 1-16, 2020.
[7] Šumak, Boštjan et al.,“Sensors and Artificial Intelligence Methods and Algorithms for Human–Computer Intelligent Interaction: A Systematic Mapping Study”, Sensors, vol. 22, no. 1, p. 20, 2022.
[8] Ultra,“ Leap Motion Controller”, https://www.ultraleap.com/product/leap-motioncontroller/ (accessed on Jan. 29, 2024).
[9] Microsoft,“ Kinect-For-Windows”, https://learn.microsoft.com/en- us/windows/apps76/design/devices/kinect-for-windows (accessed on Jan. 29, 2024).
[10] Lucas Matney,“CTRL-labs scoops up Myo armband tech from North”,https://techcrunch.com/2019/06/27/ctrl-labs-scoops-up-myo-armband-tech-from-north (accessed on Jan. 29, 2024).
[11] Eye Tracker,“ Tobii SA - Eye Tracking Technology", https://tmcap.substack,com/p/dip-dive-tobii-sa-eye-tracking-technology (accessed on Jan. 29, 2024).
[12] Grand View Research,“Gesture Recognition Market Size, Share & Trends Analysis Report By Technology (Touch-based, Touchless), By Industry (Automotive, Consumer Electronics, Healthcare), By Region, And Segment Forecasts, 2023-2030”,https://www.grandviewresearch.com/industry-analysis/gesture-recognition-market (accessed on Jan. 29, 2024).
[13] Radu-Daniel Vatavu and Laura-Bianca Bilius,“GestuRING: A Web-based Tool for Designing Gesture Input with Rings, Ring-Like, and Ring-Ready Devices”,Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology (UIST '21), New York, USA, pp. 710-723, 2021.
[14] Abhijith Bhaskaran et al.,“Smart gloves for hand gesture recognition: Sign language to speech conversion system”, 2016 International Conference on Robotics and Automation for Humanitarian Applications (RAHA), Amritapuri, India, pp. 1-6, 2016.
[15] Mordor Intelligence,“Gesture Recognition Market Analysis”, https://www.mordorintelligence.com/industry-reports/gesture-recognition-market(accessed on Jan. 29, 2024)
[16] Kaoning Hu et al.,“Temporal Interframe Pattern Analysis for Static and Dynamic Hand Gesture Recognition”, 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, pp. 3422-3426, 2019.
[17] Mais Yasen and Shaidah Jusoh,“A systematic review on hand gesture recognition 77techniques, challenges and applications”, PeerJ Computer Science, 2019.
[18] Rosalina et al.,“Implementation of real-time static hand gesture recognition using artificial neural network”, 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT), Kuta Bali, Indonesia, pp. 1-6, 2017.
[19] Siddharth Swarup Rautaray and Anupam Agrawal,“A novel human computer interface based on hand gesture recognition using computer vision techniques” ,Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia (IITM '10), New York, USA, pp. 292-296, 2011.
[20] Janez Zaletelj et al.,“Vision-based human-computer interface using hand gestures”, Eighth International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '07), Santorini, Greece, pp. 41-41, 2007.
[21] Kazuki Tsuchida et al.,“Handwritten Character Recognition in the Air by Using Leap Motion Controller”, 2015 International Conference on Human-Computer Interaction (HCI International), Los Angeles, USA, pp. 534-538, 2015.
[22] Wei Lu et al.,“Dynamic Hand Gesture Recognition With Leap Motion Controller”,IEEE Signal Processing Letters, vol. 23, no. 9, pp. 1188-1192, September, 2016.
[23] Rubén Nogales and Marco Benalcázar,“Real-Time Hand Gesture Recognition Using the Leap Motion Controller and Machine Learning”, 2019 IEEE Latin American Conference on Computational Intelligence, Guayaquil, Ecuador, pp. 1-7, 2019.
[24] Google,“Hand landmarks detection guide”,https://developers.google.com/mediapipe/ hsolutions/vision/hand_landmarker(accessed on Jan. 29, 2024).
[25] Fan Zhang et al.,“Mediapipe hands”, On-device real-time hand tracking, vol. 10214, pp. 1-5, 2020.78
[26] Grand View Research,“Eye Tracking Market Size, Share & Trends Analysis Report By Type (Optical, Eye Attached Tracking), By Application (Healthcare, Consumer Electronics), By Component (Hardware, Software), By Location, And Segment Forecasts, 2022 - 2030”, https://www.grandviewresearch.com/industry-analysis/eyetracking-market (accessed on Jan. 29, 2024)
[27] Transparency Market Research,“Eye Tracking Market”,https://www.psmarketresearch.com/market-analysis/eye-tracking-market.html(accessed on Jan. 29, 2024)
[28] Tadas Baltrusaitis et al.,“OpenFace 2.0: Facial Behavior Analysis Toolkit”, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi'an, China, pp. 59-66, 2018.
[29] Meng-Tzu Chiu et al.,“High-Accuracy RGB-D Face Recognition via SegmentationAware Face Depth Estimation and Mask-Guided Attention Network”, 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, pp. 1-8, 2021.
[30] Tadas Baltrušaitis et al.,“OpenFace: An open source facial behavior analysis toolkit”, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, pp. 1-10, 2016.
[31] Takehiko Ohno et al.,“FreeGaze: a gaze tracking system for everyday gaze interaction”, Proceedings of the 2002 symposium on Eye tracking research & applications (ETRA '02), ACM, New York, NY, USA, pp. 125-132, 2002.
[32] Yihua Cheng et al.,“Appearance-based gaze estimation with deep learning: A review and benchmark”, 2021.
[33] Erroll Wood et al.,“Rendering of eyes for eye-shape registration and gaze estimation”, Proceedings of the IEEE International Conference on Computer Vision, 79pp. 3756-3764, 2015.
[34] Anjana Sharma and Pawanesh Abrol,“Eye gaze techniques for human computer interaction: A research survey”, International Journal of Computer Applications, vol. 71, no. 9, 2013.
[35] Xinming Wang et al.,“Vision-Based Gaze Estimation: A Review”, IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 2, pp. 316-332, June, 2022.
[36] WebGazer.js,“Democratizing Webcam Eye Tracking on the Browser”,https://webgazer.cs.brown.edu/ (accessed on Jan. 29, 2024)
[37] Dan Witzner Hansen and Qiang Ji,“In the Eye of the Beholder: A Survey of Models for Eyes and Gaze”, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 3, pp. 478-500, March, 2010.
[38] Francisco Muñoz-Leiva et al.,“Measuring advertising effectiveness in Travel 2.0 websites through eye-tracking technology”, Physiology & behavior 200 pp.83-95, 2019
[39] Borhan Vasli et al.,“On driver gaze estimation: Explorations and fusion of geometric and data driven approaches”, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, pp. 655-660, 2016.
[40] Zeng Hai Chen et al.,“Eye-tracking-aided digital system for strabismus diagnosis”, Healthc Technol Lett, pp. 1-6, January, 2018.
[41] Dominic Canare et al.,“Using Gesture, Gaze, and Combination Input Schemes as Alternatives to the Computer Mouse”, Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 62, no. 1, pp. 297-301, October 2018.
[42] Microsoft Support,“Eye control basics in Windows”,https://support.microsoft.com/en-us/windows/eye-control-basics-in-windows97d68837-b993-8462-1f9d-3c957117b1cf (accessed on Jan. 29, 2024)80
[43] Abdul Moiz Penkar et al.,“Designing for the eye: design parameters for dwell in gaze interaction”, Proceedings of the 24th Australian Computer-Human Interaction Conference (OzCHI '12), New York, USA, pp. 479-488, 2012.
[44] Kristen Grauman et al.,“Communication via eye blinks and eyebrow raises: Videobased human-computer interfaces”, Universal Access in the Information Society, vol. 2, no. 4, pp. 359-373, December. 2003.
[45] Shumin Zhai et al.,“Manual and gaze input cascaded (MAGIC) pointing”, Proceedings of the Conference on Human Factors in Computing Systems (CHI '99), New York, USA, pp. 246-253, 1999.
[46] GazeFlow,“Webcam Eye Tracking engine”, https://gazerecorder.com/gazeflow/ (accessed on Jan. 29, 2024)
[47] GazeRecorder,“Track your eye movement and create video recordings of it by relying on this handy application that uses your webcam as an input source”, https://gazerecorder.com/gazerecorder/ (accessed on Jan. 29, 2024)
[48] GazePointer,“Control mouse cursor position with your eyes via webcam”, https://gazerecorder.com/gazepointer/ (accessed on Jan. 29, 2024)
[49] Gaze Board,“Gaze based keyboard, text entry using your eye’s”, https://gazerecorder.com/gazeboard/ (accessed on Jan. 29, 2024)
[50] Ngip-Khean Chuan and Ashok Sivaji,“Combining eye gaze and hand tracking for pointer control in HCI: Developing a more robust and accurate interaction system for pointer positioning and clicking”, 2012 IEEE Colloquium on Humanities, Science and Engineering (CHUSER), Kota Kinabalu, Malaysia, pp. 172-176, 2012.
[51] Yanxia Zhang et al.,“The costs and benefits of combining gaze and hand gestures for remote interaction”, Human-Computer Interaction–INTERACT 2015: 15th IFIP TC 13 International Conference, Bamberg, Germany, pp.14-18, September, 2015.81
[52] Ishan Chatterjee et al.,“Gaze+Gesture: Expressive, Precise and Targeted Free-Space Interactions”, in Proceedings of the ACM International Conference on Multimodal Interaction (ICMI '15), New York, USA, pp. 131-138, 2015.
[53] Windows Dev Center,“access camera from multiple application at the same time”,https://social.msdn.microsoft.com/Forums/windows/en-US/12de4147-9fa0-4031-818f-2cec9a1d080b/access-camera-from-multiple-application-at-the-sametime?forum=wdk/ (accessed on Jan. 29, 2024)
[54] Nhu-Ngoc Dao et al.,“A Contemporary Survey on Live Video Streaming from a Computation-Driven Perspective”, ACM Computing Surveys, vol. 54, no. 10, January,2022.
[55] Open Broadcaster Software,“What is the virtual camera?”,https://obsproject.com/kb/virtual-camera-guide (accessed on Jan. 29, 2024)
[56] CNS 國家照度標準,“辦公室及學校室內照度標準”,https://www-ws.gov.taipei/001/Upload/public/MMO/files/CNS%E7%85%A7%E5%BA%A6%E6%A8%99%E6%BA%96.pdf (accessed on Jan. 29, 2024)
[57] OBS Projects,“OBS Project's full-time lead developer and project maintainer lead developer and project maintainer”, https://obsproject.com/contribute(accessed on Jan. 29, 2024)
[58] Opencv,“OpenCV modules”, https://docs.opencv.org/4.7.0/ (accessed on Jan. 29, 2024)
[59] Pyvirtualcam,“Pyvirtualcam sends frames to a virtual camera from Python.”, https://pypi.org/project/pyvirtualcam/0.10.2/ (accessed on Jan. 29, 2024)
[60] Websockets, “Websockets is a library for building WebSocket servers and clients inPython with a focus on correctness, simplicity, robustness, and performance? ”,https://pypi.org/project/websockets/ (accessed on Jan. 29, 2024)
[61] Math, “Math --- Mathematical functions”,82https://docs.python.org/zh-tw/3.7/library/math.html (accessed on Jan. 29, 2024)
[62] PyAutoGUI,“PyAutoGUI is a cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.”,https://pypi.org/project/PyAutoGUI/ (accessed on Jan. 29, 2024)
[63] Steve Bryson,“Effects of lag and frame rate on various tracking tasks”, Stereoscopic Displays and Applications IV, vol. 1915, pp. 155-166, 1993.
[64] Paul Read et al.,“Restoration of motion picture film”, Conservation and Museology, Butterworth-Heinemann, pp. 24–26, 2000.
[65] Teavel Tainan, “VR tour of Tainan”, https://www.twtainan.net/immersive/zh-tw/a-0/(accessed on Jan. 29, 2024)