簡易檢索 / 詳目顯示

研究生: 李原毅
Lee, Yuan-Yi
論文名稱: 沉浸式人機互動模型開發之研究
Research and development of immersive human-computer interaction system
指導教授: 蕭世文
Hsiao, Shih-Wen
學位類別: 碩士
Master
系所名稱: 規劃與設計學院 - 工業設計學系
Department of Industrial Design
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 81
中文關鍵詞: 人機介面沉浸式互動手勢辨識手勢追蹤卷積神經網路
外文關鍵詞: Human-machine interface, immersive interaction, gesture recognition, gesture tracking, convolutional neural network.
相關次數: 點閱:254下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 科技發達的時代,人機互動一直是重要的研究發展重點之一,互動模式發展歷程從圖像化介面(GUI)一直到現在的自然介面(NUI), 操作模式也發展出不同互動模式,從鍵盤指令、觸控發展到自然語言控制,如果能運用自然語言來取代鍵盤滑鼠指令來進行更自然的人機互動,便能達到具有沉浸的體驗感受。本研究以此為出發點,建立一個沉浸式人機互動模型,以常見網路攝影機為資料收集與系統控制儀器,將收集的影像經過膚色擷取、去躁、形態學處理與輪廓提取將影像進行預處理,並定義六個常見自然手勢語言(無手勢出現、握拳、手掌全開、食指伸出、食指與中指伸出、拇指與小指伸出),再輸入12層的卷積神經網路進行學習訓練,訓練結果顯示手勢辨識準確率達98%以上具有高辨識度,並結合運用幀差法與核相關濾波演算法理論所組成的手勢追蹤使整體操作感提高,完成系統操作。最後以自己撰寫3D操作模型與使用者進行驗證,使用者可以匯入自己建立的模型並運用不同的自然手勢來控制模型進行放大、縮小、移動與旋轉等指令,讓使用者在操作模型上具有沉浸感,結果顯示使用者使用該系統時具有高度沉浸感體驗。

    With the development of science and technology, human-computer interaction has been one of the important researches. The development process of interactive mode has changed from the graphical interface (GUI) to the current natural interface (NUI). The operation mode has also been created to different interactive modes from keyboard control to natural gesture control. If natural gesture language can be used to replace traditional commands, we can achieve more immersive experience.
    Based on this point, this study establishes an immersive human-computer interactive model, uses webcam as data collection and system control instruments, and carries out image collection through skin color acquisition, dinoising, morphological processing, and contour extraction methods. And define six common natural gesture languages (none, fist, one, two, five, swing), and then input 12 layers of convolutional neural network to training. The result showed gesture recognition system has high recognition accuracy rate of 98%. Then we combined using temporal difference method and Kernelized Correlation Filters theory composed of gesture tracking to improve the overall sense of operation, and finally use 3D modeling which can import their own models and use different natural gestures to control the model to zoom in, zoom out, move, and rotate commands. It allows users to have more immersion sense in the operation model than the traditional one.

    摘要 ii SUMMARY iii ACKNOWLEDGEMENTS iv Table of Contents v List of Tables viii List of Figures ix CHAPTER 1 INTRODUCTION 1 1.1 Background 1 1.2 Motivation 5 1.3 Purpose 6 1.4 Research framework 7 1.5 Research framework diagram 9 1.6 Limitations of the Study 10 CHAPTER 2 LITERATURE REVIEW 11 2.1 Human-computer interaction 11 2.2 Immersive experience and immersive technology 12 2.3 Gesture capture 14 2.3.1 Marked point recognition 15 2.3.2 Single camera 15 2.3.3 Multiple cameras 15 2.3.4 Depth sensor 15 2.3.5 Glove sensor 16 2.3.6 Wrist sensor 16 2.3.7 Non-wearable sensors 16 2.4 Machine learning 17 2.4.1 Segmentation based on image gestures 19 2.4.2 Based on image feature extraction 19 2.4.3 Identification based on image classification 19 2.4.4 Gesture recognition based on deep learning 20 2.5 Gesture tracking theory 23 CHAPTER 3 THEORIES OF THE RESEARCH 27 3.1 Skin color extraction 27 3.1.1 Denoising 28 3.1.2 Morphology 30 3.1.3 Contour extraction 33 3.2 Convolutional Neural Network 33 3.2.1 Convolutional layer 35 3.2.2 Pooling layer 36 3.2.3 Fully connected layer 37 3.2.4 Dropout 37 3.2.5 Activation function 38 3.3 Temporal Difference method 38 3.4 Kernelized Correlation Filters 39 3.4.1 Linear regression 40 3.4.2 Learning 41 3.4.3 Testing 41 3.4.4 Updating 42 CHAPTER 4 RESEARCH METHODS 43 4.1 Experimental environment and instruments 44 4.2 Hand feature extraction 46 4.2.1 Image reading 47 4.2.2 Skin color detection 47 4.2.3 Denoising 49 4.2.4 Threshold 50 4.2.5 Contour extraction 52 4.2.6 Convolutional Neural Network Training 53 4.3 Gesture tracking 63 4.4 Immersive operation 65 CHAPTER 5 RESULT 67 CHAPTER 6 CONCLUSION 73 6.1 Research results and contributions 73 6.2 Follow-up research and development 73 REFERENCES 75 Chinese References 80

    Anderson, F., Grossman, T., Matejka, J., Fitzmaurice, G., (2013). YouMove: enhancing movement training with an augmented reality mirror. In: Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology. ACM, pp. 311e320.
    Arango Paredes, J.D., Munoz, B., Agredo, W., Ariza-Araujo, Y., Orozco, J.L., Navarro, A., (2015). A reliability assessment software using Kinect to complement the clinical evaluation of Parkinson's disease. In: Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE. IEEE, pp. 6860e6863.
    Caivano, J. L. . (1994). Color and Sound - Physical and Psychophysical Relations. Color Research and Application, 19(2), 126-133.
    Camps-Valls, G., Gómez-Chova, L., Calpe, J., Soria, E., Martín, J. D., Alonso, L., & Moreno, J. . (2004). Robust support vector method for hyperspectral data classification and knowledge discovery. IEEE Trans. Geosci. Remote Sens., 42(7), p.1530-1542.
    Chittaro, L., Sioni, R., Crescentini, C., & Fabbro, F. (2017). Mortality salience in virtual reality experiences and its effects on users' attitudes towards risk. International Journal of Human-Computer Studies, 101, 10e22.
    Datcu, D., Lukosch, S., & Brazier, F. (2015). On the usability and effectiveness of different interaction types in augmented reality. International Journal of Humancomputer Interaction, 31, 193e209.
    Dhanalakshmi, P. , Palanivel, S. , & Ramalingam, V. . (2009). Classification of audio signals using SVM and RBFNN. Expert Systems with Applications, 36(3), 6069-6075. doi: DOI 10.1016/j.eswa.2008.06.126
    Elmezain, M., Al-Hamadi, A., Appenrodt, J., Michaelis, B., 2008. A Hidden Markov Model-based Continuous Gesture Recognition System for Hand Motion Trajectory. In: Pattern Recognition, pp. 1e4. ICPR 2008. 19th International Conference on, IEEE, 2008.
    Enyedy, N., Danish, J. A., & DeLiema, D. (2015). Constructing liminal blends in a collaborative augmented-reality learning environment. International Journal of Computer-Supported Collaborative Learning, 10, 7e34.
    Enyedy, N., Danish, J. A., Delacruz, G., & Kumar, M. (2012). Learning physics through play in an augmented reality environment. International Journal of Computer Supported Collaborative Learning, 7, 347e378.
    Erol, Bebis, Nicolescu, R.D. Boyle, X. Twombly. (2007), Vision-based hand pose estimation: a review. Comput. Vis. Image Underst., 108, pp. 52-73
    Esmaili, S. , Krishnan, S. , & Raahemifar, K. . (2004). Content based audio classification and retrieval using joint time–frequency analysis. Paper presented at the IEEE international conference on acoustics, speech and signal processing.
    G. V. Pail, G. J. Beach, C. J. Cohen, and C. J. Jacobus, (2006). Tracking and Gesture Recognition System Particularly Suited to Vehicular Control Applications
    Głomb, Romaszewski, Sochan, Opozda. (2011). Unsupervised parameter selection for gesture recognition with vector quantization and hidden Markov models. Proceedings of the IFIP Conference on Human–computer Interaction., 6949, pp. 170-177
    Goh, D. H. L., Lee, C. S., & Razikin, K. (2016). Interfaces for accessing location-based information on mobile devices: An empirical evaluation. Journal of the Association for Information Science and Technology, 67, 2882e2896.
    Hsiao, K. F., Chen, N. S., & Huang, S. Y. (2012). Learning while exercising for science education in augmented reality among adolescents. Interactive Learning Environments, 20, 331e349
    Jin, S.-A. A. (2013). The moderating role of sensation seeking tendency in robotic haptic interfaces. Behaviour & Information Technology, 32, 862e873.
    Kapuscinski, T., Oszust, M., Wysocki, M., 2014. Hand gesture recognition using time- of-flight camera and viewpoint feature histogram. In: Intelligent Systems in Technical and Medical Diagnostics. Springer, pp. 403e414.
    Katsuki, Y., Yamakawa, Y., Ishikawa, M., 2015. High-speed human/robot hand interaction system. In: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-robot Interaction Extended Abstracts. ACM, pp. 117e118.
    Knoblich, G. , Jordan, J. S. . (2003). Action coordination in groups and individual: learning anticipatory control. Journal of Experimental Psychology learning Memory and Cognition, 29, p.1066-1016.
    Liu, H., & Wang, L. J. I. J. o. I. E. (2018). Gesture recognition for human-robot collaboration: A review. 68, 355-367.

    Li, D. G. , Sethi, I. K. , Dimitrova, N. , & McGee, T. . (2001). Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22(5), 533-544. doi: Doi 10.1016/S0167-8655(00)00119-7
    Mammone, R. J. , Zhang, X. , & Ramachandran, R. P. . (1996). Robust speaker recognition: a feature-based approach. IEEE Signal Processing Magazine, 13(5), 1053-5888.
    Matsumoto, Y., Zelinsky, A., 2000. An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In: Automatic Face and Gesture Recognition, pp. 499e504. Proceedings. Fourth IEEE International Conference on,IEEE, 2000.
    Mitra, S., Acharya, T., 2007. Gesture Recognition: a Survey, Systems, Man, and Cy bernetics, Part C: Applications and Reviews. IEEE Transactions on, 37, pp.
    N. Greggio, A. Bernardino, C. Laschi, P. Dario, J. (2012) .Santos-Victor Fast estimation of Gaussian mixture models for image segmentation. Machine Vision & Applications, 23 (4), pp. 773-789
    Obdrzalek, S., Kurillo, G., Ofli, F., Bajcsy, R., Seto, E., Jimison, H., Pavel, M., 2012. Accuracy and robustness of Kinect pose estimation in the context of coaching of elderly population.
    Polys, N. F., Bowman, D. A., & North, C. (2011). The role of Depth and Gestalt cues in information-rich virtual environments. International Journal of HumanComputer Studies, 69, 30e51.
    Roberts, Lawrence G. (1980). Machine perception of three-dimensional solids. New York: Garland Pub.
    Rovira, A., & Slater, M. (2017). Reinforcement Learning as a tool to make people move to a specific location in Immersive Virtual Reality. International Journal of Human-computer Studies, 98, 89e94.
    Sakai, K. , Hikosaka, O. , Nakamura, K. . (2004). Emergence of rhythm during motor learning. Trends in Cognitive Sciences, 8, p.547-553
    Sen, A. , & Srivastava, M. . (1990). Regression Analysis: Theory, Methods, and Applications. New York: Springer.
    Shih Wen Hsiao, Rong Qi Chem, Wan Lee Leng.(2015).Applying riding-posture optimization on bicycle frame design. Applied Ergonomics, Volume 51, November 2015, Pages 69-79.
    Shih-Wen Hsiao, Chu-Hsuan Lee, Meng-Hua Yang, Rong-Qi Chen..(2017). User interface based on natural interaction design for seniors. Computers in Human Behavior, Volume 75, Pages 147-159
    Solomatine, D. P., & Shrestha, D. L. (2004). AdaBoost.RT: A boosting algorithm for regression problems. . Paper presented at the Proc. IEEE Int. Joint Conf., Neural Networks.
    Starner, T., Weaver, J., Pentland, A., 1998. Real-time american sign language recognition using desk and wearable computer based video. In: Pattern Analysis and Machine Intelligence, pp. 1371e1375. IEEE Transactions on.
    Starner, T.E., 1995. Visual Recognition of American Sign Language Using Hidden Markov Models.
    Tay, Francis E. H. , & Cao, L. . (2001). Application of support vector machines in financial time series forecasting. Omega, 29(4), 309-317.
    Van Amsterdam, B., Nakawala, H., De Momi, E., & Stoyanov, D, (2019), Weakly supervised recognition of surgical gestures, in International Conference on Robotics and Automation. IEEE, ,pp. 9565–9571

    Vapnik, V. , Goldwich, S. , & Smola, A. . (1997). Support vector method for function approximation, regression estimation, and signal processing: Cambride:MIT press.
    Vapnik, V. . (1995). The nature of statistical learning theory.Springer. New York.
    Velius, G. . (1988). Variants of cepstrum based speaker identity verification. Paper presented at the Acoustics, Speech, and Signal Processing New York, NY.
    Weng C., Li Y., Zhang M., Guo K., Tang X., Pan Z. (2010). Robust hand posture recognition integrating multi-cue hand tracking Proceedings of the Entertainment for Education. Digital Techniques and Systems, International Conference on E-Learning and Games, Edutainment, pp. 497-508
    Wojciechowski, R., & Cellary, W. (2013). Evaluation of learners' attitude toward learning in ARIES augmented reality environments. Computers & Education, 68, 570e585.
    Xiuhui Wang, Ke Yan. Immersive human–computer interactive virtual environment using large-scale display system. Future Generation Computer Systems, In press, corrected proof, Available online 2017
    Zhao, X., You, X., Shi, C., & Gan, S. (2015). Hypnosis therapy using augmented reality technology: Treatment for psychological stress and anxiety. Behaviour & Information Technology, 34, 646e653.
    Zhihan Lv, Xiaoming Li, Wenbin Li. Virtual reality geographical interactive scene semantics research for immersive geography learning. Neurocomputing, Volume 254, 2017, Pages 71-78
    許頌伶(2016),利用三為模型訓練類神經網路得手勢辨識技術,交通大學資訊科學與工程研究所碩士論文。
    謝汝欣(2016),基於深度卷積神經網路之手勢辨識技術研究,交通大學電子工程研究所碩士論文。
    洪屹呈(2015),基於卷積神經網路的門禁與自動標記系統,國立交通大學電控工程研究所碩士論文
    陳昱丞(2015),基於卷積神經網路之車牌辨識系統,國立交通大學電控工程研究所碩士論文
    陳昱丞(2015),基於卷積神經網路之車牌辨識系統,國立交通大學電控工程研究所碩士論文
    劉建賢(2011),使用加速度計和陀螺儀之跌倒偵測系統,大同大學資訊工程研究所碩士論文。
    李健銘(2010),即時手語辨識,國立成功大學工程科學研究所碩士論文
    張維哲(1992),人工神經網路,全欣出版社,台北: 全欣資訊圖書股份有限公司

    無法下載圖示 校內:2025-08-29公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE