| 研究生: |
李政輝 Li, Cheng-Hui |
|---|---|
| 論文名稱: |
整合隨機森林法、卷積神經網路與門閘遞迴神經網路之物品學習系統設計及其於服務型機器人之應用 Design of Object Learning System by Using Random Forest, Convolutional Neural Network and Gated Recurrent Neural Network for Service Robot |
| 指導教授: |
李祖聖
Li, Tzuu-Hseng S. |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 英文 |
| 論文頁數: | 75 |
| 中文關鍵詞: | 卷積神經網路 、門閘遞迴單元 、物品學習 、隨機森林 |
| 外文關鍵詞: | Convolutional Neural Network, Gated Recurrent Unit, Object Learning, Random Forest |
| 相關次數: | 點閱:128 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
受到兒童自我探索學習法的啟發,本論文提出一個應用於服務型機器人的物品學習系統,讓機器人透過與物品的互動,取得物品特徵,建立物品概念。本論文定義三種物品特徵,分別是互動式特徵、視覺特徵以及本質特徵。互動式特徵是機器人與不同物品互動後所取得的特徵,當機器人推動、堆疊物品後,透過感測器偵測行爲前後的物品變化關係,即可建立該物品的互動特徵。視覺特徵包含了彩色影像資訊以及深度資訊。而本質特徵爲物品的本質特性,例如體積、重量、形狀等資訊。本論文藉由隨機森林及卷積神經網路,建立這三種特徵相互間的關係模型,以預測物品的特性,提升機器人的判斷能力。在隨機森林中,我們使用人工蜂群演算法作爲節點的分裂函式,並依其找出的適應值用作判斷特徵好壞的依據,建立互動式特徵以及本質特徵的關聯性。卷積神經網路則用來建立視覺特徵與本質特徵間的關聯性,讓機器人可依物品的形狀,判斷如何與其互動。透過物體概念模型的建立,機器人可以提供多樣服務,如選擇合適的容器,完成倒水的服務,或是依照物品的形狀,選擇合適的動作將其遞給使用者。除此之外,結合門閘遞迴神經網路,機器人可透過多次堆疊嘗試經驗,學習堆疊物品的順序。實際實驗結果顯示,本論文所提方法,可讓機器人透過與物品互動,建立物品概念模型並成功完成多種任務。
Inspired by the self-exploring learning approach, this thesis proposes an object learning system in which the robot interacts with an object to obtain its features and constructs the object concept. The system consists of three kinds of features: interaction features, visual features, and intrinsic features. When the robot interacts with an object, for example pushing or stacking the object, it observes the changes of the object to obtain the interaction features. At the same time, the robot gets the visual features of the object that include color image information and depth information. The intrinsic features are the properties of an object, such as volume, weight and shape. The relationship models of these three kinds of features are constructed through a Random Forest algorithm (RF) and a convolutional neural network (CNN). The established models help the robot to predict the properties of a new object and to make decisions. The relationship models between the interaction features and the intrinsic features are built by the RF, where an artificial bee colony algorithm is integrated as the splitting function of each node and then used to judge whether a feature is good or not. The relationship models between the visual features and the intrinsic features are determined by the CNN, which allows the robot to decide how to interact with an object through the obtained visual features. Two experiments are constructed in this thesis, the service providing task and the stacking task. In the former, the robot figures out the appropriate object by the object concept models to accomplish the appointed task, for example, selecting a suitable container to pour water. In the latter experiment, the robot combined with the gated recurrent neural network to learn out the stacking sequence of various objects. All the real experimental results demonstrate that the robot can build the object concept models by interacting with the objects and utilize these models to accomplish several tasks.
[1] Y. Sakagami, R. Watanabe, C. Aoyama, S. Matsunaga, N. Higaki, and K. Fujimura, “The intelligent ASIMO: System overview and integration,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robot and Systems, 2002, vol. 3, pp. 2478-2483.
[2] KUKA Industrial robots. [Online] Available: https://www.kuka.com/en-us/products/robotics-systems/industrial-robots
[3] S. Chitta, E. G. Jones, M. Ciocarlie, and K. Hsiao, “Perception, planning, and execution for mobile manipulation in unstructured environments,” IEEE Robotics and Automation Magazine, Special Issue on Mobile Manipulation, vol. 19, no.2, pp. 58-71, 2012.
[4] J. R. Bellegarda, “Spoken language understanding for natural interaction: The siri experience,” in Natural Interaction with Robots, Knowbots and Smartphones, Springer New York, pp. 3-14, 2014.
[5] O. Russakovsky et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015.
[6] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484-489, 2016.
[7] Important milestones: your child by two years. [Online] Available: https://www.cdc.gov/ncbddd/actearly/milestones/milestones-2yr.html
[8] Child milestones. [Online] Available: http://topic.parenting.com.tw/web/pic/child_dev.jpg
[9] G. Metta, and P. Fitzpatrick, “Early integration of vision and manipulation,” Adaptive Behavior, vol. 11, no. 2, pp. 109-128, 2003.
[10] S. Griffith, J. Sinapov, V. Sukhoy, and A. Stoytchev, “A behavior-grounded approach to forming object categories: Separating containers from noncontainers,” IEEE Transactions on Autonomous Mental Development, vol. 4, no. 1, pp. 54-69, 2012.
[11] L. Montesano, M. Lopes, A. Bernardino, and J. Santos-Victor, “Learning object affordances: from sensory--motor coordination to imitation,” IEEE Transactions on Robotics, vol. 24, no. 1, pp. 15-26, 2008.
[12] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[13] S. Du, and S. Chen, “Salient object detection via random forest,” IEEE Signal Processing Letters, vol. 21, no. 1, pp. 51-54, 2014.
[14] H. Abdulsalam, D. B. Skillicorn, and P. Martin, “Classification using streaming random forests,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 1, pp. 22-36, 2011.
[15] J. Chen, K. Li, Z. Tang, K. Bilal, S. Yu, C. Weng, and K. Li, “A parallel random forest algorithm for big data in a Spark cloud computing environment,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 4, pp. 919-933, 2017.
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proceedings of Annual conference on neural information processing systems, 2012, pp. 1097-1105.
[17] K. Lee, S. Cheon, and C. O. Kim, “A convolutional neural network for fault classification and diagnosis in semiconductor manufacturing processes,” IEEE Transactions on Semiconductor Manufacturing, vol. 30, no. 2, pp. 135-142, 2017.
[18] C. Yao, Y. Qu, B. Jin, L. Guo, C. Li, W. Cui, and L. Feng, “A convolutional neural network model for online medical guidance,” IEEE Access, vol. 4, pp. 4094-4103, 2016.
[19] E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Convolutional neural networks for large-scale remote-sensing image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 2, pp. 645-657, 2017.
[20] Y. Chen, H. Jiang, X. Li, X. Jia, and P. Ghamisi, “Deep feature extraction and classification of hyperspectral images based on convolutional neural networks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 10, pp. 6232-6251, 2016.
[21] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the National Academy of Sciences, vol. 79, no. 8, pp. 2554-2558, 1982.
[22] G. Li, “A new dynamic strategy of recurrent neural network,” in Proceedings of IEEE International Conference on Cognitive Informatics, 2009, pp. 486-491.
[23] T. Nakashika, T. Takiguchi, and Y. Ariki, “Voice conversion using RNN pre-trained by recurrent temporal restricted Boltzmann machines, “IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 23, no. 3, pp. 580-587, 2015.
[24] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[25] M. Sundermeyer, H. Ney, and R. Schlüter, “From feedforward to recurrent LSTM neural networks for language modeling,” IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 23, no. 3, pp. 517-529, 2015.
[26] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734.
[27] J. Kang, W. Q. Zhang, and J. Liu, “Gated recurrent units based hybrid acoustic models for robust speech recognition,” in Proceedings of International Symposium on Chinese Spoken Language Processing,2016, pp. 1-5.
[28] Microsoft Kinect v2. [Online] Available: http://www.xbox.com/zh-TW/xbox-one/accessories/kinect
[29] K. Zhang, L. Zhang, and M. H. Yang, “Fast compressive tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 10, pp. 2002-2015, 2014.
[30] PortAudio Library. [Online] Available: http://www.portaudio.com
[31] C. Van Loan, Computational frameworks for the fast Fourier transform, Philadelphia: SIAM, 1992.
[32] WACON DynPick WEF-6A200-4-RC24. [Online] Available: http://www.wacoh-tech.com/products/dynpick/200n.html
[33] J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[34] D. Karaboga, and B. Basturk, “A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm,” Journal of Global Optimization, vol. 39, no. 3, pp.459-471, 2007.
[35] J. Kennedy and R. C. Eberhart, “Particle swarm optimization,” in Proceedings of IEEE International Conference on Neural Networks, 1995, pp. 1942-1948.
[36] E. Rashedi, H. Nezamabadi-Pour and S. Saryazdi, “GSA: a gravitational search algorithm,” Information Sciences, vol.179, no. 13, pp. 2232-2248, 2009.
[37] J. H. Holland, “Genetic algorithms,” Scientific American, vol. 267, no. 1, pp. 66-72, 1992.
[38] S. Mirjalili, S. M. Mirjalili and A. Lewis, “Grey wolf optimizer,” Advances in Engineering Software, vol. 69, pp. 46-61, 2014.
[39] I. Jolliffe, “Principal component analysis and factor analysis,” in Principal Component Analysis, 1986, pp. 115-128.
[40] J. Whitehill, and C. W. Omlin, “Haar features for facs au recognition,” in Proceedings of 7th International Conference on Automatic Face and Gesture Recognition, 2006, pp. 97-101.
[41] D. R. G. H. R. Williams, and G. Hinton, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533-538, 1986.
[42] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in ICML, 2013, vol. 30.
[43] SICK LMS100. [Online] Available: https://www.sick.com/tw/zf/2d-lidar/lms1xx/lms100-10000/p/p109841
[44] ROBOTIS. [Online] Available: http://en.robotis.com/index/product.php?cate_code=101010
[45] Convolutional Neural Networks. [Online] Available: https://read01.com/zPNEj.html
[46] Understanding LSTM Networks. [Online] Available: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
[47] Caffe Deep Learning Framework. [Online] Available: http://caffe.berkeleyvision.org/