簡易檢索 / 詳目顯示

研究生: 陳柔安
Chen, Jou-An
論文名稱: 結合單樣本學習與深度特徵擷取技術於多旋翼無人機之自主式交叉視角定位系統
Deep Representation on One-shot Learning Model for Cross-view Localization of UAV
指導教授: 黃悅民
Huang, Yueh-Min
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 63
中文關鍵詞: 無人機卷積神經網絡單樣本學習影像導航機器學習
外文關鍵詞: Unmanned Aerial Vehicles(UAV), Convolutional Neural Networks(CNN), One-shot Learning, Vision-guided Navigation, Machine Learning
相關次數: 點閱:148下載:12
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來隨著自走車技術日新月異的發展,終端運算之於機器自主性的技術亦有顯著的突破,特別是多旋翼無人機的自主性控制與即時運算處理。本研究基於此發展趨勢,研究現階段對於運算力仍較吃重的卷積神經網絡於多旋翼無人機上之應用與表現。以無人機的交叉視角定位為目的,目標在於解決無人機之自主定位長期仰賴GPS訊號,然而實際應用上卻時常遭遇訊號定位不穩定或完全無GPS訊號之情況。在該情境下,無人機如何透過周遭視角辨別所在位置,甚至進一步得到路徑方位在全域性空間上的規劃,為本研究探討研究之議題。
    本研究提出一套基於卷積神經網絡特徵擷取的單樣本學習模型於無人機交叉視角定位上的方法。系統首先通過離線之模型學習相同地點、不同地點的場景特徵,以 達到模型能判斷給定的兩張場景影像是否在同一位置之目的。而後再藉由無人機即時傳入之影片畫面與地面預先運算之特徵向量進行最近距離的查找,將運算結果映射至正射影像上的位置,進而輔助後續全域性路徑規劃之應用。
    本研究將過去用於交叉視角影像檢索上的方法,結合近年來有顯著發展的單樣本學習模式,應用在無人機之狀態意識、視覺定位問題上,並且以實際設備捕捉之無人機視角影像、測試影片等驗證該方法在真實世界中之可行性。研究內容主要可分為兩部分:第一部分呈現以圖像檢索問題訓練離線學習模型的過程,提出如何以現有圖資訓練並改進模型,達成目標資料集在判定位置上之可行效能。第二部分則呈現對於訓練模型的調整,縮短交叉視角定位誤差,以達成精準定位之目的。

    This study targeted at cross-view localization of UAV, hoping to solve the dependency of GPS signals in GPS-denied scenario. How can a UAV localize itself with the surrounding scene in such environment, or even find a global path for path planning are the issues we discussed.
    In this study, an one-shot learning model based on CNN feature representation for cross-view localization of UAV is proposed. The system will first learn the scene features from same/different viewpoints in same/different spots to recognize whether two given images are indicating same spots. Then, the video captured by UAV will be streaming real-time and match with the precomputed feature vectors of supporting set images. Finally, the nearest feature vectors indicating the location will be outputted.
    This study transformed the previous cross-view image retrieval problem to actual localization problem on multi-rotor UAVs, applying to real campus environment for structuring the problem and testing. The work can be divided into two parts: In partⅠ, the training process of the proposed model on self-collected campus images was presented. Result and observation were included for further improvement. In partⅡ, the training model was revised through these observations to address the localization error. Finally, the improved model was validated on UAV captured videos and showed that the localization error can be bounds to 25-30m, confirming the feasibility of the proposed method.

    摘要 I Extended Abstract II 誌謝 VIII 目錄 IX 表目錄 XI 圖目錄 XII 第一章、緒論 1 1-1 研究動機 1 1-2 研究目的 2 1-3 章節提要 2 第二章、背景介紹與文獻探討 4 2-1 交叉視角定位(Cross-view localization) 4 2-2 單樣本學習(One-shot learning) 6 2-3 深度學習特徵擷取(Deep Feature Representation) 7 2-3-1 ResNet 7 2-3-2 RMSProp 8 2-4 孿生網絡(Siamese Network) 8 2-5 過濾元件(Filtering Component) 10 2-5-1 T-SNE Feature Embedding 10 2-5-2 非監督式K鄰近鄰居演算法(Unsupervised K-Nearest Neighbors) 11 2-6 圖像搜索評估標準 11 第三章、研究方法與系統設計 13 3-1 系統架構 13 3-2 訓練資料樣本 14 3-3 硬體平台移植 20 第四章、實驗設計與結果分析 25 4-1 實驗設備 25 4-2 訓練模型架構 25 4-2-1 透過模型參數進行知識轉移 27 4-2-2 透過共享特徵進行知識轉移 29 4-3 訓練模型的初步強化 30 4-3-1 在訓練樣本中增加隨機雜訊 31 4-3-2 Margin 值對於訓練模型的影響 32 4-4 初步實驗結果 34 4-5 初步結果改善方向 40 4-6 改善方法與實驗結果 47 4-6-1 以regularization改善overfitting問題 47 4-6-2 保留不同地域的樣本資料 50 4-6-3 在訓練樣本對中加入地理位置資訊 51 4-7 最終採定之映射結果 53 第五章、結論與未來展望 57 5-1 結論 57 5-2 未來展望 57 參考文獻 60

    [1] A. Carrio, C. Sampedro, A. Rodriguez-Ramos, and P. Campoy, “A Review of Deep Learning Methods and Applications for Unmanned Aerial Vehicles,” Journal of Sensors, vol. 2017, pp. 1–13, 2017.
    [2] Qi Shan, Changchang Wu, B. Curless, Y. Furukawa, C. Hernandez, and S. M. Seitz, “Accurate Geo-Registration by Ground-to-Aerial Image Matching,” 2014, pp. 525–532.
    [3] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.
    [4] T.-Y. Lin, Y. Cui, S. Belongie, and J. Hays, “Learning deep representations for ground-to-aerial geolocalization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5007–5015.
    [5] D. Anguelov, C. Dulong, D. Filip, C. Frueh, S. Lafon, R. Lyon, A. Ogale, L. Vincent, and J. Weaver. Google street view: Capturing the world at street level. Computer, 2010.
    [6] Amir Roshan Zamir and Mubarak Shah, "Image Geo-localization Based on Multiple Nearest Neighbor Feature Matching using Generalized Graphs", IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014
    [7] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million Image Database for Scene Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2017.
    [8] A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In Proceedings of the international conference on very large data bases, 1999
    [9] M. S. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002
    [10] T. Taisho, L. Enfu, T. Kanji, and S. Naotoshi, “Mining visual experience for fast cross-view UAV localization,” in System Integration (SII), 2015 IEEE/SICE International Symposium on, 2015, pp. 375–380.
    [11] O. Boiman, E. Shechtman, and M. Irani, “In defense of nearest neighbor based image classification,” in CVPR, 2008, pp. 1–8.
    [12] T. Tommasi and B. Caputo, “Frustratingly easy NBNN domain adaptation,” in ICCV, 2013, pp. 897–904.
    [13] Y. Tian, C. Chen, and M. Shah, “Cross-View Image Matching for Geo-localization in Urban Environments,” arXiv preprint arXiv:1703.07815, 2017.
    [14] M. Pavan and M. Pelillo. Dominant sets and pairwise clustering. IEEE TPAMI, 29(1):167–172, 2007.
    [15] One-shot learning, 參考來源: https://en.wikipedia.org/wiki/One-shot_learning
    [16] M. Fink, "Object classification from a single example utilizing class relevance pseudo-metrics". NIPS, 2004.
    [17] Bart and Ullman "Cross-generalization: learning novel classes from a single example by feature replacement". CVPR, 2005.
    [18] K. Murphy, A. Torralba, W.T. Freeman, "Using the forest to see the trees: a graphical model relating features, objects, and scenes". NIPS, 2004.
    [19] D. Hoiem, A.A. Efros, and M. Herbert, "Geometric context from a single image". ICCV, 2005.
    [20] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385,2015.
    [21] Tieleman, Tijmen, and Geoffrey Hinton. “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.” COURSERA: Neural networks for machine learning 4.2 (2012): 26-31.
    [22] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, “Signature verification using a" siamese" time delay neural network,” in Advances in Neural Information Processing Systems, 1994, pp. 737–744.
    [23] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, vol. 1, pp. 539–546.
    [24] AT&T人臉資料庫, 參考來源: http://www.uk.research.att.com/facedatabase.html.
    [25] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in Computer vision and pattern recognition, 2006 IEEE computer society conference on, 2006, vol. 2, pp. 1735–1742.
    [26] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap to human-level performance in face verification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1701–1708.
    [27] J. Wang et al., “Learning fine-grained image similarity with deep ranking,” arXiv preprint arXiv:1404.4661, 2014.
    [28] van der Maaten, L.J.P.; Hinton, G.E. (Nov 2008). "Visualizing High-Dimensional Data Using t-SNE" (PDF). Journal of Machine Learning Research. 9: 2579–2605.
    [29] J.R. Smith and C.-S. Li, ªImage Retrieval Evaluation”, Proc. Workshop Content-Based Access of Image and Video Libraries, 1998.
    [30] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” 2016, pp. 2818–2826.
    [31] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million Image Database for Scene Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2017.
    [32] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
    [33] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1): 1929–1958, 2014.
    [34] TensorRT1.0與TensorRT2.1在ResNet50單張影像預測速度之差異比較, 參考來源: https://devblogs.nvidia.com/jetpack-doubles-jetson-inference-perf/
    [35] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane,´ R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. War- ´ den, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org
    [36] Keras.io. (2018). Keras Documentation. [online] Available at: https://keras.io [Accessed 17 Jun. 2018].

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE