簡易檢索 / 詳目顯示

研究生: 王勇翔
Wang, Yong-Xiang
論文名稱: 全捲積神經網路於招牌偵測與辨識之應用
On-premise Signs Detection and Recognition Using Fully Convolutional Networks
指導教授: 胡敏君
Hu, Min-Chun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 34
中文關鍵詞: 招牌辨識全捲積網路資料擴充深度學習
外文關鍵詞: OPS recognition, fully convolutional networks, data augmentation, deep learning
相關次數: 點閱:78下載:9
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 目前捲積神經網路已經被廣泛地研究與應用在許多影像辨識的題目上,本研究論文延伸其架構,以全捲積神經網路進行街景影像中的招牌偵測與招牌辨識任務。這項技術可以與具有影像擷取設備的行動裝置(例如:智慧型手機、擴增實境眼鏡)結合,發展出具有商業價值的應用程式。本研究利用全捲積網路訓練一套可用來辨識出影像中是否有特定招牌並定位其位置之模型,為了增進此招牌辨識模型的效能,我們分別針對以下兩個議題進行研究與探討:(1)設計一套擴充訓練資料的方法以解決學習樣本不足夠的問題。(2)不同深度結構的神經網路模型可以學習到不一樣的特徵,本篇研究中發現有些類別的招牌適合用比較低層的特徵去辨識而其它則適合用高層的特徵去辨識,因此我們提出了一個結合兩者的架構。實驗證實本研究所提出之方法可有效提升招牌辨識的準確率。

    Convolutional neural network (CNN) has been recently studied and used in many object recognition tasks. In this work, the fully convolutional networks (FCNs) is employed to recognize On-Premise Signs (OPS) in real scene. This technology can be utilized in many camera-enabled devices like smart phones to develop practical commercial applications. FCNs are used to train a model to infer whether a street view image contains a specific OPS and where the OPS locates in the input image. Furthermore, to improve the recognition performance, we investigate into two issues: (1) Designing a data augmentation scheme to solve the problem of insufficient training samples. (2) Proposing a deep neural network model which fuses two FCN models to captures both low level and high level features. The OPS-62 dataset is used to evaluate the proposed approaches and the results show that our OPS recognition model outperforms the state-of-the-art method.

    Cover i Oral presentation document ii Chinese version ii English version iii Abstract (Chinese) iv Abstract (English) v Acknowledgments vi Table of Contents vii List of Tables ix List of Figures x Chapter 1. Introduction 1 Chapter 2. Related Work 4 Chapter 3. Fully Convolutional Network 7 3.1 Neural Network (NN) 7 3.2 Convolutional Neural Network 10 3.3 Fully Convolutional Network 11 Chapter 4. OPS recognition based on FCN 14 4.1 OPS recognition training details of FCN model 14 4.2 FCN model wiht augmented data 16 4.3 FCN model with modified architecture 17 4.3.1 Combination of two models 18 Chapter 5. Experimental Result 21 5.1 Effect of augmented data 21 5.2 Performance comparison on different architectures 24 5.3 Evaluation of FCN model 26 5.4 Multi-task learning issue 28 Chapter 6. Conclusions & Future Work 30 References 32

    [1] J. Dean, G. Corrado, R. Monga, et al. Large scale distributed deep networks.
    In Advances in Neural Information Processing Systems, pages 1223–1231,
    2012.
    [2] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman.
    The PASCAL Visual Object Classes Challenge 2007 (VOC2007)
    Results. http:// www.pascal-network.org/ challenges/ VOC/ voc2007/ workshop/
    index.html.
    [3] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman.
    The PASCAL Visual Object Classes Challenge 2011 (VOC2011)
    Results. http:// www.pascal-network.org/ challenges/ VOC/ voc2011/ workshop/
    index.html.
    [4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for
    accurate object detection and semantic segmentation. In Computer Vision and
    Pattern Recognition, 2014.
    [5] L. Gomez and D. Karatzas. Multi-script text extraction from natural scenes. In
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference
    on, pages 467–471. IEEE, 2013.
    [6] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digit
    number recognition from street view imagery using deep convolutional neural
    networks. International Conference on Learning Representation, abs/
    1312.6082, 2014.
    [7] M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features for text spotting.
    In Computer Vision–ECCV 2014, pages 512–528. Springer, 2014.
    [8] A. V. Ken Chatfield, Karen Simonyan and A. Zisserman. The devil is in the
    details: an evaluation of recent feature encoding methods. In British Machine
    Vision Conference, 2011.
    [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with
    deep convolutional neural networks. In Advances in Neural Information Processing
    Systems 25, pages 1097–1105. 2012.
    [10] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied
    to document recognition. In Proceedings of the IEEE, pages 2278–2324,
    1998.
    [11] Y. LeCun and C. Cortes. MNIST handwritten digit database. 2010.
    [12] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic
    segmentation. Computer Vision and Pattern Recognition, Nov. 2015.
    [13] O. Russakovsky, J. Deng, H. Su, et al. ImageNet Large Scale Visual Recognition
    Challenge. International Journal of Computer Vision, 115(3):211–252,
    2015.
    [14] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale
    image recognition. CoRR, abs/1409.1556, 2014.
    [15] C. Szegedy, W. Liu, Y. Jia, et al. Going deeper with convolutions. Conference
    on Computer Vision and Pattern Recognition, abs/1409.4842, 2015.
    [16] T.-H. Tsai, W.-H. Cheng, C.-W. You, et al. Learning and recognition of onpremise
    signs from weakly labeled street view images. Image Processing,
    IEEE Transactions on, 23(3):1047–1059, March 2014.
    [17] S. Xie, T. Yang, X. Wang, and Y. Lin. Hyper-class augmented and regularized
    deep learning for fine-grained image classification. In Proceedings of the
    IEEE Conference on Computer Vision and Pattern Recognition, pages 2645–
    2654, 2015.
    [18] C. You, W. Cheng, A. W. Tsui, et al. Mobilequeue: an image-based queue card
    management system through augmented reality phones. In The 2012 ACM
    Conference on Ubiquitous Computing, pages 651–652, 2012.
    [19] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks.
    In Computer Vision–ECCV 2014, pages 818–833, 2014.
    [20] Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep
    multi-task learning. In Computer Vision–ECCV 2014, pages 94–108. Springer,
    2014.

    下載圖示 校內:2021-06-30公開
    校外:2021-06-30公開
    QR CODE