成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王勇翔 Wang, Yong-Xiang
論文名稱：	全捲積神經網路於招牌偵測與辨識之應用 On-premise Signs Detection and Recognition Using Fully Convolutional Networks
指導教授：	胡敏君 Hu, Min-Chun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2016
畢業學年度：	104
語文別：	英文
論文頁數：	34
中文關鍵詞：	招牌辨識、全捲積網路、資料擴充、深度學習
外文關鍵詞：	OPS recognition, fully convolutional networks, data augmentation, deep learning
相關次數：	點閱：78 下載：9
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

目前捲積神經網路已經被廣泛地研究與應用在許多影像辨識的題目上，本研究論文延伸其架構，以全捲積神經網路進行街景影像中的招牌偵測與招牌辨識任務。這項技術可以與具有影像擷取設備的行動裝置(例如:智慧型手機、擴增實境眼鏡)結合，發展出具有商業價值的應用程式。本研究利用全捲積網路訓練一套可用來辨識出影像中是否有特定招牌並定位其位置之模型，為了增進此招牌辨識模型的效能，我們分別針對以下兩個議題進行研究與探討：(1)設計一套擴充訓練資料的方法以解決學習樣本不足夠的問題。(2)不同深度結構的神經網路模型可以學習到不一樣的特徵，本篇研究中發現有些類別的招牌適合用比較低層的特徵去辨識而其它則適合用高層的特徵去辨識，因此我們提出了一個結合兩者的架構。實驗證實本研究所提出之方法可有效提升招牌辨識的準確率。

Convolutional neural network (CNN) has been recently studied and used in many object recognition tasks. In this work, the fully convolutional networks (FCNs) is employed to recognize On-Premise Signs (OPS) in real scene. This technology can be utilized in many camera-enabled devices like smart phones to develop practical commercial applications. FCNs are used to train a model to infer whether a street view image contains a specific OPS and where the OPS locates in the input image. Furthermore, to improve the recognition performance, we investigate into two issues: (1) Designing a data augmentation scheme to solve the problem of insufficient training samples. (2) Proposing a deep neural network model which fuses two FCN models to captures both low level and high level features. The OPS-62 dataset is used to evaluate the proposed approaches and the results show that our OPS recognition model outperforms the state-of-the-art method.

Cover i
Oral presentation document ii
Chinese version  ii
English version  iii
Abstract (Chinese) iv
Abstract (English) v
Acknowledgments vi
Table of Contents vii
List of Tables ix
List of Figures x
Chapter 1. Introduction 1
Chapter 2. Related Work 4
Chapter 3. Fully Convolutional Network 7
3.1 Neural Network (NN) 7
3.2 Convolutional Neural Network 10
3.3 Fully Convolutional Network 11
Chapter 4. OPS recognition based on FCN 14
4.1 OPS recognition training details of FCN model 14
4.2 FCN model wiht augmented data 16
4.3 FCN model with modified architecture 17
4.3.1 Combination of two models 18
Chapter 5. Experimental Result 21
5.1 Effect of augmented data 21
5.2 Performance comparison on different architectures 24
5.3 Evaluation of FCN model 26
5.4 Multi-task learning issue   28
Chapter 6. Conclusions & Future Work 30
References 32
                                    

[1] J. Dean, G. Corrado, R. Monga, et al. Large scale distributed deep networks.
In Advances in Neural Information Processing Systems, pages 1223–1231,
2012.
[2] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman.
The PASCAL Visual Object Classes Challenge 2007 (VOC2007)
Results. http:// www.pascal-network.org/ challenges/ VOC/ voc2007/ workshop/
index.html.
[3] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman.
The PASCAL Visual Object Classes Challenge 2011 (VOC2011)
Results. http:// www.pascal-network.org/ challenges/ VOC/ voc2011/ workshop/
index.html.
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for
accurate object detection and semantic segmentation. In Computer Vision and
Pattern Recognition, 2014.
[5] L. Gomez and D. Karatzas. Multi-script text extraction from natural scenes. In
Document Analysis and Recognition (ICDAR), 2013 12th International Conference
on, pages 467–471. IEEE, 2013.
[6] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digit
number recognition from street view imagery using deep convolutional neural
networks. International Conference on Learning Representation, abs/
1312.6082, 2014.
[7] M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features for text spotting.
In Computer Vision–ECCV 2014, pages 512–528. Springer, 2014.
[8] A. V. Ken Chatfield, Karen Simonyan and A. Zisserman. The devil is in the
details: an evaluation of recent feature encoding methods. In British Machine
Vision Conference, 2011.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with
deep convolutional neural networks. In Advances in Neural Information Processing
Systems 25, pages 1097–1105. 2012.
[10] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied
to document recognition. In Proceedings of the IEEE, pages 2278–2324,
1998.
[11] Y. LeCun and C. Cortes. MNIST handwritten digit database. 2010.
[12] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic
segmentation. Computer Vision and Pattern Recognition, Nov. 2015.
[13] O. Russakovsky, J. Deng, H. Su, et al. ImageNet Large Scale Visual Recognition
Challenge. International Journal of Computer Vision, 115(3):211–252,
2015.
[14] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale
image recognition. CoRR, abs/1409.1556, 2014.
[15] C. Szegedy, W. Liu, Y. Jia, et al. Going deeper with convolutions. Conference
on Computer Vision and Pattern Recognition, abs/1409.4842, 2015.
[16] T.-H. Tsai, W.-H. Cheng, C.-W. You, et al. Learning and recognition of onpremise
signs from weakly labeled street view images. Image Processing,
IEEE Transactions on, 23(3):1047–1059, March 2014.
[17] S. Xie, T. Yang, X. Wang, and Y. Lin. Hyper-class augmented and regularized
deep learning for fine-grained image classification. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 2645–
2654, 2015.
[18] C. You, W. Cheng, A. W. Tsui, et al. Mobilequeue: an image-based queue card
management system through augmented reality phones. In The 2012 ACM
Conference on Ubiquitous Computing, pages 651–652, 2012.
[19] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks.
In Computer Vision–ECCV 2014, pages 818–833, 2014.
[20] Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep
multi-task learning. In Computer Vision–ECCV 2014, pages 94–108. Springer,
2014.

校內：2021-06-30公開
校外：2021-06-30公開

簡易檢索 / 詳目顯示

相關論文