| 研究生: |
王勇翔 Wang, Yong-Xiang |
|---|---|
| 論文名稱: |
全捲積神經網路於招牌偵測與辨識之應用 On-premise Signs Detection and Recognition Using Fully Convolutional Networks |
| 指導教授: |
胡敏君
Hu, Min-Chun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 英文 |
| 論文頁數: | 34 |
| 中文關鍵詞: | 招牌辨識 、全捲積網路 、資料擴充 、深度學習 |
| 外文關鍵詞: | OPS recognition, fully convolutional networks, data augmentation, deep learning |
| 相關次數: | 點閱:78 下載:9 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前捲積神經網路已經被廣泛地研究與應用在許多影像辨識的題目上,本研究論文延伸其架構,以全捲積神經網路進行街景影像中的招牌偵測與招牌辨識任務。這項技術可以與具有影像擷取設備的行動裝置(例如:智慧型手機、擴增實境眼鏡)結合,發展出具有商業價值的應用程式。本研究利用全捲積網路訓練一套可用來辨識出影像中是否有特定招牌並定位其位置之模型,為了增進此招牌辨識模型的效能,我們分別針對以下兩個議題進行研究與探討:(1)設計一套擴充訓練資料的方法以解決學習樣本不足夠的問題。(2)不同深度結構的神經網路模型可以學習到不一樣的特徵,本篇研究中發現有些類別的招牌適合用比較低層的特徵去辨識而其它則適合用高層的特徵去辨識,因此我們提出了一個結合兩者的架構。實驗證實本研究所提出之方法可有效提升招牌辨識的準確率。
Convolutional neural network (CNN) has been recently studied and used in many object recognition tasks. In this work, the fully convolutional networks (FCNs) is employed to recognize On-Premise Signs (OPS) in real scene. This technology can be utilized in many camera-enabled devices like smart phones to develop practical commercial applications. FCNs are used to train a model to infer whether a street view image contains a specific OPS and where the OPS locates in the input image. Furthermore, to improve the recognition performance, we investigate into two issues: (1) Designing a data augmentation scheme to solve the problem of insufficient training samples. (2) Proposing a deep neural network model which fuses two FCN models to captures both low level and high level features. The OPS-62 dataset is used to evaluate the proposed approaches and the results show that our OPS recognition model outperforms the state-of-the-art method.
[1] J. Dean, G. Corrado, R. Monga, et al. Large scale distributed deep networks.
In Advances in Neural Information Processing Systems, pages 1223–1231,
2012.
[2] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman.
The PASCAL Visual Object Classes Challenge 2007 (VOC2007)
Results. http:// www.pascal-network.org/ challenges/ VOC/ voc2007/ workshop/
index.html.
[3] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman.
The PASCAL Visual Object Classes Challenge 2011 (VOC2011)
Results. http:// www.pascal-network.org/ challenges/ VOC/ voc2011/ workshop/
index.html.
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for
accurate object detection and semantic segmentation. In Computer Vision and
Pattern Recognition, 2014.
[5] L. Gomez and D. Karatzas. Multi-script text extraction from natural scenes. In
Document Analysis and Recognition (ICDAR), 2013 12th International Conference
on, pages 467–471. IEEE, 2013.
[6] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digit
number recognition from street view imagery using deep convolutional neural
networks. International Conference on Learning Representation, abs/
1312.6082, 2014.
[7] M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features for text spotting.
In Computer Vision–ECCV 2014, pages 512–528. Springer, 2014.
[8] A. V. Ken Chatfield, Karen Simonyan and A. Zisserman. The devil is in the
details: an evaluation of recent feature encoding methods. In British Machine
Vision Conference, 2011.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with
deep convolutional neural networks. In Advances in Neural Information Processing
Systems 25, pages 1097–1105. 2012.
[10] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied
to document recognition. In Proceedings of the IEEE, pages 2278–2324,
1998.
[11] Y. LeCun and C. Cortes. MNIST handwritten digit database. 2010.
[12] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic
segmentation. Computer Vision and Pattern Recognition, Nov. 2015.
[13] O. Russakovsky, J. Deng, H. Su, et al. ImageNet Large Scale Visual Recognition
Challenge. International Journal of Computer Vision, 115(3):211–252,
2015.
[14] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale
image recognition. CoRR, abs/1409.1556, 2014.
[15] C. Szegedy, W. Liu, Y. Jia, et al. Going deeper with convolutions. Conference
on Computer Vision and Pattern Recognition, abs/1409.4842, 2015.
[16] T.-H. Tsai, W.-H. Cheng, C.-W. You, et al. Learning and recognition of onpremise
signs from weakly labeled street view images. Image Processing,
IEEE Transactions on, 23(3):1047–1059, March 2014.
[17] S. Xie, T. Yang, X. Wang, and Y. Lin. Hyper-class augmented and regularized
deep learning for fine-grained image classification. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 2645–
2654, 2015.
[18] C. You, W. Cheng, A. W. Tsui, et al. Mobilequeue: an image-based queue card
management system through augmented reality phones. In The 2012 ACM
Conference on Ubiquitous Computing, pages 651–652, 2012.
[19] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks.
In Computer Vision–ECCV 2014, pages 818–833, 2014.
[20] Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep
multi-task learning. In Computer Vision–ECCV 2014, pages 94–108. Springer,
2014.