簡易檢索 / 詳目顯示

研究生: 余威甫
Yu, Wei-Fu
論文名稱: 使用淺深卷積網路於增強式局部二元特徵之文字辨識
Using Shallow-Deep CNN on ILBP for Character Recognition
指導教授: 楊竹星
Yang, Chu-Sing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 90
中文關鍵詞: 增強式局部二元特徵淺深卷積神經網路ILBPCNN
外文關鍵詞: Improved Local Binary Pattern, Shallow-Deep CNN, ILBPCNN
相關次數: 點閱:64下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,隨著GPU效能不斷提升,卷積神經網路(CNN)架構不斷推層出新,也讓影像辨識達到前所未有高度。架構深度也成為卷積神經網路設計中不可或缺的重要因素,不論VGGNet或是GoogLeNet等架構,皆利用大量的卷積層(convolutional layer)不斷堆疊,獲得描述能力優秀的高階特徵,並漸漸取代傳統由人工定義的高維特徵描述子(feature descriptor)。然而,在爭相追逐準確度時,往往使用過多的網路參數量和記憶體,且深度網路架構也將造成網路權重難以更新,進而導致網路效能不彰。因此在現今的應用中,如何在有限的資源底下達到低計算成本和高效能,就成為了一個重要議題。

    傳統文字辨識的方法,礙於當時硬體效能不足,故而其特徵擷取方式僅能利用有限特徵維度來描述邊緣分布。本論文希望利用傳統影像特徵的低運算量及高效能之特性,且對於環境因素擁有一定對抗能力,結合卷積神經網路強大的學習能力,將文字辨識應用於資源有限的任務當中。本論文採用增強式局部二元特徵(Improved Local Binary Pattern, ILBP),導入尺度空間(Scale space)來降低其對於噪音的敏感度。增強式局部二元特徵中具有兩種特性之特徵圖,稱為最大選擇特徵圖與第一選擇特徵圖。前者將選擇強度值最強的邊緣,減緩訊噪點的影響;後者則選擇將局部二元化特徵經過尺度偵測上的一個有效邊緣。而在卷積神經網路部分,採用淺深卷積網路架構(Shallow-Deep CNN),依據輸入特徵差異,搭配不同深度的網路進行學習,並通過兩個網路所學習之特徵來達到分類目的。從實驗結果中,顯示增強式局部二元特徵結合淺深卷積網路的模型對於文字資料集有一定的辨識能力,而整體的網路參數量與計算成本也有小幅的縮減,最後網路效能的部分相較於其餘網路架構也能保持一定的水準。

    In recent years, as GPU performance improves, Convolutional neural networks (CNNs) have achieved significant progress in computer vision tasks such as image classification. Building deeper and larger convolutional neural networks (CNNs) has become a primary trend. Whether VGGNet or GoogLeNet architectures are stacked with a large number of convolutional layers to obtain high-level features with excellent description capabilities, instead of handcraft features. However, when pursuing the best accuracy, excessive network parameters and memory are often used, and the deeper network architecture will also make the network weights difficult to update, resulting in poor network performance. Therefore, in real world applications, how to achieve low computational cost and high performance under limited resources has become an important issue.

    This thesis proposed Improved Local Binary Pattern CNN (ILBPCNN) which combines the low computational complexity and high performance superiorities in hand-crafted features with the advantage of supervised high-level features in CNNs to learn character representations. ILBP is used to reduce the sensitivity to noise by introducing scale space. A feature map with two characteristics in ILBP feature is called a maximum selection feature map (MLBP) and a first selection feature map (FLBP). The former will select the edge with the strongest intensity value to mitigate the influence of the noise point; the latter choose to pass the local binary feature to an effective edge on the scale detection. In the network architecture design, Shallow-Deep CNN is used to learn the comprehensive representations according to the difference of input characteristics. The experimental results show that ILBPCNN has certain recognition ability in several character dataset, and the number of network parameters and computational cost are also slightly reduced. Finally, the network performance can maintain a certain level compared to other state-of-the-art networks.

    摘要 I 誌謝 VIII 目錄 IX 表目錄 XI 圖目錄 XII 1. 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 3 1.3 論文架構 4 2. 相關研究與文獻探討 5 2.1 文字辨識 5 2.2 卷積神經網路介紹 6 2.3 卷積神經網路架構介紹 8 2.4 卷積神經網路輕量化 14 2.5 LBP 19 2.6 Haar 22 2.7 HOG 24 2.8 影像特徵結合深度學習之應用 25 3. 系統架構 28 3.1 ILBP 29 3.1.1 Scale space LBP 29 3.1.2 Reducing number of bins in LBP 32 3.1.3 Selection method over scale space 35 3.2 ILBPCNN架構介紹 37 3.3 淺層網路架構介紹 39 3.3.1 原生squeezeNet網路模組與架構 39 3.3.2 weighted depthwise separable fire模組與淺層網路架構 43 3.3.3 淺層網路架構 48 3.4 深層網路架構介紹 50 3.4.1 深層網路模組 50 3.4.2 通道量化 55 3.4.3 深層網路架構 58 3.5 全局均值池化層 60 3.6 分類器 60 4. 實驗結果及分析 61 4.1 實驗環境說明 61 4.2 測試資料集與網路架構 62 4.3 訓練細節 63 4.4 實驗一、ILBPCNN效能測試 71 4.5 實驗二、分組卷積組數效能測試 77 4.6 實驗三、Channel ratio效能測試 78 5. 結論與未來展望 80 參考文獻 82

    [1] R. Ascher, G. Koppelman, M. Miller, G. Nagy and G. Shelton, "An Interactive System for Reading Unformatted Printed Text", IEEE Transactions on Computers, vol. -20, no. 12, pp. 1527-1543, 1971.
    [2] A. Roy, M. Hossen and D. Nag, "License plate detection and character recognition system for commercial vehicles based on morphological approach and template matching", 2016 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), 2016.
    [3] J. Mayan, K. Deep, M. Kumar, L. Alvin and S. Reddy, "Number plate recognition using template comparison for various fonts in MATLAB", 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), 2016.
    [4] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection", 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
    [5] D. Huang, C. Shan, M. Ardabilian, Y. Wang and L. Chen, "Local Binary Patterns and Its Application to Facial Image Analysis: A Survey", IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 41, no. 6, pp. 765-781, 2011.
    [6] M. Heikkila and M. Pietikainen, "A texture-based method for modeling the background and detecting moving objects", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 657-662, 2006.
    [7] T. Ojala, M. Pietikainen and D. Harwood, "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions", Proceedings of 12th International Conference on Pattern Recognition.
    [8] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features", Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.
    [9] R. Lienhart and J. Maydt, "An extended set of Haar-like features for rapid object detection", Proceedings. International Conference on Image Processing.
    [10] C. Geng and X. Jiang, "Face recognition based on the multi-scale local image structures", Pattern Recognition, vol. 44, no. 10-11, pp. 2565-2575, 2011.
    [11] D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
    [12] Q. Liu, C. Jung, S. Kim, Y. Moon and J. Kim, "Stroke Filter for Text Localization in Video Images", 2006 International Conference on Image Processing, 2006.
    [13] E. Osuna, R. Freund and F. Girosit, "Training support vector machines: an application to face detection", Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
    [14] A. Vedaldi and A. Zisserman, "Efficient Additive Kernels via Explicit Feature Maps", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 3, pp. 480-492, 2012.
    [15] Y. Freund and R. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting", Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
    [16] N. Altman, "An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression", The American Statistician, vol. 46, no. 3, p. 175, 1992.
    [17] G. Dahl, Dong Yu, Li Deng and A. Acero, "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition", IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 30-42, 2012.
    [18] S. Khawaldeh, U. Pervaiz, A. Rafiq and R. Alkhawaldeh, "Noninvasive Grading of Glioma Tumor Using Magnetic Resonance Imaging with Convolutional Neural Networks", Applied Sciences, vol. 8, no. 1, p. 27, 2017.
    [19] Z. Wei and M. Hoai, "Region Ranking SVM for Image Classification", 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [20] S. Rizvi, G. Cabodi and G. Francini, "Optimized Deep Neural Networks for Real-Time Object Classification on Embedded GPUs", Applied Sciences, vol. 7, no. 8, p. 826, 2017.
    [21] T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, "End-to-end text recognition with convolutional neural networks", in Proc. Int. Conf. Pattern Recognit., Tsukuba, Japan, Nov. 2012, pp. 3304-3308.
    [22] M. Jaderberg, A. Vedaldi and A. Zisserman, "Deep Features for Text Spotting", Computer Vision – ECCV 2014, pp. 512-528, 2014.
    [23] M. Cimpoi, S. Maji and A. Vedaldi, "Deep filter banks for texture recognition and segmentation", 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    [24] Y. Wang, C. Shi, C. Wang, B. Xiao and C. Qi, "Multi-order co-occurrence activations encoded with Fisher Vector for scene character recognition", Pattern Recognition Letters, vol. 97, pp. 69-76, 2017.
    [25] K. Fukushima, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position", Biological Cybernetics, vol. 36, no. 4, pp. 193-202, 1980.
    [26] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks", in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 315-323.
    [27] ]Y. LeCun et al., "Backpropagation Applied to Handwritten Zip Code Recognition", Neural Computation, vol. 1, no. 4, pp. 541-551, 1989.
    [28] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition", Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [29] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Imagenet classification with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
    [30] M. Zeiler and R. Fergus, "Visualizing and Understanding Convolutional Networks", Computer Vision – ECCV 2014, pp. 818-833, 2014.
    [31] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition", arXiv.org, 2014. [Online]. Available: https://arxiv.org/abs/1409.1556. [Accessed: 12- Jan- 2019].
    [32] C. Szegedy et al., "Going deeper with convolutions", 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
    [33] M. Lin, Q. Chen and S. Yan, "Network In Network", arXiv.org, 2013. [Online]. Available: https://arxiv.org/abs/1312.4400. [Accessed: 12- Jan- 2019].
    [34] Openreview.net, 2019. [Online]. Available: https://openreview.net/pdf?id=ry_WPG-A-. [Accessed: 12- Jan- 2019].
    [35] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision", 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [36] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift", in Proc. Int. Conf. Mach. Learn., 2015, pp. 448–456.
    [37] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition", in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 770–778.
    [38] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning", in Proc. AAAI, 2017, pp. 4278–4284.
    [39] G. Huang, Z. Liu, L. van der Maaten and K. Weinberger, "Densely Connected Convolutional Networks", arXiv.org, 2016. [Online]. Available: https://arxiv.org/abs/1608.06993. [Accessed: 12- Jan- 2019].
    [40] G. Huang, Y. Sun, Z. Liu, D. Sedra and K. Weinberger, "Deep Networks with Stochastic Depth", Computer Vision – ECCV 2016, pp. 646-661, 2016.
    [41] S. Han, H. Mao and W. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", arXiv.org, 2015. [Online]. Available: https://arxiv.org/abs/1510.00149. [Accessed: 12- Jan- 2019].
    [42] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv and Y. Bengio, "Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1", arXiv.org, 2016. [Online]. Available: https://arxiv.org/abs/1602.02830. [Accessed: 12- Jan- 2019].
    [43] M. Rastegari, V. Ordonez, J. Redmon and A. Farhadi, "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks", Computer Vision – ECCV 2016, pp. 525-542, 2016.
    [44] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger", arXiv.org, 2016. [Online]. Available: https://arxiv.org/abs/1612.08242. [Accessed: 12- Jan- 2019].
    [45] A. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications", arXiv.org, 2017. [Online]. Available: https://arxiv.org/abs/1704.04861. [Accessed: 12- Jan- 2019].
    [46] F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions", 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [47] T. Ojala, M. Pietikainen and T. Maenpaa, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971-987, 2002.
    [48] M. Heikkila and M. Pietikainen, "A texture-based method for modeling the background and detecting moving objects", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 657-662, 2006.
    [49] Hongliang Jin, Qingshan Liu, Hanqing Lu and Xiaofeng Tong, "Face Detection Using Improved LBP under Bayesian Framework", Third International Conference on Image and Graphics (ICIG'04).
    [50] D. Huang, Y. Wang and Y. Wang, "A Robust Method for Near Infrared Face Recognition Based on Extended Local Binary Pattern", Advances in Visual Computing, pp. 437-446.
    [51] Y. Huang, Y. Wang and T. Tan, "Combining Statistics of Geometrical and Correlative Features for 3D Face Recognition", Procedings of the British Machine Vision Conference 2006, 2006.
    [52] Zhenhua Guo, Lei Zhang and D. Zhang, "A Completed Modeling of Local Binary Pattern Operator for Texture Classification", IEEE Transactions on Image Processing, vol. 19, no. 6, pp. 1657-1663, 2010.
    [53] Xiaoyang Tan and B. Triggs, "Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions", IEEE Transactions on Image Processing, vol. 19, no. 6, pp. 1635-1650, 2010.
    [54] L. Zhang, R. Chu, S. Xiang, S. Liao and S. Li, "Face Detection Based on Multi-Block LBP Representation", Advances in Biometrics, pp. 11-18.
    [55] M. Heikkilä, M. Pietikäinen and C. Schmid, "Description of Interest Regions with Center-Symmetric Local Binary Patterns", Computer Vision, Graphics and Image Processing, pp. 58-69, 2006.
    [56] D. Nguyen, Z. Zong, P. Ogunbona and W. Li, "Object detection using Non-Redundant Local Binary Patterns", 2010 IEEE International Conference on Image Processing, 2010.
    [57] C. Yang and Y. Yang, "A Robust Feature Descriptor: Signed LBP", in Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV), 2016, p. 316: The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp).
    [58] M. Parchami, S. Bashbaghi and E. Granger, "Video-based face recognition using ensemble of haar-like deep convolutional neural networks", 2017 International Joint Conference on Neural Networks (IJCNN), 2017.
    [59] Y. Annadani, V. Naganoor, A. Jagadish and K. Chemmangat, "Selfie Detection by Synergy-Constraint Based Convolutional Neural Network", 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), 2016.
    [60] Q. Shi, W. Li, F. Zhang, W. Hu, X. Sun and L. Gao, "Deep CNN With Multi-Scale Rotation Invariance Features for Ship Classification", IEEE Access, vol. 6, pp. 38656-38668, 2018.
    [61] Q. Wang, Y. Zheng, G. Yang, W. Jin, X. Chen and Y. Yin, "Multiscale Rotation-Invariant Convolutional Neural Networks for Lung Texture Classification", IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 1, pp. 184-195, 2018.
    [62] F. Juefei-Xu, V. Boddeti and M. Savvides, "Local Binary Convolutional Neural Networks", 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [63] X. Zhang, Y. Xie, J. Chen, L. Wu, Q. Ye and L. Liu, "Rotation Invariant Local Binary Convolution Neural Networks", IEEE Access, vol. 6, pp. 18420-18430, 2018.
    [64] J. Lin, Y. Yang, R. Gupta and Z. Tu, "Local Binary Pattern Networks", arXiv.org, 2018. [Online]. Available: https://arxiv.org/abs/1803.07125. [Accessed: 12- Jan- 2019].
    [65] C. Yang and Y. Yang, "Improved local binary pattern for real scene optical character recognition", Pattern Recognition Letters, vol. 100, pp. 14-21, 2017.
    [66] F. Iandola, S. Han, M. Moskewicz, K. Ashraf, W. Dally and K. Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size", arXiv.org, 2016. [Online]. Available: https://arxiv.org/abs/1602.07360. [Accessed: 12- Jan- 2019].
    [67] X. Zhang, X. Zhou, M. Lin and J. Sun, "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices", arXiv.org, 2017. [Online]. Available: https://arxiv.org/abs/1707.01083. [Accessed: 28- Dec- 2018].
    [68] F. Gao et al., "SD-CNN: A shallow-deep CNN for improved breast cancer diagnosis", Computerized Medical Imaging and Graphics, vol. 70, pp. 53-62, 2018.
    [69] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, "Self-normalizing neural networks", in Advances in Neural Information Processing Systems, 2017, pp. 971-980.
    [70] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks", 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
    [71] Z. Qin, Z. Zhang, S. Zhang, H. Yu and Y. Peng, "Merging-and-Evolution Networks for Mobile Vision Applications", IEEE Access, vol. 6, pp. 31294-31306, 2018.
    [72] M. Abadi et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems", arXiv.org, 2016. [Online]. Available: https://arxiv.org/abs/1603.04467. [Accessed: 12- Jan- 2019].
    [73] S. Chetlur et al., "cuDNN: Efficient Primitives for Deep Learning", arXiv.org, 2014. [Online]. Available: https://arxiv.org/abs/1410.0759. [Accessed: 12- Jan- 2019].
    [74] A. Mishra, K. Alahari and C. Jawahar, "Scene Text Recognition using Higher Order Language Priors", Procedings of the British Machine Vision Conference 2012, 2012.
    [75] S. Lucas et al., "ICDAR 2003 robust reading competitions: entries, results, and future directions", International Journal of Document Analysis and Recognition (IJDAR), vol. 7, no. 2-3, pp. 105-122, 2005.
    [76] C. Wolf and J. Jolion, "Object count/area graphs for the evaluation of object detection and segmentation algorithms", International Journal of Document Analysis and Recognition (IJDAR), vol. 8, no. 4, pp. 280-296, 2006.
    [77] T. E. De Campos, B. R. Babu, and M. J. V. Varma, "Character recognition in natural images", In Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, Feb. 2009.
    [78] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, "Reading digits in natural images with unsupervised feature learning", in NIPS workshop on deep learning and unsupervised feature learning, 2011, vol. 2011, no. 2, p. 5.
    [79] D. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization", arXiv.org, 2014. [Online]. Available: https://arxiv.org/abs/1412.6980. [Accessed: 12- Jan- 2019].
    [80] N. Ma, X. Zhang, H. Zheng and J. Sun, "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design", Computer Vision – ECCV 2018, pp. 122-138, 2018.

    下載圖示 校內:2024-02-01公開
    校外:2024-02-01公開
    QR CODE