簡易檢索 / 詳目顯示

研究生: 蕭翔之
Hsiao, Hsiang-Chih
論文名稱: 使用多分塊晶片上記憶體降低複雜度之物件偵測器
Low-complexity Viola-Jones Object Detector using Multi-bank On-chip Memories
指導教授: 謝明得
Shieh, Ming-Der
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 55
中文關鍵詞: 電腦視覺稀疏矩陣向量乘法超大型積體電路架構現場可編程邏輯閘陣列
外文關鍵詞: Computer vision, sparse matrix-vector multiplication, VLSI architecture, field- programmable gate array
相關次數: 點閱:89下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電腦視覺領域的進步使得各種智慧應用成為可能。對於有些產品,需要使用低 複雜度的物件偵測器,例如無線傳感器網絡或智能玩具,這為硬體設計工程師帶來 了挑戰。這篇論文中,我們提出一個基於 Viola-Jones 演算法的基礎設計。為了在保 持足夠計算吞吐量的同時減小面積,晶片片上記憶體(例如 SRAM)被劃分為多個 區塊。但是由於記憶體往往只有寥寥幾個存取埠,對同一區塊的存取可能會發生衝 突。由於物件偵測演算法在機器學習上的不確定性,導致的記憶體存取是不規律的, 因而我們無法以傳統方式將記憶體劃分為多個區塊。因此,我們提出了一種方法, 能夠對機器學習後產生的記憶體存取序列做排程,進而避免記憶體存取衝突。我們 用一個 FPGA 實現驗證該作法,結果表明,通過使用所提出的方法,可以大大減少 正反器的使用率。此外,整體面積效率也得到改善。

    Advances in the computer vision have enabled various smart applications in the con- sumer market. Some products, such as wireless sensor networks or smart toys, may require a low-complexity visual object detection unit, in which area and energy efficiency are of con- cerned. This poses a challenge to the designers. In this work, a flexible yet area-efficient object detector based on Viola-Jones algorithm is presented as a base design. To reduce the area while maintaining sufficient throughput, on-chip memories (such as SRAM) are par- titioned into several banks. However, accesses to the same bank may conflict, since the accessing port of each banks are limited. Resolving conflicts is non-trivial due to the fact that the memory access pattern of the detection task depends on the result of machine learn- ing, which is often unpredictable before training. Therefore, we propose an approach which explicitly schedules the access sequence as a post-processing performed after training the object model. As a consequence, memory access conflicts can be avoided. An FPGA imple- mentation is used to verify the idea, and is given at the end of this study. The result shows that by using the proposed methodology, the flip-flop utilization can be drastically reduced. Moreover, the overall area-efficiency is also improved.

    Table of Contents ............ iv List of Tables ............ vi List of Figures ............ vii Chapter 1. Introduction ............ 1 Chapter 2. Object Detection ............ 5   Application scenario ............ 5   Viola-Jones algorithm ............ 6   Assembling weak-classifiers ............ 8     Integral image technique ............ 10     Haar-like features ............ 11 Chapter3. Architecture of the Base Design ............ 14   Image down-sampling unit ............ 15   Line buffer ............ 17   Integral image and Variance computation unit ............ 17   Cascaded classifier ............ 18   Window cache ............ 20 Chapter4. The Proposed Memory Banking Methodology ............ 22   Access sequence scheduler ............ 24     Modeling the scheduling problem ............ 25     Architecture-aware scheduler ............ 27     Scheduler algorithm ............ 29   Necessity of the optimization constraint ............ 31   The number of the processing elements ............ 33 Chapter5. Implementation Results ............ 36   Exploration of the hardware parameters ............ 37   An end-to-end comparison ............ 41 Chapter6. Conclusion and Future Works ............ 43 References ............ 45 Appendix A. Alternative Approach-Approximation in the Special Case ............ 47   A.1 LBP features ............ 47   A.2 Approximated feature extraction ............ 48   A.3 Comparison of the two approaches ............ 52

    [1] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple fea- tures,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, 2001, pp. I–511–I–518 vol.1.
    [2] P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian Detection: An Evaluation of the State of the Art,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 743–761, Apr. 2012.
    [3] M. Yang, “Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel Methods,” in Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition(FG), 2002, p. 0215. [Online]. Available: doi.ieeecomputersociety.org/10.1109/FGR.2002.10001
    [4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
    [5] P. Piccinini, A. Prati, and R. Cucchiara, “Real-time object detection and localization with SIFT-based clustering,” Image and Vision Computing, vol. 30, no. 8, pp. 573–587, Aug. 2012. [Online]. Available: http://www.sciencedirect.com/science/ article/pii/S0262885612000923
    [6] “OpenCV library.” [Online]. Available: https://opencv.org/
    [7] C. R. Chen, W. S. Wong, and C. T. Chiu, “A 0.64 mm2$ Real-Time Cascade Face De- tection Design Based on Reduced Two-Field Extraction,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 11, pp. 1937–1948, Nov. 2011.
    [8] Y. M. Tsai, T. J. Yang, C. C. Tsai, K. Y. Huang, and L. G. Chen, “A 69mw 140- meter/60fps and 60-meter/300fps intelligent vision SoC for versatile automotive ap- plications,” in 2012 Symposium on VLSI Circuits (VLSIC), Jun. 2012, pp. 152–153.
    [9] A. M. Abdelhadi and G. G. Lemieux, “Modular Multi-ported SRAM-based Memories,” in Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, ser. FPGA ’14. New York, NY, USA: ACM, 2014, pp. 35–44. [Online]. Available: http://doi.acm.org/10.1145/2554688.2554773
    [10] Y. Hanai, Y. Hori, J. Nishimura, and T. Kuroda, “A versatile recognition processor em- ploying Haar-like feature and cascaded classifier,” in 2009 IEEE International Solid- State Circuits Conference - Digest of Technical Papers, Feb. 2009, pp. 148–149,149a.
    [11] Y. Hori, Y. Hanai, J. Nishimura, and T. Kuroda, “Architecture Design of Versatile Recognition Processor for Sensornet Applications,” IEEE Micro, vol. 29, no. 6, pp. 44–57, Nov. 2009.
    [12] Y. Kim, M. Imani, and T. Rosing, “ORCHARD: Visual object recognition accelerator based on approximate in-memory processing,” in 2017 IEEE/ACM International Con- ference on Computer-Aided Design (ICCAD), Nov. 2017, pp. 25–32.
    [13] Y.-J. Chen, C.-H. Tsai, and L.-G. Chen, “Architecture design of area-efficient SRAM- based multi-symbol arithmetic encoder in H.264/AVC,” in 2006 IEEE International Symposium on Circuits and Systems, May 2006, pp. 4 pp.–2624.
    [14] Y. Murachi, T. Kamino, J. Miyakoshi, H. Kawaguchi, and M. Yoshimoto, “A power- efficient SRAM core architecture with segmentation-free and rectangular accessibility for super-parallel video processing,” in 2008 IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Apr. 2008, pp. 63–66.
    [15] M. Hiromoto, H. Sugano, and R. Miyamoto, “Partially Parallel Architecture for Ad- aBoost-Based Detection With Haar-Like Features,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 1, pp. 41–52, Jan. 2009.
    [16] B. Brousseau and J. Rose, “An energy-efficient, fast FPGA hardware architecture for OpenCV-Compatible object detection,” in 2012 International Conference on Field- Programmable Technology, Dec. 2012, pp. 166–173.
    [17] C. Kyrkou and T. Theocharides, “A Flexible Parallel Hardware Architecture for Ad- aBoost-Based Real-Time Object Detection,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 6, pp. 1034–1047, Jun. 2011.
    [18] M. Bastiaans, “On the sliding-window representation in digital signal processing,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 4, pp. 868–873, Aug. 1985.
    [19] “AXI4-Stream Video IP and System Design Guide (UG934),” p. 68, 2016.
    [20] C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden, “Pyramid Methods in Image Processing,” 1984.
    [21] M. Kimura, J. Matai, M. Jacobsen, and R. Kastner, “A low-power Adaboost-based ob- ject detection processor using Haar-like features,” in 2013 IEEE Third International Conference on Consumer Electronics ¿¿ Berlin (ICCE-Berlin), Sep. 2013, pp. 203– 206.
    [22] J. Cho, S. Mirzaei, J. Oberg, and R. Kastner, “Fpga-based Face Detection System Using Haar Classifiers,” in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, ser. FPGA ’09. New York, NY, USA: ACM, 2009, pp. 103–112. [Online]. Available: http://doi.acm.org/10.1145/1508128.1508144
    [23] L. Zhang, R. Chu, S. Xiang, S. Liao, and S. Z. Li, “Face Detection Based on Multi- block LBP Representation,” in Proceedings of the 2007 International Conference on Advances in Biometrics, ser. ICB’07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 11–18. [Online]. Available: http://dl.acm.org/citation.cfm?id=2391659.2391662
    [24] V. Jain and E. Learned-Miller, “FDDB: A Benchmark for Face Detection in Uncon- strained Settings,” University of Massachusetts, Amherst, vol. Technical Report UMCS- 2010-009, p. 11, 2010.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE