| 研究生: |
蕭翔之 Hsiao, Hsiang-Chih |
|---|---|
| 論文名稱: |
使用多分塊晶片上記憶體降低複雜度之物件偵測器 Low-complexity Viola-Jones Object Detector using Multi-bank On-chip Memories |
| 指導教授: |
謝明得
Shieh, Ming-Der |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 電腦視覺 、稀疏矩陣向量乘法 、超大型積體電路架構 、現場可編程邏輯閘陣列 |
| 外文關鍵詞: | Computer vision, sparse matrix-vector multiplication, VLSI architecture, field- programmable gate array |
| 相關次數: | 點閱:89 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
電腦視覺領域的進步使得各種智慧應用成為可能。對於有些產品,需要使用低 複雜度的物件偵測器,例如無線傳感器網絡或智能玩具,這為硬體設計工程師帶來 了挑戰。這篇論文中,我們提出一個基於 Viola-Jones 演算法的基礎設計。為了在保 持足夠計算吞吐量的同時減小面積,晶片片上記憶體(例如 SRAM)被劃分為多個 區塊。但是由於記憶體往往只有寥寥幾個存取埠,對同一區塊的存取可能會發生衝 突。由於物件偵測演算法在機器學習上的不確定性,導致的記憶體存取是不規律的, 因而我們無法以傳統方式將記憶體劃分為多個區塊。因此,我們提出了一種方法, 能夠對機器學習後產生的記憶體存取序列做排程,進而避免記憶體存取衝突。我們 用一個 FPGA 實現驗證該作法,結果表明,通過使用所提出的方法,可以大大減少 正反器的使用率。此外,整體面積效率也得到改善。
Advances in the computer vision have enabled various smart applications in the con- sumer market. Some products, such as wireless sensor networks or smart toys, may require a low-complexity visual object detection unit, in which area and energy efficiency are of con- cerned. This poses a challenge to the designers. In this work, a flexible yet area-efficient object detector based on Viola-Jones algorithm is presented as a base design. To reduce the area while maintaining sufficient throughput, on-chip memories (such as SRAM) are par- titioned into several banks. However, accesses to the same bank may conflict, since the accessing port of each banks are limited. Resolving conflicts is non-trivial due to the fact that the memory access pattern of the detection task depends on the result of machine learn- ing, which is often unpredictable before training. Therefore, we propose an approach which explicitly schedules the access sequence as a post-processing performed after training the object model. As a consequence, memory access conflicts can be avoided. An FPGA imple- mentation is used to verify the idea, and is given at the end of this study. The result shows that by using the proposed methodology, the flip-flop utilization can be drastically reduced. Moreover, the overall area-efficiency is also improved.
[1] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple fea- tures,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, 2001, pp. I–511–I–518 vol.1.
[2] P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian Detection: An Evaluation of the State of the Art,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 743–761, Apr. 2012.
[3] M. Yang, “Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel Methods,” in Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition(FG), 2002, p. 0215. [Online]. Available: doi.ieeecomputersociety.org/10.1109/FGR.2002.10001
[4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[5] P. Piccinini, A. Prati, and R. Cucchiara, “Real-time object detection and localization with SIFT-based clustering,” Image and Vision Computing, vol. 30, no. 8, pp. 573–587, Aug. 2012. [Online]. Available: http://www.sciencedirect.com/science/ article/pii/S0262885612000923
[6] “OpenCV library.” [Online]. Available: https://opencv.org/
[7] C. R. Chen, W. S. Wong, and C. T. Chiu, “A 0.64 mm2$ Real-Time Cascade Face De- tection Design Based on Reduced Two-Field Extraction,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 11, pp. 1937–1948, Nov. 2011.
[8] Y. M. Tsai, T. J. Yang, C. C. Tsai, K. Y. Huang, and L. G. Chen, “A 69mw 140- meter/60fps and 60-meter/300fps intelligent vision SoC for versatile automotive ap- plications,” in 2012 Symposium on VLSI Circuits (VLSIC), Jun. 2012, pp. 152–153.
[9] A. M. Abdelhadi and G. G. Lemieux, “Modular Multi-ported SRAM-based Memories,” in Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, ser. FPGA ’14. New York, NY, USA: ACM, 2014, pp. 35–44. [Online]. Available: http://doi.acm.org/10.1145/2554688.2554773
[10] Y. Hanai, Y. Hori, J. Nishimura, and T. Kuroda, “A versatile recognition processor em- ploying Haar-like feature and cascaded classifier,” in 2009 IEEE International Solid- State Circuits Conference - Digest of Technical Papers, Feb. 2009, pp. 148–149,149a.
[11] Y. Hori, Y. Hanai, J. Nishimura, and T. Kuroda, “Architecture Design of Versatile Recognition Processor for Sensornet Applications,” IEEE Micro, vol. 29, no. 6, pp. 44–57, Nov. 2009.
[12] Y. Kim, M. Imani, and T. Rosing, “ORCHARD: Visual object recognition accelerator based on approximate in-memory processing,” in 2017 IEEE/ACM International Con- ference on Computer-Aided Design (ICCAD), Nov. 2017, pp. 25–32.
[13] Y.-J. Chen, C.-H. Tsai, and L.-G. Chen, “Architecture design of area-efficient SRAM- based multi-symbol arithmetic encoder in H.264/AVC,” in 2006 IEEE International Symposium on Circuits and Systems, May 2006, pp. 4 pp.–2624.
[14] Y. Murachi, T. Kamino, J. Miyakoshi, H. Kawaguchi, and M. Yoshimoto, “A power- efficient SRAM core architecture with segmentation-free and rectangular accessibility for super-parallel video processing,” in 2008 IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Apr. 2008, pp. 63–66.
[15] M. Hiromoto, H. Sugano, and R. Miyamoto, “Partially Parallel Architecture for Ad- aBoost-Based Detection With Haar-Like Features,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 1, pp. 41–52, Jan. 2009.
[16] B. Brousseau and J. Rose, “An energy-efficient, fast FPGA hardware architecture for OpenCV-Compatible object detection,” in 2012 International Conference on Field- Programmable Technology, Dec. 2012, pp. 166–173.
[17] C. Kyrkou and T. Theocharides, “A Flexible Parallel Hardware Architecture for Ad- aBoost-Based Real-Time Object Detection,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 6, pp. 1034–1047, Jun. 2011.
[18] M. Bastiaans, “On the sliding-window representation in digital signal processing,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 4, pp. 868–873, Aug. 1985.
[19] “AXI4-Stream Video IP and System Design Guide (UG934),” p. 68, 2016.
[20] C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden, “Pyramid Methods in Image Processing,” 1984.
[21] M. Kimura, J. Matai, M. Jacobsen, and R. Kastner, “A low-power Adaboost-based ob- ject detection processor using Haar-like features,” in 2013 IEEE Third International Conference on Consumer Electronics ¿¿ Berlin (ICCE-Berlin), Sep. 2013, pp. 203– 206.
[22] J. Cho, S. Mirzaei, J. Oberg, and R. Kastner, “Fpga-based Face Detection System Using Haar Classifiers,” in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, ser. FPGA ’09. New York, NY, USA: ACM, 2009, pp. 103–112. [Online]. Available: http://doi.acm.org/10.1145/1508128.1508144
[23] L. Zhang, R. Chu, S. Xiang, S. Liao, and S. Z. Li, “Face Detection Based on Multi- block LBP Representation,” in Proceedings of the 2007 International Conference on Advances in Biometrics, ser. ICB’07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 11–18. [Online]. Available: http://dl.acm.org/citation.cfm?id=2391659.2391662
[24] V. Jain and E. Learned-Miller, “FDDB: A Benchmark for Face Detection in Uncon- strained Settings,” University of Massachusetts, Amherst, vol. Technical Report UMCS- 2010-009, p. 11, 2010.