簡易檢索 / 詳目顯示

研究生: 黃煦堯
Huang, Hsu-Yao
論文名稱: 透過電子系統層級之全系統模擬優化砌塊式繪圖處理器
Tile-Based GPU Optimizations through the ESL Full System Simulation
指導教授: 陳中和
Chen, Chung-Ho
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 102
中文關鍵詞: 電子系統層級設計全系統模擬軟硬體分割貼圖壓縮砌塊式繪圖處理器
外文關鍵詞: electronic system level design, ESL, ETC, full system simulation, software/hardware partition, texture compression, tile-based GPU
相關次數: 點閱:175下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • <BLOCKQUOTE>       QEMU-SystemC是最近熱門的全系統模擬平台之一,我們之前已經使用它建立了一套3D繪圖系統雛型,針對砌塊式繪圖處理器(tile-based GPU)設計,探討其架構與演算法之優劣,以及初步的軟硬體分割(software/hardware partition),並實現為可合成的硬體,然而在整體效能上卻還有調校的空間;此外,雖然從QEMU中可以得到估計指令的週期數,分別測量出軟硬體執行時間,卻仍然沒有辦法將軟體的負載真實地反應到系統效能中。

           因此,本論文以SysemC建構出抽象層級的tile-based GPU模型,包含貼圖單元,並估計模型需要的週期數以及合成的面積,作為評估效能的模型。接著配合QEMU同步分析器(synchronization profiler),在全系統模擬平台中建立同步機制,得到近似精確的應用程式執行時間,反應到系統中增加效能評估的準確性。GPU內部採用Ericsson Texture Compression (ETC)貼圖壓縮的技術,並擴充它支援透明度(alpha)壓縮,大幅降低外部記憶體的使用量與匯流排頻寬,約為原來的1/6,就Rasterization Engine (RE)部分加快35%。最後針對軟硬體資料流和架構優化,改善Geometry Engine (GE) 96%,改善RE 89%,整體系統效能提升70%;在時脈為200MHz時,GE最大輸出可達7.407 Mtriangles/sec、RE最大輸出為200 Mpixels/sec。</BLOCKQUOTE>

    <BLOCKQUOTE>       Previously, we have built a 3D rendering system prototype through QEMU-SystemC, which is a popular full system simulation platform. By the analysis of different algorithms and architectures, we have implemented the synthesized hardware and performed the preliminary software/hardware partition for a tile-based GPU design. However, there is still important remaining work for further design space explorations of the 3D rendering system. We also have measured the software and hardware execution time respectively, by the approximate cycle count obtained from QEMU; however, the software overhead isn’t truly reflected in the system performance.

           In this thesis, we construct an abstract level of the tile-based GPU model using SystemC to assess the performance, including texture mapping unit, and estimate the execution cycle and GPU area used. For the sake of accurate estimation of application software, a synchronization mechanism of virtual platform is proposed, which cooperates with a QEMU synchronization profiler. We adopt the technology of Ericsson Texture Compression (ETC) in our GPU, and extend this design to support alpha compression. In this way, we reduce the usage of external memory to about one sixth, and speed up the Rasterization Engine (RE) by 35%. We optimize the HW/SW data flow and architecture, which improves the Geometry Engine (GE) performance by 96%, the RE by 89%, and the whole system by 70%. Running at 200 MHz, the GPU achieves a maximum throughput of 7.407 Mtriangles/sec at GE, 200 Mpixels/sec at RE.</BLOCKQUOTE>

    <pre> 摘要.......................................................I Abstract.................................................II 誌謝.....................................................III 目錄......................................................IV 表目錄...................................................VII 圖目錄..................................................VIII 第1章 序論..................................................1 1.1 Motivation.........................................1 1.2 Contribution.......................................2 1.3 Organization........................................2 第2章 背景知識與相關研究......................................3 2.1 3D computer graphics...............................3 2.2 Grahpics API.......................................4 2.2.1 OpenGL.......................................5 2.2.2 OpenGL ES....................................5 2.3 Conventional OpenGL pipeline.......................6 2.3.1 Geometry engine..............................8 2.3.2 Rasterization engine........................14 2.4 Tile-based OpenGL ES pipeline.....................17 2.5 Related work......................................19 2.5.1 Electronic System Level(ESL)................19 2.5.2 Full system simulation platform.............19 第3章 軟硬體優化............................................26 3.1 Texture mapping unit..............................26 3.1.1 Texture cache...............................26 3.1.2 Texture decompression.......................29 3.2 Timing model......................................42 3.2.1 Geometry engine(GE) pipeline................42 3.2.2 Rasterization engine(RE) pipeline...........43 3.2.3 SystemC model verification..................44 3.2.4 Synchronization.............................45 3.2.5 ARM bridge module...........................50 3.2.6 Interrupt mechanism.........................51 3.3 Data flow optimizations...........................53 3.3.1 Redundant initialization....................53 3.3.2 Redundant data flow.........................54 3.3.3 Vertex Buffer...............................56 3.3.4 Display List offloading.....................61 3.3.5 RE controller offloading....................68 3.4 Geometry Engine optimizations.....................70 3.4.1 Non-blocking GE pipeline....................70 3.4.2 Non-blocking GE write back..................71 3.5 Rasterization Engine optimizations................73 3.5.1 Non-blocking RE pipeline....................73 3.5.2 Non-blocking RE write back..................74 3.6 Bus architecture..................................77 3.6.1 Multi-layer AHB.............................78 3.6.2 Implementation..............................79 第4章 驗證環境與模擬結果.....................................82 4.1 Verification environment..........................82 4.2 Simulation results................................85 4.2.1 Texture compression.........................85 4.2.2 Optimizations of data flow and architecture.86 4.2.3 Comparison with other GPUs..................92 第5章 結論與未來展望.........................................95 5.1 Conclusion........................................95 5.2 Future work.......................................97 參考文獻...................................................98 </pre>

    [1] S.-T. Shen, “Full System Design and Simulation of a Multi-view Graphics Processor using QEMU,” Master Thesis, Dept. of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, Jul. 2009.
    [2] S.-F. Tsai, “Design, Analysis, and Implementation of a Geometry Engine Based on Tile-Based Rendering Architecture in 3D Graphics,” Master Thesis, Dept. of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, Jul. 2009.
    [3] J.-Y. Liou, “Design, Analysis, and Implementation of a Rasterization Engine Based on Tile-Based Rendering Architecture in 3D Graphics,” Master Thesis, Dept. of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, Jul. 2009.
    [4] K.-C. Chen, “QEMU-CoWare Full System Simulation Platform with Simulation Synchronization Profiler,” Master Thesis, Dept. of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, Jul. 2010.
    [5] H.-L. Lin, “Advanced Texture Unit Design of 3D Rendering System,” Master Thesis, Dept. of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, Jul. 2007.
    [6] C.-H. Sun, Y.-M. Tsao, and S.-Y. Chien, “High-Quality Mipmapping Texture Compression With Alpha Maps for Graphics Processing Units,” IEEE Trans. Multimedia, vol. 11, no. 4, Jun. 2009.
    [7] G. Campbell, T. A. DeFanti, J. Frederiksen, et al., “Two Bit/Pixel Full Color Encoding,” Proc. ACM SIGGRAPH Conf. Computer Graphics, vol. 20, no. 4, pp. 215–223, Aug. 1986.
    [8] K. Iourcha, K. S. Nayak, and Z. Hong, “System and Method for Fixed-Rate Block-based Image Compression with Inferred Pixels Values,” US Patent 5,956,431, 1999.
    [9] S. Fenny, “Texture Compression using Low-Frequency Signal Modulation,” Proc. ACM SIGGRAPH /EUROGRAPHICS Conf. Graphics hardware (HWWS ‘03), pp. 84–91, Jul. 2003.
    [10] J. Ström and T. Akenine-Möller, “PACKMAN : Texture Compression for Mobile Phones,” ACM SIGGRAPH Sketches, 2004.
    [11] J. Ström and T. Akenine-Möller, “iPACKMAN : High Quality, Low Complexity Texture Compression for Mobile Phones,” Proc. ACM SIGGRAPH /EUROGRAPHICS Conf. Graphics hardware (HWWS ‘05), pp. 63–70, Jul. 2005.
    [12] M. Pettersson and J. Ström, “Texture Compression : THUMB — Two Hues Using Modified Brightness,” Proceedings of Sigrad, Lund, pp. 7–12, 2005.
    [13] J. Ström and M. Pettersson, “ETC2: texture compression using invalid combinations,” Proc. ACM SIGGRAPH/EUROGRAPHICS Symp. Graphics hardware, pp. 49-54, Aug. 2007.
    [14] M. Pettersson and J. Ström, “Table-based Alpha Compression,” Eurographics Conf. Computer Graphics, vol. 28, no. 2, pp. 687–695, Mar./Apr. 2009.
    [15] C.-H. Tsai, “Design of 3D Graphic Tile-based Rendering Engine for Embedded Systems,” Master Thesis, Dept. of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, Jul. 2007.
    [16] C.-Y. Lin, “Performance Modeling for a 3D Graphics SoC,” Master Thesis, Dept. of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, Jul. 2009.
    [17] L.-B. Chen, C.-T. Yeh, H.-Y. Chen, et al., “A System-Level Model of Design Space Exploration for a Tile-Based 3D Graphics Soc Refinement,” IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, vol. E92-A, no. 12, pp. 3193-3202, Dec. 2009.
    [18] C.-T. Yeh, L.-B. Chen, C.-Y. Lin, et al., “A Bottom-Up Exploration Approach for 3D Graphics Hardware Accelerator in Consumer Electronics,” Proc. SASIMI ‘09, pp. 183-188, Mar. 2009.
    [19] C.-H. Sun, “Low Power Graphics Processing Units with Programmable Texture Unit and Universal Rasterizer for Mobile Multimedia Applications,” Master Thesis, Dept. of Electronics Engineering, National Taiwan University, Taipei, Taiwan, Jul. 2008.
    [20] T. Akenine-Möller and J. Ström, “Graphics for the masses: a hardware rasterization architecture for mobile phones,” ACM Trans. Graphics (Proc. SIGGRAPH ‘03), vol. 22, no. 3, pp. 801-808, Jul. 2003.
    [21] B. Fabrice, “QEMU, a Fast and Portable Dynamic Translator,” Proc. USENIX Ann. Technical Conf., pp. 41-46, 2005.
    [22] A. Munshi and J.Leech, “OpenGL ES Common/Common-Lite Profile Specification Version 1.1.12,” Khronos Group, Apr. 2008.
    [23] Khronos Group, “OpenGL ES API Registry,” http://www.khronos.org/registry/gles/, 2011.
    [24] R. S. Wright, Jr., B. Lipchark, and N. Haemel, OpenGL SuperBible, 4th edition, Addison-Wesley, 2007.
    [25] T. Akenine-Möller and E. Haines, Real-Time Rendering, 2nd edition. A. K. Peters, Ltd, Natick, MA, 2002.
    [26] K. Pulli, et al., Mobile 3D Graphics with OpenGL ES and M3G, Morgan Kaufmann, 2008.
    [27] D. C. Black, J. Donovan, B. Bunton, and A Keist, SystemC: From the Ground Up, 2nd edition, Springer, 2010.
    [28] J. Corbet, A. Rubini, and G. Kroah-Hartman, Linux Device Driver, 3rd edition, O’reilly, Feb. 2005.
    [29] Synopsys Inc., “Platform Architect,” http://www.synopsys.com/Systems/ArchitectureDesign/pages/PlatformArchitect.aspx, 2011
    [30] GreenSocs, “QEMU-SystemC,” http://www.greensocs.com/projects/QEMUSystemC, 2008.
    [31] ARM Ltd., “Multi-layer AHB Overview,” 2004.
    [32] ARM Ltd., “Mali Graphics Hardware,” http://www.arm.com/products/multimedia/mali-graphics-hardware/index.php, 2011
    [33] Tom Olson, “ARM Mali-400 MP:A Scalable GPU for Mobile Devices,” http://www.highperformancegraphics.org/previous/www_2010/media/Hot3D/HPG2010_Hot3D_ARM.pdf, ARM Ltd., Jun. 2010.
    [34] Imagination Technologies Ltd., “POWERVR MBX OpenGL ES 1.x SDK,” http://www.imgtec.com/powervr/insider/sdk/KhronosOpenGLES1xMBX.asp.
    [35] http://thefree3dmodels.com/

    下載圖示 校內:2012-08-31公開
    校外:2012-08-31公開
    QR CODE