簡易檢索 / 詳目顯示

研究生: 陳立偉
Chen, Li-Wei
論文名稱: 高速系統單晶片-Mercurius於嵌入式系統單晶片雛型驗證平台Concord II驗證及網路式路由器-AnyNoC之實現
Verification of High Performance SoC-Mercurius on Embedded MPSoC Rapid Prototyping Platform-Concord II and Implementation of Network-on-Chip Router-AnyNoC
指導教授: 卿文龍
Chin, Wen-Long
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 103
中文關鍵詞: 網路式晶片晶片上通訊系統單晶片流量控制封包式交換輸出佇列
外文關鍵詞: Network on Chip, interconnection networks, System-on-Chip, flow control, packet switching, Output Queue Buffering
相關次數: 點閱:115下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今電子消費市場可明顯看出通訊、數據、影音等許多功能已整合進單一裝置,以智慧型手機為例,手機整合各種晶片來處理多媒體應用,如錄影、照相、微型投影和擴增實境(Augmented Reality,AR)。加上4G/WIFI/LTE無線寬頻技術的快速發展,對於資料傳輸的頻寬要求比起以往強烈許多。

    同時,在電信、娛樂多媒體或手持行動裝置領域上,隨著半導體技術的持續成長,眾多矽智財(silicon intellectual property,SIP)供應商致力於開發多功能、高效能且省電的矽智財,如DSP、GPU、WIFI,其可重複使用(reusable)、可修改(programmable)的模組特性,大幅縮短系統單晶片(System-on-Chip,SoC)的設計時程。然而矽智財供應商卻面臨了難題,矽智財能否在各種環境中相容。當系統設計者希望快速整合各種矽智財時,定義一種標準匯流排協定即成為了最有效率的方法,讓矽智財供應商生產的矽智財具有可攜性(portable)。

    嵌入式系統整合軟體、硬體與行動應用,系統底層以匯流排負責各個矽智財的溝通,為了創造流暢的使用者體驗(User Experience),高效率的匯流排是設計系統單晶片的關鍵。總結前段所述,如何讓矽智財之間相容、快速傳遞封包並且合理的區域化分配,是當前匯流排的重要議題。

    如今系統單晶片整合越來越多矽智財,匯流排擴充性及特定矽智財的高效能需求,成為傳統匯流排面臨的挑戰。根據本實驗室學長的提出的Mercurius高速匯流排架構,其交易效能比傳統匯流排快三倍,本作除了利用完整的嵌入式系統驗證外,也進一步改善前作的功能與效能,在一般交通流量下使效能提升到傳統匯流排的五倍,並設計應用於網路式晶片(Network on Chip)的路由核心(router core),稱之AnyNoC,其具有前作的低延遲、高傳輸量、擴充性。以虛擬點對點(virtual point-to-point)的封包傳輸通道,同時接收所有輸入埠(ingress port)的資料,且立即傳送所有封包至輸出埠(egress port),避免多個裝置進行存取時的匯流排競爭,大幅提升矽智財之間的頻寬。路由核心使用共享式記憶體輸出佇列封包交換技術(shared-memory output-queue packet switching)來充份利用封包緩衝區。輸出佇列架構可以消除前端緩衝佇列阻塞效應(HOL blocking),並支援多筆資料交易(multiple outstanding transaction),使系統效能達到最佳。流量控制(flow control)模組不僅在公平分配系統資源之下避免封包緩衝區溢出,更有效利用封包緩衝區,並提升系統單晶片上的服務品質(quality of service,QoS)。

    本作AnyNoC中的連接設定上,相鄰的路由核心之間使用封包式傳輸,並利用適合的網路介面(network interface,NI)支援各種通訊協定(AHB、AXI、OCP)的矽智財。AnyNoC的多網段(network segment)架構可彈性擴充大量的矽智財或匯流排,在單一晶片中建立全域非同步及區域同步(globally asynchronous locally synchronous,GALS)時脈系統;各個路由核心與網路介面分別具有獨立時脈;核心與裝置之間是透過交握(handshaking)方式來互相溝通,大幅降低網路時脈樹(clock tree)結構的複雜度,亦使整體時脈樹功耗更低。且交握方式的非同步電路防止大型系統單晶片下的時鐘傾斜(clock skew),並利用非同步(asynchronous)資料緩衝區來防止因為不同時脈溝通時產生的亞穩態(metastable)輸出。

    In this paper, the performance of a packet switched router is improved by
    means of a shared-memory output-queue technique. The enhanced router not only has the benefits of good scalability and a high resource utilization
    efficiency, but also offers a superior communication performance due to its
    virtual point-to-point characteristic. In the proposed router, a virtual point-to-point connection is set up at each dedicated port; thus yielding a higher throughput without interrupting the slower devices or causing congestion at any of the other ports. In addition, the head-of-line (HOL) blocking problem is resolved by using a flit buffer to construct an output queued switch. In implementing the flit buffer, the buffer address is maintained by a dynamic linked list; thereby avoiding packet losses with flow control. The packet switched router is implemented in two different environments. In the first case, the router is implemented in a real-time embedded system in place of the traditional shared bus (e.g., the AHB bus).In the second case, the proposed router is used to construct a network-on-chip (NoC). The performance of the proposed router, designated as AnyNoC is evaluated by comparing the overall system throughput of the constructed NoC with that obtained using a traditional packet-switched NoC. It is shown that AnyNoC yields a significant reduction in the network latency compared to that achieved using a traditional router architecture.

    摘要 . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . iii 誌謝 . . . . . . . . . . . . . . . . . . . ix 目錄 .................................... x 圖目錄................................... xiii 表目錄................................... xvii 第一章 導論............................ 1 1.1 研究動機 .......................... 1 1.2 論文方向 .......................... 3 1.3 研究貢獻 .......................... 4 1.4 論文編排 ................... 6 第二章 背景知識介紹 ............... 7 2.1 系統單晶片 .................. 7 2.2 網路式晶片架構 ............... 9 2.2.1 訊息交換技術 .............. 11 2.2.2 拓撲結構 ................. 12 2.2.3 路由演算法................ 15 2.2.4 交換方式 ...................19 2.2.5 虛擬通道 .................... 22 2.2.6 蟲洞交換路由器架構 ............. 22 2.3 AMBA 匯流排協定 ................ 25 2.3.1 訊號定義 ............................. 26 2.3.2 接腳訊號 ....................................... 26 2.3.3 匯流排架構.............................................. 29 2.3.4 基本 AHB 交易 ............................. 30 2.3.5 控制訊號 ......................................... 31 2.3.6 地址解碼器......................................... 35 2.3.7 預設僕裝置........................................ 35 2.3.8 資料多工器.................................... 35 第三章 網路路由器架構 ............................. 37 3.1 操作原理與架構 ..................................... 38 3.1.1 動態鏈結串列 ................................... 40 3.1.2 路由核心架構及操作步驟............................. 42 3.1.3 流量控制 .................................... 46 3.2 路由核心的改善 ........................................... 47 3.2.1 擴充性 .............................................. 47 3.2.2 鏈結操作管理 ................................. 48 3.3 網路介面的改善 ...................................... 50 3.3.1 主裝置與主裝置間的通訊...................... 50 3.3.2 非同步先進先出佇列 ........................ 51 3.3.3 Mercurius 讀取交易 ................... 54 3.3.4 裝置接取 ................................ 58 3.3.5 硬體參數、專案腳本 ....................... 65 第四章 Concord 系統架構 ........................... 66 4.1 硬體實現 ........................................ 66 4.2 Concord 介紹 .................................... 67 4.3 實驗環境 ........................................ 69 4.4 母板架構 ........................................ 70 4.4.1 數位時脈管理器 ................................... 71 4.4.2 記憶體 .......................................... 73 4.5 Concord 系統 ...................................... 74 4.6 Concord 上的 AHB 訊號時序 ............................. 80 第五章 實現環境與實驗結果 .............................. 83 5.1 晶片網路環境......................................... 83 5.1.1 封包格式 ................................... 83 5.1.2 網路拓撲 ......................................... 84 5.1.3 路由演算法及交握訊號 ............................... 85 5.1.4 AnyNoC 模擬結果....................................... 85 5.2 Mercurius 模擬環境 ..................................... 90 5.3 臨界路徑修改......................................... 96 5.4 Mercurius 移植及實際運作 Concord 之結果.................... 96 第六章 結論與未來展望 .................................... 99

    [1] A. Bindal, S. Mann, B. Ahmed, and L. Raimundo, “An undergraduate system-on-
    chip (soc) course for computer engineering students,” Education, IEEE Transactions
    on, vol. 48, pp. 279–289, May 2005.
    [2] “Amba specification (rev. 2).” May 1999.
    [3] F. Poletti, D. Bertozzi, L. Benini, and A. Bogliolo, “Performance analysis of arbitra-
    tion policies for soc communication architectures.,” Design Autom. for Emb. Sys.,
    vol. 8, no. 2-3, pp. 189–210, 2003.
    [4] L. Benini and D. Bertozzi, “Network-on-chip architectures and design methods,”
    Computers and Digital Techniques, IEE Proceedings -, vol. 152, pp. 261–272, Mar
    2005.
    [5] C. Grecu, P. P. Pande, A. Ivanov, and R. Saleh, “Structured interconnect archi-
    tecture: a solution for the non-scalability of bus-based socs.,” in ACM Great Lakes
    Symposium on VLSI (D. Garrett, J. Lach, and C. A. Zukowski, eds.), pp. 192–195,
    ACM, 2004.
    [6] C.-Y. Lin, “Mercurius :a high speed and flexible amba architecture,” Master’s thesis,
    NCKU, June 2012.
    [7] G. D. Micheli and L. Benini, “Networks on chip: A new paradigm for systems on
    chip design.,” in DATE, pp. 418–419, IEEE Computer Society, 2002.
    101
    [8] Y. Tamir and G. Frazier, “Dynamically-allocated multi-queue buffers for vlsi com-
    munication switches,” Computers, IEEE Transactions on, vol. 41, pp. 725–737, Jun
    1992.
    [9] “82596 user’s manual.” 1989.
    [10] J.-H. Tang, “Design of a wormhole switch circuit for network on a chip,” Master’s
    thesis, NCKU, July 2013.
    [11] S. Keshav, An Engineering Approach to Computer Networking: ATM Networks, the
    Internet, and the Telephone Network. Massachusetts: Addison Wesley Longman,
    1997.
    [12] H. M. A. Agarwal, C. Iskander and R. Shankar, “Survey of network on chip archi-
    tectures and contributions,” Journal of Engineering, Computing and Architecture,
    2009.
    [13] F. Davik, M. Yilmaz, S. Gjessing, and N. Uzun, “Ieee 802.17 resilient packet ring
    tutorial,” Communications Magazine, IEEE, vol. 42, pp. 112–118, Mar 2004.
    [14] S. Kumar, A. Jantsch, J.-P. Soininen, M. Forsell, M. Millberg, J. Oberg, K. Tiensyrja,
    and A. Hemani, “A network on chip architecture and design methodology,” in VLSI,
    2002. Proceedings. IEEE Computer Society Annual Symposium on, pp. 105–112, 2002.
    [15] W. J. Dally and B. Towles, “Route packets, not wires: On-chip interconnection
    networks.,” in DAC, pp. 684–689, ACM, 2001.
    [16] F. G. M. A. V. de Mello, L. C. Ost and N. L. V. Calazans, “Evaluation of routing
    algorithms on mesh based nocs,” Pontifícia Universidade Católica do Rio Grande do
    Sul, Mar 2004.
    [17] L. Benini and G. Micheli, Networks on Chips: Technology and Tools. Morgan Kauf-
    mann, 2006.
    102
    [18] N. Jerger and L. Peh, On-Chip Networks. Morgan and Claypool Pub., 2009.
    [19] L. G. Valiant and G. J. Brebner, “Universal schemes for parallel communication,” in
    STOC, pp. 263–277, ACM, 1981.
    [20] T. Nesson and S. L. Johnsson, “Romm routing on mesh and torus networks.,” in
    SPAA, pp. 275–287, 1995.
    [21] D. Seo, A. Ali, W.-T. Lim, N. Rafique, and M. Thottethodi, “Near-optimal worst-
    case throughput routing for two-dimensional mesh networks.,” in ISCA, pp. 432–443,
    IEEE Computer Society, 2005.
    [22] P. Wolkotte, G. J. M. Smit, G. Rauwerda, and L. Smit, “An energy-efficient recon-
    figurable circuit-switched network-on-chip,” in Parallel and Distributed Processing
    Symposium, 2005. Proceedings. 19th IEEE International, pp. 155a–155a, April 2005.
    [23] W. J. Dally and B. Towles, Principles and Practices of Interconnection Network. San
    Mateo, CA: Morgan Kaufmann, 2004.
    [24] J. Joyner, R. Venkatesan, P. Zarkesh-Ha, J. Davis, and J. Meindl, “Impact of three-
    dimensional architectures on interconnects in gigascale integration,” Very Large Scale
    Integration (VLSI) Systems, IEEE Transactions on, vol. 9, pp. 922–928, Dec 2001.
    [25] L.-S. Peh and W. Dally, “A delay model and speculative architecture for pipelined
    routers,” in High-Performance Computer Architecture, 2001. HPCA. The Seventh
    International Symposium on, pp. 255–266, 2001.
    [26] J. Lee, C. Nicopoulos, S. J. Park, M. Swaminathan, and J. Kim, “Do we need wide
    flits in networks-on-chip?,” in VLSI (ISVLSI), 2013 IEEE Computer Society Annual
    Symposium on, pp. 2–7, Aug 2013.
    [27] L. Peterson and S. Davie, Computer Networks. ELSEVIER, 4 th.

    下載圖示 校內:2019-08-28公開
    校外:2019-08-28公開
    QR CODE