| 研究生: |
陳冠宇 Chen, Guan-Yu |
|---|---|
| 論文名稱: |
多階儲存單元相變記憶體之段與列感知記憶體分配器之設計與實作 Design and Implementation of Segment- and Row-aware Memory Allocator for MLC PCM |
| 指導教授: |
張大緯
Chang, Da-Wei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 38 |
| 中文關鍵詞: | 多階儲存單元 、非揮發性記憶體 、記憶體分配 、作業系統 |
| 外文關鍵詞: | Multi-level cell, non-volatile memory, memory allocation, operating system |
| 相關次數: | 點閱:91 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著應用程式所需的記憶體空間逐漸增大,硬體端也需要一個容量大且存取度快的記憶體來支援,由於密度及延展性的限制,傳統的動態隨機存取記憶體 (DRAM) 將無法滿足此需求。非揮發性記憶體 (NVM) 是一種很有機會取代 DRAM 的新興記憶體,並且可以運用多階儲存單元 (MLC) 的技術來提升密度,然而這卻使得 NVM 的存取速度下降。雖然先前已有研究修改硬體來降低存取延遲,但也造成了在同一個記憶體裝置內,不同位置的分頁 (Page) 具有不同的存取速度。為了有效利用存取速度較快的分頁,我們提出了一個新的記憶體分配方法稱作 SegRowMA。我們觀察到有大記憶體需求的應用程式中,堆區段 (heap segment) 通常是記憶體用量占比最大的區段,又發現此區的資料經常會一群一群的初始化,導致預測資料的存取行為時會有困難。SegRowMA 觀察資料的存取行為,將較常存取的資料保持在能快速存取的分頁,並將較少存取的但存取度較快的分頁,轉換成存取速度較慢的分頁,藉此獲得更多的可用分頁,同時又可以避免嚴重的搬移成本。根據實驗結果,相比於傳統的分配方法,SegRowMA 平均改善26.6% 記憶體通量,以及 30.0% 每時脈周期可執行的指令數。
A main memory with large capacity and fast access latency is now required because the memory usage of applications is becoming increasingly larger. Traditional dynamic random access memory (DRAM) is no longer appropriate as the main memory due to limitations in terms of capacity and scalability. Non-volatile memory (NVM) is a promising emerging memory that can be extended with multi-level cell (MLC) techniques to enhance density. However, the drawback is longer access latency. Prior researchers have been made efforts to reduce the access latency of MLC NVM, but it leads to non-uniform access latency among pages in MLC PCM. To exploit fast-accessed pages and improve performance, a memory allocation method called SegRowMA is proposed in this work. It is observed that the heap segment occupies most of the memory usage for large footprint applications, and data in heap segments are initialized group by group. During initialization, it is difficult to predict the access behavior of data. By tracking the access behavior of pages, SegRowMA keeps hot pages fast-accessed. Also, cold pages can be switched to slow-accessed pages to create more free memory space without migration. Hence, SegRowMA may prevent heavy migration overhead. The evaluation results indicate that SegRowMA can improve memory throughput by 26.6% and IPC by 30.0% compared to the conventional allocation method.
[1] L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim, “GraphPIM: enabling instruction-level PIM offloading in graph computing frameworks,” in Proceedings of the IEEE 23rd International Symposium on High Performance Computer Architecture, Austin, TX, USA, pp. 457-468, Feb. 2017.
[2] M. Jalili and H. Sarbazi-Azad, “Tolerating more hard errors in MLC PCMs using compression,” in Proceedings of the IEEE 34th International Conference on Computer Design, Scottsdale, AZ, USA, Oct. 2016, pp. 304-311.
[3] I. Bhati, M. T. Chang, Z. Chishti, S. L. Lu, and B. Jacob, “DRAM refresh mechanisms, penalties, and trade-offs,” IEEE Transactions on Computers, vol. 65, no. 1, pp. 108-121, Jan. 2016.
[4] P. J. Nair, D.-H. Kim, and M. K. Qureshi, “ArchShield: Architectural framework for assisting DRAM scaling by tolerating high error rates,” in Proceedings of the 40th Annual International Symposium on Computer Architecture, Tel-Aviv, Israel, Jan. 2013, pp. 72-83.
[5] J. Boukhobza, S. Rubini, R. Chen, and Z. Shao, “Emerging NVM: A survey on architectural integration and research challenges,” ACM Transactions on Design Automation of Electronic Systems, vol. 23, no. 2, pp. 14:1-14:32, Jan. 2018.
[6] H. P. Wong et al., “Phase Change Memory,” Proceedings of the IEEE, vol. 98, no. 12, pp. 2201-2227, Dec. 2010.
[7] H. P. Wong et al., “Metal–Oxide RRAM,” Proceedings of the IEEE, vol. 100, no. 6, pp. 1951-1970, Jun. 2012.
[8] D. Apalkov, B. Dieny, and J. M. Slaughter, “Magnetoresistive Random Access Memory,” Proceedings of the IEEE, vol. 104, no. 10, pp. 1796-1830, Oct. 2016.
[9] Intel and Micro, “Introducing Intel optane technology - Bringing 3D XPoint memory to storage and memory products”, Nov. 2015. [Online]. Available: https://newsroom.intel.com/press-kits/introducing-intel-optane-technology-bringing-3d-xpoint-memory-to-storage-and-memory-products
[10] J. Izraelevitz et al., “Basic performance measurements of the Intel Optane DC persistent memory module,” in arXiv e-prints, Mar. 2019, pp. arXiv:1903.05714.
[11] F. Bedeschi et al., “A bipolar-selected phase change memory featuring multi-level cell storage,” IEEE Journal of Solid-State Circuits, vol. 44, no. 1, pp. 217-227, Jan. 2009.
[12] C. Xu et al., “Overcoming the challenges of crossbar resistive memory architectures,” in Proceedings of the IEEE 21st International Symposium on High Performance Computer Architecture, Burlingame, CA, USA, Feb 2015, pp. 476-488.
[13] Y. Zhang, D. Feng, J. Liu, W. Tong, B. Wu, and C. Fang, “A novel ReRAM-based main memory structure for optimizing access latency and reliability,” in Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference, Austin, TX, USA, June 2017, pp. 82:1-82:6.
[14] M. K. Qureshi, M. M. Franceschini, and L. A. Lastras-Montaño, “Improving read performance of phase change memories via write cancellation and write pausing,” in Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture, Bangalore, Indian, Jan. 2010, pp. 1-11.
[15] J. Li and K. Mohanram, “Write-once-memory-code phase change memory,” in Proceedings of 2014 Design, Automation Test in Europe Conference Exhibition, Dresden, Germany, Mar. 2014, pp. 1-6.
[16] B. Yang, J. Lee, J. Kim, J. Cho, S. Lee, and B. Yu, “A low power phase-change random access memory using a data-comparison write scheme,” in Proceedings of the 2007 IEEE International Symposium on Circuits and Systems, New Orleans, LA, USA, May. 2007, pp. 3014-3017.
[17] S. Cho and H. Lee, “Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance,” in Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA, Dec. 2009, pp. 347-357.
[18] M. K. Qureshi, M. M. Franceschini, A. Jagmohan, and L. A. Lastras, “PreSET: Improving performance of phase change memories by exploiting asymmetry in write times,” in Proceedings of the 39th Annual International Symposium on Computer Architecture, Portland, OR, USA, Jun. 2012, pp. 380-391.
[19] S. M. Seyedzadeh, R. Maddah, A. Jones, and R. Melhem, “PRES: Pseudo-random encoding scheme to increase the bit flip reduction in the memory,” in Proceedings of the 52nd ACM/EDAC/IEEE Design Automation Conference, San Francisco, CA, USA, Jun. 2015, pp. 1-6.
[20] J. Wang, X. Dong, G. Sun, D. Niu, and Y. Xie, “Energy-efficient multi-level cell phase-change memory system with data encoding,” in Proceeding of the IEEE 29th International Conference on Computer Design, Amherst, MA, USA, Oct. 2011, pp. 175-182.
[21] P. M. Palangappa and K. Mohanram, “CompEx++: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVMs,” ACM Transactions on Architecture and Code Optimization, vol. 14, no. 1, pp. 10:1-10:30, Ap. 2017.
[22] S. Seyedzadeh, A. Jones, and R. Melhem, “Enabling fine-grain restricted coset coding through word-level compression for PCM,” in Proceedings of the IEEE 24th International Symposium on High Performance Computer Architecture, Vienna, Austria, Feb. 2018, pp. 350-361.
[23] M. Hoseinzadeh, M. Arjomand, and H. Sarbazi-Azad, “Reducing access latency of MLC PCMs through line striping,” in Proceedings of the 41st Annual International Symposium on Computer Architecture, Minneapolis, MN, USA, Jun. 2014, pp. 277-288.
[24] M. Arjomand, A. Jadidi, M. T. Kandemir, A. Sivasubramaniam, and C. R. Das, “HL-PCM: MLC PCM main memory with accelerated read,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 11, pp. 3188-3200, Nov 2017.
[25] H. Yoon, J. Meza, N. Muralimanohar, N. P. Jouppi, and O. Mutlu, “Efficient data mapping and buffering techniques for multilevel cell phase-change memories,” ACM Transactions on Architecture and Code Optimization, vol. 11, no. 4, pp. 40:1-40:25, Jan. 2015.
[26] H. Wang, J. Zhang, S. Shridhar, G. Park, M. Jung, and N. S. Kim, “DUANG: Fast and lightweight page migration in asymmetric memory systems,” in Proceedings of the IEEE 22nd International Symposium on High Performance Computer Architecture, Barcelona, Spain, Mar. 2016, pp. 481-493.
[27] M. K. Qureshi, M. M. Franceschini, L. A. Lastras-Montaño, and J. P. Karidis, “Morphable memory system: A robust architecture for exploiting multi-level phase change memories,” in Proceedings of the 37th Annual International Symposium on Computer Architecture, Saint-Malo, France, Jun. 2010, pp. 153-162.
[28] X. Dong and Y. Xie, “AdaMS: Adaptive MLC/SLC Phase-change Memory Design for File Storage,” in Proceedings of the 16th Asia and South Pacific Design Automation Conference, Yokohama, Japan, Jan. 2011, pp. 31-36.
[29] L. Long, D. Liu, X. Zhu, K. Zhong, Z. Shao, and E. H.-M. Sha, “Balloonfish: Utilizing morphable resistive memory in mobile virtualization,” in Proceedings of the 20th Asia and South Pacific Design Automation Conference, Chiba, Japan, Jan. 2015, pp. 322-327.
[30] N. Binkert et al., “The Gem5 simulator,” ACM SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1-7, May. 2011.
[31] The Standard Performance Evaluation Corporation, “SPEC CPU® 2017”, Sep. 2019. [Online]. Available: http://www.spec.org/cpu2017/index.html
[32] S. Beamer, K. Asanovic, and D. A. Patterson, “The GAP Benchmark Suite,” ArXiv, vol. abs/1508.03619, 2015.
[33] Bailey, David H., et al. “The NAS parallel benchmarks” The International Journal of Supercomputing Applications, vol. 5, no. 3, pp. 63-77, Sep. 1991.
[34] R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary, “MineBench: A Benchmark Suite for Data Mining Workloads.” in Proceedings of the IEEE International Symposium on Workload Characterization, San Jose, CA, USA, Oct. 2016, pp. 182-188.
[35] L. Jiang, B. Zhao, Y. Zhang, J. Yang, and B. R. Childers, “Improving write operations in MLC phase change memory,” in Proceedings of the 18th IEEE International Symposium on High-Performance Computer Architecture, New Orleans, LA, USA, Mar 2012, pp. 1-10.
[36] M. Joshi, W. Zhang, and T. Li, “Mercury: A fast and energy-efficient multi-level cell based Phase Change Memory system,” in Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture, Rio de Janeiro, Brazil, Feb 2011, pp. 345-356.
[37] M. Ramezani, N. Elyasi, M. Arjomand, M. T. Kandemir, and A. Sivasubramaniam, “Exploring the impact of memory block permutation on performance of a crossbar ReRAM main memory,” in Proceedings of the 2017 IEEE International Symposium on Workload Characterization, Seattle, WA, USA, Oct. 2017, pp. 167-176.
[38] W. Wei, D. Jiang, S. A. McKee, J. Xiong, and M. Chen, “Exploiting program semantics to place data in hybrid memory,” in Proceeding of 2015 24th International Conference on Parallel Architecture and Compilation, San Francisco, CA, USA, Oct. 2015, pp. 163-173.
[39] H. A. Khouzani, F. S. Hosseini, and C. Yang, “Segment and conflict aware page allocation and migration in DRAM-PCM hybrid main memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 36, no. 9, pp. 1458-1470, Sep. 2017.
[40] R.-S. Liu, D.-Y. Shen, C.-L. Yang, S.-C. Yu, and C.-Y. M. Wang, “NVM Duet: Unified working memory and persistent store architecture,” in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, UT, USA, Mar. 2014, pp. 455-470.
[41] S.-H. Chen, T.-Y. Chen, Y.-H. Chang, H.-W. Wei, and W.-K. Shih, “UnistorFS: A Union Storage File System Design for Resource Sharing Between Memory and Storage on Persistent RAM-Based Systems,” ACM Transactions on Storage, vol. 14, no. 1, pp. 3:1-3:22, Feb. 2018.
校內:2025-07-27公開