成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳亮錡 Chen, Liang-Chi
論文名稱：	利用近記憶體運算系統加速核糖核酸序列表現量量化之效能觀察以及軟體方法設計 Accelerating RNA Sequence Quantification on Real Processing-Near-Memory System: Observation and Design
指導教授：	何建忠 Ho, Chien-Chung
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	48
中文關鍵詞：	記憶體內運算、近記憶體運算、核糖核酸序列表現量量化、UPMEM DPU
外文關鍵詞：	processing-in-memory, processing-near-memory, RNA sequence quantification, UPMEM DPU
相關次數：	點閱：93 下載：21
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在近代的新興計算機架構發展中，記憶體內處理 (Processing in Memory，PIM) 之加速技術逐漸引起人們的關注，原因是因為PIM展現出對於資料密集性應用 (data-intensive applications) 有很大的加速潛力，PIM可以通過減少處理器 (CPU) 和記憶體 (memory) 之間的晶片外資料搬移 (off-chip data movement)，進而去減緩傳統計算機架構的效能瓶頸問體 (von-Neumann bottleneck)。UPMEM公司在2019年發表了具有PIM概念的加速器新產品 - DRAM Processing Unit (DPU)[14]，在近年已經有大量學者以及團隊投入於DPU相關之研究，也已經有許多優秀的使用案例，證明了UPMEM DPU這種新興的計算機架構有助於加速許多資料密集性應用；不過由於其硬體設計的緣故，UPMEM DPU也有些許使用上的限制，因此本研究將專注於使用軟體設計解決這些硬體所帶來的使用限制。
資料密集性的應用之中，RNA序列量化是在生物資訊領域中一項重要的分析方法，用於衡量RNA序列的豐富度。而在RNA序列量化中存在一個主要的效能瓶頸為序列比對 (sequence alignment)，即將一組RNA序列 (RNA reads) 與資料庫中的長序列（稱為轉錄體，transcriptome）進行比對。我們希望可以使用DPU來加速並解決這效能瓶頸，不過因為DPU有硬體所帶來的使用限制，讓設計上會面臨些許困難。為了更好地了解如何在DPU上利用軟體設計來突破這些困難，我們選擇了現代最熱門也最常被使用的RNA序列量化軟體kallisto [7] 來嘗試進行加速並且研究其行為表現，藉由觀察來找出需要注意之軟體設計考慮因素。為了實現這一目標，我們實作了一個DPU版本的kallisto，並且建立了一系列實驗來評估其表現以及行為。通過所呈現的分析和比較，發現在序列比對中所使用的雜湊表 (hash table) 需要的記憶體容量很大，也是造成在DPU上實現RNA序列量化會有困難的原因，因為DPU之中的記憶體為固定有限容量，但每個DPU不能直接的互相分享資料，這樣的限制讓我們在hash table的存放上產生極大的困難。因此我們在軟體設計的層面上提出了新興的DPU管理概念，使用pipeline流程管理，此設計考量了DPU的硬體限制，讓DPU在有限的記憶體容量下也可以高效率的完成序列比對。最終實驗結果不僅證明了我們提出的設計之可行性，以及顯示我們的DPU管理方法可以解決DPU的有限硬體資源問題，更可以在不失去序列比對準確度的情況下，大量優化RNA序列量化的整體效能。

Recently, in the development of emerging computer architectures, an acceleration technique called Processing in Memory (PIM) has garnered attention due to its potential to address the performance bottleneck known as the von-Neumann bottleneck by reducing off-chip data movement between the processing unit (such as CPU) and memory component (such as DRAM or NVM). In 2019, UPMEM introduced the commercially available processing-in-memory accelerator, the DRAM Processing Unit (DPU)[14]. Since then, numerous researchers and teams have devoted their efforts to DPU-related studies, and there have been impressive use cases demonstrating the benefits of UPMEM DPU in accelerating data-intensive applications. However, due to the hardware design, UPMEM DPU also imposes certain usage limitations. This research focuses on addressing these hardware-related usage constraints.
Among data-intensive applications, RNA sequence quantification is a critical analysis method in the field of bioinformatics, used to measure the abundance of RNA sequences. The main performance bottleneck in RNA sequence quantification is alignment, which involves comparing a set of RNA reads with a database of longer sequences called the transcriptome. We aim to leverage DPUs to accelerate this performance bottleneck. However, due to the usage limitations imposed by DPU hardware, understanding how to design software for DPU becomes crucial. To better understand the design considerations, we selected the most wide-used RNA sequence quantification tool, kallisto[7], as a case study to explore acceleration possibilities and investigate its behavior. Through observations, we identified the hash table used in sequence alignment as a major challenge for implementing RNA sequence quantification on DPUs. Because the hash table requires substantial memory resources; however, DPUs have fixed and limited memory capacity, and each DPU cannot directly share data with others, making it challenging to store the large memory footprint required by the hash table. To overcome this limitation, we propose a novel DPU management utilizing a pipeline-based concept at the software level. This design considers the hardware limitations of DPUs, enabling efficient sequence alignment even within the constrained memory capacity of DPUs. The experimental results demonstrate the feasibility of our proposed design and show that our DPU management approach effectively addresses the limited hardware resources of DPUs. Additionally, it significantly optimizes the overall performance of RNA sequence quantification without compromising alignment accuracy.

摘要 i
Abstract ii
Acknowledgements iii
Table of Contents iv
List of Tables vi
List of Figures vii
Chapter 1. Introduction 1
Chapter 2. Background 4
2.1. Processing in Memory 4
2.2. UPMEM Processing-in-Memory DIMM Module 5
2.3. RNA Sequence Abundance Quantification 8
2.3.A. Basics 8
2.3.B. Kallisto 10
Chapter 3. Motivation 13
3.1. Observation 13
3.2. Objective 16
Chapter 4. Design 18
4.1. Overview 18
4.2. The Allocation of RNA Transcriptome in DPU System 19
4.3. The Process of RNA Sequence Abundance Quantification on DPU System 21
4.4. DPU Pipeline Management Algorithm 23
Chapter 5. Experiment and Evaluation Study 27
5.1. Experimental Setup 27
5.1.A. The Goal of Experiment 27
5.1.B. Experimental Software/Hardware Environment 28
5.1.C. Experimental Dataset 29
5.2. Experimental Result 30
5.2.A. DPU Execution and Overall Performance 30
5.2.B. Effect of Data Granularity Size on Performance 32
5.2.C. Effect of Different Settings (k-mer, Dataset Size) on Performance 34
5.2.D. Accuracy of Proposed Quantification Design 36
Chapter 6. Related Work 38
6.1. UPMEM-Related Research 38
6.2. Other Processing-in-Memory Research 39
Chapter 7. Conclusion 40
References 42
                                    

[1] Amogh Agrawal, Akhilesh Jaiswal, Chankyu Lee, and Kaushik Roy. X-sram: Enabling in-memory boolean computations in cmos static random access memories. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(12):4219–4232, 2018.
[2] Berkin Akin, Franz Franchetti, and James C Hoe. Data reorganization in memory using 3d-stacked dram. ACM SIGARCH Computer Architecture News, 43(3S):131–143, 2015.
[3] Mustafa Ali, Akhilesh Jaiswal, Sangamesh Kodge, Amogh Agrawal, Indranil Chakraborty, and Kaushik Roy. Imac: In-memory multi-bit multiplication and accumulation in 6t sram array. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(8):2521–2531, 2020.
[4] Mustafa F Ali, Akhilesh Jaiswal, and Kaushik Roy. In-memory low-cost bit-serial addition using commodity dram technology. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(1):155–165, 2019.
[5] Alexander Baumstark, Muhammad Attahir Jibril, and Kai-Uwe Sattler. Accelerating large table scan using processing-in-memory technology. BTW 2023, 2023.
[6] James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, and Robert M Davies. HTSlib: C library for reading/writing high-throughput sequencing data. GigaScience, 10(2), 02 2021. giab007.
[7] Nicolas L Bray, Harold Pimentel, Páll Melsted, and Lior Pachter. Near-optimal probabilistic rna-seq quantification. Nature biotechnology, 34(5):525–527, 2016.
[8] Liang-Chi Chen, Shu-Qi Yu, Chien-Chung Ho, Yuan-Hao Chang, Da-Wei Chang, WeiChen Wang, and Yu-Ming Chang. Rna-seq quantification on processing in memory architecture: Observation and characterization. In 2022 IEEE 11th Non-Volatile Memory Systems and Applications Symposium (NVMSA), pages 26–32. IEEE, 2022.
[9] Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. ACM SIGARCH Computer Architecture News, 44(3):27–39, 2016.
[10] chi 0828. Uppipe. https://github.com/chi-0828/UpPipe.
[11] Zamshed I Chowdhury, S Karen Khatamifard, Salonik Resch, Hüsrev Cılasun, Zhengyang Zhao, Masoud Zabihi, Meisam Razaviyayn, Jian-Ping Wang, Sachin S Sapatnekar, and Ulya R Karpuzcu. Cram-seq: Accelerating rna-seq abundance quantification using computational ram. IEEE Transactions on Emerging Topics in Computing, 10(4):2055–2071, 2022.
[12] DaehwanKimLab. tophat. https://github.com/DaehwanKimLab/tophat.
[13] Prangon Das, Purab Ranjan Sutradhar, Mark Indovina, Sai Manoj Pudukotai Dinakarrao, and Amlan Ganguly. Implementation and evaluation of deep neural networks in commercially available processing in memory hardware. In 2022 IEEE 35th International System-on-Chip Conference (SOCC), pages 1–6. IEEE, 2022.
[14] Fabrice Devaux. The true processing in memory accelerator. In 2019 IEEE Hot Chips 31 Symposium (HCS), pages 1–24. IEEE Computer Society, 2019.
[15] Safaa Diab, Amir Nassereldine, Mohammed Alser, Juan Gómez Luna, Onur Mutlu, and Izzat El Hajj. A framework for high-throughput sequence alignment using real processing-in-memory systems. Bioinformatics, 39(5):btad155, 2023.
[16] Qing Dong, Mahmut E Sinangil, Burak Erbagci, Dar Sun, Win-San Khwa, Hung-Jen Liao, Yih Wang, and Jonathan Chang. 15.3 a 351tops/w and 372.4 gops compute-inmemory sram macro in 7nm finfet cmos for machine-learning applications. In 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pages 242–244. IEEE, 2020.
[17] Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. Drama: An architecture for accelerated processing near memory. IEEE Computer Architecture Letters, 14(1):26–29, 2014.
[18] Saugata Ghose, Amirali Boroumand, Jeremie S Kim, Juan Gómez-Luna, and Onur Mutlu. Processing-in-memory: A workload-driven perspective. IBM Journal of Research and Development, 63(6):3–1, 2019.
[19] Christina Giannoula, Ivan Fernandez, Juan Gómez Luna, Nectarios Koziris, Georgios Goumas, and Onur Mutlu. Sparsep: Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 6(1):1–49, 2022.
[20] Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F Oliveira, and Onur Mutlu. Benchmarking memory-centric computing systems: Analysis of real processing-in-memory hardware. In 2021 12th International Green and Sustainable Computing Conference (IGSC), pages 1–7. IEEE, 2021.
[21] Juan Gómez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F Oliveira, Gagandeep Singh, and Onur Mutlu. An experimental evaluation of machine learning training on a real processing-in-memory system. arXiv preprint arXiv:2207.07886, 2022.
[22] SAFARI Research Group. Prim benchmark suite. https://github.com/CMUSAFARI/prim-benchmarks.
[23] Zvika Guz, Manu Awasthi, Vijay Balakrishnan, Mrinmoy Ghosh, Anahita Shayesteh,Tameesh Suri, and Samsung Semiconductor. Real-time analytics as the killer application for processing-in-memory. Near Data Processing (WoNDP), pages 10–2, 2014.
[24] Han-Wen Hu, Wei-Chen Wang, Yuan-Hao Chang, Yung-Chun Lee, Bo-Rong Lin, HuaiMu Wang, Yen-Po Lin, Yu-Ming Huang, Chong-Ying Lee, Tzu-Hsiang Su, et al. Ice: An intelligent cognition engine with 3d nand-based in-memory computing for vector similarity search acceleration. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 763–783. IEEE, 2022.
[25] Han-Wen Hu, Wei-Chen Wang, Chung-Kuang Chen, Yung-Chun Lee, Bo-Rong Lin, Huai-Mu Wang, Yen-Po Lin, Yu-Chao Lin, Chih-Chang Hsieh, Chia-Ming Hu, et al. A 512gb in-memory-computing 3d-nand flash supporting similar-vector-matching operations on edge-ai devices. In 2022 IEEE International Solid-State Circuits Conference (ISSCC), volume 65, pages 138–140. IEEE, 2022.
[26] Intel. Intel developer cloud, 2019.
[27] Intel. Intel advisor, 2021.
[28] Alberto Jaspe Villanueva. Scalable exploration of 3D massive models. PhD thesis, 2018.
[29] Chuan-Jia Jhang, Cheng-Xin Xue, Je-Min Hung, Fu-Chun Chang, and Meng-Fan Chang. Challenges and trends of sram-based computing-in-memory for ai edge devices. IEEE Transactions on Circuits and Systems I: Regular Papers, 68(5):1773–1786, 2021.
[30] Hongbo Kang, Yiwei Zhao, Guy E Blelloch, Laxman Dhulipala, Yan Gu, Charles McGuffey, and Phillip B Gibbons. Pim-tree: A skew-resistant index for processingin-memory. arXiv preprint arXiv:2211.10516, 2022.
[31] Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S Lee, et al. Recnmp: Accelerating personalized recommendation with near-memory processing. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 790–803. IEEE, 2020.
[32] Daehwan Kim, Geo Pertea, Cole Trapnell, Harold Pimentel, Ryan Kelley, and Steven L Salzberg. Tophat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome biology, 14(4):1–13, 2013.
[33] Jin Hyun Kim, Shin-Haeng Kang, Sukhan Lee, Hyeonsu Kim, Yuhwan Ro, Seungwon Lee, David Wang, Jihyun Choi, Jinin So, YeonGon Cho, et al. Aquabolt-xl hbm2-pim, lpddr5-pim with in-memory processing, and axdimm with acceleration buffer. IEEE Micro, 42(3):20–30, 2022.
[34] Jin Hyun Kim, Shin-haeng Kang, Sukhan Lee, Hyeonsu Kim, Woongjae Song, Yuhwan Ro, Seungwon Lee, David Wang, Hyunsung Shin, Bengseng Phuah, et al. Aquaboltxl: Samsung hbm2-pim with in-memory processing for ml accelerators and beyond. In 2021 IEEE Hot Chips 33 Symposium (HCS), pages 1–26. IEEE, 2021.
[35] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. Flipping bits in memory without accessing them: An experimental study of dram disturbance errors. ACM SIGARCH Computer Architecture News, 42(3):361–372, 2014.
[36] Gunjae Koo, Kiran Kumar Matam, Te I, HV Krishna Giri Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, and Murali Annavaram. Summarizer: trading communication with computing near storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pages 219–231, 2017.
[37] Young-Cheon Kwon, Suk Han Lee, Jaehoon Lee, Sang-Hyuk Kwon, Je Min Ryu, JongPil Son, O Seongil, Hak-Soo Yu, Haesuk Lee, Soo Young Kim, et al. 25.4 a 20nm 6gb function-in-memory dram, based on hbm2 with a 1.2 tflops programmable computing unit using bank-level parallelism, for machine learning applications. In 2021 IEEE International Solid-State Circuits Conference (ISSCC), volume 64, pages 350–352. IEEE, 2021.
[38] Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations in deep learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 740–753, 2019.
[39] Dominique Lavenier, Remy Cimadomo, and Romaric Jodin. Variant calling parallelization on processor-in-memory architecture. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 204–207. IEEE, 2020.
[40] Dominique Lavenier, Charles Deltel, David Furodet, and Jean-François Roy. BLAST on UPMEM. PhD thesis, INRIA Rennes-Bretagne Atlantique, 2016.
[41] Dominique Lavenier, Jean-Francois Roy, and David Furodet. Dna mapping using processor-in-memory architecture. In 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1429–1435. IEEE, 2016.
[42] Joo Hwan Lee, Hui Zhang, Veronica Lagrange, Praveen Krishnamoorthy, Xiaodong Zhao, and Yang Seok Ki. Smartssd: Fpga accelerated near-storage data analytics on ssd. IEEE Computer architecture letters, 19(2):110–113, 2020.
[43] Shengwen Liang, Ying Wang, Cheng Liu, Huawei Li, and Xiaowei Li. Ins-dla: An in-ssd deep learning accelerator for near-data processing. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL), pages 173–179. IEEE, 2019.
[44] Hang-Ting Lue, Po-Kai Hsu, Ming-Liang Wei, Teng-Hao Yeh, Pei-Ying Du, Wei-Chen Chen, Keh-Chung Wang, and Chih-Yuan Lu. Optimal design methods to transform 3d nand flash into a high-density, high-bandwidth and low-power nonvolatile computing in memory (nvcim) accelerator for deep-learning neural networks (dnn). In 2019 IEEE International Electron Devices Meeting (IEDM), pages 38–1. IEEE, 2019.
[45] Sparsh Mittal. A survey of reram-based architectures for processing-in-memory and neural networks. Machine learning and knowledge extraction, 1(1):75–114, 2018.
[46] Samuel K Moore. Ai computing comes to memory chips: Samsung will double performance of neural nets with processing-in-memory. IEEE Spectrum, 59(1):40–41, 2022.
[47] NCBI. Sequence read archive (sra). https://www.ncbi.nlm.nih.gov/sra.
[48] Leibin Ni, Zichuan Liu, Hao Yu, and Rajiv V Joshi. An energy-efficient digital reramcrossbar-based cnn with bitwise parallelism. IEEE Journal on Exploratory solid-state computational devices and circuits, 3:37–46, 2017.
[49] Joel Nider, Craig Mustard, Andrada Zoltan, John Ramsden, Larry Liu, Jacob Grossbard, Mohammad Dashti, Romaric Jodin, Alexandre Ghiti, Jordi Chauzi, et al. A case study of {Processing-in-Memory} in {off-the-Shelf} systems. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 117–130, 2021.
[50] Geraldo F Oliveira, Juan Gómez-Luna, Saugata Ghose, Amirali Boroumand, and Onur Mutlu. Accelerating neural network inference with processing-in-dram: From the edge to the cloud. IEEE Micro, 42(6):25–38, 2022.
[51] pachterlab. kallisto. https://github.com/pachterlab/kallisto.
[52] pachterlab. kallisto-transcriptome-indices. https://github.com/pachterlab/kallistotranscriptome-indices.
[53] Jun-Seok Park, Heonsoo Lee, Dongwoo Lee, Jewoo Moon, Suknam Kwon, SangHyuck Ha, MinSeong Kim, Junghun Park, Jihoon Bang, and Sukhwan Lim Inyup Kang. Samsung neural processing unit: An ai accelerator and sdk for flagship mobile ap. In 2021 IEEE Hot Chips 33 Symposium (HCS), pages 1–21. IEEE Computer Society, 2021.
[54] Rob Patro, Stephen M Mount, and Carl Kingsford. Sailfish enables alignment-free isoform quantification from rna-seq reads using lightweight algorithms. Nature biotechnology, 32(5):462–464, 2014.
[55] Michael Roberts, Wayne Hayes, Brian R Hunt, Stephen M Mount, and James A Yorke. Reducing storage requirements for biological sequence comparison. Bioinformatics, 20(18):3363–3369, 2004.
[56] Sahand Salamat, Armin Haj Aboutalebi, Behnam Khaleghi, Joo Hwan Lee, Yang Seok Ki, and Tajana Rosing. Nascent: Near-storage acceleration of database sort on smartssd. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’21, page 262–272, New York, NY, USA, 2021. Association for Computing Machinery.
[57] samtools. htslib. https://github.com/samtools/htslib.
[58] Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A Kozuch, Onur Mutlu, Phillip B Gibbons, and Todd C Mowry. Ambit: In-memory accelerator for bulk bitwise operations using commodity dram technology. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pages 273–287, 2017.
[59] Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News, 44(3):14–26, 2016.
[60] Wonbo Shim and Shimeng Yu. Gp3d: 3d nand based in-memory graph processing accelerator. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 12(2):500–507, 2022.
[61] Xin Si, Yung-Ning Tu, Wei-Hsing Huang, Jian-Wei Su, Pei-Jung Lu, Jing-Hong Wang, Ta-Wei Liu, Ssu-Yen Wu, Ruhui Liu, Yen-Chi Chou, et al. 15.5 a 28nm 64kb 6t sram computing-in-memory macro with 8b mac operation for ai edge chips. In 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pages 246–248. IEEE, 2020.
[62] Mahmut E Sinangil, Burak Erbagci, Rawan Naous, Kerem Akarvardar, Dar Sun, WinSan Khwa, Hung-Jen Liao, Yih Wang, and Jonathan Chang. A 7-nm compute-inmemory sram macro supporting multi-bit input, weight and output and achieving 351 tops/w and 372.4 gops. IEEE Journal of Solid-State Circuits, 56(1):188–198, 2020.
[63] Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, and Albert-Jan Boonstra. A review of near-memory computing architectures: Opportunities and challenges. In 2018 21st Euromicro Conference on Digital System Design (DSD), pages 608–617. IEEE, 2018.
[64] Xuan Sun, Hu Wan, Qiao Li, Chia-Lin Yang, Tei-Wei Kuo, and Chun Jason Xue. Rmssd: In-storage computing for large-scale recommendation inference. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 1056–1070, 2022.
[65] Po-Hao Tseng, Feng-Ming Lee, Yu-Hsuan Lin, Liang-Yu Chen, Yung-Chun Li, HanWen Hu, Yun-Yuan Wang, Chih-Chang Hsieh, Ming-Hsiu Lee, Hsiang-Lan Lung, et al. In-memory-searching architecture based on 3d-nand technology with ultra-high parallelism. In 2020 IEEE International Electron Devices Meeting (IEDM), pages 36–1. IEEE, 2020.
[66] Po-Hao Tseng, Feng-Ming Lee, Yu-Hsuan Lin, Yun-Yuan Wang, Ming-Hsiu Lee, Kuang-Yeu Hsieh, Keh-Chung Wang, and Chih-Yuan Lu. A hybrid in-memorysearching and in-memory-computing architecture for nvm based ai accelerator. In 2021 Symposium on VLSI Technology, pages 1–2. IEEE, 2021.
[67] UPMEM. Upmem sdk. https://sdk.upmem.com/.
[68] Xin Xin, Youtao Zhang, and Jun Yang. Elp2im: Efficient and low power bitwise operation processing in dram. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 303–314. IEEE, 2020.
[69] Xunzhao Yin, Yu Qian, Mohsen Imani, Kai Ni, Chao Li, Grace Li Zhang, Bing Li, Ulf Schlichtmann, and Cheng Zhuo. Ferroelectric ternary content addressable memories for energy efficient associative search. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
[70] Salessawi Ferede Yitbarek, Tao Yang, Reetuparna Das, and Todd Austin. Exploring specialized near-memory processing for data intensive operations. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1449–1452. IEEE, 2016.
[71] Fan Zhang, Shaahin Angizi, Naima Ahmed Fahmi, Wei Zhang, and Deliang Fan. Pimquantifier: A processing-in-memory platform for mrna quantification. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 43–48. IEEE, 2021.
[72] Jintao Zhang, Zhuo Wang, and Naveen Verma. In-memory computation of a machinelearning classifier in a standard 6t sram array. IEEE Journal of Solid-State Circuits, 52(4):915–924, 2017.
[73] Vasileios Zois, Divya Gupta, Vassilis J Tsotras, Walid A Najjar, and Jean-Francois Roy. Massively parallel skyline computation for processing-in-memory architectures. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, pages 1–12, 2018.

校內：立即公開
校外：立即公開

簡易檢索 / 詳目顯示

相關論文