簡易檢索 / 詳目顯示

研究生: 孫啟慧
Sun, Qi-Hui
論文名稱: 利用卷積神經網路擴增資源受限設備於環境感測之應用
Augmenting Resource-Constrained Devices with Convolutional Neural Networks for Environmental Sensing Applications
指導教授: 涂嘉恒
Tu, Chia-Heng
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 86
中文關鍵詞: 卷積神經網路物聯網分佈式系統設計嵌入式裝置微型機器學習環境監測應用
外文關鍵詞: Convolutional neural network design, Internet-of-Things, Distributed system design, embedded devices, TinyML, environmental monitoring applications
相關次數: 點閱:77下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習的不斷發展,其被廣泛應用在物聯網設備中。物聯網設備功能增強的同時,帶來在資源有限的設備上運行機器學習的需求激增。物聯網設備的有限資源與深度學習對於運算力和存儲資源的大量需求相互矛盾,因此如何向資源有限的嵌入式設備部署運算密集的深度學習模型成為重大挑戰。在本論文中,我們提出一個名為 NERD 的軟體框架,旨在為資源有限的不同等級嵌入式設備客製化卷積神經網路,以滿足特定應用的時間需求。整個 NERD 框架構建在成熟的神經網路框架Chainer 上,因此可以利用其特性來靈活地設計卷積模型。透過提出的可插拔設計,NERD 可以整合現有深度學習框架中的層,包含用高階語言定義的常規層,還有自定義或用 C/C++ 語言來實現的輕量級神經網路中的客製化層。該方法可以為嵌入式設備設計模型並產生相應的可執行程序碼。NERD 還提供模型在嵌入式設備上的性能測量,從而簡化整個模型佈建到嵌入式終端設備的設計過程。我們根據內存將資源有限的嵌入式設備分成三種不同的等級,且為每種等級分別提出不同的計算方式, 適用 MB 等級設備的分佈式推斷,為 KB 等級設備提出的二分分類器,以及用多個二分分類器來為 sub-KB 等級設備解決多分類問題。需要注意的是,適用於某一層級記憶體的方法可以適用於更高層級的記憶體裝置。例如,適用於具有 KB 等級記憶體裝置的方法可以被應用到具有 MB 級記憶體的裝置上。我們實作了 NERD 框架,並且以影像辨識和聲音辨識應用設計實驗,驗證了框架適用於環境感測應用。實驗結果表明,我們的結果表明,NERD 可以產生放置在具有 8KB 內存的 16 位元微控製器 TI 平台上的模型,其只需要 0.51KB 內存來緩衝運行時數據來完成三分類問題,且有高於 80% 的準確率。我們希望關於 NERD 的研究可以為同一領域的其他研究帶來啟發。

    With the widespread adoption of deep learning, IoT applications are becoming more intelligent. While this enhances the functionalities of IoT devices, the demand for running machine learning models on resource-constrained devices has surged. The conflicting nature of these resource demands poses a significant challenge when deploying computationally intensive deep learning on embedded devices with limited resources. In this thesis, we introduce a software framework called NERD, designed to facilitate the creation of CNN models tailored to different scales of embedded devices with limited memory and time constraints for specific applications. NERD is built upon the Chainer neural network to design CNN models. By proposing a pluggable design, NERD can integrate layers from existing frameworks, including those implemented in regular high-level languages or custom lightweight C/C++. This approach enables the design of the model and generation of executable code for the embedded device. NERD also provides measurements for both model performance and on-device performance, streamlining the model design process. We categorize target resource-constrained devices into three scales based on their memory size and provide three design approaches for them: distributed inference, binary classification, and using multi-binary classification to approximate multiclass classification. Importantly, methods that work for one level of memory can be applied to devices with a higher level of memory. For example, methods applicable to devices with KB-level memory can also be applied to those with MB-level memory. We have prototyped the NERD framework and built environmental sensing applications for image recognition and sound classification. Our results demonstrate that the built model can use as little as 0.51 KB of SRAM to buffer the required runtime data, achieving acceptable accuracy (above 80%) within soft real-time on the TI 16-bit microcontroller platform. We hope that this work serves as inspiration for others in the same research field.

    摘要i Abstract iii 誌謝v Table of Contents vii List of Tables ix List of Figures x Chapter 1 Introduction 1 1.1 Background and Motivation 3 1.2 Review of Literature 6 1.2.1 Existing Optimizations for CNNs Design 6 1.2.2 Developing CNNs for Embedded Hardware 9 1.2.3 Limitation of Existing Works 11 1.3 Contributions of the Thesis 14 1.4 Organization of the Thesis 15 Chapter 2 NERD 17 2.1 The Key Concept of NERD 17 2.2 The Architecture of NERD 18 2.3 NERD Training and Inference 20 2.4 Overhead of porting a new network on to the resource-limited platforms 23 2.5 The Results of Models Designed by NERD 24 Chapter 3 NERD for Distributed Inference 26 3.1 Distributed Computing Scheme 26 3.2 System Architecture of NERD for distributed inference 28 3.2.1 Server Module 28 3.2.2 Device Module 30 3.2.3 Communication Scheme 31 3.3 Experiments of Distributed Inference 32 3.3.1 Model design for surveillance system and learn lesson 34 3.3.2 The impact of different entropy threshold impact 35 Chapter 4 NERD for Binary Classification 38 4.1 System Architecture of NERD for Binary Classifier 39 4.2 Optimizations of model design for embedded devices 40 4.3 Experiments and Results of Binary Classification 41 4.3.1 Analysis of Classification Problem Complexity versus Resource Utilization 45 4.3.2 Validation of Our Proposed Framework 46 4.3.3 Case Study 1: Poacher Detection 48 4.3.4 Case Study 2: Cicada Detection 51 Chapter 5 NERD for Multiclass Classification 54 5.1 Decomposing Multiple Classification with Binary Classifiers 55 5.2 Early-exit OVA Scheme 56 5.3 Transfer Learning 57 5.4 System Architecture of NERD for Multiclass Classification 58 5.5 Experiments of Multi-binary Classification 60 5.5.1 Experiments setup 60 5.5.2 Case Study 1: Poacher Detection 61 5.5.3 Case Study 2: Acoustic Monitoring 65 Chapter 6 Conclusion 67 References 69

    [1] Ikkyu Aihara, Daichi Kominami, Yasuharu Hirano, and Masayuki Murata. Mathematical modelling and application of frog choruses as an autonomous distributed communication system. Royal Society open science, 6(1):181117, 2019.
    [2] Eyuel D Ayele, Nirvana Meratnia, and Paul JM Havinga. Towards a new opportunistic iot network architecture for wildlife monitoring system. In Proceedings of the 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pages 1–5, 2018.
    [3] G Naveen Balaji, V Nandhini, S Mithra, N Priya, and R Naveena. Advanced crop monitoring using internet of things based smart intrusion & prevention in agricultural land. International Journal of Trend in Scientific Research and Development, 2(2):1348–1352, 2018.
    [4] Simone Bianco, Remi Cadene, Luigi Celona, and Paolo Napoletano. Benchmark analysis of representative deep neural network architectures. IEEE Access, 6:64270–64277, 2018.
    [5] Julie Bort. The Google Brain is a real thing but very few people have seen it. https://www.businessinsider.com/what-is-google-brain-2016-9, 2016.
    [6] Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware, 2018. CoRR abs/1812.00332.
    [7] Mu-Hsuan Cheng, Qihui Sun, and Chia-Heng Tu. An adaptive computation framework of distributed deep learning models for internet-of-things applications. In Proceedings of the IEEE 24th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 85–91, 2018.
    [8] cnet contributors. cnet. https://github.com/yang-le/cnet?fbclid=IwAR1Pi6-lEiBthjgC6hFIgypvShR9FYQJo15yVuMzvVgTkXm6H1DO471L7Lg, 2020.
    [9] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016.
    [10] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 248–255, 2009.
    [11] Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, and Nando De Freitas. Predicting parameters in deep learning. In Proceedings of the Advances in neural information processing systems, pages 2148–2156, 2013.
    [12] Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural architecture search: A survey, 2018. CoRR abs/1808.05377.
    [13] Julian Faraone, Nicholas Fraser, Michaela Blott, and Philip HW Leong. Syq: Learning symmetric quantization for efficient deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4300–4309, 2018.
    [14] Eduardo Fonseca, Manoj Plakal, Daniel PW Ellis, Frederic Font, Xavier Favory, and Xavier Serra. Learning sound event classifiers from web audio with noisy labels. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 21–25, 2019.
    [15] GigaDevice. GD32 RISC-V Microcontrollers. https://www.gigadevice.com/products/microcontrollers/gd32/risc-v/, 2019.
    [16] Stefano Giordano, Ilias Seitanidis, Mike Ojo, Davide Adami, and Fabio Vignoli. Iot solutions for crop protection against wild animal attacks. In Proceedings of the IEEE International Conference on Environmental Engineering (EE), pages 1–5, 2018.
    [17] Graham Gobieski, Brandon Lucia, and Nathan Beckmann. Intelligence beyond the edge: Inference on intermittent embedded systems. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 199–213, 2019.
    [18] Google LLC. Vision api - image content analysis, cloud vision api, google cloud. https://cloud.google.com/vision/, 2018.
    [19] Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, and Prateek Jain. Protonn: Compressed and accurate knn for resource-scarce devices. In Proceedings of the International Conference on Machine Learning, pages 1331–1340, 2017.
    [20] Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S. Ryoo, and Hyesoon Kim. Musical chair: Efficient real-time recognition using collaborative iot devices. CoRR, abs/1802.02138, 2018.
    [21] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 2015. CoRR abs/1510.00149.
    [22] Charles R Harris, K Jarrod Millman, Stéfan J Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, et al. Array programming with numpy. Nature, 585(7825):357–362, 2020.
    [23] Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pages 784–800, 2018.
    [24] Diyi Hu and Bhaskar Krishnamachari. Fast and accurate streaming cnn inference via communication compression on the edge. In Proceedings of the Fifth IEEE/ACM International Conference on Internet-of-Things Design and Implementation (IoTDI), pages 157–163. IEEE, 2020.
    [25] Vasu Jindal. Mobilesoft: U: A deep learning framework to monitor heart rate during intensive physical exercise. https://src.acm.org/binaries/content/assets/src/2016/vasujindal.pdf, 2016.
    [26] Norm Jouppi. https://cloudplatform.googleblog.com/2016/05/google-supercharges-machine-learning-tasks-with-custom-chip.html, 2016.
    [27] Nikhil Ketkar. Introduction to keras. Deep learning with python: a hands-on introduction, pages 97–111, 2017.
    [28] Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. CoRR, abs/1511.06530, 2015.
    [29] Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. In Proceedings of the 4th International IEEE Workshop on 3D Representation and Recognition (3dRR), Sydney, Australia, 2013.
    [30] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The cifar-10 dataset. https://www.cs.toronto.edu/~kriz/cifar.html, 2019.
    [31] Kumar, Karthik, and Yung-Hsiang Lu. Cloud computing for mobile users: Can offloading computation save energy? Computer, 43(4):51–56, 2010.
    [32] Ashish Kumar, Saurabh Goyal, and Manik Varma. Resource-efficient machine learning in 2 kb ram for the internet of things. In Proceedings of the 34th International Conference on Machine Learning, pages 1935–1944, 2017.
    [33] Silicon Labs. EFM32TM Gecko 32-bit Microcontroller. https://www.silabs.com/mcu/32-bit/efm32-gecko, 2019.
    [34] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    [35] Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34, 2018.
    [36] Qing Lu, Weiwen Jiang, Xiaowei Xu, Yiyu Shi, and Jingtong Hu. On neural architecture search for resource-constrained hardware platforms. In Proceedings of the 38th International Conference On Computer Aided Design (ICCAD), 2019.
    [37] Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and Yiran Chen. Modnn: Local distributed mobile computing system for deep neural network. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1396–1401. IEEE, 2017.
    [38] Yoshitomo Matsubara, Marco Levorato, and Francesco Restuccia. Split computing and early exiting for deep learning applications: Survey and research challenges. CoRR, abs/2103.04505, 2021.
    [39] Bradley McDanel, Surat Teerapittayanon, and Hsiang-Tsung Kung. Embedded binarized neural networks. In Proceedings of the International Conference on Embedded Wireless Systems and Networks, pages 168–173, 2017.
    [40] Bradley McDanel, Surat Teerapittayanon, and HT Kung. Embedded binarized neural networks, 2017. CoRR abs/1709.02260.
    [41] Preferred Networks. Chianer: A powerful, flexible, and intuitive framework for neural networks. https://chainer.org, 2019.
    [42] Behnam Neyshabur, Hanie Sedghi, and Chiyuan Zhang. What is being transferred in transfer learning? In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 512–523. Curran Associates, Inc., 2020.
    [43] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
    [44] Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean. Efficient neural architecture search via parameter sharing, 2018. CoRR abs/1802.03268.
    [45] Karol J Piczak. Esc: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia, pages 1015–1018, 2015.
    [46] UK Biodiversity Action Plan. Uk bap priority species. https://jncc.gov.uk/our-work/uk-bap-priority-species/, 2007.
    [47] Sujith Ravi. Projectionnet: Learning efficient on-device deep networks using neural projections, 2017. CoRR abs/1708.00630.
    [48] Khatal Sachin Sahebrao. Gps/gsm based animal tracking and health monitoring system. International Journal Of Engineering, Education And Technology (ARDIJEET), 3(2):1–4, 2015.
    [49] Mahadev Satyanarayanan. The emergence of edge computing. Computer, 50(1):30–39, 2017.
    [50] Simone Scardapane, Michele Scarpiniti, Enzo Baccarelli, and Aurelio Uncini. Why should we add early exits to neural networks? Cognitive Computation, 12(5):954–966, 2020.
    [51] Dialog semiconductor. Animal Tracker / Location Tracker. https://www.dialog-semiconductor.com/application-solution-diagram/iot/connected-consumer/animal-tracker-location-tracker, 2019.
    [52] SRIPAAD SRINIVASAN. audiomnist. https://www.kaggle.com/datasets/sripaadsrinivasan/audio-mnist, 2020.
    [53] STMicroelectronics. STM32 Ultra Low Power MCUs. https://www.st.com/en/microcontrollers-microprocessors/stm32-ultra-low-power-mcus.html, 2019.
    [54] STMicroeletronics. Getting started with x-cube-ai expansion package for artificial intelligence (ai). https://www.st.com/resource/en/user_manual/dm00570145-getting-started-with-xcubeai-expansion-package-for-artificial-intelligence-ai-stmicroelectronics. pdf, 2019.
    [55] Bharath Sudharsan, John G Breslin, and Muhammad Intizar Ali. Ml-mcu: A framework to train ml classifiers on mcu-based iot edge devices. IEEE Internet of Things Journal, 9(16):15007–15017, 2021.
    [56] QiHui Sun, Mei-Lan Lin, and Chia-Heng Tu. Designing multi-class classifiers for sub-ma microcontroller platforms. In 2023 International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan), pages 131–132. IEEE, 2023.
    [57] Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12):2295–2329, 2017.
    [58] Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2820–2828, 2019.
    [59] Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. Distributed deep neural networks over the cloud, the edge and end devices. In Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, pages 328–339, 2017.
    [60] TensorFlow. Deploy machine learning models on mobile and iot devices. https://www.tensorflow.org/lite, 2019.
    [61] TensorFlow. Tensorflow lite for microcontrollers. https://www.tensorflow.org/lite/microcontrollers, 2020.
    [62] Chia-Heng Tu, Qihui Sun, and Hsiao-Hsuan Chang. Rap: a software framework of developing convolutional neural networks for resource-constrained devices using environmental monitoring as a case study. ACM Transactions on Cyber-Physical Systems (TCPS), 5(4):1–28, 2021.
    [63] Chia-Heng Tu, QiHui Sun, and Mu-Hsuan Cheng. On designing the adaptive computation framework of distributed deep learning models for internet-of-things applications. The Journal of Supercomputing, 77:13191–13223, 2021.
    [64] Jack Turner, José Cano, Valentin Radu, Elliot J Crowley, Michael O'Boyle, and Amos Storkey. Characterising across-stack optimisations for deep convolutional neural networks. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pages 101–110, 2018.
    [65] Thomas J. Walker and Teresa Marie Yawn. Singing insects of north america(sina) collection. https://orthsoc.org/sina/, 2019.
    [66] Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuan-dong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware effi-cient convnet design via differentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10734–10742, 2019.
    [67] Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. Netadapt: Platform-aware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision (ECCV), pages 285–300, 2018.
    [68] Shanhe Yi, Zijiang Hao, Zhengrui Qin, and Qun Li. Fog computing: Platform and applications. In Proceedings of the IEEE Workshop on Hot Topics in Web Systems and Technologies, pages 73–78, 2015.
    [69] Y.SAISHO. Cicada songs. http://zikade.world.coocan.jp/cicadasongaac_e.html, 2020.
    [70] Shuai Zhang, Sheng Zhang, Zhuzhong Qian, Jie Wu, Yibo Jin, and Sanglu Lu. Deep-slicing: Collaborative and adaptive cnn inference with low latency. Proceedings of theIEEE Transactions on Parallel and Distributed Systems, 2021.
    [71] Zhihe Zhao, Kai Wang, Neiwen Ling, and Guoliang Xing. Edgeml: An automl framework for real-time deep learning on the edge. In Proceedings of the International Conference on Internet-of-Things Design and Implementation, pages 133–144, 2021.
    [72] Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning, 2016. CoRR abs/1611.01578.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE