Fused Depthwise Tiling for Memory Optimization in TinyML Deep Neural Network Inference

AI-generated keywords: Memory Optimization

AI-generated Key Points

Memory optimization for deep neural network (DNN) inference is crucial in TinyML.
TinyML involves deploying DNN inference tasks on tiny, low-power microcontrollers with limited memory.
Fused Depthwise Tiling (FDT) method has been proposed for the memory optimization of DNNs.
FDT reduces memory usage without inducing any run time overhead and applies to a larger variety of network layers than existing tiling methods that focus on convolutions.
An end-to-end flow with a new path discovery method has been proposed to identify the best tiling configuration for a given model.
FDT achieved significant memory reduction for two models by 76.2% and 18.1% where existing tiling methods could not be applied.
Five models from a wide range benefit from fused tiling: Keyword Spotting (KWS), Text Sentiment Analysis (TXT), Magic Wand (MW), PoseNet (POS), and MobileNet V2 SSDLite.
The proposed FDT method provides an effective solution to optimize memory usage in DNNs for TinyML applications.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Rafael Stahl, Daniel Mueller-Gritschneder, Ulf Schlichtmann

arXiv: 2303.17878v1 - DOI (cs.LG)

Accepted as a full paper by the TinyML Research Symposium 2023

License: CC BY 4.0

Abstract: Memory optimization for deep neural network (DNN) inference gains high relevance with the emergence of TinyML, which refers to the deployment of DNN inference tasks on tiny, low-power microcontrollers. Applications such as audio keyword detection or radar-based gesture recognition are heavily constrained by the limited memory on such tiny devices because DNN inference requires large intermediate run-time buffers to store activations and other intermediate data, which leads to high memory usage. In this paper, we propose a new Fused Depthwise Tiling (FDT) method for the memory optimization of DNNs, which, compared to existing tiling methods, reduces memory usage without inducing any run time overhead. FDT applies to a larger variety of network layers than existing tiling methods that focus on convolutions. It improves TinyML memory optimization significantly by reducing memory of models where this was not possible before and additionally providing alternative design points for models that show high run time overhead with existing methods. In order to identify the best tiling configuration, an end-to-end flow with a new path discovery method is proposed, which applies FDT and existing tiling methods in a fully automated way, including the scheduling of the operations and planning of the layout of buffers in memory. Out of seven evaluated models, FDT achieved significant memory reduction for two models by 76.2% and 18.1% where existing tiling methods could not be applied. Two other models showed a significant run time overhead with existing methods and FDT provided alternative design points with no overhead but reduced memory savings.

Submitted to arXiv on 31 Mar. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2303.17878v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

Memory optimization for deep neural network (DNN) inference is crucial in the emerging field of TinyML, which involves deploying DNN inference tasks on tiny, low-power microcontrollers. However, these devices have limited memory and are heavily constrained by the large intermediate run-time buffers required to store activations and other intermediate data during DNN inference. This leads to high memory usage, making it challenging to deploy applications such as audio keyword detection or radar-based gesture recognition on such devices. To address this challenge, a new Fused Depthwise Tiling (FDT) method has been proposed for the memory optimization of DNNs. Compared to existing tiling methods, FDT reduces memory usage without inducing any run time overhead and applies to a larger variety of network layers than existing tiling methods that focus on convolutions. It improves TinyML memory optimization significantly by reducing memory of models where this was not possible before and additionally providing alternative design points for models that show high run time overhead with existing methods. To identify the best tiling configuration for a given model, an end-to-end flow with a new path discovery method has been proposed. This approach applies FDT and existing tiling methods in a fully automated way, including scheduling operations and planning buffer layout in memory. Out of seven evaluated models, FDT achieved significant memory reduction for two models by 76.2% and 18.1% where existing tiling methods could not be applied. Two other models showed significant run time overhead with existing methods, but FDT provided alternative design points with no overhead but reduced memory savings. The study identified five models from a wide range that benefit from fused tiling: Keyword Spotting (KWS), Text Sentiment Analysis (TXT), Magic Wand (MW), PoseNet (POS), and MobileNet V2 SSDLite. All models were quantized to 8 bits, and performance was comparable across different architectures. In conclusion, the proposed FDT method provides an effective solution to optimize memory usage in DNNs for TinyML applications. The end-to-end flow with a new path discovery method enables fully automated tiling configuration and buffer layout planning, making it easier to deploy DNN inference tasks on tiny, low-power microcontrollers.

- Memory optimization for deep neural network (DNN) inference is crucial in TinyML.
- TinyML involves deploying DNN inference tasks on tiny, low-power microcontrollers with limited memory.
- Fused Depthwise Tiling (FDT) method has been proposed for the memory optimization of DNNs.
- FDT reduces memory usage without inducing any run time overhead and applies to a larger variety of network layers than existing tiling methods that focus on convolutions.
- An end-to-end flow with a new path discovery method has been proposed to identify the best tiling configuration for a given model.
- FDT achieved significant memory reduction for two models by 76.2% and 18.1% where existing tiling methods could not be applied.
- Five models from a wide range benefit from fused tiling: Keyword Spotting (KWS), Text Sentiment Analysis (TXT), Magic Wand (MW), PoseNet (POS), and MobileNet V2 SSDLite.
- The proposed FDT method provides an effective solution to optimize memory usage in DNNs for TinyML applications.

For a six-year-old kid - People want to make computers that are very small and use very little power. - These computers need to be able to do really complicated things, like recognizing speech or images. - A new way of organizing the information in these tasks has been invented that makes them take up less space in the computer's memory. - This new way doesn't make the computer slower, and it works for lots of different kinds of tasks. - The people who made this new way also figured out how to choose the best way to organize each task so that it takes up as little space as possible. Definitions - Memory optimization: making sure a computer program uses as little memory (space) as possible - Deep neural network (DNN): a type of computer program that can recognize patterns and make decisions based on them - Inference: using a DNN to make predictions or decisions based on input data - TinyML: using DNNs on very small and low-power devices - Microcontroller: a type of small computer used in many electronic devices

Memory Optimization for Deep Neural Network Inference in TinyML Applications

The emerging field of TinyML involves deploying deep neural network (DNN) inference tasks on tiny, low-power microcontrollers. However, these devices have limited memory and are heavily constrained by the large intermediate run-time buffers required to store activations and other intermediate data during DNN inference. This leads to high memory usage, making it challenging to deploy applications such as audio keyword detection or radar-based gesture recognition on such devices. To address this challenge, a new Fused Depthwise Tiling (FDT) method has been proposed for the memory optimization of DNNs.

Background

Memory optimization is crucial for TinyML applications that involve deploying DNN inference tasks on tiny, low-power microcontrollers. These devices have limited memory and are heavily constrained by the large intermediate run-time buffers required to store activations and other intermediate data during DNN inference. This leads to high memory usage, making it challenging to deploy applications such as audio keyword detection or radar-based gesture recognition on such devices.

Fused Depthwise Tiling Method

To address this challenge, a new Fused Depthwise Tiling (FDT) method has been proposed for the memory optimization of DNNs. Compared to existing tiling methods, FDT reduces memory usage without inducing any run time overhead and applies to a larger variety of network layers than existing tiling methods that focus on convolutions. It improves TinyML memory optimization significantly by reducing memory of models where this was not possible before and additionally providing alternative design points for models that show high run time overhead with existing methods. To identify the best tiling configuration for a given model, an end-to-end flow with a new path discovery method has been proposed. This approach applies FDT and existing tiling methods in a fully automated way, including scheduling operations and planning buffer layout in memory. Out of seven evaluated models, FDT achieved significant memory reduction for two models by 76.2% and 18

Created on 14 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

56.9%

DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN…

cs.AR

54.7%

HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Dev…

cs.AR

53.9%

OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for Experi…

cs.AR

53.6%

Efficiently Scaling Transformer Inference

cs.LG

53.5%

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

cs.LG

50.7%

LUT-NN: Towards Unified Neural Network Inference by Table Lookup

cs.LG

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.