Moccasin: Efficient Tensor Rematerialization for Neural Networks

AI-generated keywords: Tensor Rematerialization

AI-generated Key Points

Deployment and training of neural networks on edge computing devices present challenges due to low memory nature of these devices
Tensor rematerialization or recompute is used to address high memory requirements for neural network training and inference
MOCCASIN is a new constraint programming formulation that minimizes execution time of compute graphs subject to a memory budget
MOCCASIN has only O(n) integer variables, which is a significant improvement over recent literature that proposes formulations with O(n^2) Boolean variables
Retention interval formulation for rematerialization simplifies problem formulation greatly by defining output retention intervals for each node in the computation graph
Parameter Cv defines the maximum number of times a node v can be computed in the final sequence, and this simple complexity reduction retains solution quality even for very small values of Cv
MOCCASIN is up to an order of magnitude faster than recent work, especially for large-scale graphs
Empirical results demonstrate MOCCASIN's effectiveness compared to other recent works while highlighting its scalability to larger graphs.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Burak Bartan, Haoming Li, Harris Teague, Christopher Lott, Bistra Dilkina

arXiv: 2304.14463v1 - DOI (cs.LG)

License: CC BY 4.0

Abstract: The deployment and training of neural networks on edge computing devices pose many challenges. The low memory nature of edge devices is often one of the biggest limiting factors encountered in the deployment of large neural network models. Tensor rematerialization or recompute is a way to address high memory requirements for neural network training and inference. In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget. In particular, we develop a new constraint programming formulation called \textsc{Moccasin} with only $O(n)$ integer variables, where $n$ is the number of nodes in the compute graph. This is a significant improvement over the works in the recent literature that propose formulations with $O(n^2)$ Boolean variables. We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.

Submitted to arXiv on 27 Apr. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2304.14463v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

The deployment and training of neural networks on edge computing devices present various challenges, with the low memory nature of these devices being one of the most significant limiting factors. To address high memory requirements for neural network training and inference, tensor rematerialization or recompute is often used. In this paper titled "MOCCASIN: Efficient Tensor Rematerialization for Neural Networks," the authors consider the problem of execution time minimization of compute graphs subject to a memory budget. The authors introduce MOCCASIN, a new constraint programming formulation that addresses the high memory requirements for neural network training and inference. MOCCASIN has only O(n) integer variables, where n is the number of nodes in the compute graph, which is a significant improvement over recent literature that proposes formulations with O(n^2) Boolean variables. The authors demonstrate through numerical studies that their approach is up to an order of magnitude faster than recent work, especially for large-scale graphs. One key contribution of this paper is introducing retention interval formulation for rematerialization. This concept simplifies problem formulation greatly by defining output retention intervals for each node in the computation graph. Additionally, they introduce parameter Cv that defines the maximum number of times a node v can be computed in the final sequence. They demonstrate empirically that this simple complexity reduction retains solution quality even for very small values of Cv. The authors also provide a comparison between their solution speed and CHECKMATE's solution speed while demonstrating equivalence between solutions. Furthermore, they show how different local memory limits impact solution speed and final solution value. Overall, MOCCASIN provides an efficient way to address high memory requirements when deploying large neural network models on edge computing devices. The retention interval formulation introduced in this paper simplifies problem formulation significantly while retaining solution quality even with reduced complexity. The empirical results presented in this paper demonstrate MOCCASIN's effectiveness compared to other recent works while highlighting its scalability to larger graphs.

- Deployment and training of neural networks on edge computing devices present challenges due to low memory nature of these devices
- Tensor rematerialization or recompute is used to address high memory requirements for neural network training and inference
- MOCCASIN is a new constraint programming formulation that minimizes execution time of compute graphs subject to a memory budget
- MOCCASIN has only O(n) integer variables, which is a significant improvement over recent literature that proposes formulations with O(n^2) Boolean variables
- Retention interval formulation for rematerialization simplifies problem formulation greatly by defining output retention intervals for each node in the computation graph
- Parameter Cv defines the maximum number of times a node v can be computed in the final sequence, and this simple complexity reduction retains solution quality even for very small values of Cv
- MOCCASIN is up to an order of magnitude faster than recent work, especially for large-scale graphs
- Empirical results demonstrate MOCCASIN's effectiveness compared to other recent works while highlighting its scalability to larger graphs.

For a six-year-old kid: Neural networks are like brains that help computers learn and do things. Sometimes they need to be trained on small devices with not much memory, which can be tricky. Scientists made a new way to train them called MOCCASIN, which helps them work faster and use less memory. They also figured out a way to simplify the problem by only letting each part of the brain be used a certain number of times. This new way works really well and is better than other ways people have tried before. Definitions - Neural networks: computer programs that learn from data and can perform tasks - Edge computing devices: small devices that can run programs without needing to connect to the internet or other computers - Tensor rematerialization/recompute: a technique for reducing memory usage when training neural networks - Constraint programming formulation: a mathematical approach for solving problems with constraints (rules) - Computation graph: a visual representation of how data flows through a neural network - Integer variables/Boolean variables: types of numbers used in math equations - Retention interval formulation: defining how long output from each part of the neural network should be kept for future use

MOCCASIN: Efficient Tensor Rematerialization for Neural Networks

Deploying and training neural networks on edge computing devices present various challenges, with the low memory nature of these devices being one of the most significant limiting factors. To address high memory requirements for neural network training and inference, tensor rematerialization or recompute is often used. In this paper titled "MOCCASIN: Efficient Tensor Rematerialization for Neural Networks," the authors consider the problem of execution time minimization of compute graphs subject to a memory budget.

Introduction

The authors introduce MOCCASIN, a new constraint programming formulation that addresses the high memory requirements for neural network training and inference. MOCCASIN has only O(n) integer variables, where n is the number of nodes in the compute graph, which is a significant improvement over recent literature that proposes formulations with O(n^2) Boolean variables.

Retention Interval Formulation

One key contribution of this paper is introducing retention interval formulation for rematerialization. This concept simplifies problem formulation greatly by defining output retention intervals for each node in the computation graph. Additionally, they introduce parameter Cv that defines the maximum number of times a node v can be computed in the final sequence. They demonstrate empirically that this simple complexity reduction retains solution quality even for very small values of Cv.

Comparison to CHECKMATE

The authors provide a comparison between their solution speed and CHECKMATE's solution speed while demonstrating equivalence between solutions. Furthermore, they show how different local memory limits impact solution speed and final solution value.

Conclusion

Overall, MOCCASIN provides an efficient way to address high memory requirements when deploying large neural network models on edge computing devices. The retention interval formulation introduced in this paper simplifies problem formulation significantly while retaining solution quality even with reduced complexity. The empirical results presented in this paper demonstrate MOCCASIN's effectiveness compared to other recent works while highlighting its scalability to larger graphs

Created on 23 May. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

50.1%

Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem …

cs.AI

47.5%

HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Dev…

cs.AR

47.0%

Optimizing Memory Mapping Using Deep Reinforcement Learning

cs.PF

46.5%

Fast and Slow Planning

cs.AI

46.5%

A decomposition strategy for decision problems with endogenous uncertainty us…

math.OC

46.4%

Efficiently Scaling Transformer Inference

cs.LG

46.1%

DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN…

cs.AR

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.