LUT-NN: Towards Unified Neural Network Inference by Table Lookup

AI-generated keywords: DNN Inference LUT-NN Centroid Learning Table Lookup Hardware Design

AI-generated Key Points

Deep Neural Network (DNN) inference is computationally intensive and costly
LUT-NN is a technique for DNN inference by table lookup
LUT-NN learns typical features, called centroids, of each layer from training data
Centroids are precomputed with model weights and saved in tables for future input
LUT-NN achieves comparable accuracy (<5% difference) with original models on real complex datasets such as CIFAR, ImageNet, and GLUE
LUT-NN simplifies computing operators to only two: closest centroid search and table lookup
LUT-NN has been implemented for Intel and ARM CPUs reducing model size up to 3.5x for CNN models and 7x for BERT while achieving real speedup up to 7x for BERT and 2x for ResNet latency-wise.
Current hardware design limitations result in lower speedup than theoretical results.
The authors expect first-class table lookup support in future hardware designs to unleash the full potential of LUT-NN.
The proposed approach simplifies DNN inference while maintaining high accuracy levels without requiring extensive system development or resource costs.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xiaohu Tang, Yang Wang, Ting Cao, Li Lyna Zhang, Qi Chen, Deng Cai, Yunxin Liu, Mao Yang

arXiv: 2302.03213v1 - DOI (cs.LG)

License: CC BY 4.0

Abstract: DNN inference requires huge effort of system development and resource cost. This drives us to propose LUT-NN, the first trial towards empowering deep neural network (DNN) inference by table lookup, to eliminate the diverse computation kernels as well as save running cost. Based on the feature similarity of each layer, LUT-NN can learn the typical features, named centroids, of each layer from the training data, precompute them with model weights, and save the results in tables. For future input, the results of the closest centroids with the input features can be directly read from the table, as the approximation of layer output. We propose the novel centroid learning technique for DNN, which enables centroid learning through backpropagation, and adapts three levels of approximation to minimize the model loss. By this technique, LUT-NN achieves comparable accuracy (<5% difference) with original models on real complex dataset, including CIFAR, ImageNet, and GLUE. LUT-NN simplifies the computing operators to only two: closest centroid search and table lookup. We implement them for Intel and ARM CPUs. The model size is reduced by up to 3.5x for CNN models and 7x for BERT. Latency-wise, the real speedup of LUT-NN is up to 7x for BERT and 2x for ResNet, much lower than theoretical results because of the current unfriendly hardware design for table lookup. We expect firstclass table lookup support in the future to unleash the potential of LUT-NN.

Submitted to arXiv on 07 Feb. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2302.03213v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

Deep Neural Network (DNN) inference is a computationally intensive task that requires significant system development and resource cost. To address this challenge, the authors propose LUT-NN, a technique for DNN inference by table lookup. LUT-NN eliminates the need for diverse computation kernels and reduces running costs by learning typical features, called centroids, of each layer from training data. These centroids are precomputed with model weights and saved in tables. For future input, the results of the closest centroids with the input features can be directly read from the table as an approximation of layer output. The authors introduce a novel centroid learning technique for DNN that enables centroid learning through backpropagation and adapts three levels of approximation to minimize model loss. By this technique, LUT-NN achieves comparable accuracy (<5% difference) with original models on real complex datasets such as CIFAR, ImageNet, and GLUE. LUT-NN simplifies computing operators to only two: closest centroid search and table lookup. LUT-NN has been implemented for Intel and ARM CPUs reducing model size up to 3.5x for CNN models and 7x for BERT while achieving real speedup up to 7x for BERT and 2x for ResNet latency-wise. However, current hardware design limitations result in lower speedup than theoretical results. The authors expect first-class table lookup support in future hardware designs to unleash the full potential of LUT-NN. The proposed approach has significant implications as it simplifies DNN inference while maintaining high accuracy levels without requiring extensive system development or resource costs.

- Deep Neural Network (DNN) inference is computationally intensive and costly
- LUT-NN is a technique for DNN inference by table lookup
- LUT-NN learns typical features, called centroids, of each layer from training data
- Centroids are precomputed with model weights and saved in tables for future input
- LUT-NN achieves comparable accuracy (<5% difference) with original models on real complex datasets such as CIFAR, ImageNet, and GLUE
- LUT-NN simplifies computing operators to only two: closest centroid search and table lookup
- LUT-NN has been implemented for Intel and ARM CPUs reducing model size up to 3.5x for CNN models and 7x for BERT while achieving real speedup up to 7x for BERT and 2x for ResNet latency-wise.
- Current hardware design limitations result in lower speedup than theoretical results.
- The authors expect first-class table lookup support in future hardware designs to unleash the full potential of LUT-NN.
- The proposed approach simplifies DNN inference while maintaining high accuracy levels without requiring extensive system development or resource costs.

LUT-NN is a way to make computers understand things better, but it can be very hard and expensive. LUT-NN helps by using tables to look up information instead of doing complicated calculations. The tables have important information that the computer learned from training data. This makes it easier for the computer to understand new things without having to do a lot of work each time. LUT-NN works well on big and complex datasets like pictures and language, and it doesn't need a lot of resources or special equipment.

Simplifying Deep Neural Network Inference with LUT-NN

Deep neural networks (DNNs) are powerful tools for machine learning, but their computationally intensive nature can be a challenge. To address this issue, researchers have proposed a technique called LUT-NN that simplifies DNN inference by table lookup. This approach eliminates the need for diverse computation kernels and reduces running costs by learning typical features of each layer from training data.

What is LUT-NN?

LUT-NN stands for Lookup Table Neural Network. It is a technique for DNN inference by table lookup which eliminates the need for diverse computation kernels and reduces running costs by learning typical features, called centroids, of each layer from training data. These centroids are precomputed with model weights and saved in tables. For future input, the results of the closest centroids with the input features can be directly read from the table as an approximation of layer output.

How Does it Work?

The authors introduce a novel centroid learning technique for DNN that enables centroid learning through backpropagation and adapts three levels of approximation to minimize model loss. By this technique, LUT-NN achieves comparable accuracy (<5% difference) with original models on real complex datasets such as CIFAR, ImageNet, and GLUE while simplifying computing operators to only two: closest centroid search and table lookup.

Benefits of Using LUT-N

LUT-N has been implemented for Intel and ARM CPUs reducing model size up to 3.5x for CNN models and 7x for BERT while achieving real speedup up to 7x for BERT and 2x for ResNet latency-wise. However, current hardware design limitations result in lower speedup than theoretical results; however first class table lookup support in future hardware designs could unlock its full potential according to the authors’ expectations.. The proposed approach has significant implications as it simplifies DNN inference while maintaining high accuracy levels without requiring extensive system development or resource costs .

Conclusion

In summary, researchers have developed a new technique called LUT-N which simplifies deep neural network inference by using table lookups instead of multiple computation kernels which significantly reduces running costs while maintaining high accuracy levels without requiring extensive system development or resource cost investments . This technology has already been implemented on Intel/ARM CPUs resulting in reduced model sizes up to 3.5x/7x respectively along with speedups up to 7x/2x respectively; however further improvements may be possible if first class table lookup support is added in future hardware designs unlocking its full potential according to author expectations

Created on 18 May. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

56.7%

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

cs.LG

55.4%

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG

54.0%

Efficiently Scaling Transformer Inference

cs.LG

53.6%

DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN…

cs.AR

53.0%

Deep Direct Volume Rendering: Learning Visual Feature Mappings From Exemplary…

cs.GR

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.