LUT-NN: Towards Unified Neural Network Inference by Table Lookup
AI-generated Key Points
- Deep Neural Network (DNN) inference is computationally intensive and costly
- LUT-NN is a technique for DNN inference by table lookup
- LUT-NN learns typical features, called centroids, of each layer from training data
- Centroids are precomputed with model weights and saved in tables for future input
- LUT-NN achieves comparable accuracy (<5% difference) with original models on real complex datasets such as CIFAR, ImageNet, and GLUE
- LUT-NN simplifies computing operators to only two: closest centroid search and table lookup
- LUT-NN has been implemented for Intel and ARM CPUs reducing model size up to 3.5x for CNN models and 7x for BERT while achieving real speedup up to 7x for BERT and 2x for ResNet latency-wise.
- Current hardware design limitations result in lower speedup than theoretical results.
- The authors expect first-class table lookup support in future hardware designs to unleash the full potential of LUT-NN.
- The proposed approach simplifies DNN inference while maintaining high accuracy levels without requiring extensive system development or resource costs.
Authors: Xiaohu Tang, Yang Wang, Ting Cao, Li Lyna Zhang, Qi Chen, Deng Cai, Yunxin Liu, Mao Yang
Abstract: DNN inference requires huge effort of system development and resource cost. This drives us to propose LUT-NN, the first trial towards empowering deep neural network (DNN) inference by table lookup, to eliminate the diverse computation kernels as well as save running cost. Based on the feature similarity of each layer, LUT-NN can learn the typical features, named centroids, of each layer from the training data, precompute them with model weights, and save the results in tables. For future input, the results of the closest centroids with the input features can be directly read from the table, as the approximation of layer output. We propose the novel centroid learning technique for DNN, which enables centroid learning through backpropagation, and adapts three levels of approximation to minimize the model loss. By this technique, LUT-NN achieves comparable accuracy (<5% difference) with original models on real complex dataset, including CIFAR, ImageNet, and GLUE. LUT-NN simplifies the computing operators to only two: closest centroid search and table lookup. We implement them for Intel and ARM CPUs. The model size is reduced by up to 3.5x for CNN models and 7x for BERT. Latency-wise, the real speedup of LUT-NN is up to 7x for BERT and 2x for ResNet, much lower than theoretical results because of the current unfriendly hardware design for table lookup. We expect firstclass table lookup support in the future to unleash the potential of LUT-NN.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.