OneBit: Towards Extremely Low-bit Large Language Models

AI-generated keywords: Large Language Models 1-bit quantization OneBit model compression performance evaluation

AI-generated Key Points

  • The paper introduces a novel approach to model quantification by quantizing weight matrices of Large Language Models (LLMs) to 1-bit
  • Aims to reduce storage and computational overheads in deploying LLMs
  • OneBit includes a unique 1-bit parameter representation method and effective parameter initialization technique based on matrix decomposition
  • Experimental results show that OneBit achieves impressive performance, with at least 83% of the non-quantized performance even with 1-bit weight matrices
  • Evaluation experiment demonstrates effectiveness through perplexity and zero-shot accuracy metrics on datasets like WikiText2 and C4
  • Lower perplexity values indicate better preservation of the output distribution, while high accuracies in zero-shot tasks highlight robustness of compressed models
  • Outperforms existing quantization techniques for extremely low bit-width deployment of LLMs
  • Presents a promising solution for deploying highly anticipated LLMs with reduced bit-width values without compromising performance
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che

15 pages, 6 figures, 5 tables
License: CC BY 4.0

Abstract: Model quantification uses low bit-width values to represent the weight matrices of models, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs. However, existing quantization methods suffer severe performance degradation when the bit-width is extremely reduced, and thus focus on utilizing 4-bit or 8-bit values to quantize models. This paper boldly quantizes the weight matrices of LLMs to 1-bit, paving the way for the extremely low bit-width deployment of LLMs. For this target, we introduce a 1-bit quantization-aware training (QAT) framework named OneBit, including a novel 1-bit parameter representation method to better quantize LLMs as well as an effective parameter initialization method based on matrix decomposition to improve the convergence speed of the QAT framework. Sufficient experimental results indicate that OneBit achieves good performance (at least 83% of the non-quantized performance) with robust training processes when only using 1-bit weight matrices.

Submitted to arXiv on 17 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.11295v1

The paper "OneBit: Towards Extremely Low-bit Large Language Models" introduces a novel approach to model quantification by boldly quantizing the weight matrices of Large Language Models (LLMs) to 1-bit. This groundbreaking method aims to significantly reduce storage and computational overheads in deploying LLMs. The proposed 1-bit quantization-aware training (QAT) framework named OneBit includes a unique 1-bit parameter representation method and an effective parameter initialization technique based on matrix decomposition. Experimental results demonstrate that OneBit achieves impressive performance, with at least 83% of the non-quantized performance, even when using only 1-bit weight matrices. The evaluation experiment showcases the effectiveness of OneBit by reporting perplexity and zero-shot accuracy metrics on datasets like WikiText2 and C4. Lower perplexity values indicate better preservation of the output distribution of the original model, while high accuracies in zero-shot tasks such as Winograde, HellaSwag, PIQA, and BoolQ highlight the robustness of the compressed models. Furthermore, the study analyzes OneBit's ability to transfer knowledge from the original models and compares its performance with other methods. The results demonstrate that OneBit outperforms existing quantization techniques when it comes to extremely low bit-width deployment of LLMs. In conclusion, OneBit presents a promising solution for deploying highly anticipated LLMs with significantly reduced bit-width values without compromising performance. The innovative approach and robust experimental results make it a valuable contribution to the field of model quantification and optimization.
Created on 09 Jun. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.