GLM-130B: An Open Bilingual Pre-trained Model

AI-generated keywords: GLM-130B Pre-trained Language Model Open Access Reproducibility Ethical Considerations

AI-generated Key Points

  • GLM-130B is a bilingual pre-trained language model with 130 billion parameters.
  • It aims to provide an open-source alternative to models like GPT-3 and promote openness and inclusivity in LLM research.
  • The reproducibility of GLM-130B is ensured through the release of code, training lessons, and the entire pre-training process.
  • GLM-130B can be efficiently run on popular GPUs like RTX 3090 and RTX 2080 Ti, making it accessible to most researchers.
  • The paper provides detailed information about the training process, including design choices, training strategies for efficiency and stability, and engineering efforts.
  • GLM-130B outperforms GPT-3 175B on various English benchmarks but does not show a performance advantage over OPT-175B and BLOOM-176B.
  • It consistently outperforms ERNIE TITAN 3.0 260B, the largest Chinese language model on related benchmarks.
  • One unique feature of GLM-130B is its ability to reach INT4 quantization without significant performance loss or specific training methods.
  • The paper emphasizes ethical considerations in LLM research while acknowledging potential risks associated with harmful applications of LLMs.
  • Promoting inclusivity and transparency can help address fairness bias privacy truthfulness issues by providing open access to LLMs.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, Jie Tang

47 pages
License: CC BY 4.0

Abstract: We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and disconvergence. In this paper, we introduce the training process of GLM-130B including its design choices, training strategies for both efficiency and stability, and engineering efforts. The resultant GLM-130B model offers significant outperformance over GPT-3 175B on a wide range of popular English benchmarks while the performance advantage is not observed in OPT-175B and BLOOM-176B. It also consistently and significantly outperforms ERNIE TITAN 3.0 260B -- the largest Chinese language model -- across related benchmarks. Finally, we leverage a unique scaling property of GLM-130B to reach INT4 quantization, without quantization aware training and with almost no performance loss, making it the first among 100B-scale models. More importantly, the property allows its effective inference on 4$\times$RTX 3090 (24G) or 8$\times$RTX 2080 Ti (11G) GPUs, the most ever affordable GPUs required for using 100B-scale models. The GLM-130B model weights are publicly accessible and its code, training logs, related toolkit, and lessons learned are open-sourced at https://github.com/THUDM/GLM-130B .

Submitted to arXiv on 05 Oct. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2210.02414v1

This paper introduces GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. The goal of GLM-130B is to provide an open-source alternative to models like GPT-3 and promote openness and inclusivity in large language model (LLM) research. The reproducibility of GLM-130B is ensured through the release of code, training lessons, and the entire pre-training process. Additionally, GLM-130B can be efficiently run on popular GPUs like RTX 3090 and RTX 2080 Ti, making it accessible to most researchers. The paper provides detailed information about the training process of GLM-130B, including design choices, training strategies for efficiency and stability, and engineering efforts. It highlights the challenges faced during the training process such as loss spikes and disconvergence. GLM-130B outperforms GPT-3 175B on various English benchmarks but does not show a performance advantage over OPT-175B and BLOOM-176B. It also consistently outperforms ERNIE TITAN 3.0 260B, the largest Chinese language model on related benchmarks. One unique feature of GLM-130B is its ability to reach INT4 quantization without quantization aware training or significant performance loss which allows for efficient inference on affordable GPUs like RTX 3090 or RTX 2080 Ti. The paper emphasizes the importance of ethical considerations in LLM research while acknowledging potential risks associated with LLMs being used for harmful applications. They argue that promoting inclusivity and transparency can lead to better defense against such harms by providing open access to LLMs which enables researchers to develop algorithms that can identify synthetic text as well as address fairness bias privacy truthfulness issues.
Created on 31 Oct. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.