LLaMA: Open and Efficient Foundation Language Models
AI-generated Key Points
- Thibaut Lavril, Giza Cardozo, Éric Grave, and Guillaume Lample introduce the LLaMA collection of foundation language models
- Models range from 7B to 65B parameters and are trained on trillions of tokens using publicly available datasets exclusively
- State-of-the-art models can be trained without proprietary and inaccessible datasets
- A smaller model trained for longer can ultimately be cheaper at inference
- The focus is to train language models that achieve the best possible performance at various inference budgets by training on more tokens than what is typically used
- LLaMA models outperform existing large language models (LLMs) such as GPT-3 on most benchmarks despite being smaller in size
- All their models are released to the research community and use only publicly available data sources for training
- Compatible with open-sourcing and democratizes access to and study of LLMs
- Modifications made to the transformer architecture (Vaswani et al., 2017) and their training method are presented
- Performance of their models compared with other LLMs on a set of standard benchmarks is reported
- Biases and toxicity encoded in their models using some of the latest responsible AI benchmarks are exposed
Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
Abstract: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.