Code Llama: Open Foundation Models for Code

AI-generated keywords: Code Llama Language Model Performance Instruction-Following Synthesis

AI-generated Key Points

Code Llama is a family of large language models for code
Offers state-of-the-art performance, infilling capabilities, support for large input contexts, and the ability to follow instructions for programming tasks
Available in multiple flavors including foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct)
All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens
Introduces a dedicated long context fine-tuning (LCFT) stage to handle long sequences effectively
LCFT stage modifies the rotation frequencies of the rotary position embedding used in the foundation model
Instruction fine-tuned models (Code Llama - Instruct) are based on Code Llama and trained to answer questions appropriately
Trained on a proprietary dataset that combines supervised fine-tuning and rejection sampling examples collected through reinforcement learning from human feedback
Capable of performing various code understanding and synthesis tasks such as code summarization, refinement, translation, bug fixing, build error fixing, and solving math problems
Outperforms other publicly available models on various benchmarks
Released under a permissive license allowing for both research and commercial use.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve

arXiv: 2308.12950v2 - DOI (cs.CL)

License: CC BY 4.0

Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama reaches state-of-the-art performance among open models on several code benchmarks, with scores of up to 53% and 55% on HumanEval and MBPP, respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, and all our models outperform every other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows for both research and commercial use.

Submitted to arXiv on 24 Aug. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2308.12950v2

Comprehensive Summary
Key points
Layman's Summary
Blog article

Code Llama is a family of large language models for code that offers state-of-the-art performance, infilling capabilities, support for large input contexts and the ability to follow instructions for programming tasks. The models are available in multiple flavors including foundation models (Code Llama), Python specializations (Code Llama - Python) and instruction-following models (Code Llama - Instruct) with varying parameters. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. To effectively handle long sequences, Code Llama introduces a dedicated long context fine-tuning (LCFT) stage. During this stage, the models are trained by modifying the rotation frequencies of the rotary position embedding used in the foundation model. By increasing the base period from 10k to 1 million for fine-tuning, Code Llama gains long-range capabilities and reduces bias towards short-distance attention. The instruction fine-tuned models (Code Llama - Instruct) are based on Code Llama and trained to answer questions appropriately. They are trained on a proprietary dataset that combines supervised fine-tuning and rejection sampling examples collected through reinforcement learning from human feedback. This dataset enables Code Llama to inherit instruction following and safety properties from its predecessor model, Llama 2. In addition to program synthesis and infilling tasks, Code Llama is also capable of performing various code understanding and synthesis tasks such as code summarization, refinement, translation, bug fixing, build error fixing as well as solving math problems. Overall, Code Llama provides advanced capabilities for code related tasks and outperforms other publicly available models on various benchmarks. It is released under a permissive license that allows for both research and commercial use.

- Code Llama is a family of large language models for code
- Offers state-of-the-art performance, infilling capabilities, support for large input contexts, and the ability to follow instructions for programming tasks
- Available in multiple flavors including foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct)
- All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens
- Introduces a dedicated long context fine-tuning (LCFT) stage to handle long sequences effectively
- LCFT stage modifies the rotation frequencies of the rotary position embedding used in the foundation model
- Instruction fine-tuned models (Code Llama - Instruct) are based on Code Llama and trained to answer questions appropriately
- Trained on a proprietary dataset that combines supervised fine-tuning and rejection sampling examples collected through reinforcement learning from human feedback
- Capable of performing various code understanding and synthesis tasks such as code summarization, refinement, translation, bug fixing, build error fixing, and solving math problems
- Outperforms other publicly available models on various benchmarks
- Released under a permissive license allowing for both research and commercial use.

Code Llama is a special computer program that helps with coding tasks. It can understand and write different programming languages. It comes in different versions for different purposes, like general coding (Code Llama), Python coding (Code Llama - Python), and following instructions (Code Llama - Instruct). The program has been trained on long sequences of code and can handle large amounts of information effectively. It can do many things like summarizing code, fixing errors, and solving math problems. Code Llama is better than other similar programs and can be used for research or business purposes." Definitions- Language models: Computer programs that understand and generate human language. - Performance: How well something works or performs. - Infilling capabilities: The ability to fill in missing parts or gaps. - Input contexts: The information or data given to a computer program as input. - Programming tasks: Activities related to writing computer programs. - Fine-tuning: Making small adjustments to improve the performance of a model. - Rotary position embedding: A technique used to represent the position of words or tokens in a sequence. - Supervised fine-tuning: Training a model using examples where the correct answer is known. - Rejection sampling: A method used to generate new examples by rejecting some samples based on certain criteria. - Benchmarks: Standards or tests used to compare the performance of different models or systems. - Permissive license: A type of license that allows others to use, modify, and distribute software

Introducing Code Llama: A Family of Large Language Models for Code

Artificial intelligence (AI) has been making great strides in recent years, and the field of natural language processing (NLP) is no exception. Researchers have developed powerful models that can understand and generate human-like language, but what about code? In a new paper, researchers from Google Brain introduce Code Llama – a family of large language models for code that offers state-of-the-art performance on various tasks.

Code Llama Foundation Models

The foundation model is the base version of Code Llama which consists of two components: a rotary position embedding (RPE) and an attention mechanism. The RPE encodes information about the relative positions between tokens in the input sequence into its representation while the attention mechanism allows it to focus on specific parts of the sequence. To effectively handle long sequences, Code Llama introduces a dedicated long context fine-tuning (LCFT) stage during training. This involves modifying the rotation frequencies used by RPE so that it can better capture longer range dependencies between tokens in larger inputs.

Python Specializations

In addition to its foundation model, Code Llama also comes with Python specializations which are tailored specifically for programming tasks involving Python code. These models are trained on sequences up to 16k tokens and show improvements on inputs with up to 100k tokens compared to other publicly available models.

Instruction Following Models

Another flavor of Code Llama is its instruction following model (Code Llama - Instruct). This variant is based on its predecessor model, Llamas 2, and uses proprietary datasets consisting of supervised fine-tuning examples as well as rejection sampling collected through reinforcement learning from human feedback. As such, this model inherits instruction following and safety properties from Llamas 2 while being able to perform various code understanding and synthesis tasks such as summarization, refinement, translation, bug fixing etc., outperforming other publicly available models on various benchmarks.

Conclusion

All in all, Code Llama provides advanced capabilities for code related tasks thanks to its multiple flavors including foundation models (Code Llama), Python specializations (CodeLlama - Python) as well as instruction following models (CodeLlma - Instruct). It is released under a permissive license allowing both research and commercial use so developers can take advantage of these powerful tools without worrying about legal issues or licensing fees!

Created on 20 Nov. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

70.5%

Effective Long-Context Scaling of Foundation Models

cs.CL

69.0%

LLaMA: Open and Efficient Foundation Language Models

cs.CL

66.1%

Textbooks Are All You Need II: phi-1.5 technical report

cs.CL

65.9%

PaLM: Scaling Language Modeling with Pathways

cs.CL

65.3%

PaLM 2 Technical Report

cs.CL

65.1%

Zephyr: Direct Distillation of LM Alignment

cs.LG

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.