Large Language Models for Compiler Optimization

AI-generated keywords: Large Language Models Code Optimization LLVM Assembly Instruction Counts Compiler

AI-generated Key Points

Large Language Models (LLMs) applied to code optimization
7B-parameter transformer model trained from scratch for LLVM assembly optimization
Model predicts instruction counts before and after optimization, as well as optimized code
Auxiliary learning tasks enhance model's performance and understanding
Achieves 3.0% improvement in reducing instruction counts compared to the compiler
Outperforms two state-of-the-art baselines requiring thousands of compilations
Strong code reasoning abilities: generates compilable code 91% of the time, emulates compiler output 70% of the time
Unique focus on optimizing code compared to other LLMs trained on source code for different tasks
Demonstrates potential of LLMs in improving code performance through automated optimizations

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chris Cummins, Volker Seeker, Dejan Grubisic, Mostafa Elhoushi, Youwei Liang, Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Kim Hazelwood, Gabriel Synnaeve, Hugh Leather

arXiv: 2309.07062v1 - DOI (cs.PL)

License: CC BY 4.0

Abstract: We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time.

Submitted to arXiv on 11 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.07062v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

This paper explores the novel application of Large Language Models (LLMs) to code optimization. The authors present a 7B-parameter transformer model that is trained from scratch to optimize LLVM assembly for code size. The model takes unoptimized assembly as input and outputs a list of compiler options to best optimize the program. During training, the model is asked to predict instruction counts before and after optimization, as well as the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and enhance its depth of understanding. The authors evaluate their approach on a large suite of test programs and find that it achieves a 3.0% improvement in reducing instruction counts compared to the compiler. This outperforms two state-of-the-art baselines that require thousands of compilations. Additionally, the model demonstrates strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time. While there have been previous LLMs trained on source code for various tasks such as code search, summarization, and documentation generation, this work is unique in its focus on optimizing code. Most LLMs are trained at least partly on code, but this paper specifically targets programming language models for optimization purposes. Overall, this research showcases how Large Language Models can be effectively utilized for compiler optimization and highlights their potential in improving code performance through automated optimizations.

- Large Language Models (LLMs) applied to code optimization
- 7B-parameter transformer model trained from scratch for LLVM assembly optimization
- Model predicts instruction counts before and after optimization, as well as optimized code
- Auxiliary learning tasks enhance model's performance and understanding
- Achieves 3.0% improvement in reducing instruction counts compared to the compiler
- Outperforms two state-of-the-art baselines requiring thousands of compilations
- Strong code reasoning abilities: generates compilable code 91% of the time, emulates compiler output 70% of the time
- Unique focus on optimizing code compared to other LLMs trained on source code for different tasks
- Demonstrates potential of LLMs in improving code performance through automated optimizations

Large Language Models (LLMs) are computer programs that help make code work better. They use a 7B-parameter transformer model, which is like a smart tool, to make code written in LLVM assembly language better. The model can predict how many instructions there are before and after making the code better, and it can also create the improved code. It learns from other tasks to become even smarter. Compared to regular compilers, it reduces the number of instructions by 3%. It is better than other similar tools and can understand and generate good code most of the time. This shows that LLMs have potential in making code work faster through automatic improvements." Definitions - Large Language Models (LLMs): Computer programs that help improve code. - Transformer model: A smart tool used by LLMs to make code better. - LLVM assembly optimization: Making code written in LLVM assembly language work better. - Instruction counts: The number of steps or actions needed for a program to run. - Compiler: A program that translates human-readable code into machine-readable instructions.

Large Language Models for Code Optimization

Computer programs are often written in high-level programming languages such as Java and C++. However, these programs must be translated into low-level assembly language before they can be executed by the computer. This translation process is known as compilation, and it involves a number of optimization steps to reduce the size and improve the performance of the program. In this paper, we explore a novel application of Large Language Models (LLMs) to code optimization. We present a 7B-parameter transformer model that is trained from scratch to optimize LLVM assembly for code size.

Background

Compiler optimization has traditionally been done manually by experienced programmers who understand how different compiler options affect code performance. However, manual optimization can be time consuming and requires significant expertise. Automated compiler optimization tools have been developed in recent years to address this problem, but they typically require thousands of compilations to achieve good results.

Methodology

The authors propose an automated approach based on large language models (LLMs). The model takes unoptimized assembly as input and outputs a list of compiler options that best optimize the program for size reduction. During training, the model is asked to predict instruction counts before and after optimization, as well as the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and enhance its depth of understanding.

Results

The authors evaluate their approach on a large suite of test programs and find that it achieves a 3% improvement in reducing instruction counts compared to traditional compilers without requiring thousands of compilations like other state-of-the-art baselines do. Additionally, the model demonstrates strong code reasoning abilities; generating compilable code 91% of the time and perfectly emulating output from traditional compilers 70% percent of time when given identical inputs..

Conclusion

This research showcases how Large Language Models can be effectively utilized for compiler optimization purposes and highlights their potential in improving code performance through automated optimizations while outperforming existing methods with fewer computations required during training or inference stages . While there have been previous LLMs trained on source codes for various tasks such as search , summarization , documentation generation etc., this work is unique in its focus on optimizing codes specifically .

Created on 15 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

58.1%

Demystifying GPT Self-Repair for Code Generation

cs.CL

55.7%

LLaMA: Open and Efficient Foundation Language Models

cs.CL

55.4%

InstructZero: Efficient Instruction Optimization for Black-Box Large Language…

cs.AI

54.9%

Large Language Models as Optimizers

cs.LG

53.9%

Learning Compiler Pass Orders using Coreset and Normalized Value Prediction

cs.PL

53.6%

GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and …

cs.SE

53.4%

Emergent Abilities of Large Language Models

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.