Repository-Level Prompt Generation for Large Language Models of Code

AI-generated keywords: LLMs Code Repositories Prompt Design Repo-Level Prompt Generator Context

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Authors address the importance of incorporating domain-specific knowledge in prompt design for large language models (LLMs) of code
Introduce a framework called Repo-Level Prompt Generator that generates example-specific prompts using prompt proposals
Proposals consider context from the entire repository, including structure and relevant files
Technique does not require access to LLM weights, making it applicable with black-box access only
Experiments on single-line code-autocompletion using Google Code archives show remarkable relative improvement of 36% over Codex with oracle constructed from prompt proposals
Trained model to predict prompt proposals shows significant performance gains compared to Codex and other baselines
Presents a novel approach to generating prompts for LLMs of code by leveraging repository-level context
Authors provide code, data, and trained checkpoints for further exploration and potential applications in other domains.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Disha Shrivastava, Hugo Larochelle, Daniel Tarlow

ICML, 2023

arXiv: 2206.12839v3 - DOI (cs.LG)

ICML 2023 (Camera-Ready version)

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a remarkably high relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines. We release our code, data, and trained checkpoints at: \url{https://github.com/shrivastavadisha/repo_level_prompt_generation}.

Submitted to arXiv on 26 Jun. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2206.12839v3

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

In their paper titled "Repository-Level Prompt Generation for Large Language Models of Code," authors Disha Shrivastava, Hugo Larochelle, and Daniel Tarlow address the importance of incorporating domain-specific knowledge in the prompt design process for large language models (LLMs) of code. They introduce a framework called Repo-Level Prompt Generator that generates example-specific prompts using prompt proposals. These proposals consider the context from the entire repository, including the structure of the repository and relevant files such as imports and parent class files. Notably, their technique does not require access to the weights of the LLM, making it applicable even when only black-box access is available. The authors conduct experiments on single-line code-autocompletion using code repositories obtained from Google Code archives. They demonstrate that an oracle constructed from their prompt proposals achieves a remarkable relative improvement of 36% over Codex, highlighting the quality of these proposals. Additionally, they train a model to predict prompt proposals and show significant performance gains compared to Codex and other baselines. Overall, this work presents a novel approach to generating prompts for LLMs of code by leveraging repository-level context. The authors provide their code, data, and trained checkpoints for further exploration which can be used to investigate potential applications in other domains as well.

- Authors address the importance of incorporating domain-specific knowledge in prompt design for large language models (LLMs) of code
- Introduce a framework called Repo-Level Prompt Generator that generates example-specific prompts using prompt proposals
- Proposals consider context from the entire repository, including structure and relevant files
- Technique does not require access to LLM weights, making it applicable with black-box access only
- Experiments on single-line code-autocompletion using Google Code archives show remarkable relative improvement of 36% over Codex with oracle constructed from prompt proposals
- Trained model to predict prompt proposals shows significant performance gains compared to Codex and other baselines
- Presents a novel approach to generating prompts for LLMs of code by leveraging repository-level context
- Authors provide code, data, and trained checkpoints for further exploration and potential applications in other domains.

The authors of a study talk about how important it is to use specific knowledge when creating prompts for computer programs. They introduce a framework called Repo-Level Prompt Generator that helps generate prompts for different examples. The prompts take into account the structure and relevant files of the entire program. This technique can be used even if we don't have access to all the information about the program. In experiments, the new approach showed a 36% improvement compared to other methods. The authors also provide code, data, and trained checkpoints for others to use and explore." Definitions- Domain-specific knowledge: Specific information or understanding about a particular subject or field. - Large language models (LLMs): Computer programs that can understand and generate human-like language. - Prompt: A question or instruction given to a computer program to guide its behavior or output. - Repository: A place where computer code and related files are stored and managed. - Baselines: Standard methods or techniques used as a comparison for evaluating new approaches or improvements.

Repository-Level Prompt Generation for Large Language Models of Code: A Comprehensive Guide

Background

The use of LLMs to generate code has been a popular research topic in recent years due to its potential applications in software engineering tasks such as autocompletion and bug fixing. However, these models often suffer from poor performance when trained on small datasets or without access to domain-specific information. To address this issue, the authors propose a framework called Repo-Level Prompt Generator that generates example-specific prompts using prompt proposals based on repository context. Notably, their technique does not require access to the weights of the LLM, making it applicable even when only black-box access is available.

Experiments

The authors conduct experiments on single-line code autocompletion using code repositories obtained from Google Code archives. They demonstrate that an oracle constructed from their prompt proposals achieves a remarkable relative improvement of 36% over Codex, highlighting the quality of these proposals. Additionally, they train a model to predict prompt proposals and show significant performance gains compared to Codex and other baselines.

Conclusion

Overall, this work presents a novel approach to generating prompts for LLMs of code by leveraging repository-level context. The authors provide their code, data, and trained checkpoints for further exploration which can be used to investigate potential applications in other domains as well. This research provides valuable insights into how domain knowledge can be incorporated into LLM training processes with promising results that could have far reaching implications across many areas including software engineering and natural language processing (NLP).

Created on 15 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

80.8%

Prompting Large Language Model for Machine Translation: A Case Study

cs.CL

79.2%

Large Language Models Are Human-Level Prompt Engineers

cs.LG

76.9%

Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Lan…

cs.CL

76.5%

Black-box Prompt Learning for Pre-trained Language Models

cs.CL

75.6%

MetaPrompting: Learning to Learn Better Prompts

cs.CL

75.1%

LLM-Rec: Personalized Recommendation via Prompting Large Language Models

cs.CL

74.7%

Prompt Sapper: A LLM-Empowered Production Tool for Building AI Chains

cs.SE

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.