Repository-Level Prompt Generation for Large Language Models of Code

AI-generated keywords: LLMs Code Repositories Prompt Design Repo-Level Prompt Generator Context

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors address the importance of incorporating domain-specific knowledge in prompt design for large language models (LLMs) of code
  • Introduce a framework called Repo-Level Prompt Generator that generates example-specific prompts using prompt proposals
  • Proposals consider context from the entire repository, including structure and relevant files
  • Technique does not require access to LLM weights, making it applicable with black-box access only
  • Experiments on single-line code-autocompletion using Google Code archives show remarkable relative improvement of 36% over Codex with oracle constructed from prompt proposals
  • Trained model to predict prompt proposals shows significant performance gains compared to Codex and other baselines
  • Presents a novel approach to generating prompts for LLMs of code by leveraging repository-level context
  • Authors provide code, data, and trained checkpoints for further exploration and potential applications in other domains.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Disha Shrivastava, Hugo Larochelle, Daniel Tarlow

ICML, 2023
ICML 2023 (Camera-Ready version)

Abstract: With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a remarkably high relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines. We release our code, data, and trained checkpoints at: \url{https://github.com/shrivastavadisha/repo_level_prompt_generation}.

Submitted to arXiv on 26 Jun. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2206.12839v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "Repository-Level Prompt Generation for Large Language Models of Code," authors Disha Shrivastava, Hugo Larochelle, and Daniel Tarlow address the importance of incorporating domain-specific knowledge in the prompt design process for large language models (LLMs) of code. They introduce a framework called Repo-Level Prompt Generator that generates example-specific prompts using prompt proposals. These proposals consider the context from the entire repository, including the structure of the repository and relevant files such as imports and parent class files. Notably, their technique does not require access to the weights of the LLM, making it applicable even when only black-box access is available. The authors conduct experiments on single-line code-autocompletion using code repositories obtained from Google Code archives. They demonstrate that an oracle constructed from their prompt proposals achieves a remarkable relative improvement of 36% over Codex, highlighting the quality of these proposals. Additionally, they train a model to predict prompt proposals and show significant performance gains compared to Codex and other baselines. Overall, this work presents a novel approach to generating prompts for LLMs of code by leveraging repository-level context. The authors provide their code, data, and trained checkpoints for further exploration which can be used to investigate potential applications in other domains as well.
Created on 15 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.