Automated Unit Test Improvement using Large Language Models at Meta

AI-generated keywords: Meta TestGen-LLM Large Language Models automated test enhancement diff-time deployment

AI-generated Key Points

  • Meta's TestGen-LLM tool utilizes Large Language Models (LLMs) to enhance human-written tests automatically
  • Generated test classes pass filters for significant improvements over original test suite and to mitigate LLM hallucination issues
  • Deployment of TestGen-LLM at Meta test-a-thons for Instagram and Facebook platforms shows promising results
  • Evaluation focusing on Reels and Stories products for Instagram: 75% of TestGen-LLM's test cases successfully built, 57% passed reliably, resulting in 25% coverage increase
  • Diff-time deployment mode provides engineers with full context of testing and code under review, showcasing effectiveness in real-world scenarios
  • Construction of TestGen-LLM diffs during Instagram Test-a-thons yielded promising results, with some diffs significantly improving coverage by covering previously untouched methods and files
  • Previous literature reviews confirm prevalence of LLM-based test generation approaches; this paper stands out for extending existing test classes and reporting industrial-scale deployment results
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Nadia Alshahwan, Jubin Chheda, Anastasia Finegenova, Beliz Gokkaya, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang

12 pages, 8 figures, 32nd ACM Symposium on the Foundations of Software Engineering (FSE 24)
License: CC BY 4.0

Abstract: This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Instagram and Facebook platforms. In an evaluation on Reels and Stories products for Instagram, 75% of TestGen-LLM's test cases built correctly, 57% passed reliably, and 25% increased coverage. During Meta's Instagram and Facebook test-a-thons, it improved 11.5% of all classes to which it was applied, with 73% of its recommendations being accepted for production deployment by Meta software engineers. We believe this is the first report on industrial scale deployment of LLM-generated code backed by such assurances of code improvement.

Submitted to arXiv on 14 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.09171v1

This paper presents Meta's TestGen-LLM tool, which utilizes Large Language Models (LLMs) to automatically enhance human-written tests. The tool ensures that the generated test classes pass a set of filters to guarantee significant improvements over the original test suite and mitigate issues related to LLM hallucination. The deployment of TestGen-LLM at Meta test-a-thons for Instagram and Facebook platforms is discussed, showcasing promising results. In an evaluation focusing on Reels and Stories products for Instagram, 75% of TestGen-LLM's test cases were successfully built and 57% passed reliably, resulting in a 25% increase in coverage. The paper emphasizes the effectiveness of deploying tests at diff time as it provides engineers with full context of existing testing and code under review. Insights into diff-time deployment mode were obtained through experiences gained from a test-a-thon, shedding light on how this technology performs in real-world scenarios. Initially done manually but later automated in subsequent events, the construction of TestGen-LLM diffs for the Instagram Test-a-thons yielded promising results. During the first Instagram Test-a-thon, 36 engineers landed 105 unit test diffs with 16 generated by TestGen-LLM. Notably, one diff was rejected due to lack of assertion in the test case. The outcomes varied with some diffs significantly improving coverage by covering previously untouched methods and files. The largest coverage improvement stemmed from a diff that covered multiple new files and A/B testing gatekeepers. In terms of related work, software test generation within the realm of Large Language Model-based Software Engineering (LLMSE) has been extensively studied. While previous literature reviews confirm the prevalence of LLM-based test generation approaches, this paper stands out for its focus on extending existing test classes and reporting results from industrial-scale deployment. Overall, this paper contributes valuable insights into automated unit test improvement using LLMs at Meta through diff-time deployment strategies and showcases promising results from real-world applications on popular social media platforms like Instagram and Facebook.
Created on 17 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.