Evaluating and Explaining Large Language Models for Code Using Syntactic Structures

AI-generated keywords: Large Language Models (LLMs) ASTxplainer Abstract Syntax Tree (AST) GitHub projects Software Engineering

AI-generated Key Points

  • Large Language Models (LLMs) are evaluated and explained for their effectiveness in code
  • LLMs are transformer-based neural networks pre-trained on large datasets of natural and programming languages
  • ASTxplainer is introduced as an explainability method specific to LLMs for code, aligning token predictions with Abstract Syntax Tree (AST) nodes
  • ASTxplainer extracts and aggregates normalized model logits within AST structures to provide insights into LLM effectiveness
  • Empirical evaluation conducted on 12 popular LLMs using a curated dataset of popular GitHub projects
  • Demonstrates the practical benefit of ASTxplainer in improving LLM evaluation
  • User study conducted to examine the usefulness of ASTxplainer-derived visualizations in aiding end-users' understanding of model predictions
  • Results highlight the potential for ASTxplainer to improve LLM evaluation and aid in interpreting model predictions
  • Contributes to the field of automated Software Engineering tools and provides valuable insights into the performance and interpretability of LLMs for code.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: David N Palacio, Alejandro Velasco, Daniel Rodriguez-Cardenas, Kevin Moran, Denys Poshyvanyk

License: CC BY-SA 4.0

Abstract: Large Language Models (LLMs) for code are a family of high-parameter, transformer-based neural networks pre-trained on massive datasets of both natural and programming languages. These models are rapidly being employed in commercial AI-based developer tools, such as GitHub CoPilot. However, measuring and explaining their effectiveness on programming tasks is a challenging proposition, given their size and complexity. The methods for evaluating and explaining LLMs for code are inextricably linked. That is, in order to explain a model's predictions, they must be reliably mapped to fine-grained, understandable concepts. Once this mapping is achieved, new methods for detailed model evaluations are possible. However, most current explainability techniques and evaluation benchmarks focus on model robustness or individual task performance, as opposed to interpreting model predictions. To this end, this paper introduces ASTxplainer, an explainability method specific to LLMs for code that enables both new methods for LLM evaluation and visualizations of LLM predictions that aid end-users in understanding model predictions. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes, by extracting and aggregating normalized model logits within AST structures. To demonstrate the practical benefit of ASTxplainer, we illustrate the insights that our framework can provide by performing an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects. Additionally, we perform a user study examining the usefulness of an ASTxplainer-derived visualization of model predictions aimed at enabling model users to explain predictions. The results of these studies illustrate the potential for ASTxplainer to provide insights into LLM effectiveness, and aid end-users in understanding predictions.

Submitted to arXiv on 07 Aug. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2308.03873v1

This paper evaluates and explains the effectiveness of Large Language Models (LLMs) for code. LLMs are transformer-based neural networks pre-trained on large datasets of both natural and programming languages. To this end, the authors introduce ASTxplainer, an explainability method specific to LLMs for code that aligns token predictions with Abstract Syntax Tree (AST) nodes. This automated method extracts and aggregates normalized model logits within AST structures to provide insights into LLM effectiveness. An empirical evaluation is conducted on 12 popular LLMs using a curated dataset of popular GitHub projects to demonstrate the practical benefit of ASTxplainer. Additionally, a user study is conducted to examine the usefulness of ASTxplainer-derived visualizations in aiding end-users in understanding model predictions. The results highlight the potential for ASTxplainer to improve LLM evaluation and aid in interpreting model predictions. Overall, this research contributes to the field of automated Software Engineering tools and provides valuable insights into the performance and interpretability of LLMs for code.
Created on 28 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.