Can LLMs Generate Architectural Design Decisions? -An Exploratory Empirical study

AI-generated keywords: Architectural Knowledge Management

AI-generated Key Points

  • Study focuses on Architectural Knowledge Management (AKM) and use of Large Language Models (LLMs) for Architecture Decision Records (ADRs)
  • Utilizes Developer-Intent Driven Code Comment Generation and Automatic Identification of Decisions from developer mailing lists
  • Tools like ADeX used for automatic curation of design decision knowledge
  • Evaluation metrics include ROUGE, BLEU, METEOR, and BERTScore
  • Experiment involves gathering 95 ADRs from repositories like archane-framework, winery, joelparkerhenderson's repository, cardano, and island
  • LLM models explored include GPT-2, GPT-3, GPT-3.5, GPT-4, T5 in different sizes (small to XL), T0 models like ada and davinci along with Flan-T5 variants
  • Results show that state-of-the-art models like GPT-4 can generate relevant Design Decisions in a 0-shot setting but fall short of human-level performance
  • More cost-effective models such as GPT-3.5 show promise in few-shot settings while smaller models like Flan-T5 can yield comparable results after fine-tuning
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Rudra Dhar, Karthik Vaidhyanathan, Vasudeva Varma

This paper has been accepted to IEEE ICSA 2024 (Main Track - Research Track)
License: CC BY 4.0

Abstract: Architectural Knowledge Management (AKM) involves the organized handling of information related to architectural decisions and design within a project or organization. An essential artifact of AKM is the Architecture Decision Records (ADR), which documents key design decisions. ADRs are documents that capture decision context, decision made and various aspects related to a design decision, thereby promoting transparency, collaboration, and understanding. Despite their benefits, ADR adoption in software development has been slow due to challenges like time constraints and inconsistent uptake. Recent advancements in Large Language Models (LLMs) may help bridge this adoption gap by facilitating ADR generation. However, the effectiveness of LLM for ADR generation or understanding is something that has not been explored. To this end, in this work, we perform an exploratory study that aims to investigate the feasibility of using LLM for the generation of ADRs given the decision context. In our exploratory study, we utilize GPT and T5-based models with 0-shot, few-shot, and fine-tuning approaches to generate the Decision of an ADR given its Context. Our results indicate that in a 0-shot setting, state-of-the-art models such as GPT-4 generate relevant and accurate Design Decisions, although they fall short of human-level performance. Additionally, we observe that more cost-effective models like GPT-3.5 can achieve similar outcomes in a few-shot setting, and smaller models such as Flan-T5 can yield comparable results after fine-tuning. To conclude, this exploratory study suggests that LLM can generate Design Decisions, but further research is required to attain human-level generation and establish standardized widespread adoption.

Submitted to arXiv on 04 Mar. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2403.01709v1

, , , , In this expanded study on Architectural Knowledge Management (AKM) and the use of Large Language Models (LLMs) for Architecture Decision Records (ADRs), we delve into the realm of Developer-Intent Driven Code Comment Generation, Automatic Identification of Decisions from developer mailing lists, and tools like ADeX for automatic curation of design decision knowledge. Drawing from foundational works such as the Goal Question Metric Approach by Basili et al. and metrics like ROUGE, BLEU, METEOR, and BERTScore for evaluation, we aim to enhance our understanding of how LLMs can revolutionize ADR generation. Our experimental subject involves gathering ADR data from various repositories like archane-framework, winery, joelparkerhenderson's repository, cardano, and island. Through web crawling and manual extraction processes, we obtained 95 ADRs that adhere to a standard format. Focusing on extracting the Context and Decision components from these ADRs, we aim to leverage LLMs for generating Design Decisions based on given contexts. We explore a range of LLM models including GPT-2, GPT-3, GPT-3.5, GPT-4, T5 in different sizes (small to XL), T0 models like ada and davinci along with Flan-T5 variants. By experimenting with 0-shot, few-shot, and fine-tuning approaches using these models on our extracted ADR data samples as shown in Figure 2 - where Python is chosen as the primary programming language - we assess their effectiveness in generating accurate Design Decisions. Our results indicate that state-of-the-art models like GPT-4 can generate relevant Design Decisions in a 0-shot setting but fall short of human-level performance. However, more cost-effective models such as GPT-3.5 show promise in few-shot settings while smaller models like Flan-T5 can yield comparable results after fine-tuning. This exploratory study suggests that LLMs have potential for generating Design Decisions but further research is needed to achieve human-level generation and establish standardized widespread adoption in AKM practices.
Created on 07 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.