ChatGPT as a tool for User Story Quality Evaluation: Trustworthy Out of the Box?

AI-generated keywords: ChatGPT User Story Evaluation Agile Software Development AI Trustworthiness Output Stability

AI-generated Key Points

  • ChatGPT, a general-purpose language model, used for evaluating user stories in Agile software development
  • User stories capture end-user needs and facilitate communication within development teams
  • ChatGPT's performance aligns well with human evaluation, compared to an existing benchmark
  • "Best of three" strategy proposed to improve output stability and address concerns about trustworthiness and reliability in AI
  • User stories commonly used in expressing requirements in Agile software development, following a specific template with elements such as role, goal, and benefit
  • Few-shot prompting technique utilized to evaluate user story quality using ChatGPT
  • Quality criteria presented by Lucassen et al. used for assessing individual user stories and sets of user stories
  • T˜oemets' work on predicting the quality of user stories mentioned for monitoring purposes
  • High agreement rates between human evaluations and ChatGPT's evaluations shown across various metrics
  • Study demonstrates that ChatGPT can effectively evaluate the quality of user stories in Agile software development
  • Provides insights into improving output stability of ChatGPT
  • Highlights potential applications of AI in this domain while addressing concerns about trustworthiness and reliability.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Krishna Ronanki, Beatriz Cabrero-Daniel, Christian Berger

9 Pages, 2 Tables, 1 Figure. Accepted at AI-Assisted Agile Software Development Workshop (Co-located with XP 2023)
License: CC BY 4.0

Abstract: In Agile software development, user stories play a vital role in capturing and conveying end-user needs, prioritizing features, and facilitating communication and collaboration within development teams. However, automated methods for evaluating user stories require training in NLP tools and can be time-consuming to develop and integrate. This study explores using ChatGPT for user story quality evaluation and compares its performance with an existing benchmark. Our study shows that ChatGPT's evaluation aligns well with human evaluation, and we propose a ``best of three'' strategy to improve its output stability. We also discuss the concept of trustworthiness in AI and its implications for non-experts using ChatGPT's unprocessed outputs. Our research contributes to understanding the reliability and applicability of AI in user story evaluation and offers recommendations for future research.

Submitted to arXiv on 21 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.12132v1

This study explores the use of ChatGPT, a general-purpose language model, for evaluating the quality of user stories in Agile software development. User stories are important for capturing end-user needs and facilitating communication within development teams. The researchers compare ChatGPT's performance with an existing benchmark and find that it aligns well with human evaluation. They propose a "best of three" strategy to improve output stability and address concerns about trustworthiness and reliability in AI. In the background section, the authors explain that user stories are commonly used in expressing requirements in Agile software development. They follow a specific template that includes elements such as role, goal, and benefit. The study utilizes few-shot prompting technique to evaluate user story quality using ChatGPT. This technique involves providing the model with a small number of examples as conditioning before asking it to evaluate user stories based on defined criteria. The researchers use the quality criteria presented by Lucassen et al., which consist of 13 criteria for assessing individual user stories and sets of user stories. They also mention T˜oemets' work on predicting the quality of user stories for monitoring purposes. The method section includes tables showing agreement rates between human evaluations and ChatGPT's evaluations using different interpretation strategies. The results show high agreement rates across various metrics. Overall, this study demonstrates that ChatGPT can effectively evaluate the quality of user stories in Agile software development and provides insights into improving its output stability. It highlights the potential applications of AI in this domain while addressing concerns about trustworthiness and reliability.
Created on 20 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.