ChatGPT as a tool for User Story Quality Evaluation: Trustworthy Out of the Box?
AI-generated Key Points
- ChatGPT, a general-purpose language model, used for evaluating user stories in Agile software development
- User stories capture end-user needs and facilitate communication within development teams
- ChatGPT's performance aligns well with human evaluation, compared to an existing benchmark
- "Best of three" strategy proposed to improve output stability and address concerns about trustworthiness and reliability in AI
- User stories commonly used in expressing requirements in Agile software development, following a specific template with elements such as role, goal, and benefit
- Few-shot prompting technique utilized to evaluate user story quality using ChatGPT
- Quality criteria presented by Lucassen et al. used for assessing individual user stories and sets of user stories
- T˜oemets' work on predicting the quality of user stories mentioned for monitoring purposes
- High agreement rates between human evaluations and ChatGPT's evaluations shown across various metrics
- Study demonstrates that ChatGPT can effectively evaluate the quality of user stories in Agile software development
- Provides insights into improving output stability of ChatGPT
- Highlights potential applications of AI in this domain while addressing concerns about trustworthiness and reliability.
Authors: Krishna Ronanki, Beatriz Cabrero-Daniel, Christian Berger
Abstract: In Agile software development, user stories play a vital role in capturing and conveying end-user needs, prioritizing features, and facilitating communication and collaboration within development teams. However, automated methods for evaluating user stories require training in NLP tools and can be time-consuming to develop and integrate. This study explores using ChatGPT for user story quality evaluation and compares its performance with an existing benchmark. Our study shows that ChatGPT's evaluation aligns well with human evaluation, and we propose a ``best of three'' strategy to improve its output stability. We also discuss the concept of trustworthiness in AI and its implications for non-experts using ChatGPT's unprocessed outputs. Our research contributes to understanding the reliability and applicability of AI in user story evaluation and offers recommendations for future research.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.