Analysis of Software Engineering Practices in General Software and Machine Learning Startups

AI-generated keywords: Software Engineering Machine Learning ML Startups Systematic Literature Review Software Development Life-cycle

AI-generated Key Points

Objective: Understand software engineering practices followed by machine learning (ML) startups and identify any additional needs
Method: Conducted a systematic literature review on 37 papers published in the last 21 years, focusing on general software startups and ML startups
Phases of software development life-cycle studied: requirement engineering, design, development, quality assurance, and deployment
Database search performed in IEEE Xplore and ACM Digital Library using terms related to ML startups; alternate terminologies like "Deep learning" and "Artificial Intelligence" used to enrich the database for ML startups
Metadata collected from database search including title, author names, abstracts, published year, URL citations, etc.
Web scraper developed using BeautifulSoup library in Python to extract metadata from ACM Digital Library search results
Deduplication performed using Pandas package in Python; duplicate papers dropped based on metadata; RegEx used to drop papers without required keywords in abstracts
Snowballing techniques employed to increase number of papers for ML startups; relevant paper citations checked for additional relevant papers based on inclusion-exclusion criteria; added twenty-one papers to the database for ML startups
Manual selection process conducted to finalize papers for analysis
Total of 92 papers collected: 72 belonged to general software startups while 20 belonged to ML startups distributed among different phases of software development life cycle.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Bishal Lakha, Kalyan Bhetwal, Nasir U. Eisty

arXiv: 2304.01523v1 - DOI (cs.SE)

Accepted at the 21st IEEE/ACIS International Conference on Software Engineering Research, Management and Applications (SERA 2023)

License: CC BY 4.0

Abstract: Context: On top of the inherent challenges startup software companies face applying proper software engineering practices, the non-deterministic nature of machine learning techniques makes it even more difficult for machine learning (ML) startups. Objective: Therefore, the objective of our study is to understand the whole picture of software engineering practices followed by ML startups and identify additional needs. Method: To achieve our goal, we conducted a systematic literature review study on 37 papers published in the last 21 years. We selected papers on both general software startups and ML startups. We collected data to understand software engineering (SE) practices in five phases of the software development life-cycle: requirement engineering, design, development, quality assurance, and deployment. Results: We find some interesting differences in software engineering practices in ML startups and general software startups. The data management and model learning phases are the most prominent among them. Conclusion: While ML startups face many similar challenges to general software startups, the additional difficulties of using stochastic ML models require different strategies in using software engineering practices to produce high-quality products.

Submitted to arXiv on 04 Apr. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2304.01523v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

The objective of this study was to understand the software engineering practices followed by machine learning (ML) startups and identify any additional needs. To achieve this goal, a systematic literature review was conducted on 37 papers published in the last 21 years, focusing on both general software startups and ML startups. The data collected aimed to understand software engineering practices in five phases of the software development life-cycle: requirement engineering, design, development, quality assurance, and deployment. Initially, a database search was performed in IEEE Xplore and ACM Digital Library using terms related to ML startups. As an alternative approach, alternate terminologies for machine learning such as "Deep learning" and "Artificial Intelligence" were used to enrich the database for ML startups. This resulted in a few additional useful papers. Metadata including title, author names, abstracts, published year, URL citations and other details were collected from the database search and stored in a CSV file. This allowed for deduplication and validation of papers as well as manual filtering based on abstracts. While IEEE Xplore had built-in features for exporting metadata as a CSV file; ACM Digital Library and Google Scholar did not have these features. Therefore; a web scraper was developed using BeautifulSoup library in Python to extract metadata from ACM Digital Library search results. After collecting metadata from various sources for both general software startups and ML startups; deduplication was performed using Pandas package in Python. Duplicate papers were dropped from the list based on their metadata; additionally RegEx (regular expressions) was used to identify papers where the required keywords were not present in the abstracts; these papers were also dropped from the list. To increase the number of papers for ML startups since there were very few initially found through database search alone; snowballing techniques were employed. Citations of relevant papers were checked to identify any additional relevant papers based on inclusion-exclusion criteria; this resulted in the addition of twenty-one papers to the database for ML startups. After de-duplication; validation; and snowballing; a manual selection process was conducted to finalize the papers for analysis. Out of 92 total papers collected 72 belonged to general software startups while 20 belonged to ML startups distributed among different phases of software development life cycle.

- Objective: Understand software engineering practices followed by machine learning (ML) startups and identify any additional needs
- Method: Conducted a systematic literature review on 37 papers published in the last 21 years, focusing on general software startups and ML startups
- Phases of software development life-cycle studied: requirement engineering, design, development, quality assurance, and deployment
- Database search performed in IEEE Xplore and ACM Digital Library using terms related to ML startups; alternate terminologies like "Deep learning" and "Artificial Intelligence" used to enrich the database for ML startups
- Metadata collected from database search including title, author names, abstracts, published year, URL citations, etc.
- Web scraper developed using BeautifulSoup library in Python to extract metadata from ACM Digital Library search results
- Deduplication performed using Pandas package in Python; duplicate papers dropped based on metadata; RegEx used to drop papers without required keywords in abstracts
- Snowballing techniques employed to increase number of papers for ML startups; relevant paper citations checked for additional relevant papers based on inclusion-exclusion criteria; added twenty-one papers to the database for ML startups
- Manual selection process conducted to finalize papers for analysis
- Total of 92 papers collected: 72 belonged to general software startups while 20 belonged to ML startups distributed among different phases of software development life cycle.

The researchers wanted to learn about how machine learning startups make software and if they need any extra help. They read 37 papers from the past 21 years about software startups and machine learning startups. They looked at different parts of making software like planning, designing, building, testing, and putting it out for people to use. They searched for these papers in special libraries online using words related to machine learning startups. They used a program called Python to collect information from the library's website. They checked the papers to make sure there were no duplicates or ones that didn't have the right information. They also looked at other papers that were mentioned in the ones they found. In total, they had 92 papers - 72 about regular software startups and 20 about machine learning startups." Definitions- Software engineering practices: The ways that people make computer programs. - Machine learning: When computers can learn things on their own without being told exactly what to do. - Startups: Small new companies that are just starting out. - Systematic literature review: Reading lots of articles or papers on a specific topic and summarizing what they say. - Requirement engineering: Figuring out what a computer program needs to be able to do before you start making it. - Design: Planning how a computer program will look and work before you start building it. - Development: Building a computer program by writing code. - Quality assurance: Checking that a computer program works correctly and doesn't have any problems or bugs. -

Understanding Software Engineering Practices in Machine Learning Startups

Data Collection

The data collected aimed to understand software engineering practices in five phases of the software development life-cycle: requirement engineering, design, development, quality assurance, and deployment. Initially, a database search was performed in IEEE Xplore and ACM Digital Library using terms related to ML startups. As an alternative approach; alternate terminologies for machine learning such as "Deep learning" and "Artificial Intelligence" were used to enrich the database for ML startups. This resulted in a few additional useful papers. Metadata including title; author names; abstracts; published year; URL citations and other details were collected from the database search and stored in a CSV file. This allowed for deduplication and validation of papers as well as manual filtering based on abstracts. While IEEE Xplore had built-in features for exporting metadata as a CSV file; ACM Digital Library and Google Scholar did not have these features. Therefore; a web scraper was developed using BeautifulSoup library in Python to extract metadata from ACM Digital Library search results.

Deduplication & Validation

After collecting metadata from various sources for both general software startups and ML startups; deduplication was performed using Pandas package in Python. Duplicate papers were dropped from the list based on their metadata; additionally RegEx (regular expressions) was used to identify papers where the required keywords were not present in the abstracts; these papers were also dropped from the list.

Snowballing Techniques

To increase the number of papers for ML startups since there were very few initially found through database search alone; snowballing techniques were employed. Citations of relevant papers were checked to identify any additional relevant papers based on inclusion-exclusion criteria which resulted in addition twenty-one morepapers added into database related with ML startup research area . After de-duplication ; validation ;and snowballing ;a manual selection process was conducted to finalizethe paperfor analysis .Outof 92 totalpapers collected 72 belongedto generalfor softwaresoftwarestartups while 20 belongedtoMLstartups distributedamong differentphasesofsoftwaredevelopmentlifecycle .

Conclusion

In conclusion ,this study provides insight into understandingsoftwareengineeringpracticesfollowedbymachinelearningstartupsthroughsystematicliteraturereviewon37paperspublishedinthelast21yearsfocusingongeneralsoftwarestartupsandMLstartups .Thedatacollectedaimedtounderstandsoftwareengineeringpracticesinfivephasesofthesoftwaredevelopmentlifecyclerequirementengineeringdesigndevelopmentqualityassuranceanddeployment .Thisstudycanhelpotherresearchersinthefieldunderstandthecurrentstateoftheartinthisdomainandserveasabasisfortheirfutureresearchworkinsimilardomains .

Created on 19 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

56.9%

Motivations, Benefits, and Issues for Adopting Micro-Frontends: A Multivocal …

cs.SE

53.6%

Successful Management of Cloud Based Global Software Development Projects: A …

cs.SE

52.4%

Ethics of AI: A Systematic Literature Review of Principles and Challenges

cs.CY

51.9%

A Study of Documentation for Software Architecture

cs.SE

51.1%

The "Collections as ML Data" Checklist for Machine Learning & Cultural Herita…

cs.LG

49.0%

Tracing and Visualizing Human-ML/AI Collaborative Processes through Artifacts…

cs.HC

48.4%

Practical and Ethical Challenges of Large Language Models in Education: A Sys…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.