The objective of this study was to understand the software engineering practices followed by machine learning (ML) startups and identify any additional needs. To achieve this goal, a systematic literature review was conducted on 37 papers published in the last 21 years, focusing on both general software startups and ML startups. The data collected aimed to understand software engineering practices in five phases of the software development life-cycle: requirement engineering, design, development, quality assurance, and deployment. Initially, a database search was performed in IEEE Xplore and ACM Digital Library using terms related to ML startups. As an alternative approach, alternate terminologies for machine learning such as "Deep learning" and "Artificial Intelligence" were used to enrich the database for ML startups. This resulted in a few additional useful papers. Metadata including title, author names, abstracts, published year, URL citations and other details were collected from the database search and stored in a CSV file. This allowed for deduplication and validation of papers as well as manual filtering based on abstracts. While IEEE Xplore had built-in features for exporting metadata as a CSV file; ACM Digital Library and Google Scholar did not have these features. Therefore; a web scraper was developed using BeautifulSoup library in Python to extract metadata from ACM Digital Library search results. After collecting metadata from various sources for both general software startups and ML startups; deduplication was performed using Pandas package in Python. Duplicate papers were dropped from the list based on their metadata; additionally RegEx (regular expressions) was used to identify papers where the required keywords were not present in the abstracts; these papers were also dropped from the list. To increase the number of papers for ML startups since there were very few initially found through database search alone; snowballing techniques were employed. Citations of relevant papers were checked to identify any additional relevant papers based on inclusion-exclusion criteria; this resulted in the addition of twenty-one papers to the database for ML startups. After de-duplication; validation; and snowballing; a manual selection process was conducted to finalize the papers for analysis. Out of 92 total papers collected 72 belonged to general software startups while 20 belonged to ML startups distributed among different phases of software development life cycle.
- - Objective: Understand software engineering practices followed by machine learning (ML) startups and identify any additional needs
- - Method: Conducted a systematic literature review on 37 papers published in the last 21 years, focusing on general software startups and ML startups
- - Phases of software development life-cycle studied: requirement engineering, design, development, quality assurance, and deployment
- - Database search performed in IEEE Xplore and ACM Digital Library using terms related to ML startups; alternate terminologies like "Deep learning" and "Artificial Intelligence" used to enrich the database for ML startups
- - Metadata collected from database search including title, author names, abstracts, published year, URL citations, etc.
- - Web scraper developed using BeautifulSoup library in Python to extract metadata from ACM Digital Library search results
- - Deduplication performed using Pandas package in Python; duplicate papers dropped based on metadata; RegEx used to drop papers without required keywords in abstracts
- - Snowballing techniques employed to increase number of papers for ML startups; relevant paper citations checked for additional relevant papers based on inclusion-exclusion criteria; added twenty-one papers to the database for ML startups
- - Manual selection process conducted to finalize papers for analysis
- - Total of 92 papers collected: 72 belonged to general software startups while 20 belonged to ML startups distributed among different phases of software development life cycle.
The researchers wanted to learn about how machine learning startups make software and if they need any extra help. They read 37 papers from the past 21 years about software startups and machine learning startups. They looked at different parts of making software like planning, designing, building, testing, and putting it out for people to use. They searched for these papers in special libraries online using words related to machine learning startups. They used a program called Python to collect information from the library's website. They checked the papers to make sure there were no duplicates or ones that didn't have the right information. They also looked at other papers that were mentioned in the ones they found. In total, they had 92 papers - 72 about regular software startups and 20 about machine learning startups."
Definitions- Software engineering practices: The ways that people make computer programs.
- Machine learning: When computers can learn things on their own without being told exactly what to do.
- Startups: Small new companies that are just starting out.
- Systematic literature review: Reading lots of articles or papers on a specific topic and summarizing what they say.
- Requirement engineering: Figuring out what a computer program needs to be able to do before you start making it.
- Design: Planning how a computer program will look and work before you start building it.
- Development: Building a computer program by writing code.
- Quality assurance: Checking that a computer program works correctly and doesn't have any problems or bugs.
-
Understanding Software Engineering Practices in Machine Learning Startups
The objective of this study was to understand the software engineering practices followed by machine learning (ML) startups and identify any additional needs. To achieve this goal, a systematic literature review was conducted on 37 papers published in the last 21 years, focusing on both general software startups and ML startups. This article will discuss the methodology used for collecting data from various sources, deduplication of papers, validation of results and snowballing techniques employed to increase the number of papers for ML startups.
Data Collection
The data collected aimed to understand software engineering practices in five phases of the software development life-cycle: requirement engineering, design, development, quality assurance, and deployment. Initially, a database search was performed in IEEE Xplore and ACM Digital Library using terms related to ML startups. As an alternative approach; alternate terminologies for machine learning such as "Deep learning" and "Artificial Intelligence" were used to enrich the database for ML startups. This resulted in a few additional useful papers. Metadata including title; author names; abstracts; published year; URL citations and other details were collected from the database search and stored in a CSV file. This allowed for deduplication and validation of papers as well as manual filtering based on abstracts. While IEEE Xplore had built-in features for exporting metadata as a CSV file; ACM Digital Library and Google Scholar did not have these features. Therefore; a web scraper was developed using BeautifulSoup library in Python to extract metadata from ACM Digital Library search results.
Deduplication & Validation
After collecting metadata from various sources for both general software startups and ML startups; deduplication was performed using Pandas package in Python. Duplicate papers were dropped from the list based on their metadata; additionally RegEx (regular expressions) was used to identify papers where the required keywords were not present in the abstracts; these papers were also dropped from the list.
Snowballing Techniques
To increase the number of papers for ML startups since there were very few initially found through database search alone; snowballing techniques were employed. Citations of relevant papers were checked to identify any additional relevant papers based on inclusion-exclusion criteria which resulted in addition twenty-one morepapers added into database related with ML startup research area . After de-duplication ; validation ;and snowballing ;a manual selection process was conducted to finalizethe paperfor analysis .Outof 92 totalpapers collected 72 belongedto generalfor softwaresoftwarestartups while 20 belongedtoMLstartups distributedamong differentphasesofsoftwaredevelopmentlifecycle .
Conclusion
In conclusion ,this study provides insight into understandingsoftwareengineeringpracticesfollowedbymachinelearningstartupsthroughsystematicliteraturereviewon37paperspublishedinthelast21yearsfocusingongeneralsoftwarestartupsandMLstartups .Thedatacollectedaimedtounderstandsoftwareengineeringpracticesinfivephasesofthesoftwaredevelopmentlifecyclerequirementengineeringdesigndevelopmentqualityassuranceanddeployment .Thisstudycanhelpotherresearchersinthefieldunderstandthecurrentstateoftheartinthisdomainandserveasabasisfortheirfutureresearchworkinsimilardomains .