Analysis of Software Engineering Practices in General Software and Machine Learning Startups

AI-generated keywords: Software Engineering Machine Learning ML Startups Systematic Literature Review Software Development Life-cycle

AI-generated Key Points

  • Objective: Understand software engineering practices followed by machine learning (ML) startups and identify any additional needs
  • Method: Conducted a systematic literature review on 37 papers published in the last 21 years, focusing on general software startups and ML startups
  • Phases of software development life-cycle studied: requirement engineering, design, development, quality assurance, and deployment
  • Database search performed in IEEE Xplore and ACM Digital Library using terms related to ML startups; alternate terminologies like "Deep learning" and "Artificial Intelligence" used to enrich the database for ML startups
  • Metadata collected from database search including title, author names, abstracts, published year, URL citations, etc.
  • Web scraper developed using BeautifulSoup library in Python to extract metadata from ACM Digital Library search results
  • Deduplication performed using Pandas package in Python; duplicate papers dropped based on metadata; RegEx used to drop papers without required keywords in abstracts
  • Snowballing techniques employed to increase number of papers for ML startups; relevant paper citations checked for additional relevant papers based on inclusion-exclusion criteria; added twenty-one papers to the database for ML startups
  • Manual selection process conducted to finalize papers for analysis
  • Total of 92 papers collected: 72 belonged to general software startups while 20 belonged to ML startups distributed among different phases of software development life cycle.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Bishal Lakha, Kalyan Bhetwal, Nasir U. Eisty

Accepted at the 21st IEEE/ACIS International Conference on Software Engineering Research, Management and Applications (SERA 2023)
License: CC BY 4.0

Abstract: Context: On top of the inherent challenges startup software companies face applying proper software engineering practices, the non-deterministic nature of machine learning techniques makes it even more difficult for machine learning (ML) startups. Objective: Therefore, the objective of our study is to understand the whole picture of software engineering practices followed by ML startups and identify additional needs. Method: To achieve our goal, we conducted a systematic literature review study on 37 papers published in the last 21 years. We selected papers on both general software startups and ML startups. We collected data to understand software engineering (SE) practices in five phases of the software development life-cycle: requirement engineering, design, development, quality assurance, and deployment. Results: We find some interesting differences in software engineering practices in ML startups and general software startups. The data management and model learning phases are the most prominent among them. Conclusion: While ML startups face many similar challenges to general software startups, the additional difficulties of using stochastic ML models require different strategies in using software engineering practices to produce high-quality products.

Submitted to arXiv on 04 Apr. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2304.01523v1

The objective of this study was to understand the software engineering practices followed by machine learning (ML) startups and identify any additional needs. To achieve this goal, a systematic literature review was conducted on 37 papers published in the last 21 years, focusing on both general software startups and ML startups. The data collected aimed to understand software engineering practices in five phases of the software development life-cycle: requirement engineering, design, development, quality assurance, and deployment. Initially, a database search was performed in IEEE Xplore and ACM Digital Library using terms related to ML startups. As an alternative approach, alternate terminologies for machine learning such as "Deep learning" and "Artificial Intelligence" were used to enrich the database for ML startups. This resulted in a few additional useful papers. Metadata including title, author names, abstracts, published year, URL citations and other details were collected from the database search and stored in a CSV file. This allowed for deduplication and validation of papers as well as manual filtering based on abstracts. While IEEE Xplore had built-in features for exporting metadata as a CSV file; ACM Digital Library and Google Scholar did not have these features. Therefore; a web scraper was developed using BeautifulSoup library in Python to extract metadata from ACM Digital Library search results. After collecting metadata from various sources for both general software startups and ML startups; deduplication was performed using Pandas package in Python. Duplicate papers were dropped from the list based on their metadata; additionally RegEx (regular expressions) was used to identify papers where the required keywords were not present in the abstracts; these papers were also dropped from the list. To increase the number of papers for ML startups since there were very few initially found through database search alone; snowballing techniques were employed. Citations of relevant papers were checked to identify any additional relevant papers based on inclusion-exclusion criteria; this resulted in the addition of twenty-one papers to the database for ML startups. After de-duplication; validation; and snowballing; a manual selection process was conducted to finalize the papers for analysis. Out of 92 total papers collected 72 belonged to general software startups while 20 belonged to ML startups distributed among different phases of software development life cycle.
Created on 19 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.