Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach

AI-generated keywords: Autism Machine Learning Speech Audio Diagnosis Home Environment

AI-generated Key Points

Autism spectrum disorder (ASD) affects behavior, social development, and communication patterns.
Prevalence of autism has tripled in recent years with 1 in 54 children now affected.
Traditional diagnosis is lengthy and labor-intensive, leading to the development of systems that can automatically screen for autism.
Prosody abnormalities are clear signs of autism, including echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns.
Researchers present a suite of machine learning approaches to detect autism in self-recorded speech audio captured from autistic and neurotypical children in home environments.
Three methods were considered: Random Forests trained on extracted audio features, convolutional neural networks (CNNs) trained on spectrograms, and fine-tuned wav2vec 2.0 - a state-of-the-art Transformer-based ASR model.
The classifiers were trained on a novel dataset of cellphone-recorded child speech audio curated from Stanford's Guess What? mobile game.
The Random Forest classifier achieved 70% accuracy, the fine-tuned wav2vec 2.0 model achieved 77% accuracy, and the CNN achieved 79% accuracy when classifying children's audio as either ASD or NT.
Models were able to predict autism status when training on a varied selection of home audio clips with inconsistent recording quality which may be more generalizable to real-world conditions.
Future work could involve expanding the dataset used for training the classifiers or exploring other machine learning techniques to improve accuracy.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Nathan A. Chi, Peter Washington, Aaron Kline, Arman Husic, Cathy Hou, Chloe He, Kaitlyn Dunlap, Dennis Wall

arXiv: 2201.00927v1 - DOI (cs.SD)

17 pages, 4 figures, submitted to JMIR Pediatrics and Parenting

License: CC BY 4.0

Abstract: Autism spectrum disorder (ASD) is a neurodevelopmental disorder which results in altered behavior, social development, and communication patterns. In past years, autism prevalence has tripled, with 1 in 54 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process, significant attention has been given to developing systems that automatically screen for autism. Prosody abnormalities are among the clearest signs of autism, with affected children displaying speech idiosyncrasies including echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns. In this work, we present a suite of machine learning approaches to detect autism in self-recorded speech audio captured from autistic and neurotypical (NT) children in home environments. We consider three methods to detect autism in child speech: first, Random Forests trained on extracted audio features (including Mel-frequency cepstral coefficients); second, convolutional neural networks (CNNs) trained on spectrograms; and third, fine-tuned wav2vec 2.0--a state-of-the-art Transformer-based ASR model. We train our classifiers on our novel dataset of cellphone-recorded child speech audio curated from Stanford's Guess What? mobile game, an app designed to crowdsource videos of autistic and neurotypical children in a natural home environment. The Random Forest classifier achieves 70% accuracy, the fine-tuned wav2vec 2.0 model achieves 77% accuracy, and the CNN achieves 79% accuracy when classifying children's audio as either ASD or NT. Our models were able to predict autism status when training on a varied selection of home audio clips with inconsistent recording quality, which may be more generalizable to real world conditions. These results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment.

Submitted to arXiv on 04 Jan. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2201.00927v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

Autism spectrum disorder (ASD) is a neurodevelopmental disorder that affects behavior, social development, and communication patterns. The prevalence of autism has tripled in recent years, with 1 in 54 children now affected. Traditional diagnosis is a lengthy and labor-intensive process, which has led to significant attention being given to developing systems that can automatically screen for autism. Prosody abnormalities are among the clearest signs of autism, with affected children displaying speech idiosyncrasies such as echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns. In this study, researchers present a suite of machine learning approaches to detect autism in self-recorded speech audio captured from autistic and neurotypical (NT) children in home environments. They consider three methods: Random Forests trained on extracted audio features (including Mel-frequency cepstral coefficients), convolutional neural networks (CNNs) trained on spectrograms, and fine-tuned wav2vec 2.0—a state-of-the-art Transformer-based ASR model. The classifiers were trained on a novel dataset of cellphone-recorded child speech audio curated from Stanford's Guess What? mobile game—an app designed to crowdsource videos of autistic and neurotypical children in a natural home environment. The Random Forest classifier achieved 70% accuracy, the fine-tuned wav2vec 2.0 model achieved 77% accuracy, and the CNN achieved 79% accuracy when classifying children's audio as either ASD or NT. One strength of this study is that the models were able to predict autism status when training on a varied selection of home audio clips with inconsistent recording quality which may be more generalizable to real world conditions. Additionally, the researchers noted that they were able to conduct their experiment in an unobtrusive way without specialized equipment. Future work could involve expanding the dataset used for training the classifiers or exploring other machine learning techniques to improve accuracy. Overall these results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment which could potentially lead to earlier and more accurate diagnoses for autism spectrum disorder.

- Autism spectrum disorder (ASD) affects behavior, social development, and communication patterns.
- Prevalence of autism has tripled in recent years with 1 in 54 children now affected.
- Traditional diagnosis is lengthy and labor-intensive, leading to the development of systems that can automatically screen for autism.
- Prosody abnormalities are clear signs of autism, including echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns.
- Researchers present a suite of machine learning approaches to detect autism in self-recorded speech audio captured from autistic and neurotypical children in home environments.
- Three methods were considered: Random Forests trained on extracted audio features, convolutional neural networks (CNNs) trained on spectrograms, and fine-tuned wav2vec 2.0 - a state-of-the-art Transformer-based ASR model.
- The classifiers were trained on a novel dataset of cellphone-recorded child speech audio curated from Stanford's Guess What? mobile game.
- The Random Forest classifier achieved 70% accuracy, the fine-tuned wav2vec 2.0 model achieved 77% accuracy, and the CNN achieved 79% accuracy when classifying children's audio as either ASD or NT.
- Models were able to predict autism status when training on a varied selection of home audio clips with inconsistent recording quality which may be more generalizable to real-world conditions.
- Future work could involve expanding the dataset used for training the classifiers or exploring other machine learning techniques to improve accuracy.

Autism is a condition that affects how people behave, communicate, and make friends. More kids have autism now than before. Doctors can use machines to help diagnose autism faster. Some sounds that people with autism make are different from other people's sounds. Scientists made computer programs that can listen to kids talking and tell if they have autism or not. They tested the programs on recordings of kids talking at home and they were pretty good at telling who had autism and who didn't. They might need more recordings to make the programs even better in the future. Definitions- Autism spectrum disorder (ASD): A condition that affects behavior, social development, and communication patterns. - Prevalence: The number of cases of a particular disease or condition present in a population at a given time. - Diagnosis: The process of identifying a disease or condition by examining someone's symptoms. - Prosody abnormalities: Differences in the way someone speaks, including pitch, tone, rhythm, and stress patterns. - Machine learning approaches: Computer algorithms designed to learn from data and improve their performance over time without being explicitly programmed.

Using Machine Learning to Automatically Detect Autism from Speech

Background

Prosody abnormalities are among the clearest signs of autism, with affected children displaying speech idiosyncrasies such as echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns. To address this issue, researchers have developed various machine learning techniques for detecting these prosodic features automatically from speech recordings. The study discussed here focuses on three different methods: Random Forests trained on extracted audio features (including Mel-frequency cepstral coefficients), convolutional neural networks (CNNs) trained on spectrograms, and fine-tuned wav2vec 2.0—a state-of-the-art Transformer-based ASR model.

Methods

The classifiers were trained on a novel dataset of cellphone-recorded child speech audio curated from Stanford's Guess What? mobile game—an app designed to crowdsource videos of autistic and neurotypical children in a natural home environment. This dataset was used to train the models so they could accurately classify between ASD or NT based solely on audio recordings without any other information about the speaker or context provided by parents or clinicians.

Results

The Random Forest classifier achieved 70% accuracy when classifying children's audio as either ASD or NT while the fine tuned wav2vec 2.0 model achieved 77% accuracy and CNN achieved 79% accuracy respectively when tested against this same dataset . One strength of this study is that it shows promise even when training on varied selection of home audio clips with inconsistent recording quality which may be more generalizable to real world conditions than traditional lab settings where recordings are taken under controlled conditions with high quality equipment . Additionally , since no specialized equipment was required , it allowed them to conduct their experiment in an unobtrusive way without disrupting participants' daily lives .

Implications & Future Work

Overall these results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment which could potentially lead to earlier and more accurate diagnoses for autism spectrum disorder . However , further work needs to be done before these techniques can be implemented into clinical practice including expanding the dataset used for training the classifiers as well as exploring other machine learning techniques such as transfer learning or deep reinforcement learning algorithms which may improve accuracy even further .

Created on 20 May. 2023

Assess the quality of the AI-generated content by voting

Score: 1

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

51.3%

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classificat…

cs.SD

50.9%

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Ke…

cs.SD

49.7%

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Mod…

eess.AS

49.7%

AraSpot: Arabic Spoken Command Spotting

cs.CL

49.7%

Big Data driven Product Design: A Survey

cs.HC

47.8%

Training a Helpful and Harmless Assistant with Reinforcement Learning from Hu…

cs.CL

46.2%

Hate speech detection using static BERT embeddings

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.