Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets

AI-generated keywords: Personalized ASR Disordered Speech Adaptation Data Word Error Rate Practical Approach

AI-generated Key Points

  • Study focuses on personalized automatic speech recognition (ASR) for recognizing disordered speech using small amounts of per-speaker adaptation data
  • Lack of available speech data has been a major challenge in adapting speaker-independent ASR systems for dysarthric speech
  • Researchers trained personalized models for 195 individuals with different types and severities of speech impairment
  • Training sets varied in size from less than one minute to 18-20 minutes of speech data per speaker
  • Word error rate (WER) thresholds were used to determine the Success Percentage, representing the percentage of personalized models that achieved the target WER in different application scenarios
  • In the home automation scenario, 79% of speakers reached the target WER when trained with 18-20 minutes of speech data, and even with only 3-4 minutes of data, 63% still reached the target WER
  • Performance on test sets containing conversational and out-of-domain unprompted phrases showed similar improvements
  • Personalized ASR can benefit individuals with disordered speech even with just a few minutes of recordings, which is significant as recording large amounts of samples per speaker is often impractical and challenging for people with speech impairments
  • Previous studies required hours of recorded speech data per speaker for substantial WER improvements, whereas this study shows promising results using significantly smaller amounts of adaptation data
  • Research highlights the potential and feasibility of personalized ASR for individuals with disordered speech, offering valuable insights into optimizing ASR systems for such impairments and providing a more practical approach that can be implemented with limited recording times per speaker.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jimmy Tobin, Katrin Tomanek

Submitted to ICASSP 2022
License: CC BY 4.0

Abstract: This study investigates the performance of personalized automatic speech recognition (ASR) for recognizing disordered speech using small amounts of per-speaker adaptation data. We trained personalized models for 195 individuals with different types and severities of speech impairment with training sets ranging in size from <1 minute to 18-20 minutes of speech data. Word error rate (WER) thresholds were selected to determine Success Percentage (the percentage of personalized models reaching the target WER) in different application scenarios. For the home automation scenario, 79% of speakers reached the target WER with 18-20 minutes of speech; but even with only 3-4 minutes of speech, 63% of speakers reached the target WER. Further evaluation found similar improvement on test sets with conversational and out-of-domain, unprompted phrases. Our results demonstrate that with only a few minutes of recordings, individuals with disordered speech could benefit from personalized ASR.

Submitted to arXiv on 09 Oct. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2110.04612v1

This study focuses on personalized automatic speech recognition (ASR) for recognizing disordered speech using small amounts of per-speaker adaptation data. Previous research has demonstrated promising results in adapting speaker-independent ASR systems for dysarthric speech, however the lack of available speech data has been a major challenge. Recent work has explored the potential of personalizing ASR models for individuals with speech impairments. The researchers trained personalized models for 195 individuals with different types and severities of speech impairment. The training sets varied in size from less than one minute to 18-20 minutes of speech data per speaker. They used word error rate (WER) thresholds to determine the Success Percentage which represents the percentage of personalized models that achieved the target WER in different application scenarios. In the home automation scenario they found that 79% of speakers reached the target WER when trained with 18-20 minutes of speech data and surprisingly even with only 3-4 minutes of speech data 63% still reached the target WER. The researchers also evaluated performance on test sets containing conversational and out-of-domain unprompted phrases and found similar improvements. These results demonstrate that individuals with disordered speech can benefit from personalized ASR even with just a few minutes of recordings which is significant because recording large amounts of samples per speaker is often impractical and challenging for people with speech impairments. While previous studies have reported substantial WER improvements through model personalization they typically required hours of recorded speech data per speaker; one study achieved an average WER improvement of 75% on a large corpus by recording about two hours per speaker whereas this study shows promising results using significantly smaller amounts adaptation data. Overall this research highlights the potential and feasibility of personalized ASR for individuals with disordered speech providing valuable insights into optimizing ASR systems for such impairments as well as offering a more practical approach that can be implemented with limited recording times per speaker.
Created on 03 Aug. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.