Speech Disorder Classification Using Extended Factorized Hierarchical Variational Auto-encoders

AI-generated keywords: Speech Disorders Neural Networks FHVAE Model Representation Learning Classification

AI-generated Key Points

Research focuses on the classification of objective speech disorders in individuals with communication difficulties
Neural networks are proposed for this application due to advancements in speech technology
An extended version of Factorized Hierarchical Variational Auto-encoders (FHVAE) is applied for representation learning on disordered speech
FHVAE model extracts content-related and sequence-related latent variables from speech data
Latent variables are aggregated at both word and sentence levels for improved classification performance
Study demonstrates successful disentanglement of content-related and sequence-related representations, but both are necessary for optimal results in disorder type classification
COPAS database is utilized, containing recordings of Dutch Intelligibility Assessment (DIA) from 319 speakers with various types of speech disorders
Only five types of speakers are selected from the database: control speakers, dysarthria, cleft palate, laryngectomy, and impaired speech secondary to hearing impairment
DIA recordings are segmented into word-level audio pieces based on provided word-level alignment
Data augmentation techniques such as altering speed rate using 0.8x or 1.2x are employed to increase available sample size for training models
Overall research highlights successful application of extended FHVAE model for representation learning in classification of speech disorders, emphasizing importance of both content-related and sequence-related representations

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jinzi Qi, Hugo Van hamme

arXiv: 2106.07337v1 - DOI (eess.AS)

5 pages, 2 figures, submitted to INTERSPEECH2021

License: CC BY 4.0

Abstract: Objective speech disorder classification for speakers with communication difficulty is desirable for diagnosis and administering therapy. With the current state of speech technology, it is evident to propose neural networks for this application. But neural network model training is hampered by a lack of labeled disordered speech data. In this research, we apply an extended version of Factorized Hierarchical Variational Auto-encoders (FHVAE) for representation learning on disordered speech. The FHVAE model extracts both content-related and sequence-related latent variables from speech data, and we utilize the extracted variables to explore how disorder type information is represented in the latent variables. For better classification performance, the latent variables are aggregated at the word and sentence level. We show that an extension of the FHVAE model succeeds in the better disentanglement of the content-related and sequence-related related representations, but both representations are still required for best results on disorder type classification.

Submitted to arXiv on 14 Jun. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2106.07337v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

This research focuses on the classification of objective speech disorders in individuals with communication difficulties. The use of neural networks for this application is proposed due to advancements in speech technology. To address the lack of labeled disordered speech data, an extended version of Factorized Hierarchical Variational Auto-encoders (FHVAE) is applied for representation learning on disordered speech. The FHVAE model extracts both content-related and sequence-related latent variables from speech data which are then utilized to explore how information about the type of disorder is represented in the latent variables. For improved classification performance, the latent variables are aggregated at both the word and sentence levels. The study demonstrates that the extension of the FHVAE model successfully disentangles content-related and sequence-related representations; however, it is found that both representations are still necessary for achieving optimal results in disorder type classification. The COPAS database is utilized which contains recordings of Dutch Intelligibility Assessment (DIA) along with other materials from 319 speakers with various types of speech disorders such as dysarthria, voice disorders, cleft palate, articulation disorders, laryngectomy, glossectomy and impaired speech secondary to hearing impairment. To ensure a balanced dataset with sufficient data sizes only five types of speakers (control speakers, dysarthria, cleft palate, laryngectomy and impaired speech secondary to hearing impairment) are selected from the database consisting each type consists 29 speakers. The DIA recordings are segmented into word-level audio pieces based on provided word-level alignment and due to limited data availability (approximately 1.6 hours), data augmentation techniques such as altering speed rate using 0.8x or 1.2x are employed to increase available samples size for training models. Overall this research highlights successful application of an extended FHVAE model for representation learning in classification of speech disorders emphasizing importance of both content-related and sequence-related representations in achieving accurate disorder type classification results.

- Research focuses on the classification of objective speech disorders in individuals with communication difficulties
- Neural networks are proposed for this application due to advancements in speech technology
- An extended version of Factorized Hierarchical Variational Auto-encoders (FHVAE) is applied for representation learning on disordered speech
- FHVAE model extracts content-related and sequence-related latent variables from speech data
- Latent variables are aggregated at both word and sentence levels for improved classification performance
- Study demonstrates successful disentanglement of content-related and sequence-related representations, but both are necessary for optimal results in disorder type classification
- COPAS database is utilized, containing recordings of Dutch Intelligibility Assessment (DIA) from 319 speakers with various types of speech disorders
- Only five types of speakers are selected from the database: control speakers, dysarthria, cleft palate, laryngectomy, and impaired speech secondary to hearing impairment
- DIA recordings are segmented into word-level audio pieces based on provided word-level alignment
- Data augmentation techniques such as altering speed rate using 0.8x or 1.2x are employed to increase available sample size for training models
- Overall research highlights successful application of extended FHVAE model for representation learning in classification of speech disorders, emphasizing importance of both content-related and sequence-related representations

Researchers are studying how to classify speech disorders in people who have trouble communicating. They use advanced technology called neural networks to help with this. They use a special model called FHVAE to learn about disordered speech. The model looks at different parts of the speech and learns from them. They also use a database of recordings from people with different types of speech disorders to train their models. The research shows that both the content and the order of words are important for classifying speech disorders." Definitions- Classification: Sorting things into groups based on their similarities. - Objective: Something that is based on facts and not personal opinions. - Speech disorders: Problems with speaking or making sounds. - Communication difficulties: Trouble talking or understanding others. - Neural networks: Computer systems that can learn and make decisions like humans. - Advancements: Improvements or progress in something. - Representation learning: Learning about something by looking at its parts or characteristics. - Latent variables: Hidden factors or qualities that affect something but are not easily seen. - Aggregated: Put together or combined into one group. - Database: A collection of organized information stored in a computer system. - Control speakers, dysarthria, cleft palate, laryngectomy, impaired speech secondary to hearing impairment: Different types of speech disorders.

Classifying Speech Disorders with Neural Networks and Representation Learning

Speech disorders can have a significant impact on an individual's ability to communicate. To help diagnose and classify these disorders, researchers are exploring the use of neural networks for this application due to recent advancements in speech technology. In a recent study, an extended version of Factorized Hierarchical Variational Auto-encoders (FHVAE) was used to explore how information about the type of disorder is represented in latent variables extracted from disordered speech data. The results showed that both content-related and sequence-related representations were necessary for achieving optimal results in disorder type classification.

Background

Objective speech disorder classification has been studied extensively over the past few decades; however, there is still a lack of labeled disordered speech data which limits the accuracy of existing methods. To address this issue, representation learning techniques such as FHVAE have been proposed as they can extract meaningful features from unlabeled data which can then be utilized for improved classification performance.

Study Design

The COPAS database was used for this study which contains recordings of Dutch Intelligibility Assessment (DIA) along with other materials from 319 speakers with various types of speech disorders such as dysarthria, voice disorders, cleft palate, articulation disorders, laryngectomy, glossectomy and impaired speech secondary to hearing impairment. To ensure a balanced dataset with sufficient data sizes only five types of speakers (control speakers, dysarthria, cleft palate, laryngectomy and impaired speech secondary to hearing impairment) were selected consisting each type consists 29 speakers. The DIA recordings were segmented into word-level audio pieces based on provided word-level alignment and due to limited data availability (approximately 1.6 hours), data augmentation techniques such as altering speed rate using 0.8x or 1.2x were employed to increase available samples size for training models.

Results

The results demonstrated that the extension of the FHVAE model successfully disentangled content-related and sequence-related representations; however it was found that both representations were still necessary for achieving optimal results in disorder type classification when aggregated at both the word and sentence levels.. This highlights how important it is to consider both content related information as well as sequence related information when classifying different types of speech disorders accurately using neural networks and representation learning techniques like FHVAE models .

Conclusion

In conclusion ,this research highlights successful application of an extended FHVAE model for representation learning in classification of speech disorders emphasizing importance of both content-related and sequence-related representations in achieving accurate disorder type classification results .

Created on 03 Aug. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

54.3%

Self Multi-Head Attention for Speaker Recognition

cs.SD

53.4%

data2vec: A General Framework for Self-supervised Learning in Speech, Vision …

cs.LG

53.2%

Zero-Shot Text-to-Image Generation

cs.CV

53.0%

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Mod…

eess.AS

52.9%

Proficiency assessment of L2 spoken English using wav2vec 2.0

cs.CL

52.6%

BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenizat…

cs.CV

51.2%

A framework for the emergence and analysis of language in social learning age…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.