Decoupled Multimodal Distilling for Emotion Recognition

AI-generated keywords: Multimodal emotion recognition

AI-generated Key Points

  • The challenge in human multimodal emotion recognition (MER) is perceiving emotions through language, visual, and acoustic modalities due to heterogeneities and varying contributions.
  • Decoupled Multimodal Distillation (DMD) is a novel approach proposed to enhance the discriminative features of each modality by decoupling their representations into modality-irrelevant and modality-exclusive spaces through a self-regression process.
  • DMD utilizes Graph Distillation Units (GD-Units) for each decoupled part, allowing specialized knowledge distillation with dynamic graph structures for flexible knowledge transfer.
  • Experimental results show that DMD consistently outperforms state-of-the-art MER methods, demonstrating its effectiveness in enhancing emotion recognition accuracy.
  • Implementation details involve extracting unimodal language features using GloVe and BERT-base-uncased pre-trained models, encoding video frames via Facet for facial action unit representation, and processing acoustic modality data.
  • On the CMU-MOSI dataset, DMD achieves superior performance compared to existing methods such as EF-LSTM, LF-LSTM, TFN, LMF, MFM, RAVEN, MCTN, MulT, PMR, MISA*, FDMER*, and MICA*.
  • DMD facilitates adaptive crossmodal knowledge distillation for improved emotion recognition across diverse modalities.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yong Li, Yuanzhi Wang, Zhen Cui

To appear at CVPR 2023, selected as a hightlight, 10% of accepted papers, 2.5% of submissions
License: CC ZERO 1.0

Abstract: Human multimodal emotion recognition (MER) aims to perceive human emotions via language, visual and acoustic modalities. Despite the impressive performance of previous MER approaches, the inherent multimodal heterogeneities still haunt and the contribution of different modalities varies significantly. In this work, we mitigate this issue by proposing a decoupled multimodal distillation (DMD) approach that facilitates flexible and adaptive crossmodal knowledge distillation, aiming to enhance the discriminative features of each modality. Specially, the representation of each modality is decoupled into two parts, i.e., modality-irrelevant/-exclusive spaces, in a self-regression manner. DMD utilizes a graph distillation unit (GD-Unit) for each decoupled part so that each GD can be performed in a more specialized and effective manner. A GD-Unit consists of a dynamic graph where each vertice represents a modality and each edge indicates a dynamic knowledge distillation. Such GD paradigm provides a flexible knowledge transfer manner where the distillation weights can be automatically learned, thus enabling diverse crossmodal knowledge transfer patterns. Experimental results show DMD consistently obtains superior performance than state-of-the-art MER methods. Visualization results show the graph edges in DMD exhibit meaningful distributional patterns w.r.t. the modality-irrelevant/-exclusive feature spaces. Codes are released at \url{https://github.com/mdswyz/DMD}.

Submitted to arXiv on 24 Mar. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2303.13802v1

, , , , In the field of human multimodal emotion recognition (MER), the challenge lies in effectively perceiving emotions through language, visual, and acoustic modalities due to inherent heterogeneities and varying contributions of different modalities. To address this issue, a novel approach called decoupled multimodal distillation (DMD) is proposed in this work. DMD aims to enhance the discriminative features of each modality by decoupling their representations into modality-irrelevant and modality-exclusive spaces through a self-regression process. This approach utilizes graph distillation units (GD-Units) for each decoupled part, allowing for specialized and effective knowledge distillation. Each GD-Unit consists of a dynamic graph where vertices represent modalities and edges indicate dynamic knowledge distillation, enabling flexible knowledge transfer with automatically learned distillation weights. Experimental results demonstrate that DMD consistently outperforms state-of-the-art MER methods, showcasing its effectiveness in enhancing emotion recognition accuracy. Visualization results reveal meaningful distributional patterns in the graph edges of DMD with respect to the modality-irrelevant and modality-exclusive feature spaces. The implementation details involve extracting unimodal language features using GloVe and BERT-base-uncased pre-trained models, encoding video frames via Facet for facial action unit representation, and processing acoustic modality data. Furthermore, on the CMU-MOSI dataset, DMD achieves superior performance compared to existing methods such as EF-LSTM, LF-LSTM, TFN, LMF, MFM, RAVEN, MCTN, MulT, PMR, MISA*, FDMER*, and MICA*. The refined detailed summary highlights the innovative nature of DMD in facilitating adaptive crossmodal knowledge distillation for improved emotion recognition across diverse modalities.
Created on 13 Mar. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.