Two-Stream Network for Sign Language Recognition and Translation

AI-generated keywords: Sign language recognition translation dual visual encoder keypoint sequences NeurIPS 2022

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, and Brian Mak introduce a novel approach to sign language recognition and translation
  • Proposed model named TwoStream-SLR utilizes two separate streams to encode raw videos and keypoint sequences for enhanced sign language understanding
  • Techniques such as bidirectional lateral connections, sign pyramid network with auxiliary supervision, and frame-level self-distillation are explored to facilitate interaction between the streams
  • Model demonstrates competence in sign language recognition tasks
  • Extended model called TwoStream-SLT adds an extra translation network component for accurate translation between sign languages
  • Experimental results showcase state-of-the-art performance on SLR and SLT tasks across multiple datasets including Phoenix-2014, Phoenix-2014T, and CSL-Daily
  • Research accepted by a top conference in machine learning and artificial intelligence; code and models publicly available at https://github.com/FangyunWei/SLRT
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, Brian Mak

Accepted by NeurIPS 2022. Code and models are available at: https://github.com/FangyunWei/SLRT

Abstract: Sign languages are visual languages using manual articulations and non-manual elements to convey information. For sign language recognition and translation, the majority of existing approaches directly encode RGB videos into hidden representations. RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding. To mitigate this problem and better incorporate domain knowledge, such as handshape and body movement, we introduce a dual visual encoder containing two separate streams to model both the raw videos and the keypoint sequences generated by an off-the-shelf keypoint estimator. To make the two streams interact with each other, we explore a variety of techniques, including bidirectional lateral connection, sign pyramid network with auxiliary supervision, and frame-level self-distillation. The resulting model is called TwoStream-SLR, which is competent for sign language recognition (SLR). TwoStream-SLR is extended to a sign language translation (SLT) model, TwoStream-SLT, by simply attaching an extra translation network. Experimentally, our TwoStream-SLR and TwoStream-SLT achieve state-of-the-art performance on SLR and SLT tasks across a series of datasets including Phoenix-2014, Phoenix-2014T, and CSL-Daily. Code and models are available at: https://github.com/FangyunWei/SLRT.

Submitted to arXiv on 02 Nov. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2211.01367v2

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "Two-Stream Network for Sign Language Recognition and Translation," authors Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, and Brian Mak introduce a novel approach to sign language recognition and translation. is the process of interpreting visual languages that utilize manual articulations and non-manual elements to convey information. Existing methods often encode RGB videos directly into hidden representations for this purpose. However, RGB videos contain visual redundancy that can cause the encoder to overlook crucial information essential for sign language understanding. To address this issue and enhance the incorporation of domain knowledge such as handshape and body movement, the authors propose a with two separate streams. These streams model both the raw videos and keypoint sequences generated by an off-the-shelf keypoint estimator. To facilitate interaction between the two streams, various techniques are explored including bidirectional lateral connections, a sign pyramid network with auxiliary supervision, and frame-level self-distillation. The resulting model, named TwoStream-SLR, demonstrates competence in . Building upon this success, the authors extend TwoStream-SLR to create a model called TwoStream-SLT by adding an extra translation network component. Experimental results showcase state-of-the-art performance on SLR and SLT tasks across multiple datasets including Phoenix-2014, Phoenix-2014T, and CSL-Daily. This innovative approach not only improves sign language recognition but also enables accurate translation between sign languages. The research conducted by these authors has been accepted by , one of the top conferences in machine learning and artificial intelligence. Their code and models are publicly available at https://github.com/FangyunWei/SLRT.
Created on 24 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.