Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- The paper addresses the problem of sign language translation
- Previous research has shown that using a mid-level sign gloss representation improves translation performance
- The authors propose a novel transformer-based architecture for simultaneous sign language recognition and translation
- They utilize a Connectionist Temporal Classification (CTC) loss to bind the recognition and translation problems into a unified architecture
- The joint approach does not rely on ground-truth timing information
- The approach is evaluated on the RWTH-PHOENIX-Weather-2014T dataset, achieving state-of-the-art results for both sign language recognition and translation tasks
- Sign Language Transformers outperform existing models for translating sign video to spoken language and gloss to spoken language translations
- Translation networks achieve more than double the performance with a BLEU-4 Score of 21.80 compared to 9.58 in some cases
- New baseline translation results using transformer networks for various text-to-text sign language translation tasks are presented
- The proposed approach demonstrates significant improvements in both sign language recognition and translation tasks, with potential applications in bridging communication gaps between deaf individuals who use sign language and those who do not understand it.
Authors: Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, Richard Bowden
Abstract: Prior work on Sign Language Translation has shown that having a mid-level sign gloss representation (effectively recognizing the individual signs) improves the translation performance drastically. In fact, the current state-of-the-art in translation requires gloss level tokenization in order to work. We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation while being trainable in an end-to-end manner. This is achieved by using a Connectionist Temporal Classification (CTC) loss to bind the recognition and translation problems into a single unified architecture. This joint approach does not require any ground-truth timing information, simultaneously solving two co-dependant sequence-to-sequence learning problems and leads to significant performance gains. We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T) dataset. We report state-of-the-art sign language recognition and translation results achieved by our Sign Language Transformers. Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models, in some cases more than doubling the performance (9.58 vs. 21.80 BLEU-4 Score). We also share new baseline translation results using transformer networks for several other text-to-text sign language translation tasks.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.