Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network

AI-generated keywords: CRNN Tracking DOA estimation Localization accuracy Source detection

AI-generated Key Points

  • This paper explores joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN).
  • The CRNN model is adapted to enable spatial tracking of moving sources when trained with dynamic scenes.
  • Performance of the CRNN is compared with a stand-alone tracking method that combines a multi-source estimator and a particle filter.
  • Experiments evaluate performance in various acoustic conditions including anechoic and reverberant scenarios, stationary and moving sources at different angular velocities, and varying numbers of overlapping sources.
  • The CRNN consistently tracks multiple sources more effectively than the parametric method across different acoustic scenarios but has higher localization error.
  • The parametric method achieves improved direction-of-arrival (DOA) estimation when combined with a temporal particle filter tracker but suffers from lower frame recall.
  • Using maximum likelihood estimation instead of reference information for source number estimation reduces the overall performance of the parametric approach, particularly in reverberant and moving source scenarios.
  • Both methods have trade-offs: CRNN outperforms in consistent tracking but struggles with accurate localization, while the parametric method achieves better DOA estimation but has decreased frame recall in certain scenarios.
  • Consideration should be given to both tracking consistency and localization accuracy when choosing between these two methods.
  • Recurrent layers within a CRNN architecture can achieve effective tracking of multiple sound sources, but further improvements are needed for localization accuracy and robustness in challenging acoustic conditions.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Sharath Adavanne, Archontis Politis, Tuomas Virtanen

License: CC BY-NC-SA 4.0

Abstract: This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN). We use a CRNN previously proposed for the localization and detection of stationary sources, and show that the recurrent layers enable the spatial tracking of moving sources when trained with dynamic scenes. The tracking performance of the CRNN is compared with a stand-alone tracking method that combines a multi-source (DOA) estimator and a particle filter. Their respective performance is evaluated in various acoustic conditions such as anechoic and reverberant scenarios, stationary and moving sources at several angular velocities, and with a varying number of overlapping sources. The results show that the CRNN manages to track multiple sources more consistently than the parametric method across acoustic scenarios, but at the cost of higher localization error.

Submitted to arXiv on 29 Apr. 2019

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1904.12769v1

This paper explores the joint localization, detection and tracking of sound events using a convolutional recurrent neural network (CRNN). The CRNN model, previously proposed for localizing and detecting stationary sources, is adapted to enable the spatial tracking of moving sources when trained with dynamic scenes. The performance of the CRNN is compared with a stand-alone tracking method that combines a multi-source estimator and a particle filter. The study evaluates the performance of both methods in various acoustic conditions including anechoic and reverberant scenarios as well as stationary and moving sources at different angular velocities. Additionally, the experiments consider scenarios with varying numbers of overlapping sources. The results show that the CRNN consistently tracks multiple sources more effectively than the parametric method across different acoustic scenarios. However, it does come at the cost of higher localization error. The parametric method exhibits improved direction-of-arrival (DOA) estimation when combined with a temporal particle filter tracker but suffers from lower frame recall. Further analysis reveals that using maximum likelihood estimation instead of reference information for source number estimation reduces the overall performance of the parametric approach. This reduction is particularly evident in reverberant and moving source scenario datasets, highlighting the need for more robust source detection and counting schemes. Overall, while the CRNN outperforms the parametric method in terms of consistent tracking performance, it struggles with accurate localization. On the other hand, although the parametric method achieves better DOA estimation when combined with a particle filter tracker its frame recall decreases significantly in certain scenarios. These findings emphasize the importance of considering both tracking consistency and localization accuracy when choosing between these two methods. In conclusion, this study demonstrates that by leveraging recurrent layers within a CRNN architecture it is possible to achieve effective tracking of multiple sound sources. However further improvements are needed to enhance localization accuracy and robustness in challenging acoustic conditions.
Created on 26 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.