Unsupervised deep learning identifies semantic disentanglement in single inferotemporal neurons
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Deep supervised neural networks are popular for classifying objects in the primate ventral stream
- Interpreting individual neuron responses in the inferotemporal (IT) region is challenging
- A recent study used a deep unsupervised generative model called beta-VAE to model neural responses to faces in macaque IT
- Beta-VAE can "disentangle" sensory data into interpretable latent factors
- The researchers found a correspondence between generative factors identified by the model and those coded by single IT neurons
- Face images could be reconstructed using signals from a small number of cells
- The ventral visual stream may optimize an objective related to disentangling sensory information
- The neural code produced is low-dimensional and semantically interpretable at the single-unit level
- This research provides insights into how neural networks process and represent complex visual information
Authors: Irina Higgins, Le Chang, Victoria Langston, Demis Hassabis, Christopher Summerfield, Doris Tsao, Matthew Botvinick
Abstract: Deep supervised neural networks trained to classify objects have emerged as popular models of computation in the primate ventral stream. These models represent information with a high-dimensional distributed population code, implying that inferotemporal (IT) responses are also too complex to interpret at the single-neuron level. We challenge this view by modelling neural responses to faces in the macaque IT with a deep unsupervised generative model, beta-VAE. Unlike deep classifiers, beta-VAE "disentangles" sensory data into interpretable latent factors, such as gender or hair length. We found a remarkable correspondence between the generative factors discovered by the model and those coded by single IT neurons. Moreover, we were able to reconstruct face images using the signals from just a handful of cells. This suggests that the ventral visual stream may be optimising the disentangling objective, producing a neural code that is low-dimensional and semantically interpretable at the single-unit level.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.