Mamba: Linear-Time Sequence Modeling with Selective State Spaces
AI-generated Key Points
- Mamba is a novel model that improves upon existing Transformer-based models by incorporating selective SSMs for content-based reasoning.
- Mamba achieves state-of-the-art performance across multiple modalities including language, audio, and genomics.
- The Mamba model allows for content-based reasoning and improved performance on various modalities such as language, audio, and genomics.
- The authors compare Mamba to the SaShiMi architecture and training protocols in the context of audio modelling and generation.
- They consider replacing the S4+MLP blocks in SaShiMi with Mamba blocks and provide experiment details in Appendix E.4.
- The evaluation of pretraining quality for audio waveform modality is conducted using the YouTubeMix dataset which consists of 4 hours of solo piano music sampled at a rate of 16000 Hz.
- Standard language modelling setup is followed for pretraining other modalities, evaluating the constant factor log(2) of the standard negative log-likelihood (NLL) loss for these modalities.
- For digit recognition, a dataset comprising 1-second clips sampled at 16000 Hz of digits "zero" through "nine" with highly variable characteristics is used.
- The authors' affiliations are provided as Albert Gu from Carnegie Mellon University's Machine Learning Department and Tri Dao from Princeton University's Department of Computer Science.
Authors: Albert Gu, Tri Dao
Abstract: Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5$\times$ higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.