Skip to content

AvivBick/awesome-ssm-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

151 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Awesome State-Space Resources for ML

Contributions are welcome! Please read the contribution guidelines before contributing.

Table of Contents

Tutorials

Blogposts

  1. S4 Series
  2. The Annotated S4
  3. The Annotated S4D
  4. The Annotated Mamba [code]
  5. Mamba: The Easy Way
  6. A Visual Guide to Mamba and State Space Models
  7. State Space Models: A Modern Approach
  8. Mamba No. 5 (A Little Bit Of...)
  9. Mamba: SSM, Theory, and Implementation in Keras and TensorFlow

Videos

  1. Efficiently Modeling Long Sequences with Structured State Spaces
  2. Do we need Attention? A Mamba Primer
  3. Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
  4. MAMBA from Scratch
  5. Yannic Kilcher's Video

Surveys (Structured State Space Models)

  1. Modeling Sequences with Structured State Spaces
  2. State Space Models as Foundation Models: A Control Theoretic Overview
  3. State Space Model for New-Generation Network Alternative to Transformers
  4. A Survey on Visual Mamba
  5. Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
  6. A Survey on Structured State Space Sequence (S4) Models

Books (Classical State Space Models)

  1. Linear State-Space Control Systems
  2. Principles of System Identification Theory and Practice

Foundation

Core Mamba Lineage

  1. Mamba: Linear-Time Sequence Modeling with Selective State Spaces [code]
  2. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality (Mamba-2 / SSD)
  3. Mamba-3: Improved Sequence Modeling using State Space Principles

Theory, Analysis, and Limitations

  1. Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets (ICLR 2023)
  2. Simplifying and Understanding State Space Models with Diagonal Linear RNNs
  3. State-space Models with Layer-wise Nonlinearity are Universal Approximators with Exponential Decaying Memory
  4. Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
  5. Structured state-space models are deep Wiener models
  6. Repeat After Me: Transformers are Better than State Space Models at Copying
  7. Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
  8. Theoretical Foundations of Deep Selective State-Space Models
  9. The Hidden Attention of Mamba Models
  10. The Illusion of State in State-Space Models
  11. The Expressive Capacity of State Space Models: A Formal Language Perspective
  12. An Empirical Study of Mamba-based Language Models
  13. Longhorn: State Space Models are Amortized Online Learners
  14. Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism

Distillation

  1. Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models [code]
  2. The Mamba in the Llama: Distilling and Accelerating Hybrid Models
  3. Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
  4. Retrieval-Aware Distillation for Transformer-SSM Hybrids

Architectures

Core SSM / Mamba Architectures

  1. Long range language modeling via gated state spaces (ICLR 2023)
  2. S5: Simplified State Space Layers for Sequence Modeling (ICLR 2023) [code]
  3. Liquid structural state-space models (ICLR 2023)
  4. Pretraining Without Attention [code]
  5. MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts [code]
  6. BlackMamba: Mixture of Experts for State-Space Models [code]
  7. Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
  8. DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models [code]
  9. S7: Selective and Simplified State Space Layers for Sequence Modeling
  10. Sparsified State-Space Models are Efficient Highway Networks [code]
  11. Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection

Hybrid SSM-Transformer Architectures

  1. Efficient Long Sequence Modeling via State Space Augmented Transformer
  2. Block-State Transformers
  3. Jamba: A Hybrid Transformer-Mamba Language Model
  4. Zamba: A Compact 7B SSM Hybrid Model
  5. Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
  6. Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
  7. Hymba: A Hybrid-head Architecture for Small Language Models
  8. The Zamba2 Suite: Technical Report
  9. Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Related Efficient Sequence Models

Non-SSM or loosely related efficient sequence models commonly compared with SSMs, Mamba, and other recurrent or linear-time architectures.

  1. Mega: Moving Average Equipped Gated Attention
  2. Hyena Hierarchy: Towards Larger Convolutional Language Models
  3. Resurrecting Recurrent Neural Networks for Long Sequences (ICML 2023)
  4. RWKV: Reinventing RNNs for the Transformer Era
  5. Retentive Network: A Successor to Transformer for Large Language Models
  6. Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
  7. xLSTM: Extended Long Short-Term Memory
  8. Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
  9. Gated Delta Networks: Improving Mamba2 with Delta Rule
  10. Raven: High-Recall Sequence Modeling with Sparse Memory Routing [code]
  11. Kimi Linear: An Expressive, Efficient Attention Architecture [code]

SSM Parameterization and Initialization

  1. Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks (NeurIPS 2019)
  2. HiPPO: Recurrent Memory with Optimal Polynomial Projections
  3. Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers (NeurIPS 2021)
  4. Efficiently Modeling Long Sequences with Structured State Spaces (ICLR 2022)
  5. Diagonal State Spaces are as Effective as Structured State Spaces (NeurIPS 2022) [code]
  6. On the Parameterization and Initialization of Diagonal State Space Models (NeurIPS 2022)
  7. How to Train your HIPPO: State Space Models with Generalized Orthogonal Basis Projections (ICLR 2023)
  8. Robustifying State-space Models for Long Sequences via Approximate Diagonalization
  9. StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
  10. Spectral State Space Models
  11. From Generalization Analysis to Optimization Designs for State Space Models (ICML 2024)

Systems Optimizations

  1. Quamba: A Post-Training Quantization Recipe for Selective State Space Models
  2. Marconi: Prefix Caching for the Era of Hybrid LLMs (MLSys 2025)
  3. MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
  4. Mamba Drafters for Speculative Decoding

Vision

  1. Long movie clip classification with state-space video models (ECCV 2022) [code]
  2. S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces (NeurIPS 2022)
  3. Efficient Movie Scene Detection using State-Space Transformers (CVPR 2023)
  4. Selective Structured State-Spaces for Long-Form Video Understanding (CVPR 2023)
  5. 2-D SSM: A General Spatial Layer for Visual Transformers [code]
  6. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model [code]
  7. VMamba: Visual State Space Model [code]
  8. U-shaped Vision Mamba for Single Image Dehazing [code]
  9. State Space Models for Event Cameras [code] (CVPR 2024 Spotlight)
  10. MambaIR: A Simple Baseline for Image Restoration with State-Space Model
  11. Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning [code]
  12. VideoMamba: State Space Model for Efficient Video Understanding
  13. Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM [code]
  14. LocalMamba: Visual State Space Model with Windowed Selective Scan [code]
  15. MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models [code]
  16. ZigMa: Zigzag Mamba Diffusion Model (ECCV 2024) [code] [website]
  17. MambaOut: Do We Really Need Mamba for Vision?
  18. SUM: Saliency Unification through Mamba for Visual Attention Modeling [code]
  19. MambaVision: A Hybrid Mamba-Transformer Vision Backbone
  20. DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding [code]

Language

  1. Hungry Hungry Hippos: Towards Language Modeling with State Space Models (ICLR 2023) [code]
  2. MambaByte: Token-free Selective State Space Model [code]
  3. LOCOST: State-Space Models for Long Document Abstractive Summarization [code]
  4. Falcon Mamba: The First Competitive Attention-free 7B Language Model
  5. Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing
  6. LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement

Audio

  1. It's Raw! Audio Generation with State-Space Models (ICML 2022) [code]
  2. Structured State Space Decoder for Speech Recognition and Synthesis
  3. Diagonal State Space Augmented Transformers for Speech Recognition
  4. Multi-Head State Space Model for Speech Recognition
  5. A Neural State-Space Model Approach to Efficient Speech Separation
  6. Spiking Structured State Space Model for Monaural Speech Enhancement
  7. Augmenting conformers with structured state space models for online speech recognition
  8. Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation [code]
  9. SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model [code]
  10. Audio Mamba: Bidirectional State Space Model for Audio Representation Learning [code]
  11. Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis [code]

Time-Series

  1. Deep Kalman Filters
  2. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data
  3. Structured Inference Networks for Nonlinear State Space Models
  4. Deep State Space Models for Time Series Forecasting (NeurIPS 2018)
  5. FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting (NeurIPS 2022)
  6. Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models
  7. Deep Latent State Space Models for Time-Series Generation (ICML 2023)
  8. Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series
  9. Effectively modeling time series with simple discrete state spaces (ICLR 2023)
  10. Generative AI for End-to-End Limit Order Book Modelling (ICAIF 2023)
  11. On the Performance of Legendre State-Space Models in Short-Term Time Series Forecasting (CCECE 2023)
  12. Is Mamba Effective for Time Series Forecasting?

Medical

  1. Improving the Diagnosis of Psychiatric Disorders with Self-Supervised Graph State Space Models
  2. fMRI-S4: learning short- and long-range dynamic fMRI dependencies using 1D Convolutions and State Space Models
  3. Modeling Multivariate Biosignals with Graph Neural Networks and Structured State Space
  4. Diffusion-based conditional ECG generation with structured state space models
  5. Structured State Space Models for Multiple Instance Learning in Digital Pathology
  6. U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation [code]
  7. SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation [code]
  8. MambaMorph: a Mamba-based Backbone with Contrastive Feature Learning for Deformable MR-CT Registration [code]
  9. Vivim: a Video Vision Mamba for Medical Video Object Segmentation [code]
  10. VM-UNet: Vision Mamba UNet for Medical Image Segmentation
  11. nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model
  12. Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation [code]
  13. MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation
  14. ViM-UNet: Vision Mamba for Biomedical Segmentation (MIDL 2024)
  15. I2I-Mamba: Multi-modal medical image synthesis via selective state space modeling [code]
  16. BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba
  17. MambaRoll: Physics-Driven Autoregressive State Space Models for Medical Image Reconstruction [code]

Genomics / Biology

  1. Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Tabular

  1. MambaTab: A Plug-and-Play Model for Learning Tabular Data
  2. Mambular: A Sequential Model for Tabular Deep Learning

Reinforcement Learning

  1. Structured State Space Models for In-Context Reinforcement Learning (NeurIPS 2023)
  2. Decision S4: Efficient Sequence-Based RL via State Spaces Layers (ICLR 2023)
  3. Mastering Memory Tasks with World Models (ICLR 2024 oral)

Contributions

πŸŽ‰ Thank you for considering contributing to our Awesome State Space Models for Machine Learning repository! πŸš€

Contribute in 3 Steps:

  1. Fork the Repo: Fork this repo to your GitHub account.

  2. Edit Content: Contribute by adding new resources or improving existing content in the README.md file.

  3. Create a Pull Request: Open a pull request (PR) from your branch to the main repository.

Guidelines

  • Follow the existing structure and formatting.
  • Ensure added resources are relevant to State Space Models in Machine Learning.
  • Verify that links work correctly.

Reporting Issues

If you encounter issues or have suggestions, open an issue on the GitHub repository.

Your contributions make this repository awesome! Thank you! πŸ™Œ

License

This project is licensed under the MIT License.

About

Reading list for research topics in state-space models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors