Sequence Modeling · Carnegie Mellon University
Mamba: A Serious Attempt to Challenge Attention on Long Sequences
Mamba makes state space models selective, letting them decide what to remember or forget from the input while scaling linearly with sequence length.