Stars
Implementation of the dynamic chunking mechanism in H-net by Hwang et al. of Carnegie Mellon
Complete, HSK 2.0/3.0 (汉语水平考试) Vocabulary Lists in Json
H-Net: Hierarchical Network with Dynamic Chunking
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
Muon is an optimizer for hidden layers in neural networks
Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)
biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
What would you do with 1000 H100s...
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
A Triton Kernel for incorporating Bi-Directionality in Mamba2
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
Tile primitives for speedy kernels
Repository of Transformer based PyTorch Time Series Models
Annotated version of the Mamba paper
UT-Sarulab MOS prediction system using SSL models
RNA-seq prediction with deep convolutional neural networks.
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
A beautiful, simple, clean, and responsive Jekyll theme for academics
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Understand and test language model architectures on synthetic tasks.
A high-throughput and memory-efficient inference and serving engine for LLMs
📋 A list of open LLMs available for commercial use.
Some preliminary explorations of Mamba's context scaling.
CUDA Templates and Python DSLs for High-Performance Linear Algebra