- All languages
- Assembly
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Common Workflow Language
- Cuda
- Cython
- Dockerfile
- Fortran
- GAP
- Go
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Jsonnet
- Julia
- Jupyter Notebook
- Kotlin
- Limbo
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- Objective-C
- Objective-C++
- OpenEdge ABL
- PHP
- PLSQL
- Perl
- PostScript
- PureBasic
- Python
- QML
- R
- Roff
- Ruby
- Rust
- SCSS
- Sass
- Scala
- Shell
- SourcePawn
- Svelte
- Swift
- SystemVerilog
- TSQL
- TeX
- Thrift
- TypeScript
- Vim Script
- Vue
- WebAssembly
- Zig
Starred repositories
MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models
Fast and memory-efficient exact kmeans
Build ultra fast, tiny, and cross-platform desktop apps with Typescript.
Pure MLX implementations of UMAP, t-SNE, PaCMAP, TriMap, DREAMS, CNE, MMAE, and NNDescent for Apple Silicon. Metal GPU for computation and video rendering.
UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn.
100M tokens. Infinite compute. Lowest val loss wins.
Various ML tidbits in Python/PyTorch and C++
Official implementation of ViT-5: Vision Transformers for The Mid-2020s
Code and models for the paper: Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
Helpful kernel tutorials and examples for tile-based GPU programming
FLA but cuTile
LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.
An interface library for RL post training with environments.
Qwen3-0.6B megakernel: 527 tok/s decode on RTX 3090 (3.8x faster than PyTorch)
An open source library designed to provide community examples of Joint Embedding Predictive Architectures (JEPAs). It contains code and examples for learning representations from images, video, and…
Bf-Tree is a modern read-write-optimized concurrent larger-than-memory range index in Rust from MS Research.
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
High-performance zero-copy tensor serialization for Fastest Transmission