- All languages
- Assembly
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Common Workflow Language
- Cuda
- Cython
- Dockerfile
- Fortran
- GAP
- Go
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Jsonnet
- Julia
- Jupyter Notebook
- Kotlin
- Limbo
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- Objective-C
- Objective-C++
- OpenEdge ABL
- PHP
- PLSQL
- Perl
- PostScript
- PureBasic
- Python
- QML
- R
- Roff
- Ruby
- Rust
- SCSS
- Sass
- Scala
- Shell
- SourcePawn
- Svelte
- Swift
- SystemVerilog
- TSQL
- TeX
- Thrift
- TypeScript
- Vim Script
- Vue
- WebAssembly
- Zig
Starred repositories
a disk cache for using DuckDB to access Data Lakes (ducklake, iceberg, delta)
⚡ Super fast clustering for high-dimensional vectors on CPUs (x86, ARM) and GPUs — for Python and C++. 100x faster clustering of vector embeddings than FAISS
Fast and memory-efficient exact kmeans
Build ultra fast, tiny, and cross-platform desktop apps with Typescript.
Pure MLX implementations of UMAP, t-SNE, PaCMAP, TriMap, DREAMS, CNE, MMAE, and NNDescent for Apple Silicon. Metal GPU for computation and video rendering.
UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn.
100M tokens. Infinite compute. Lowest val loss wins.
Various ML tidbits in Python/PyTorch and C++
Official implementation of ViT-5: Vision Transformers for The Mid-2020s
Code and models for the paper: Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
Helpful kernel tutorials and examples for tile-based GPU programming
FLA but cuTile
LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.
An interface library for RL post training with environments.
Qwen3-0.6B megakernel: 527 tok/s decode on RTX 3090 (3.8x faster than PyTorch)
An open source library designed to provide community examples of Joint Embedding Predictive Architectures (JEPAs). It contains code and examples for learning representations from images, video, and…
Bf-Tree is a modern read-write-optimized concurrent larger-than-memory range index in Rust from MS Research.