- Central Oregon
-
15:01
(UTC -08:00) - @epwalsh
Lists (15)
Sort Name ascending (A-Z)
Stars
🚀 Efficient implementations of state-of-the-art linear attention models
Ship correct and fast LLM kernels to PyTorch
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Simple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainings
Lightweight yet powerful formatter plugin for Neovim
Primary and community-submitted packages for webinstall.dev
fanshiqing / grouped_gemm
Forked from tgale96/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
Configuration with Dataclasses+YAML+Argparse. Fork of Pyrallis
Elegant easy-to-use neural networks + scientific computing in JAX. https://docs.kidger.site/equinox/
A simple, performant and scalable Jax LLM!
PyTorch emulation library for Microscaling (MX)-compatible data formats
PyTorch building blocks for the OLMo ecosystem
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
GPU programming related news and material links
Ring attention implementation with flash attention
Efficient Triton Kernels for LLM Training
PyTorch implementation of models from the Zamba2 series.
For optimization algorithm research and development.
Tips for Writing a Research Paper using LaTeX
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
PyTorch native quantization and sparsity for training and inference
Simple, safe way to store and distribute tensors