-
Anyscale
- United States
Stars
MSCCL++: A GPU-driven communication stack for scalable AI applications
Lightweight coding agent that runs in your terminal
You like pytorch? You like micrograd? You love tinygrad! ❤️
《Machine Learning Systems: Design and Implementation》 (V2 is launching soon)
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
SGLang is a high-performance serving framework for large language models and multimodal models.
how to optimize some algorithm in cuda.
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
A high-performance inference system for large language models, designed for production environments.
FlashInfer: Kernel Library for LLM Serving
A list of awesome compiler projects and papers for tensor computation and deep learning.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Fast inference from large lauguage models via speculative decoding
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
CUDA templates for tile-sparse matrix multiplication based on CUTLASS.