Skip to content
View scv119's full-sized avatar
⌨️
to become a better human being
⌨️
to become a better human being
  • Anyscale
  • United States

Block or report scv119

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 532 100 Updated Jun 22, 2026

Hand-Rolled GPU communications library

Cuda 95 7 Updated Nov 25, 2025

Lightweight coding agent that runs in your terminal

Rust 92,598 13,689 Updated Jun 22, 2026

You like pytorch? You like micrograd? You love tinygrad! ❤️

Python 33,138 4,197 Updated Jun 22, 2026

Fast, flexible LLM inference

Rust 7,342 629 Updated Jun 22, 2026

LLM training in simple, raw C/CUDA

Cuda 30,298 3,658 Updated Jun 26, 2025

《Machine Learning Systems: Design and Implementation》 (V2 is launching soon)

TeX 4,811 477 Updated Mar 15, 2026

Rust for C++ programmers

Rust 3,857 296 Updated Jun 17, 2026

Grok open release

Python 51,691 8,472 Updated Aug 30, 2024

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 16,533 1,565 Updated May 26, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,514 6,646 Updated Jun 22, 2026

how to optimize some algorithm in cuda.

Cuda 3,093 279 Updated Jun 21, 2026

Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

PostScript 21,588 2,563 Updated Jun 30, 2025

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,224 573 Updated Aug 22, 2025

A high-performance inference system for large language models, designed for production environments.

C++ 500 41 Updated Dec 19, 2025

FlashInfer: Kernel Library for LLM Serving

Python 5,835 1,066 Updated Jun 22, 2026

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,760 326 Updated Oct 19, 2024

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,933 2,484 Updated Jun 22, 2026

Serving multiple LoRA finetuned LLM as one

Python 1,163 63 Updated May 8, 2024
Cuda 1 Updated Sep 16, 2023

Fast inference from large lauguage models via speculative decoding

Python 916 96 Updated Aug 22, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 4,134 334 Updated Jun 22, 2026

row-major matmul optimization

C++ 733 94 Updated May 14, 2026

CUDA templates for tile-sparse matrix multiplication based on CUTLASS.

C++ 52 4 Updated Mar 1, 2018

JAX implementation of the Llama 2 model

Python 217 24 Updated Feb 2, 2024

Checkpoint/Restore tool

C 3,879 749 Updated Jun 21, 2026
Next