Skip to content
View crcrpar's full-sized avatar
  • NVIDIA
  • Tokyo
  • 01:11 (UTC +09:00)

Block or report crcrpar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,230 189 Updated Dec 23, 2025

Open ABI and FFI for Machine Learning Systems

C++ 258 43 Updated Dec 23, 2025
Python 152 14 Updated Dec 27, 2024
Cuda 43 10 Updated Dec 10, 2025

C/C++ hooks to integrate with pre-commit

Python 376 82 Updated Mar 20, 2024

Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without un…

Cuda 240 14 Updated Dec 18, 2025

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 4,841 321 Updated Dec 21, 2025

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 175 46 Updated Dec 16, 2025

A Python compiler design toolkit.

Python 459 133 Updated Dec 17, 2025

Triton-based Symmetric Memory operators and examples

Python 67 11 Updated Oct 17, 2025

An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.

Python 48 13 Updated Dec 23, 2025

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 424 48 Updated Dec 20, 2025

Perplexity GPU Kernels

C++ 542 74 Updated Nov 7, 2025

Ship correct and fast LLM kernels to PyTorch

Python 127 15 Updated Dec 18, 2025

Tokamax: A GPU and TPU kernel library.

Python 142 6 Updated Dec 23, 2025

Modern, extensible Python project management

Python 7,042 358 Updated Dec 17, 2025

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 1,927 149 Updated Aug 26, 2025
Python 65 3 Updated Apr 26, 2025

This repository contains the source code for a static website that provides documentation for each "Graph Break" identified by a Graph Break ID (GBID).

Python 4 3 Updated Dec 22, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,288 114 Updated Dec 16, 2025
Python 1,512 219 Updated Jun 26, 2025

Manages Unified Access to Generative AI Services built on Envoy Gateway

Go 1,280 141 Updated Dec 23, 2025

A Quirky Assortment of CuTe Kernels

Python 714 64 Updated Dec 23, 2025

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also …

C++ 356 76 Updated Dec 15, 2025

A syntax-highlighting pager for git, diff, grep, and blame output

Rust 28,453 464 Updated Dec 11, 2025

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

Python 178 15 Updated Dec 23, 2025

TORCH_LOGS parser for PT2

Rust 70 22 Updated Nov 10, 2025

A fast type checker and language server for Python

Rust 5,094 231 Updated Dec 23, 2025

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 694 89 Updated Dec 23, 2025

Universal LLM Deployment Engine with ML Compilation

Python 21,777 1,893 Updated Dec 11, 2025
Next