Skip to content
View crcrpar's full-sized avatar
  • NVIDIA
  • Tokyo
  • 08:16 (UTC +09:00)

Block or report crcrpar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
1872 results for source starred repositories
Clear filter

incubator repo for CUDA-TileIR backend

MLIR 101 5 Updated Jan 13, 2026

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

Python 194 19 Updated Feb 7, 2026

MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI training and inference, such as FP8 row-wise quantization and …

Python 49 21 Updated Feb 7, 2026
Python 6 Updated Jul 17, 2025

slime is an LLM post-training framework for RL Scaling.

Python 3,706 502 Updated Feb 5, 2026

JAX support for tvm-ffi abi

C++ 23 3 Updated Dec 10, 2025

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,351 416 Updated Feb 6, 2026

Open ABI and FFI for Machine Learning Systems

C++ 333 56 Updated Feb 7, 2026
Python 159 14 Updated Dec 27, 2024
Cuda 49 10 Updated Dec 10, 2025

C/C++ hooks to integrate with pre-commit

Python 379 82 Updated Mar 20, 2024

Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without un…

Cuda 247 15 Updated Jan 29, 2026

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 4,951 332 Updated Feb 7, 2026

A Python compiler design toolkit.

Python 485 145 Updated Feb 7, 2026

Triton-based Symmetric Memory operators and examples

Python 81 12 Updated Jan 15, 2026

An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.

Python 52 15 Updated Feb 6, 2026

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 462 58 Updated Dec 31, 2025

Perplexity GPU Kernels

C++ 559 75 Updated Nov 7, 2025

Ship correct and fast LLM kernels to PyTorch

Python 141 16 Updated Jan 14, 2026

Tokamax: A GPU and TPU kernel library.

Python 169 11 Updated Feb 7, 2026

Modern, extensible Python project management

Python 7,135 361 Updated Feb 3, 2026

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 2,065 167 Updated Aug 26, 2025
Python 65 3 Updated Apr 26, 2025

This repository contains the source code for a static website that provides documentation for each "Graph Break" identified by a Graph Break ID (GBID).

Python 4 4 Updated Feb 4, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,337 126 Updated Jan 31, 2026
Python 1,525 221 Updated Jun 26, 2025

Manages Unified Access to Generative AI Services built on Envoy Gateway

Go 1,365 162 Updated Feb 7, 2026

A Quirky Assortment of CuTe Kernels

Python 785 79 Updated Feb 7, 2026

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also …

C++ 362 79 Updated Jan 28, 2026

A syntax-highlighting pager for git, diff, grep, and blame output

Rust 28,950 470 Updated Dec 11, 2025
Next