Skip to content
View ocss884's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@camel-ai

Block or report ocss884

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
C++ 17 4 Updated Apr 9, 2026

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 9,066 2,311 Updated Mar 30, 2026

Entropy Based Sampling and Parallel CoT Decoding

Python 3,431 321 Updated Nov 13, 2024

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 35,345 3,512 Updated Apr 9, 2026

Helpful kernel tutorials and examples for tile-based GPU programming

Python 695 59 Updated Apr 9, 2026

Machine Learning Engineering Open Book

Python 17,651 1,119 Updated Mar 16, 2026

FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels

Python 157 122 Updated Apr 3, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,013 130 Updated Apr 4, 2026

Learning TileLang with 10 puzzles!

Python 168 20 Updated Mar 31, 2026

A Quirky Assortment of CuTe Kernels

Python 918 107 Updated Apr 8, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 625 68 Updated Apr 1, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 696 41 Updated Mar 8, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,316 856 Updated Mar 22, 2026

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,264 688 Updated Apr 9, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,361 879 Updated Apr 9, 2026

🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench

Python 134 6 Updated Nov 10, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,475 499 Updated Apr 8, 2026

Fast, Flexible and Portable Structured Generation

C++ 1,620 136 Updated Apr 9, 2026

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 470 21 Updated Apr 9, 2026

PyTorch emulation library for Microscaling (MX)-compatible data formats

Python 351 48 Updated Jun 18, 2025

Multi-GPU CUDA stress test

C++ 2,146 401 Updated Nov 4, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,001 2,061 Updated Mar 27, 2026

Financial data platform for analysts, quants and AI agents.

Python 65,628 6,515 Updated Apr 9, 2026

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 100,741 12,998 Updated Apr 9, 2026

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 768 200 Updated Apr 2, 2026

Fast and memory-efficient exact attention

Python 23,247 2,602 Updated Apr 8, 2026

Visualize and post-hoc analyze RL training for debugging and understanding

TypeScript 6 Updated Jul 23, 2025

An intuitive and low-overhead instrumentation tool for Python

Python 1,203 41 Updated Jul 8, 2025

Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.

Python 287 51 Updated Apr 2, 2026
Next