Skip to content
View ocss884's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@camel-ai

Block or report ocss884

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

From Automated Idea Factory to Realization

Shell 578 41 Updated Apr 30, 2026

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 796 52 Updated Apr 21, 2026
C++ 20 4 Updated Apr 24, 2026

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 9,128 2,321 Updated Mar 30, 2026

Entropy Based Sampling and Parallel CoT Decoding

Python 3,431 321 Updated Nov 13, 2024

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 35,514 3,549 Updated Apr 30, 2026

Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming

Python 710 68 Updated Apr 30, 2026

Machine Learning Engineering Open Book

Python 17,835 1,133 Updated Mar 16, 2026

FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels

Python 161 130 Updated Apr 26, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,034 134 Updated Apr 28, 2026

Learning TileLang with 10 puzzles!

Python 240 29 Updated Apr 28, 2026

A Quirky Assortment of CuTe Kernels

Python 952 123 Updated Apr 30, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 664 80 Updated Apr 30, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 715 43 Updated Mar 8, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,149 955 Updated Apr 24, 2026

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,312 716 Updated Apr 29, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,537 946 Updated Apr 30, 2026

🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench

Python 136 5 Updated Nov 10, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,954 538 Updated Apr 30, 2026

Fast, Flexible and Portable Structured Generation

C++ 1,652 143 Updated Apr 29, 2026

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 482 25 Updated Apr 27, 2026

PyTorch emulation library for Microscaling (MX)-compatible data formats

Python 353 49 Updated Jun 18, 2025

Multi-GPU CUDA stress test

C++ 2,166 402 Updated Nov 4, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,059 2,072 Updated Mar 27, 2026

Financial data platform for analysts, quants and AI agents.

Python 66,793 6,677 Updated Apr 30, 2026

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 102,812 13,409 Updated Apr 30, 2026

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 814 218 Updated Apr 2, 2026

Fast and memory-efficient exact attention

Python 23,602 2,667 Updated Apr 30, 2026

Visualize and post-hoc analyze RL training for debugging and understanding

TypeScript 6 Updated Jul 23, 2025
Next