Skip to content
View Conless's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report Conless

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
180 results for source starred repositories
Clear filter

A Quirky Assortment of CuTe Kernels

Python 781 75 Updated Feb 4, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 818 60 Updated Jan 14, 2026

An extremely fast Python type checker and language server, written in Rust.

Python 17,004 208 Updated Feb 4, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,304 410 Updated Jan 19, 2026
JavaScript 23 Updated Jan 28, 2026

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 739 102 Updated Feb 4, 2026

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 87 3 Updated Dec 2, 2025

Open ABI and FFI for Machine Learning Systems

C++ 330 54 Updated Feb 3, 2026

Improved build system generator for CPython C, C++, Cython and Fortran extensions

Python 527 125 Updated Jan 26, 2026

Open-source implementation of AlphaEvolve

Python 5,311 832 Updated Feb 4, 2026

Perplexity open source garden for inference technology

Rust 359 28 Updated Dec 25, 2025
Jinja 18 2 Updated Dec 4, 2025

A language-model–powered compressor for natural language text

Python 49 2 Updated Oct 23, 2025

Pie: Programmable LLM Serving

Python 121 15 Updated Feb 4, 2026

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 441 15 Updated Feb 4, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,333 124 Updated Jan 31, 2026

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 9,801 835 Updated Jan 30, 2026

KV cache store for distributed LLM inference

C++ 390 34 Updated Nov 13, 2025

Repo for OSDI 2023 paper: "Ship your Critical Section Not Your Data: Enabling Transparent Delegation with TCLocks"

C 21 3 Updated Nov 6, 2024

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 676 154 Updated Feb 3, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,204 118 Updated Feb 4, 2026

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 772 85 Updated Jan 10, 2026

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.

C++ 2,226 227 Updated Feb 4, 2026

WaferLLM: Large Language Model Inference at Wafer Scale

Python 87 11 Updated Jan 7, 2026

[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation

Python 579 32 Updated Nov 11, 2025

Tutorials for NVIDIA CUPTI samples

C++ 50 10 Updated Nov 3, 2025

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Python 1,208 82 Updated Jan 30, 2026
Next