Skip to content
View Conless's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report Conless

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 266 20 Updated Dec 20, 2025

An extremely fast Python type checker and language server, written in Rust.

Python 15,430 163 Updated Dec 20, 2025
Python 1,896 152 Updated Dec 21, 2025
JavaScript 21 Updated Dec 18, 2025

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 692 89 Updated Dec 21, 2025

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 62 2 Updated Dec 2, 2025

Open ABI and FFI for Machine Learning Systems

C++ 257 43 Updated Dec 20, 2025

Improved build system generator for CPython C, C++, Cython and Fortran extensions

Python 526 125 Updated Dec 15, 2025

Open-source implementation of AlphaEvolve

Python 4,933 759 Updated Dec 20, 2025

Perplexity open source garden for inference technology

Rust 307 25 Updated Dec 9, 2025
Jinja 15 2 Updated Dec 4, 2025

A language-model–powered compressor for natural language text

Python 49 2 Updated Oct 23, 2025

Pie: Programmable LLM Serving

Rust 81 11 Updated Dec 20, 2025

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 433 14 Updated Dec 16, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,284 112 Updated Dec 16, 2025
440 9 Updated Aug 10, 2025

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 5,032 503 Updated Dec 20, 2025

KV cache store for distributed LLM inference

C++ 376 32 Updated Nov 13, 2025

Repo for OSDI 2023 paper: "Ship your Critical Section Not Your Data: Enabling Transparent Delegation with TCLocks"

C 21 3 Updated Nov 6, 2024

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 563 120 Updated Dec 18, 2025

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,132 105 Updated Dec 21, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 723 73 Updated Nov 30, 2025

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.

C++ 2,149 222 Updated Dec 21, 2025

WaferLLM: Large Language Model Inference at Wafer Scale

Python 77 11 Updated Oct 31, 2025

[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation

Python 565 31 Updated Nov 11, 2025

Tutorials for NVIDIA CUPTI samples

C++ 45 9 Updated Nov 3, 2025

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Python 1,049 75 Updated Nov 25, 2025
Next