-
Shanghai Jiao Tong University
- Seattle, WA ⇌ Shanghai, China
-
15:36
(UTC -08:00) - https://conless.dev/
- @conlesspan
Highlights
- Pro
Stars
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
An extremely fast Python type checker and language server, written in Rust.
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
Improved build system generator for CPython C, C++, Cython and Fortran extensions
Open-source implementation of AlphaEvolve
Perplexity open source garden for inference technology
A language-model–powered compressor for natural language text
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
Distributed Compiler based on Triton for Parallel Systems
RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
KV cache store for distributed LLM inference
Repo for OSDI 2023 paper: "Ship your Critical Section Not Your Data: Enabling Transparent Delegation with TCLocks"
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
WaferLLM: Large Language Model Inference at Wafer Scale
[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning