Skip to content
View yiliu30's full-sized avatar
🌍
Working on site
🌍
Working on site

Block or report yiliu30

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
427 results for source starred repositories
Clear filter

A connect program to connect opencode session to slack

TypeScript 5 Updated Feb 4, 2026

Train transformer language models with reinforcement learning.

Python 17,288 2,471 Updated Feb 5, 2026

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 837 100 Updated Feb 4, 2026
TypeScript 2,559 298 Updated Feb 5, 2026

High Performance LLM Inference Operator Library

C++ 687 56 Updated Feb 5, 2026

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Python 3,571 240 Updated Jan 14, 2026

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 2,165 233 Updated Aug 17, 2024

Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels

Python 112 7 Updated Jul 31, 2023

Implementation of FlashAttention in PyTorch

Python 180 22 Updated Jan 12, 2025

A sparse attention kernel supporting mix sparse patterns

C++ 452 45 Updated Jan 18, 2026

[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

Scala 122 11 Updated Aug 27, 2024

Nano vLLM

Python 11,503 1,525 Updated Nov 3, 2025

Unofficial description of the CUDA assembly (SASS) instruction sets.

Python 200 19 Updated Jul 18, 2025

Nvidia Instruction Set Specification Generator

Python 311 20 Updated Jul 9, 2024

An unofficial cuda assembler, for all generations of SASS, hopefully :)

Python 567 99 Updated Apr 20, 2023

Light Image Video Generation Inference Framework

Python 1,921 156 Updated Feb 5, 2026

vLLM Daily Summarization of Merged PRs

39 3 Updated Feb 4, 2026

[ICLR 2026] rCM: SOTA JVP-Based Diffusion Distillation & Few-Step Video Generation & Scaling Up sCM/MeanFlow

Python 524 21 Updated Feb 5, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,309 227 Updated Jan 29, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 823 60 Updated Jan 14, 2026

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.

Python 249 43 Updated Feb 2, 2026

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 190 21 Updated Feb 5, 2026

A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…

TypeScript 20,778 2,194 Updated Feb 5, 2026

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C 130 5 Updated Nov 26, 2025

Helpful kernel tutorials and examples for tile-based GPU programming

Python 630 44 Updated Feb 5, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,901 110 Updated Feb 3, 2026

A framework for efficient model inference with omni-modality models

Python 2,623 389 Updated Feb 5, 2026

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

Python 194 19 Updated Feb 5, 2026

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,472 1,209 Updated Feb 5, 2026

MLIR-based toolkit targeting intel heterogeneous hardware

C++ 51 16 Updated Feb 5, 2026
Next