Skip to content
View Lyxien's full-sized avatar

Block or report Lyxien

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.

Python 523 65 Updated Jun 17, 2026
Python 6,955 925 Updated Jun 12, 2026

将冰冷的离别化为温暖的 Skill,欢迎加入数字生命1.0!Transforming cold farewells into warm skills? It's giving rebirth era. Welcome to Digital Life 1.0. 🫶

Python 19,649 1,943 Updated Jun 1, 2026

An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.

Rust 194,120 109,928 Updated Jun 8, 2026

Sample codes for my CUDA programming book

Cuda 2,072 388 Updated Dec 14, 2025

[CVPR2026] BinaryAttention: One-Bit QK-Attention for Vision and Diffusion Transformers

Python 37 3 Updated Mar 17, 2026

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 788 95 Updated Aug 14, 2025

NVFP4 Flash-Attention 4 on BlackWell

Python 13 1 Updated Jun 21, 2026

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 314 23 Updated May 31, 2026

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,418 143 Updated Mar 19, 2026

let coding agents use ncu skills analysis cuda program automatically!

Shell 112 8 Updated May 25, 2026

Machine Learning Engineering Open Book

Python 18,156 1,152 Updated May 18, 2026

OpenLovart 是一个基于 AI 的设计平台,让创意设计变得简单而强大。通过 AI 对话和智能画布,快速实现你的设计想法。

TypeScript 261 75 Updated Jan 23, 2026

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 4,295 333 Updated Jun 13, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,536 265 Updated Jun 17, 2026

flex-block-attn: an efficient block sparse attention computation library

Jupyter Notebook 130 14 Updated Dec 26, 2025

HunyuanVideo-1.5: A leading lightweight video generation model

Python 4,492 229 Updated Apr 10, 2026

Learn CUDA with PyTorch

Cuda 336 50 Updated Jun 1, 2026

LLM training in simple, raw C/CUDA

Cuda 30,286 3,658 Updated Jun 26, 2025
Cuda 2 Updated Sep 22, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,482 18,291 Updated Jun 21, 2026
9 Updated Nov 10, 2025
C++ 3 3 Updated Nov 5, 2023

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

Python 569 27 Updated Jun 13, 2026

Large Language Model (LLM) Systems Paper List

2,141 110 Updated Jun 21, 2026

A curated list of recent efficient video generation methods.

72 3 Updated Oct 7, 2025

Puzzles for learning Triton, play it with minimal environment configuration!

Python 2 Updated Aug 6, 2025
Next