Highlights
- Pro
Starred repositories
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Strong and Open Vision Language Assistant for Mobile Devices
My learning notes for ML SYS.
[IEEE TCSVT'26] 🂡 AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation
This repository contains low-bit quantization papers from 2020 to 2025 on top conference.
An official implementation of "Scheduling Weight Transitions for Quantization-Aware Training" (ICCV 2025) in PyTorch.
Virtual whiteboard for sketching hand-drawn like diagrams
[Information Fusion 2025] A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
Efficient vision foundation models for high-resolution generation and perception.
EfficientSAM3 compresses SAM3 into lightweight, edge-friendly models via progressive knowledge distillation for fast promptable concept segmentation and tracking.
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
A curated list of foundation models for vision and language tasks
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
yangrudan / CUDA-Learn-Note
Forked from xlite-dev/LeetCUDA🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
This repository offers tools and guidance for fine-tuning the Siglip2 Vision Transformer (ViT) model. It includes scripts and best practices to adapt the model for custom datasets and tasks. Design…
ACL 2025: Synthetic data generation pipelines for text-rich images.