Skip to content
View gcanlin's full-sized avatar
  • Huawei
  • Shenzhen, China
  • 10:53 (UTC +08:00)

Block or report gcanlin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Early-stage Rust drop-in alternative frontend for vLLM

Rust 26 2 Updated Apr 29, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,535 943 Updated Apr 29, 2026

MLIR-based TileLang Ascend Adapter

C++ 10 12 Updated Apr 29, 2026

Notes on AI infrastructure, inference systems, and engineering trade-offs.

Astro 3 Updated Apr 17, 2026

An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale

Python 340 31 Updated Apr 29, 2026

Bridge local AI coding agents (Claude Code, Cursor, Gemini CLI, Codex) to messaging platforms (Feishu/Lark, DingTalk, Slack, Telegram, Discord, LINE, WeChat Work). Chat with your AI dev assistant f…

Go 6,772 635 Updated Apr 28, 2026

Agent2Agent (A2A) is an open protocol enabling communication and interoperability between opaque agentic applications.

Shell 23,507 2,376 Updated Apr 29, 2026

Synchronizing Claude Code conversations across machines

Python 13 Updated Apr 21, 2026

A smarter cd command. Supports all major shells.

Rust 36,179 812 Updated Apr 13, 2026

Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control

Python 229 17 Updated Feb 26, 2026

vLLM Model plugin for the encoder-decoder BART model

Python 11 7 Updated Apr 10, 2026

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…

Python 1,700 159 Updated Apr 29, 2026

The open source coding agent.

TypeScript 152,079 17,514 Updated Apr 30, 2026

Public repository for Agent Skills

Python 126,169 14,787 Updated Apr 23, 2026

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 7,831 490 Updated Feb 10, 2026

DFlash: Block Diffusion for Flash Speculative Decoding

Python 2,423 174 Updated Apr 26, 2026

Terminal based presentation tool

Go 11,483 310 Updated Aug 21, 2024

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 12,996 1,436 Updated Mar 3, 2026

A debugging and profiling tool that can trace and visualize python code execution

Python 7,620 468 Updated Feb 16, 2026

Community maintained hardware plugin for vLLM on Apple Silicon

Python 1,051 112 Updated Apr 29, 2026

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 666 79 Updated Jan 15, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 715 43 Updated Mar 8, 2026

A high-performance and light-weight router for vLLM large scale deployment

Rust 212 74 Updated Apr 29, 2026

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 3,171 496 Updated Apr 29, 2026

A PyTorch-native inference engine with cache, parallelism, quantization for Diffusion Transformers.

Python 1,155 70 Updated Apr 29, 2026

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 33,502 6,955 Updated Apr 29, 2026

A framework for efficient model inference with omni-modality models

Python 4,558 855 Updated Apr 30, 2026

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,334 402 Updated Jan 17, 2026

MLX: An array framework for Apple silicon

C++ 25,860 1,731 Updated Apr 28, 2026

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 502 34 Updated Nov 19, 2025
Next