Skip to content
View alexzms's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@FoundationResearch

Block or report alexzms

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Accelerating MoE with IO and Tile-aware Optimizations

Python 582 52 Updated Feb 6, 2026

Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

Python 1,838 197 Updated Oct 4, 2025

[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation

Python 581 32 Updated Nov 11, 2025

NVIDIA FastGen: Fast Generation from Diffusion Models

Python 527 29 Updated Jan 28, 2026

d3LLM: Ultra-Fast Diffusion LLM 🚀

Python 91 2 Updated Feb 4, 2026

Jacobi Forcing: Fast and Accurate Diffusion-style Decoding

Python 154 6 Updated Jan 3, 2026

[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning

Python 65 6 Updated Oct 31, 2025

A feature-rich command-line audio/video downloader

Python 146,756 11,888 Updated Feb 9, 2026

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 32,752 6,757 Updated Feb 12, 2026

Flash Multi-Head Feed-Forward Networks

Python 3 Updated Dec 24, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,637 145 Updated Feb 12, 2026

Helpful kernel tutorials and examples for tile-based GPU programming

Python 639 45 Updated Feb 12, 2026

PCL 社区版 由社区开发者维护与管理

Visual Basic .NET 3,558 54 Updated Feb 11, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 70,111 13,396 Updated Feb 12, 2026

Introduction to Machine Learning Systems

JavaScript 18,115 2,123 Updated Feb 11, 2026

🚀🚀 Efficient implementations of Native Sparse Attention

Python 1,044 13 Updated Sep 29, 2025

Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch

Python 1,934 203 Updated Feb 9, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 23,520 4,411 Updated Feb 12, 2026

Keras implement of Finite Scalar Quantization

Python 84 5 Updated Oct 31, 2023

Aims to teach Python3 by example

Python 573 70 Updated May 14, 2020

Optimized primitives for collective multi-GPU communication

C++ 4,445 1,134 Updated Feb 3, 2026

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 3,682 220 Updated Feb 9, 2026

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 753 85 Updated Aug 14, 2025

A unified inference and post-training framework for accelerated video generation.

Python 3,076 261 Updated Feb 10, 2026

Tile primitives for speedy kernels

Cuda 3,138 237 Updated Feb 10, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,161 447 Updated Feb 11, 2026
Next