- Beijing
Stars
The best-benchmarked open-source AI memory system. And it's free.
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.
A flexible serving framework that delivers efficient and fault-tolerant LLM inference for clustered deployments.
[ACL 2026] OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction https://arxiv.org/abs/2604.25602
SGLang is a high-performance serving framework for large language models and multimodal models.
DefTruth / CUDA-Learn-Notes
Forked from xlite-dev/LeetCUDA📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
FlashMLA: Efficient Multi-head Latent Attention Kernels
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
A machine learning compiler for GPUs, CPUs, and ML accelerators
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
Production-ready platform for agentic workflow development.
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Dynamic Memory Management for Serving LLMs without PagedAttention
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Vane is an AI-powered answering engine.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.