Skip to content
View liutongxuan's full-sized avatar

Organizations

@DeepRec-AI

Block or report liutongxuan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The best-benchmarked open-source AI memory system. And it's free.

Python 56,137 7,270 Updated Jun 20, 2026

An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.

Rust 194,140 109,918 Updated Jun 8, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 379,899 79,533 Updated Jun 22, 2026

The Modular Platform (includes MAX & Mojo)

Mojo 26,355 2,844 Updated Jun 22, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 6,527 609 Updated Jun 22, 2026

A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.

C++ 1,354 233 Updated Jun 22, 2026

A flexible serving framework that delivers efficient and fault-tolerant LLM inference for clustered deployments.

C++ 93 32 Updated Jun 22, 2026

[ACL 2026] OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction https://arxiv.org/abs/2604.25602

Python 1,969 268 Updated Jun 22, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,519 6,649 Updated Jun 22, 2026

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 83 8 Updated Apr 26, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,709 1,063 Updated Apr 30, 2026

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 5,303 1,815 Updated Feb 26, 2025

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 4,347 843 Updated Jun 22, 2026

Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"

Python 260 27 Updated Jan 31, 2025

EMMA [TMLR 2025]

Python 14 1 Updated Sep 25, 2025

Build, run, and manage agent platforms.

Python 40,797 5,548 Updated Jun 22, 2026

Production-ready platform for agentic workflow development.

TypeScript 146,131 22,981 Updated Jun 22, 2026

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 21,657 2,311 Updated Apr 15, 2026

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.

Python 1,787 670 Updated Feb 24, 2026

A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone

Python 25,674 2,008 Updated Jun 4, 2026

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 1,373 73 Updated Jan 27, 2026

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 19,384 2,482 Updated May 30, 2026

Dynamic Memory Management for Serving LLMs without PagedAttention

C 498 42 Updated Jun 10, 2026

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 550 90 Updated Sep 8, 2024

Vane is an AI-powered answering engine.

TypeScript 35,406 3,900 Updated Apr 11, 2026

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 161,788 33,567 Updated Jun 22, 2026

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,721 389 Updated Apr 9, 2026

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,881 2,773 Updated Aug 12, 2024
Next