gty111

Follow

🎯

Focusing is all you need

Tianyu Guo gty111

🎯

Focusing is all you need

Follow

Ph.D. student of Sun Yat-Sen University, prior intern @Tencent. Simulators, GPU, architecture, AI Infra, MLSys

111 followers · 107 following

Sun Yat-sen University
Guangzhou
22:45 (UTC +08:00)
https://gty111.github.io/info/
https://orcid.org/0009-0005-2979-4486

Achievements

Achievements

Highlights

Pro

Lists (19)

Sort

AI

46 repositories

Benchmark

14 repositories

Compiler & DSL

19 repositories

CV & CG

10 repositories

Diffusion

Framework

38 repositories

Hardware

HPC

42 repositories

Instrumention&Reverse&Assemble

LAB

Math

NLP

85 repositories

Operating Systems

Recommendation

ROCM

32 repositories

Simulators

11 repositories

Template & Theme

Tools

38 repositories

Tutorial & Examples

45 repositories

Stars

gty111 / gLLM

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

Python 41 3 Updated Sep 29, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,800 909 Updated Sep 30, 2025

LLaVA-VL / LLaVA-NeXT

Python 4,296 408 Updated Sep 14, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,149 610 Updated Oct 10, 2025

ML-GSAI / LLaDA

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,019 201 Updated Sep 30, 2025

kserve / kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Python 4,629 1,268 Updated Oct 10, 2025

llm-d / llm-d

llm-d enables high-performance distributed LLM inference on Kubernetes

Makefile 1,860 189 Updated Oct 9, 2025

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,195 182 Updated Mar 27, 2024

MoonshotAI / Kimi-K2

Kimi K2 is the large language model series developed by Moonshot AI team

8,317 548 Updated Sep 11, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,157 2,523 Updated Oct 10, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,874 133 Updated Oct 10, 2025

mit-han-lab / duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 494 34 Updated Feb 10, 2025

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 19,476 1,623 Updated Sep 30, 2025

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,508 630 Updated Oct 10, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,161 103 Updated Oct 2, 2025

HumanMLLM / LLaVA-Scissor

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 109 1 Updated Jul 1, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,502 238 Updated Oct 8, 2025

flexflow / flexflow-serve

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 63 5 Updated Sep 15, 2025

Gen-Verse / MMaDA

[NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,422 71 Updated Sep 19, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,868 304 Updated Mar 10, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,102 143 Updated Mar 21, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,276 196 Updated Mar 24, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,590 948 Updated Oct 10, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,152 1,091 Updated Oct 10, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,921 284 Updated May 15, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 3,832 232 Updated Oct 6, 2025

ikatyang / emoji-cheat-sheet

A markdown version emoji cheat sheet

TypeScript 13,386 4,563 Updated Oct 10, 2025

ollama / ollama

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

Go 153,850 13,360 Updated Oct 10, 2025

sgl-project / sgl-learning-materials

Materials for learning SGLang

597 48 Updated Oct 1, 2025

LLMServe / SwiftTransformer

High performance Transformer implementation in C++.

C++ 135 16 Updated Jan 18, 2025