ryantd

🏎️

Xiaoyu Zhai ryantd

🏎️

Senior MLE @kwai, @StevensInstituteOfTechnology Alumni

68 followers · 254 following

Achievements

Organizations

Starred repositories

bytedance / deer-flow

An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…

Python 73,295 9,907 Updated Jun 23, 2026

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,709 1,063 Updated Apr 30, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,631 871 Updated Jun 22, 2026

facebookresearch / lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,760 273 Updated Jul 18, 2025

microsoft / superbenchmark

A validation and profiling tool for AI infrastructure

Python 377 88 Updated Jun 22, 2026

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,638 321 Updated Jun 22, 2026

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 17,313 1,321 Updated Jun 22, 2026

BayesWatch / sequential-imagenet-dataloader

A plug-in replacement for DataLoader to load Imagenet disk-sequentially in PyTorch.

Python 239 20 Updated Aug 18, 2021

jnwatson / py-lmdb

Universal Python binding for the LMDB 'Lightning' Database

C 741 122 Updated Jun 9, 2026

layerism / brpc_faiss_server

Vector Search Engine base on BRPC + FAISS

C++ 152 52 Updated Oct 21, 2019

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 30,302 3,657 Updated Jun 26, 2025

xai-org / grok-1

Grok open release

Python 51,692 8,472 Updated Aug 30, 2024

openai / transformer-debugger

Python 4,123 241 Updated Apr 15, 2026

google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Python 5,697 598 Updated May 30, 2025

vra / flopth

A simple program to calculate and visualize the FLOPs and Parameters of Pytorch models, with handy CLI and easy-to-use Python API.

Python 131 10 Updated Nov 23, 2024

karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 10,583 1,071 Updated Jul 1, 2024

facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,636 796 Updated May 31, 2024

cybertronai / gradient-checkpointing

Make huge neural nets fit in memory

Python 2,840 279 Updated Apr 26, 2020

gpuopenanalytics / pynvml

Provide Python access to the NVML library for GPU diagnostics

Python 270 34 Updated Sep 5, 2025

HazyResearch / aisys-building-blocks

Building blocks for foundation models.

627 27 Updated Jan 3, 2024

RAIVNLab / MRL

Code repository for the paper - "Matryoshka Representation Learning"

Jupyter Notebook 641 43 Updated Feb 19, 2024

cli99 / flops-profiler

pytorch-profiler

Python 49 8 Updated Jun 1, 2023

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,092 88 Updated Sep 4, 2024

hpcaitech / SwiftInfer

Efficient AI Inference & Serving

Python 480 31 Updated Jan 8, 2024

OpenNLPLab / lightning-attention

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Python 346 30 Updated Feb 23, 2025

huggingface / optimum-nvidia

Python 1,037 104 Updated May 26, 2026

opendilab / awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

4,397 254 Updated May 20, 2026

thuml / depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 810 29 Updated Oct 13, 2025

apple / ml-ferret

Python 8,680 519 Updated Oct 9, 2024

Tiiny-AI / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 9,577 582 Updated May 11, 2026

Xiaoyu Zhai ryantd

Organizations

Starred repositories

Deep learning

PyTorch

Rust

Machine learning

Python