- HangZhou, ZheJiang, China
- https://khotyn.com/blog
- @khotyn
Starred repositories
Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors
The API to search, scrape, and interact with the web at scale. 🔥
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
A simple, performant and scalable Jax LLM!
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
Data driven agentic landscapes and insights. Produced by Ant Open Source and inclusionAI.
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
iTerm2 is a terminal emulator for Mac OS X that does amazing things.
[HPCA 2026] AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
CUDA Templates and Python DSLs for High-Performance Linear Algebra
FlashMLA: Efficient Multi-head Latent Attention Kernels
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
SGLang is a high-performance serving framework for large language models and multimodal models.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Module, Model, and Tensor Serialization/Deserialization