-
Ph.D. Candidate@CUHK-MMLab, B.E.@ UCAS
- HongKong
- https://jf-d.github.io/
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Ongoing research training transformer models at scale
An Open Source Machine Learning Framework for Everyone
SGLang is a fast serving framework for large language models and vision language models.
verl: Volcano Engine Reinforcement Learning for LLMs
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A collection of full time roles in SWE, Quant, and PM for new grads.
Making large AI models cheaper, faster and more accessible
MSCCL++: A GPU-driven communication stack for scalable AI applications
🚀 Efficient implementations of state-of-the-art linear attention models
Open-source high-performance RISC-V processor
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
FlagGems is an operator library for large language models implemented in the Triton Language.
FlashInfer: Kernel Library for LLM Serving
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
A high-throughput and memory-efficient inference and serving engine for LLMs
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Seamless operability between C++11 and Python
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Memory engine and app that is extremely fast, scalable. The Memory API for the AI era.
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Hackable and optimized Transformers building blocks, supporting a composable construction.
Tile primitives for speedy kernels