QiJune

QI JUN QiJune

204 followers · 23 following

@NVIDIA
SHANG HAI

Achievements

x2 x3

Achievements

x2 x3

Organizations

Stars

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,823 1,061 Updated Jun 19, 2026

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 965 49 Updated Mar 29, 2026

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 495 42 Updated Jun 10, 2026

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 5,340 406 Updated Apr 20, 2026

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,916 2,482 Updated Jun 19, 2026

huggingface / optimum-nvidia

Python 1,038 104 Updated May 26, 2026

huggingface / candle

Minimalist ML framework for Rust

Rust 20,511 1,610 Updated Jun 19, 2026

ModelTC / LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 4,132 334 Updated Jun 18, 2026

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,322 18,221 Updated Jun 19, 2026

meta-llama / llama

Inference code for Llama models

Python 59,463 9,790 Updated Jan 26, 2025

secretflow / scql

SCQL (Secure Collaborative Query Language) is a system that allows multiple distrusting parties to run joint analysis without revealing their private data.

Go 181 72 Updated Mar 18, 2026

FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,365 591 Updated Oct 28, 2024

NVIDIA / trt-samples-for-hackathon-cn

Simple samples for TensorRT programming

Python 1,661 350 Updated May 5, 2026

ELS-RD / kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook 1,586 99 Updated Jan 28, 2026

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 33,884 7,066 Updated Jun 19, 2026

microsoft / msccl-tools

Synthesizer for optimal collective communication algorithms

Python 125 29 Updated Apr 8, 2024

facebookresearch / metaseq

Repo for external large-scale work

Python 6,546 718 Updated Apr 27, 2024

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,426 935 Updated Mar 27, 2024

google-research / t5x

Python 2,970 340 Updated Jun 15, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 19,471 2,947 Updated Jun 19, 2026

microsoft / msccl

Microsoft Collective Communication Library

C++ 391 34 Updated Sep 20, 2023

hpcaitech / EnergonAI

Large-scale model inference.

Python 629 85 Updated Sep 12, 2023

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,400 1,016 Updated Dec 4, 2025

ConnollyLeon / awesome-Auto-Parallelism

A baseline repository of Auto-Parallelism in Training Neural Networks

Python 145 20 Updated Jun 25, 2022

goplus / xgo

XGo is a programming language that reads like plain English. But it's also incredibly powerful — it lets you leverage assets from C/C++, Go, Python, and JavaScript/TypeScript, creating a unified so…

Go 9,436 565 Updated Jun 19, 2026

sql-machine-learning / elasticdl

Kubernetes-native Deep Learning Framework

Python 745 115 Updated Jan 26, 2024

alpa-projects / alpa

Training and serving large-scale neural networks with auto parallelization.

Python 3,182 361 Updated Dec 9, 2023

arogozhnikov / einops

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 9,524 399 Updated May 31, 2026

affjljoo3581 / GPT2

PyTorch Implementation of OpenAI GPT-2

Python 361 68 Updated Jul 4, 2024

jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Python 9,744 2,094 Updated Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QI JUN QiJune

Achievements

Achievements

Organizations

Block or report QiJune

Stars

flashinfer-ai / flashinfer

efeslab / Nanoflow

microsoft / vattention

xlite-dev / Awesome-LLM-Inference

NVIDIA / TensorRT-LLM

huggingface / optimum-nvidia

huggingface / candle

ModelTC / LightLLM

vllm-project / vllm

meta-llama / llama

secretflow / scql

FMInference / FlexLLMGen

NVIDIA / trt-samples-for-hackathon-cn

ELS-RD / kernl

huggingface / diffusers

microsoft / msccl-tools

facebookresearch / metaseq

NVIDIA / FasterTransformer

google-research / t5x

triton-lang / triton

microsoft / msccl

hpcaitech / EnergonAI

Oneflow-Inc / oneflow

ConnollyLeon / awesome-Auto-Parallelism

goplus / xgo

sql-machine-learning / elasticdl

alpa-projects / alpa

arogozhnikov / einops

affjljoo3581 / GPT2

jadore801120 / attention-is-all-you-need-pytorch