Skip to content
View zrbcool's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report zrbcool

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,137 3,483 Updated Mar 23, 2026

Fully open reproduction of DeepSeek-R1

Python 25,959 2,416 Updated Nov 24, 2025

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

Python 9,226 903 Updated Mar 23, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,272 98 Updated Aug 28, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 9,062 1,129 Updated Feb 9, 2026

Accessible large language models via k-bit quantization for PyTorch.

Python 8,068 838 Updated Mar 17, 2026

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,233 670 Updated Mar 23, 2026

Zero Bubble Pipeline Parallelism

Python 452 33 Updated May 7, 2025

Pipeline Parallelism for PyTorch

Python 786 88 Updated Aug 21, 2024

Multi-GPU CUDA stress test

C++ 2,135 399 Updated Nov 4, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…

Python 13,301 1,292 Updated Mar 23, 2026

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 17,661 1,418 Updated Feb 8, 2026

Prometheus exporter that mines /proc to report on selected processes

Go 2,097 307 Updated Apr 21, 2025

科学上网🕸️之跑路机场名单收集(2020-2026),欢迎投稿。Ad🔗🈲🙅❌

4,794 81 Updated Mar 17, 2026

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Python 1,543 228 Updated Dec 15, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 68,936 8,401 Updated Mar 21, 2026

北京联通IPTV相关脚本

Python 23 6 Updated Jun 1, 2020

北京电信IPTV播放列表 Beijing Telecom IPTV playlist bj-telecom-iptv.m3u

57 12 Updated Dec 31, 2021

Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".

Python 188 17 Updated Apr 18, 2024

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 647 73 Updated Apr 15, 2025

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 13,358 892 Updated Dec 17, 2024

Optimized primitives for collective multi-GPU communication

C++ 4,542 1,177 Updated Mar 20, 2026

A GPU performance profiling tool for PyTorch models

Python 511 50 Updated Jul 13, 2021

中国大模型

6,421 553 Updated Nov 30, 2024

Example models using DeepSpeed

Python 6,808 1,117 Updated Mar 4, 2026

A userspace out-of-memory killer

C++ 2,019 158 Updated Mar 15, 2026

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集

Python 3,054 229 Updated Apr 14, 2024

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 30,255 4,006 Updated Jul 17, 2024

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 77,239 8,329 Updated May 27, 2025

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1,438 227 Updated Mar 20, 2024
Next