Skip to content
View zrbcool's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report zrbcool

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,949 4,072 Updated Jun 13, 2026

Fully open reproduction of DeepSeek-R1

Python 26,302 2,438 Updated Apr 2, 2026

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,631 968 Updated Jun 9, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,325 104 Updated Aug 28, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 9,728 1,284 Updated Jun 11, 2026

Accessible large language models via k-bit quantization for PyTorch.

Python 8,265 869 Updated Jun 12, 2026

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,389 748 Updated Jun 13, 2026

Zero Bubble Pipeline Parallelism

Python 459 33 Updated May 7, 2025

Pipeline Parallelism for PyTorch

Python 785 87 Updated Aug 21, 2024

Multi-GPU CUDA stress test

C++ 2,228 408 Updated May 31, 2026

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …

Python 14,495 1,475 Updated Jun 13, 2026

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 18,478 1,507 Updated May 24, 2026

Prometheus exporter that mines /proc to report on selected processes

Go 2,123 307 Updated Apr 21, 2025

科学上网🕸️之跑路机场名单收集(2020-2026),欢迎投稿。Ad🔗🈲🙅❌

5,977 90 Updated May 29, 2026

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Python 1,577 229 Updated Dec 15, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,134 8,826 Updated Jun 13, 2026

北京联通IPTV相关脚本

Python 24 6 Updated Jun 1, 2020

北京电信IPTV播放列表 Beijing Telecom IPTV playlist bj-telecom-iptv.m3u

59 12 Updated Dec 31, 2021

Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".

Python 189 17 Updated Apr 18, 2024

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 712 83 Updated Apr 8, 2026

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 13,590 908 Updated Dec 17, 2024

Optimized primitives for collective multi-GPU communication

C++ 4,808 1,295 Updated Jun 13, 2026

A GPU performance profiling tool for PyTorch models

Python 511 51 Updated Jul 13, 2021

中国大模型

6,450 561 Updated Nov 30, 2024

Example models using DeepSpeed

Python 6,822 1,120 Updated May 20, 2026

A userspace out-of-memory killer

C++ 2,034 161 Updated May 28, 2026

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集

Python 3,050 225 Updated Apr 14, 2024

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 30,249 3,991 Updated Jul 17, 2024

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 77,363 8,318 Updated May 27, 2025

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1,444 226 Updated Mar 20, 2024
Next