Skip to content
View jianzhnie's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report jianzhnie

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jianzhnie/README.md
Typing SVG

📊 GitHub Stats



👨‍💻 About Me

I'm an AI engineer focused on building production-grade LLM systems and scalable reinforcement learning frameworks. I love turning cutting-edge research into clean, usable code.


🛠️ Tech Stack

Python PyTorch Ray vLLM DeepSpeed Megatron veRL HuggingFace CUDA Docker Kubernetes

🧠 LLM & AI Systems

Project Description
mini-vLLM A compact implementation of vLLM, designed to demystify the complexities of modern LLM serving systems.
ScaleTorch A scalable PyTorch framework for training large models, implementing 4D parallelism (TP, PP, SP, DP).
Open-R1 Open-source DeepSeek-R1-style and RLHF training pipeline.
LLMEval A modular framework to evaluate LLMs across tasks and settings.
LLMReasoning Techniques and toolkit for reasoning with LLMs.
LLMToolkit A PyTorch toolkit for NLP and LLM development.
LLamaTuner Easy and efficient finetuning pipelines for LLMs.

🎮 Reinforcement Learning

Project Description
Deep-RL-Toolkit Single-agent RL toolkit (DQN, Rainbow, DDPG, PPO, SAC, TD3, …).
Deep-MARL-Toolkit Multi-agent RL toolkit (VDN, QMIX, MADDPG, MAPPO, …).
RLZero MCTS for general sequential decision making (AlphaZero, MuZero, …).
ScaleRL Simple, scalable distributed RL (A3C, Ape-X, IMPALA, …).
CyberAttackSimulator RL environment for autonomous cyber attack and defense on simulated networks.

🔧 More Projects

Project Description
Diffusion Toolkit Image/audio generation with diffusion models in PyTorch.
AutoTimm AutoML for deep learning tasks.
AutoTabular AutoML for tabular data.

How to reach me 📫

Have an awesome day! 🌟

Pinned Loading

  1. LLamaTuner LLamaTuner Public

    Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

    Python 621 64

  2. Open-R1 Open-R1 Public

    The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

    Python 277 54

  3. deep-marl-toolkit deep-marl-toolkit Public

    MARLToolkit: The Multi-Agent Rainforcement Learning Toolkit. Include implementation of MAPPO, MADDPG, QMIX, VDN, COMA, IPPO, QTRAN, MAT...

    Python 167 21

  4. deep-rl-toolkit deep-rl-toolkit Public

    RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementation of DQN, AC,A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

    Python 9 2

  5. LLMToolkit LLMToolkit Public

    LLMToolkit is a toolkit for NLP(Natural Language Processing) and LLM(Large Language Models) using Pytorch.

    Python 5 2

  6. llmtech llmtech Public

    LLMTechSite, 专注于通用人工智能领域的技术生态。

    Python 12 5