liziniu

Ziniu Li liziniu

Ph.D. student at The Chinese University of Hong Kong, Shenzhen.

97 followers · 42 following

The Chinese University of Hong Kong, Shenzhen
Shenzhen
www.liziniu.org
@ziniuli

Achievements

Highlights

liziniu.github.io Public

HTML 1 Updated Oct 2, 2025
GEM Public

Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)

diversity rl cold-start reasoning generalization distribution-matching large-language-models

Python 41 5 Updated May 12, 2025
offline_rl Public

Python 3 Updated May 2, 2025
cold_start_rl Public

Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?

diversity reinforcement-learning exploration cold-start reasoning sft large-language-models

Python 18 Updated Mar 9, 2025
trl Public
Forked from huggingface/trl

Train transformer language models with reinforcement learning.

Python Apache License 2.0 Updated Mar 3, 2025
verl Public
Forked from volcengine/verl

veRL: Volcano Engine Reinforcement Learning for LLM

Python 8 1 Apache License 2.0 Updated Feb 10, 2025
transformers Public
Forked from huggingface/transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python Apache License 2.0 Updated Sep 11, 2024
alpaca_eval Public
Forked from tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Jupyter Notebook Apache License 2.0 Updated Mar 1, 2024
policy_optimization Public

Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)

bandit stochastic-approximation policy-optimization large-language-models rlhf

Python 28 6 Updated Dec 19, 2023
ReMax Public

Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)

reinforcement-learning policy-gradient large-language-models rlhf

Python 193 14 Updated Dec 16, 2023
HyperDQN Public

Code for ICLR 2022 Paper (HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning)

reinforcement-learning deep-reinforcement-learning exploration hypernetwork iclr-2022

Python 12 1 Updated Nov 28, 2023
ISWBC Public

Code for NeurIPS 2023 Paper (Imitation Learning from Imperfection: Theoretical Justifications and Algorithms)

reinforcement-learning imitation-learning importance-sampling data-selection neurips-2023

Python 7 Updated Sep 22, 2023
clash-for-linux Public
Forked from LopSdir/clash-for-linux

Linux 端使用 Clash 作为代理工具

Shell Updated Sep 6, 2023
baby-llama2-chinese Public
Forked from DLLXW/baby-llama2-chinese

用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库；24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.

Python MIT License Updated Aug 22, 2023
Chinese-LLaMA-Alpaca-2 Public
Forked from ymcui/Chinese-LLaMA-Alpaca-2

中文 LLaMA-2 & Alpaca-2 大模型二期项目 + 本地CPU/GPU训练部署 (Chinese LLaMA-2 & Alpaca-2 LLMs)

Python Apache License 2.0 Updated Aug 18, 2023
ILwSD Public

Python 3 Updated Jan 30, 2023
iclr-blog-track.github.io Public
Forked from iclr-blog-track/iclr-blog-track.github.io

HTML Other Updated Apr 10, 2022
webpage-template Public
Forked from elliottwu/webpage-template

Adapted from the widely used project webpage template made by the colorful folks.

HTML Updated Aug 8, 2021
bib-merge Public

Python 1 Updated Oct 25, 2020
RL-PPO-Keras Public

Proximal Policy Optimization(PPO) with Keras Implementation

keras policy-gradient ppo reinfocement-learning

Python 17 12 Updated Aug 8, 2020
RLX Public

RLX is an RL codebase based on TensorFlow. It implements algorithms like SAC, ACER, GAIL and TRPO. It is easy to use.

Python 3 Updated Jul 28, 2020
CVAE Public

Python Updated Dec 31, 2019
cgmm Public

Python Updated Dec 30, 2019
sample-efficient-bayesian-rl Public
Forked from stratisMarkou/sample-efficient-bayesian-rl

Source for the sample efficient tabular RL submission to the 2019 NIPS workshop on Biological and Artificial RL

Jupyter Notebook MIT License Updated Dec 18, 2019
stable-baselines Public

Python MIT License Updated Nov 29, 2019
dagger Public

Python Updated Nov 11, 2019
baselines Public

Python MIT License Updated Jun 25, 2019
SuperMario Public

Jupyter Notebook Updated Jun 2, 2019
Maze Public

Jupyter Notebook Updated May 28, 2019
go-explore Public
Forked from uber-research/go-explore

Code for Go-Explore: a New Approach for Hard-Exploration Problems

Python Other Updated May 24, 2019

Ziniu Li liziniu

Achievements

Achievements

Highlights

liziniu.github.io Public

Uh oh!

GEM Public

Uh oh!

offline_rl Public

Uh oh!

cold_start_rl Public

Uh oh!

trl Public

Uh oh!

verl Public

Uh oh!

transformers Public

Uh oh!

alpaca_eval Public

Uh oh!

policy_optimization Public

Uh oh!

ReMax Public

Uh oh!

HyperDQN Public

Uh oh!

ISWBC Public

Uh oh!

clash-for-linux Public

Uh oh!

baby-llama2-chinese Public

Uh oh!

Chinese-LLaMA-Alpaca-2 Public

Uh oh!

ILwSD Public

Uh oh!

iclr-blog-track.github.io Public

Uh oh!

webpage-template Public

Uh oh!

bib-merge Public

Uh oh!

RL-PPO-Keras Public

Uh oh!

RLX Public

Uh oh!

CVAE Public

Uh oh!

cgmm Public

Uh oh!

sample-efficient-bayesian-rl Public

Uh oh!

stable-baselines Public

Uh oh!

dagger Public

Uh oh!

baselines Public

Uh oh!

SuperMario Public

Uh oh!

Maze Public

Uh oh!

go-explore Public

Uh oh!