-
The Chinese University of Hong Kong, Shenzhen
- Shenzhen
- www.liziniu.org
- @ziniuli
Highlights
- Pro
-
-
GEM Public
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
-
-
cold_start_rl Public
Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?
-
trl Public
Forked from huggingface/trlTrain transformer language models with reinforcement learning.
Python Apache License 2.0 UpdatedMar 3, 2025 -
verl Public
Forked from volcengine/verlveRL: Volcano Engine Reinforcement Learning for LLM
-
transformers Public
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python Apache License 2.0 UpdatedSep 11, 2024 -
alpaca_eval Public
Forked from tatsu-lab/alpaca_evalAn automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Jupyter Notebook Apache License 2.0 UpdatedMar 1, 2024 -
policy_optimization Public
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
-
ReMax Public
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
-
HyperDQN Public
Code for ICLR 2022 Paper (HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning)
-
ISWBC Public
Code for NeurIPS 2023 Paper (Imitation Learning from Imperfection: Theoretical Justifications and Algorithms)
-
clash-for-linux Public
Forked from LopSdir/clash-for-linuxLinux 端使用 Clash 作为代理工具
Shell UpdatedSep 6, 2023 -
baby-llama2-chinese Public
Forked from DLLXW/baby-llama2-chinese用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
Python MIT License UpdatedAug 22, 2023 -
Chinese-LLaMA-Alpaca-2 Public
Forked from ymcui/Chinese-LLaMA-Alpaca-2中文 LLaMA-2 & Alpaca-2 大模型二期项目 + 本地CPU/GPU训练部署 (Chinese LLaMA-2 & Alpaca-2 LLMs)
Python Apache License 2.0 UpdatedAug 18, 2023 -
-
iclr-blog-track.github.io Public
Forked from iclr-blog-track/iclr-blog-track.github.ioHTML Other UpdatedApr 10, 2022 -
webpage-template Public
Forked from elliottwu/webpage-templateAdapted from the widely used project webpage template made by the colorful folks.
HTML UpdatedAug 8, 2021 -
-
RL-PPO-Keras Public
Proximal Policy Optimization(PPO) with Keras Implementation
-
RLX Public
RLX is an RL codebase based on TensorFlow. It implements algorithms like SAC, ACER, GAIL and TRPO. It is easy to use.
-
-
-
sample-efficient-bayesian-rl Public
Forked from stratisMarkou/sample-efficient-bayesian-rlSource for the sample efficient tabular RL submission to the 2019 NIPS workshop on Biological and Artificial RL
Jupyter Notebook MIT License UpdatedDec 18, 2019 -
-
-
-
-
-
go-explore Public
Forked from uber-research/go-exploreCode for Go-Explore: a New Approach for Hard-Exploration Problems
Python Other UpdatedMay 24, 2019