rl-from-human-feedback

Here is 1 public repository matching this topic...

hscspring / rl-llm-nlp

Curated, opinionated index of post-R1 LLM × Reinforcement Learning. Many deep-dive blog posts cross-linked to many papers — GRPO, DAPO, DPO, PPO, RLHF, GSPO, CISPO, VAPO, Reward Modeling, MoE RL stability, Verifier-Free RL, Training-Free RL, Agentic RL, DeepSeek-R1 reproduction.

Updated Apr 25, 2026

Improve this page

Add a description, image, and links to the rl-from-human-feedback topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rl-from-human-feedback topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rl-from-human-feedback

Here is 1 public repository matching this topic...

hscspring / rl-llm-nlp

Improve this page

Add this topic to your repo