👳♂️
I think it’s a new feature. Don’t tell anyone it was an accident.
-
Global Knowledge
- Jakarta, Indonesia
- indepeo.dev
Highlights
-
rwkv-reward-enhanced Public
This repository contains an enhanced reward model training procedure using RWKV for RLHF. It's a work in progress with a focus on generating diverse trajectories and high-quality answers.