ultra-instinct ML engineering intern for Claude Code. Reads papers, audits datasets, ships SFT/DPO/LoRA runs to Hugging Face.
-
Updated
Jun 9, 2026 - Shell
ultra-instinct ML engineering intern for Claude Code. Reads papers, audits datasets, ships SFT/DPO/LoRA runs to Hugging Face.
Behavioural protocol package for Claude Code enforcing quality and transparency
Data Preparation for Large Language Models — a curated companion to our JCST 2026 survey. Covers Pre-training, Continual Pre-training, and Post-training (SFT/RLHF/RLAIF) across collection, filtering, dedup, generation, evaluation.
Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.
To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."