Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
-
Updated
Nov 29, 2024 - Python
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
[ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"
Source code for 'Understanding impacts of human feedback via influence functions'
Experiments for the Neural Interactive Proofs paper
PyINE is a research framework for scalable elicitation and oversight of LLM reasoning, built on instrumented Python programs as a verifiable execution substrate.
Adversarial Deliberation Trees with Mechanistic Verification for scalable LLM oversight
Evaluation infrastructure for AI systems beyond direct human supervision
Nexus is a turn-based environment for training LLM agents to negotiate scarce compute resources (GPU, CPU, memory, bandwidth) under budget constraints and deadline pressure. Agents bid, trade, form coalitions, and strategize with hidden information. Designed for scalable AI oversight as well as multi-agent management research.
GG Tank Watch - frozen public-information archive of a resolved May 2026 chemical emergency. Conduit-only design; responsible-AI safety patterns in production.
A multi-agent real estate valuation engine secured by scalable AI oversight and strict guardrails.
A model-organism benchmark for misalignment and oversight failures in biomedical AI research agents.
Executable audits for latent-state claims in AI alignment and high-stakes evaluation.
Reproducible LLM-as-judge grader for med-student clinical notes: rubric scoring + evidence-cited feedback, with a QWK/Pearson/MAE human-agreement harness (validated on ACI-Bench). Rutgers Health Hack 3rd/200+.
Lean 4 theorem prover with an LLM Prover/Critic loop and SHA-256 theorem locking to catch specification gaming.
Add a description, image, and links to the scalable-oversight topic page so that developers can more easily learn about it.
To associate your repository with the scalable-oversight topic, visit your repo's landing page and select "manage topics."