scalable-oversight

PyINE is a research framework for scalable elicitation and oversight of LLM reasoning, built on instrumented Python programs as a verifiable execution substrate.

model-organisms ai-safety code-execution scalable-oversight reasoning-model-evaluation execution-grounded-verification cost-sensitive-evaluation

Updated Jun 4, 2026
Python

Napiersnotes / ScalableOversight-ADT

Star

Adversarial Deliberation Trees with Mechanistic Verification for scalable LLM oversight

python ai-safety ai-alignment scalable-oversight llm-open-source

Updated Jan 9, 2026
Python

ChaoYue0307 / open-world-alignment

Star

Evaluation infrastructure for AI systems beyond direct human supervision

benchmarking alignment ai-safety model-cards ai-evaluation scalable-oversight research-artifact open-world-alignment

Updated Jun 2, 2026
Python

Nexus is a turn-based environment for training LLM agents to negotiate scarce compute resources (GPU, CPU, memory, bandwidth) under budget constraints and deadline pressure. Agents bid, trade, form coalitions, and strategize with hidden information. Designed for scalable AI oversight as well as multi-agent management research.

reinforcement-learning cloud-computing multi-agent-systems theory-of-mind negotiation-algorithms scalable-oversight grpo procurement-ai

Updated Mar 11, 2026
Python

Mike-E-Log / gg-tank-watch

Star

GG Tank Watch - frozen public-information archive of a resolved May 2026 chemical emergency. Conduit-only design; responsible-AI safety patterns in production.

python vanilla-js civic-tech ai-safety responsible-ai emergency-information ai-control scalable-oversight frozen-archive

Updated Jun 10, 2026
Python

R123456-123 / oracle-flow

Star

A multi-agent real estate valuation engine secured by scalable AI oversight and strict guardrails.

ai-safety llm-as-a-judge scalable-oversight

Updated Apr 25, 2026
Python

Ella-Afonso / NeuroGuard

Star

A model-organism benchmark for misalignment and oversight failures in biomedical AI research agents.

python evaluations clinical-trials model-organisms ai-safety interpretability scalable-oversight llm-benchmark

Updated May 29, 2026
Python

nelacye / LAMP_IRIS_operational_physiology

Star

Executable audits for latent-state claims in AI alignment and high-stakes evaluation.

ai-safety ai-alignment mechanistic-interpretability scalable-oversight evaluation-auditing

Updated Jun 6, 2026
Python

azrabano23 / aurelis

Star

Reproducible LLM-as-judge grader for med-student clinical notes: rubric scoring + evidence-cited feedback, with a QWK/Pearson/MAE human-agreement harness (validated on ACI-Bench). Rutgers Health Hack 3rd/200+.

python medical-education ai-safety llm-evaluation llm-as-judge scalable-oversight

Updated Jun 8, 2026
Python

Cuuper22 / Erdos

Sponsor

Star

Lean 4 theorem prover with an LLM Prover/Critic loop and SHA-256 theorem locking to catch specification gaming.

rust ai theorem-proving mathematics gemini formal-verification automated-reasoning ai-alignment lean4 scalable-oversight

Updated Jun 10, 2026
Python

Improve this page

Add a description, image, and links to the scalable-oversight topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the scalable-oversight topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scalable-oversight

Here are 15 public repositories matching this topic...

WooooDyy / MathCritique

halfrot / ALaRM

mintaywon / IF_RLHF

SamAdamDay / neural-interactive-proofs

AnaBelenBarbero / ai-wise-council

saifh-github / pyine

Napiersnotes / ScalableOversight-ADT

ChaoYue0307 / open-world-alignment

n8mauer / Nexus

Mike-E-Log / gg-tank-watch

R123456-123 / oracle-flow

Ella-Afonso / NeuroGuard

nelacye / LAMP_IRIS_operational_physiology

azrabano23 / aurelis

Cuuper22 / Erdos

Improve this page

Add this topic to your repo