Autonomous design-space search over Joint-Embedding Predictive Architecture (JEPA) pretraining recipes. A clean fork of
autoresearch-rl, purpose-built for self-supervised pretraining. Prime deployment target: Basilica GPU cloud.
prepare.py --> [data + probe-eval] --> train.py --> [probe_auroc] --> keep/discard --> repeat
(frozen) (mutable) |
^ |
| LLM proposes next |
+-------- params or diff -------+
AutoJEPA inherits the autoresearch pattern — frozen prepare.py + mutable train.py, AST-validated LLM-proposed diffs, hybrid (param → diff on stall) policy — and replaces RL/SFT-shaped defaults with JEPA-shaped defaults:
- Probe-based downstream evaluation as the campaign objective (
probe_auroc, not training loss — JEPA loss collapses). - RankMe / LiDAR / latent-variance / effective-rank as hard fail gates against representation collapse.
- VICReg-aware loss defaults (C-JEPA).
- Composable mask primitives as first-class building blocks the LLM combines.
- Forecaster recalibrated for SSL learning curves (long plateau where only probe score moves).
- Multi-seed scoring because JEPA outcomes are seed-sensitive.
uv sync --extra dev --extra jepa
uv run autojepa run examples/ijepa-cifar10/config.yamlCommon workflows are wrapped in Makefile:
make help # list targets
make check # lint + typecheck + full tests
make test-fast # tests excluding slow integration suiteAutoJEPA targets GPU pretraining; Basilica is the prime deployment target and basilica-sdk is a default dependency. Local command and http targets remain available (inherited from autoresearch-rl) but campaign configs default to target: basilica.
target:
type: basilica
image: pytorch:2.4.1-cuda12.4
gpu_count: 1
gpu_models: [A100, H100]
memory: 32GiEvery campaign has two scripts connected by the filesystem, never by imports:
prepare.py (frozen) — runs once via prepare_cmd. Produces data shards, defines the probe-eval pipeline and collapse-detection callbacks. The LLM cannot modify this file. Trust boundary: evaluation integrity is guaranteed by freezing it.
train.py (mutable) — runs each iteration. Reads prepared data, trains the JEPA model (Φc context encoder + Φt EMA target encoder + Ψ predictor), prints metrics to stdout via emit_progress. The LLM proposes diffs in llm_diff or hybrid mode.
See TODO.md for the live phased plan, CHANGELOG.md for releases, and docs/research/ for the cited research corpus. Architecture writeup: gist 2567a53.
The Phase-2 falsifier (CIFAR I-JEPA) was the kill criterion for the framework approach. The Phase-3 falsifier (trace-jepa) was the kill criterion for the application (JEPA-for-LLM-agent-traces). Both were crossed:
- Phase 2: framework approach validated — LLM-authored diffs produced kept ratchet on Basilica
- Phase 3:
probe_auroc = 0.7516at 5% FPR (kept iter, weights persisted via Git LFS — see v13 evidence inartifacts/trace-jepa/)
This project tracks every load-bearing decision in docs/adr/ — 34 ADRs as of v0.2.0. Start with ADR-001 (why a fork) and ADR-004 (probe AUROC as the campaign objective); skim the index for the full set.
FunSearch → AI Scientist v1/v2 → ADAS → AIDE → AlphaEvolve → karpathy/autoresearch → autoresearch-rl → AutoJEPA.
Sibling upstream: ../autoresearch-rl (added as git remote upstream for cherry-pick reference only — capped at ~1h/wk).
Pull requests welcome. See CONTRIBUTING.md for the per-area pre-merge checklist; the CLAUDE.md hard rule applies: do not call a feature done without a realistic-config end-to-end run on the same day you wrote it.
If AutoJEPA helps your research, please cite the architecture writeup:
@misc{pappas2026autojepa,
title = {AutoJEPA: Autonomous Design-Space Search over JEPA Pretraining Recipes},
author = {Pappas, Evangelos},
year = {2026},
howpublished = {\url{https://github.com/epappas/autojepa}},
}MIT. See LICENSE.