AutoJEPA

Autonomous design-space search over Joint-Embedding Predictive Architecture (JEPA) pretraining recipes. A clean fork of autoresearch-rl, purpose-built for self-supervised pretraining. Prime deployment target: Basilica GPU cloud.

prepare.py  -->  [data + probe-eval]  -->  train.py  -->  [probe_auroc]  -->  keep/discard  -->  repeat
 (frozen)                                  (mutable)                            |
                                                ^                               |
                                                |    LLM proposes next          |
                                                +-------- params or diff -------+

Identity

AutoJEPA inherits the autoresearch pattern — frozen prepare.py + mutable train.py, AST-validated LLM-proposed diffs, hybrid (param → diff on stall) policy — and replaces RL/SFT-shaped defaults with JEPA-shaped defaults:

Probe-based downstream evaluation as the campaign objective (probe_auroc, not training loss — JEPA loss collapses).
RankMe / LiDAR / latent-variance / effective-rank as hard fail gates against representation collapse.
VICReg-aware loss defaults (C-JEPA).
Composable mask primitives as first-class building blocks the LLM combines.
Forecaster recalibrated for SSL learning curves (long plateau where only probe score moves).
Multi-seed scoring because JEPA outcomes are seed-sensitive.

Quickstart

uv sync --extra dev --extra jepa
uv run autojepa run examples/ijepa-cifar10/config.yaml

Common workflows are wrapped in Makefile:

make help        # list targets
make check       # lint + typecheck + full tests
make test-fast   # tests excluding slow integration suite

Basilica-first

AutoJEPA targets GPU pretraining; Basilica is the prime deployment target and basilica-sdk is a default dependency. Local command and http targets remain available (inherited from autoresearch-rl) but campaign configs default to target: basilica.

target:
  type: basilica
  image: pytorch:2.4.1-cuda12.4
  gpu_count: 1
  gpu_models: [A100, H100]
  memory: 32Gi

The two scripts

Every campaign has two scripts connected by the filesystem, never by imports:

prepare.py (frozen) — runs once via prepare_cmd. Produces data shards, defines the probe-eval pipeline and collapse-detection callbacks. The LLM cannot modify this file. Trust boundary: evaluation integrity is guaranteed by freezing it.

train.py (mutable) — runs each iteration. Reads prepared data, trains the JEPA model (Φc context encoder + Φt EMA target encoder + Ψ predictor), prints metrics to stdout via emit_progress. The LLM proposes diffs in llm_diff or hybrid mode.

Roadmap

See TODO.md for the live phased plan, CHANGELOG.md for releases, and docs/research/ for the cited research corpus. Architecture writeup: gist 2567a53.

The Phase-2 falsifier (CIFAR I-JEPA) was the kill criterion for the framework approach. The Phase-3 falsifier (trace-jepa) was the kill criterion for the application (JEPA-for-LLM-agent-traces). Both were crossed:

Phase 2: framework approach validated — LLM-authored diffs produced kept ratchet on Basilica
Phase 3: probe_auroc = 0.7516 at 5% FPR (kept iter, weights persisted via Git LFS — see v13 evidence in artifacts/trace-jepa/)

Architecture Decisions

This project tracks every load-bearing decision in docs/adr/ — 34 ADRs as of v0.2.0. Start with ADR-001 (why a fork) and ADR-004 (probe AUROC as the campaign objective); skim the index for the full set.

Lineage

FunSearch → AI Scientist v1/v2 → ADAS → AIDE → AlphaEvolve → karpathy/autoresearch → autoresearch-rl → AutoJEPA.

Sibling upstream: ../autoresearch-rl (added as git remote upstream for cherry-pick reference only — capped at ~1h/wk).

Contributing

Pull requests welcome. See CONTRIBUTING.md for the per-area pre-merge checklist; the CLAUDE.md hard rule applies: do not call a feature done without a realistic-config end-to-end run on the same day you wrote it.

Citation

If AutoJEPA helps your research, please cite the architecture writeup:

@misc{pappas2026autojepa,
  title  = {AutoJEPA: Autonomous Design-Space Search over JEPA Pretraining Recipes},
  author = {Pappas, Evangelos},
  year   = {2026},
  howpublished = {\url{https://github.com/epappas/autojepa}},
}

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoJEPA

Identity

Quickstart

Basilica-first

The two scripts

Roadmap

Architecture Decisions

Lineage

Contributing

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
artifacts/trace-jepa		artifacts/trace-jepa
docs		docs
examples		examples
scripts		scripts
src/autojepa		src/autojepa
tests		tests
traces/trace-jepa		traces/trace-jepa
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

AutoJEPA

Identity

Quickstart

Basilica-first

The two scripts

Roadmap

Architecture Decisions

Lineage

Contributing

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages