Complete the cutover: runner+fingerprinter on production path, legacy deletion, CLI surface

## Problem Statement

The previous wave of work — PRDs #15, #24, and #25 across 27 sub-issues — built the JobRunner extraction infrastructure and the unified `fingerprint::Fingerprinter` engine in parallel, but every slice that touched `Crawler::process_job` or the legacy detection modules deferred the load-bearing change to "the next release" and shipped under `#[deprecated]` warnings. As of today on `main`:

- **`Crawler::process_job` still runs the inline ~600 LOC fetch/extract/detect/escalation block.** The `runner: Arc<JobRunner>` and `fingerprinter: Arc<Fingerprinter>` fields are constructed on every `Crawler` and cloned in `clone_refs`, but **no code path inside `process_job` ever calls them**. The runner-as-entry-point goal from PRD #15 Q2 is unmet on every active job.
- **`src/discovery/tech_fingerprint.rs` is still 1312 LOC** and is the only detector `process_job` consults for tech inference. The `fingerprint::Fingerprinter` engine — 14 Hot sources, 4 Warm sources, 2 Cold sources, 138 unit tests — produces a `FingerprintReport` nothing in production reads.
- **`runner::ChallengeDetector` is `#[deprecated]` but still alive.** `policy::engine` constructs it on every fetch attempt to make the antibot escalation call. `runner::JobRunner::run` constructs it as a fallback when `fingerprinter` isn't injected, which is "always" in production because `process_job` never calls `runner.run`.
- **`antibot::block_detector` is `#[deprecated]` but still alive.** Same situation — `BlockPatternSource` in `fingerprint::target::sources` ports the logic; nothing reads it.
- **`antibot::detect_from_html` / `detect_from_http_response` / `detect_from_cookies` are still public functions.** `render::pool` calls them post-render. No fingerprint-engine path runs against post-render HTML.
- **`error::AntibotVendor` and `antibot::ChallengeVendor` are `#[deprecated]` enums** with `From` conversions to `fingerprint::Vendor`. Their conversion paths exist but call sites in `policy::engine`, `runner::JobRunner`, `antibot::*`, `crawler.rs`, and `tests/{escalation,session_scope_policy,antibot_detection}.rs` still consume the old variants. **333 deprecation warnings** ship in `cargo build --tests` today.
- **`runner::AutoFetcher` + `AutoOutcome` are `#[deprecated]`** per ADR-0004 and have zero consumers — pure dead code that still compiles.
- **`runner::SessionStatePlaceholder` and `ChallengeSignalPlaceholder`** are still the types `JobOutcome.new_session_state` and `JobOutcome.signals` carry. `Vec<ChallengeSignalPlaceholder>` is the typed output of every successful runner invocation today.
- **`Fingerprinter::analyze_warm` and `analyze_cold` don't exist.** The Warm/Cold sources (h2 SETTINGS, robots.txt, well-known, favicon, DNS, ASN) are registered in `Engine` but `Engine::analyze_hot` is the only dispatcher. No engine method ever invokes the higher-tier sources.
- **`TargetContext` carries only Hot-tier fields** (`status`, `headers`, `body`, `final_url`). The Warm/Cold sources have empty `analyze` because the `Option<&T>` slots from the parent PRD (h2 SETTINGS frame, robots body, well-known probe, favicon bytes, DNS observation, ASN info, peer cert) never made it onto the struct.
- **`SelfFingerprint` is never populated at runtime.** The compute functions (`compute_ja3`, `compute_ja4`, `compute_h2_settings_fingerprint`) take parsed fields, but no plumbing in `impersonate::tls::build_connector` captures our ClientHello bytes to feed them.
- **The catalog ships PLACEHOLDER hashes** (`"PLACEHOLDER_chrome131_ja3_md5"` and friends). `Coherence::compute_coherence` runs against the placeholders, which means `our_ja3_matches_profile` is always either `None` or `Some(true)` against a synthetic value — there is no real drift detection on `main`.
- **There is no CLI surface** for the new fingerprint module. `crawlex fingerprint <url>` does not exist. `--deep-fingerprint` and `--audit-tls` flags do not exist. Operators cannot drive the engine from outside a Rust binary.

The honest grade from the prior wave — "infrastructure shipped, runner is dead code in production, god-module unsolved" — is exactly the grade today. The deferred items are the actual cutover. Without them, three PRDs of detection / runner work have **zero production impact**: every byte fetched on `main` flows through the same inline dispatch and the same three legacy detectors that existed before PRD #15 started.

## Solution

Close the cutover. One unified PRD covering the nine deferred items, sequenced so each step is reviewable and the NDJSON event-stream regression test from issue #16 catches any wire drift across the whole chain.

**Phase 1 — `process_job` cutover (the real surgery).**
Replace the inline fetch/extract/detect block in `Crawler::process_job` with `runner.run(&job, &ctx).await`. The runner is constructed once with the `Arc<Fingerprinter>` from the same `Crawler` instance — antibot detection flows through `Fingerprinter::analyze_hot`, not through `ChallengeDetector`. The `SessionContext` is built from existing `Crawler` state (proxy router lease, antibot session state, render budgets, resolved policy profile). After this phase, the runner is the production entry point for per-Job execution and `crawler.rs` shrinks measurably.

**Phase 2 — `JobOutcome` widening.**
`JobOutcome.signals` becomes `Vec<fingerprint::Detection>` (drops `ChallengeSignalPlaceholder`). `JobOutcome.new_session_state` becomes `Option<antibot::SessionState>` (drops `SessionStatePlaceholder`). The two remaining placeholder types are deleted. Every caller — runner internals, integration tests, `Crawler` post-processing — reads the typed shapes.

**Phase 3 — Legacy detector deletion.**
With every in-tree caller migrated, `src/discovery/tech_fingerprint.rs`, `src/runner/challenge.rs`, `src/antibot/block_detector.rs`, the `antibot::detect_*` functions, `runner::AutoFetcher` + `AutoOutcome`, and the `error::AntibotVendor` + `antibot::ChallengeVendor` enums are deleted. The 333 deprecation warnings drop to zero.

**Phase 4 — Warm/Cold tier activation.**
`Fingerprinter::analyze_warm(host)` and `analyze_cold(host)` get real implementations: the engine fetches robots.txt / well-known probes / favicon (via existing `ImpersonateClient`), pulls DNS records (via existing `discovery::dns`), runs RDAP lookups (via existing `discovery::rdap`), and populates the cache. `TargetContext` gains the optional Warm/Cold slots. `Engine::analyze_hot` keeps its current speed budget; Warm runs once per `host:port` per TTL window; Cold runs on operator opt-in.

**Phase 5 — Self-fingerprint live capture.**
A hook into `impersonate::tls::build_connector` captures the ClientHello bytes after BoringSSL assembles them. The compute functions in `fingerprint::introspect` run against the live bytes. `SelfFingerprint.profile_expected` populates from the catalog. `SelfFingerprint.matches_profile` and `drift_signals` carry real values. The first authoritative ClientHello capture per `impersonate::Profile` replaces the PLACEHOLDER hashes in `catalog.rs`.

**Phase 6 — CLI surface.**
`crawlex fingerprint <url>` subcommand drives the engine end-to-end and prints a `FingerprintReport`. `--deep-fingerprint` enables Cold-tier sources. `--audit-tls` enables the FP-B external oracle (`tls.peet.ws`). The subcommand reuses the `Crawler` plumbing so headers / cookies / TLS state come from a real fetch.

After this PRD lands, the runner is the production per-Job entry point, the Fingerprinter is the only detection authority, the legacy modules are gone, the Warm/Cold tiers run when expected, `SelfFingerprint` reports real outbound identity with drift detection, operators can run `crawlex fingerprint example.com` from the CLI, and `cargo build --tests` produces zero deprecation warnings.

## User Stories

1. As an operator, I want every fetched job in production to flow through `JobRunner::run`, so that the per-Job timings / events / retry decisions that PRD #15 specified actually run on real traffic.
2. As an operator, I want the antibot detection on production fetches to come from `Fingerprinter::analyze_hot`, so that the Evidence-rich Detections from PRD #25 reach the events / storage / SDK instead of being computed in tests only.
3. As an operator, I want `JobOutcome.signals` to carry `Vec<fingerprint::Detection>`, so that downstream consumers (storage, SDK, events) see the same Evidence model the rest of the codebase uses.
4. As an operator, I want `JobOutcome.new_session_state` to carry the real `antibot::SessionState`, so that the Crawler's session-state commit path matches the type the antibot subsystem already uses.
5. As an operator, I want `cargo build --all-features --tests` to emit **zero** deprecation warnings, so that real deprecations introduced in the future are visible against a clean baseline.
6. As an operator, I want `src/discovery/tech_fingerprint.rs` deleted from the tree, so that the 1312-LOC duplicate detection path stops drifting alongside the unified engine.
7. As an operator, I want `src/runner/challenge.rs` deleted, so that no caller in the codebase can accidentally pick the deprecated single-purpose detector over the unified engine.
8. As an operator, I want `src/antibot/block_detector.rs` and the `antibot::detect_*` functions deleted, so that block / challenge detection has exactly one home.
9. As an operator, I want `error::AntibotVendor` and `antibot::ChallengeVendor` deleted (both already `#[deprecated]`), so that `fingerprint::Vendor` is the only Vendor identity in the crate.
10. As an operator, I want `runner::AutoFetcher` + `AutoOutcome` deleted per ADR-0004, so that the dead-code path is removed instead of carrying it forward indefinitely.
11. As an operator, I want `runner::SessionStatePlaceholder` and `ChallengeSignalPlaceholder` deleted, so that the runner module exports only types operators are meant to construct.
12. As an operator running `crawlex fingerprint https://www.drogasil.com.br`, I want a CLI surface that prints the `FingerprintReport` for that host (CDN / WAF / Antibot / CMS / Ecommerce / TLS profile / coherence warnings), so that recon does not require writing Rust.
13. As an operator running `crawlex fingerprint <url> --deep-fingerprint`, I want the Cold tier to fire (DNS / ASN / RDAP), so that I get the full vendor map when I explicitly ask for it.
14. As an operator running `crawlex fingerprint <url> --audit-tls`, I want the external oracle (`tls.peet.ws`) to compare its view of my outbound TLS handshake against the live capture, so that proxy / middlebox alterations surface.
15. As an operator running any crawl, I want `Fingerprinter::analyze_warm` to fire once per host per TTL window, so that h2 SETTINGS / robots.txt / well-known / favicon facts arrive on the second fetch of a host without rerunning per-job.
16. As an operator, I want the Warm-tier cache to honor `Fingerprinter::invalidate(host)`, so that I can force a refresh when I suspect the target rotated its stack mid-crawl.
17. As an operator, I want `SelfFingerprint.ja3_hash` to carry the **real** MD5 hash of our outbound ClientHello, so that comparing against catalog detects BoringSSL regression and proxy alteration.
18. As an operator, I want `SelfFingerprint.h2_settings_fp` to carry the **real** hash of our h2 SETTINGS frame, so that drift versus the Akamai-style fingerprint catalog is detectable.
19. As an operator, I want the catalog values for `Profile::Chrome131Stable` / `Chrome132Stable` / `Chrome149Stable` to be measured hashes rather than `"PLACEHOLDER_*"` strings, so that the drift detection logic produces meaningful answers.
20. As an operator, I want `Coherence.our_ja3_matches_profile` to report `Some(true)` on a healthy run and `Some(false)` on a regression, so that the cross-check actually runs against real data.
21. As an operator, I want `Coherence.their_antibot_compatible_with_our_profile` to surface `false` when the detected antibot vendor is on the flagged list, so that "we look like Chrome131, target is Akamai Bot Manager which currently flags Chrome131" reaches my log before I waste the crawl budget.
22. As an operator, I want NDJSON wire events to carry `fingerprint::Detection` payloads (with the `Evidence` list) on `challenge.detected` and `tech.fingerprint_detected`, so that the SDK / dashboards see the same structured output the engine produces internally.
23. As an operator, I want `cargo test --all-features --test runner_ndjson_regression` to remain byte-stable across **every** commit in this PRD, so that the wire contract from issue #16 is the gating trip wire through the entire cutover.
24. As an operator, I want `cargo test --all-features --test runner_integration` to keep passing through every commit, so that the 4-scenario contract from A5 (healthy 200, 403 challenge, connection refused, blackhole route) protects the cutover the same way the unit tests protect each source.
25. As a contributor, I want `Crawler::process_job` to read as a thin loop that calls `runner.run(...)` and post-processes the outcome, so that the orchestrator's role is visually separated from per-job execution after this PRD.
26. As a contributor, I want `src/crawler.rs` LOC to drop measurably (target: <3000 LOC, down from current 3618), so that the god-module deepening goal from PRD #15 finally has a non-zero result.
27. As a contributor, I want the `JobOutcome.signals: Vec<Detection>` migration to be a single boundary change with no intermediate adapters, so that there is one shape consumers read.
28. As a contributor, I want each phase of the PRD to land as a separate PR that ships green via `cargo test --all-features --no-fail-fast`, so that the strangler discipline from PRD #15 continues.
29. As a contributor, I want `tests/escalation.rs`, `tests/session_scope_policy.rs`, and `tests/antibot_detection.rs` to migrate from `ChallengeVendor`/`AntibotVendor` to `fingerprint::Vendor` cases, so that the deletion phase has no test-suite fallout.
30. As a contributor, I want the engine's Warm-tier dispatch to consume the existing `ImpersonateClient` for robots/well-known/favicon fetches and the existing `discovery::dns` / `discovery::rdap` for Cold-tier, so that no new external dependency is introduced for plumbing.
31. As a contributor, I want the ClientHello capture in `impersonate::tls::build_connector` to be a thin hook returning bytes (under 100 LOC of new code in `impersonate/`), so that the impersonate module retains its current shape.
32. As a contributor, I want a `SelfFingerprint::capture_live()` async helper that returns a populated `SelfFingerprint` after a single HTTPS request, so that the CLI surface can call one method to get the full snapshot.
33. As a contributor adding a new CDN vendor in the future, I want the deletion of `tech_fingerprint` to be irreversible (no `#[cfg]` flag preserving the old path), so that we cannot accidentally regress into the dual-path state.
34. As a contributor, I want the CLI `crawlex fingerprint` output format to follow the JSON shape of `FingerprintReport`, so that operators can pipe through `jq` without bespoke parsing.
35. As a contributor, I want the CLI subcommand to share construction code with `Crawler` (same `ImpersonateClient`, same `Fingerprinter` defaults), so that the engine behavior is identical whether invoked from a crawl or from the CLI.
36. As a future maintainer, I want `CONTEXT.md` to be updated to remove the `Avoid` note that flags `ChallengeDetector` as a separate term, so that the glossary reflects the post-cutover reality.
37. As a future maintainer, I want an ADR (ADR-0005) recording the cutover decision and any irreversible deletions, so that the rationale is one search away if the dual-path state ever returns as a proposal.
38. As a future maintainer, I want the placeholder hashes in `fingerprint::introspect::catalog` to be replaced with real measurements **as part of this PRD**, so that the catalog is not shipped as a deferred-forever stub.

## Implementation Decisions

**Sequence.** Six phases land as separate PRs in this order: (1) `process_job` cutover → (2) `JobOutcome` widening → (3) policy::engine swap to Fingerprinter → (4) legacy detector deletion → (5) Warm/Cold tier dispatch + TargetContext widening → (6) Self-fingerprint live capture + real catalog hashes + CLI surface. Each PR shipping green keeps `main` clean. The strangler discipline from PRD #15 continues — no commit lands with a known regression in the existing test suite, and the NDJSON regression test from issue #16 is the byte-for-byte trip wire across all six phases.

**Cutover boundary.** `Crawler::process_job` keeps the front (queue pull, admission, budgets, robots, dedupe, rate limit) and the back (storage writes, frontier feed, retry decision honoring caps and cooldowns, session-state commit, run-level events). The middle — fetch dispatch, extract, challenge detect, per-attempt events — becomes a single `let outcome = self.runner.run(&job, &ctx).await;`. The Render path stays inline for this PRD's scope (render-specific consumption of `RenderedPage` fields — Web Vitals, screenshot, ScriptSpec outcome — is too entangled to fold into the runner here; a dedicated render-cutover PRD follows).

**SessionContext construction.** `Crawler::process_job` builds the `SessionContext` from existing state per Job: `identity` from the active `ImpersonateClient` profile + `IdentityBundle`; `proxy` from the `ProxyRouter` lease; `session_state` from `Crawler.session_states[session_id]`; `budgets` from `render_budgets` + per-job timing config; `policy` from the resolved `PolicyProfile`. The two remaining placeholder types (`SessionStatePlaceholder` / `ChallengeSignalPlaceholder`) are deleted in Phase 2 as part of widening `JobOutcome`.

**Fingerprinter injection.** `Crawler::new` constructs the `Arc<Fingerprinter>` once and injects it into the `JobRunner` via `JobRunner::with_fingerprinter` (already added in B14). After the cutover, `JobRunner::run` always has the Fingerprinter present; the legacy `ChallengeDetector` fallback path inside the runner becomes unreachable and is removed in Phase 4.

**Policy::engine swap.** `policy::engine` replaces the `ChallengeDetector::new().detect(status, headers, body)` call with `Fingerprinter::analyze_hot(&ctx)` reading `report.antibot`. The change is a single function and a small wrapper to build the `TargetContext` from the same `(status, headers, body)` slice. The `policy::engine` retains all its decision logic (retry caps, host cooldowns, budgets); only the detection source changes.

**Vendor enum collapse.** `error::AntibotVendor` and `antibot::ChallengeVendor` are deleted. Every reference (`runner::JobRunner`, `policy::engine`, `antibot::*`, `crawler.rs`, `tests/{escalation,session_scope_policy,antibot_detection}.rs`) migrates to `fingerprint::Vendor`. The `From` conversions added in B7 are removed in the same commit since the source enums no longer exist.

**Legacy module deletion.** Phase 4 deletes `src/discovery/tech_fingerprint.rs`, `src/runner/challenge.rs`, `src/antibot/block_detector.rs`, `src/runner/fetcher/auto.rs`. The `antibot::detect_from_html` / `detect_from_http_response` / `detect_from_cookies` functions are removed; the `src/antibot/mod.rs` module loses 290 LOC but `bypass`, `cookie_pin`, `solver`, `telemetry`, `recaptcha` submodules stay (action paths, not detection — out of scope per ADR-0003). `src/antibot/signatures.rs` data tables get distributed into the sources that consume them.

**JobOutcome widening.** `JobOutcome.signals: Vec<ChallengeSignalPlaceholder>` becomes `Vec<fingerprint::Detection>`. `JobOutcome.new_session_state: Option<SessionStatePlaceholder>` becomes `Option<antibot::SessionState>`. Both placeholder types are deleted. `JobRunner::run` populates the typed fields directly from `Fingerprinter::analyze_hot` and from any session-state mutation it observes.

**Warm tier dispatch.** `Fingerprinter::analyze_warm(host: &str, client: &ImpersonateClient) -> Vec<Detection>` is a new async method. It fetches `https://{host}/robots.txt`, `/.well-known/security.txt`, `/.well-known/openid-configuration`, and `/favicon.ico` in parallel, then calls each registered Warm-tier source with the fetched bytes via the source's public `classify_*` helper. Results cache in `WarmCache` keyed by `host:port` with 24h TTL.

**Cold tier dispatch.** `Fingerprinter::analyze_cold(host: &str, dns: &DnsClient, rdap: &RdapClient) -> Vec<Detection>` is a new async method gated behind operator opt-in. Resolves A/AAAA/CNAME via existing `discovery::dns`, queries RDAP via existing `discovery::rdap`, and feeds the results to `DnsSource::classify_cnames` / `AsnSource::classify`. Never auto-runs from `process_job`.

**TargetContext widening.** Optional Warm/Cold slots are added: `h2_settings: Option<&[(u16, u32)]>`, `robots_body: Option<&str>`, `well_known: Option<&WellKnownProbe>`, `favicon_md5: Option<&str>`, `peer_cert: Option<&PeerCert>`, `dns: Option<&DnsObservation>`, `asn: Option<&AsnInfo>`. Hot-tier sources continue to ignore the new fields; Warm/Cold sources read what they need.

**ClientHello capture.** A thin hook in `impersonate::tls::build_connector` records the ClientHello bytes assembled by BoringSSL and exposes them via an internal accessor (`ImpersonateClient::last_client_hello() -> Option<&[u8]>`). `SelfFingerprint::capture_live(client: &ImpersonateClient) -> Option<SelfFingerprint>` parses the bytes (TLS record + handshake header + extension blocks) and runs `compute_ja3` / `compute_ja4`. The h2 SETTINGS frame is captured similarly from the first `Send` we emit on each connection.

**Catalog hardening.** During Phase 5, the test suite captures `SelfFingerprint` once per `Profile` against a known endpoint, records the resulting JA3 / JA4 / h2_fp values, and replaces the `PLACEHOLDER_*` entries in `fingerprint::introspect::catalog` with the measured hashes. The recording is a one-shot test (`#[ignore]` by default, run with `--ignored` to refresh) plus a checked-in golden file under `tests/fixtures/`. Drift detection then runs against authoritative values.

**CLI surface.** Phase 6 adds `crate::cli::fingerprint::cmd(args)` that constructs a minimal `Crawler` (HTTP-only, no queue, no storage), drives one fetch of the target URL, runs `Fingerprinter::analyze_hot` followed by Warm-tier when not opted out, and prints the `FingerprintReport` as JSON to stdout. Flags: `--deep-fingerprint` (additionally runs Cold tier), `--audit-tls` (additionally runs the external oracle and includes the result in the JSON output). The existing `crawlex` CLI binary dispatches the subcommand alongside the existing `crawl` / other subcommands.

**Test suite migration.** `tests/escalation.rs`, `tests/session_scope_policy.rs`, `tests/antibot_detection.rs`, and any other test importing `ChallengeVendor` or `AntibotVendor` migrate to `fingerprint::Vendor`. Tests that previously asserted `ChallengeDetector::detect(...)` semantics call `Fingerprinter::analyze_hot(&ctx).antibot` and assert on Detection-level Evidence.

**Performance.** Hot tier per-fetch overhead must stay under 2ms p99 on a 100KB HTML response (already asserted in PRD #25 Phase 4). Warm tier fetches must complete in under 5s per host. Cold tier is opt-in, no budget. `process_job` total latency must not regress vs `main` baseline (measured via the existing `--bench` harness in `tests/`).

## Testing Decisions

**Definition of a good test for this work.** Tests assert observable behavior at the public boundary of each module under change. For the cutover, that is: the `JobOutcome` returned by `JobRunner::run` matches the structured fields that consumer code (storage write, frontier feed) reads. For policy::engine, it is the `Decision` returned for a given `(status, headers, body)` triple. For the Fingerprinter Warm/Cold dispatchers, it is the cached `FingerprintReport` for a host. No assertion on private fields, internal cache layout, or commit-order beyond what the public contract guarantees.

**Modules to be tested.**

1. **`Crawler::process_job` post-cutover.** A new integration test under `tests/` drives a full `Crawler::run` against a wiremock server (the pattern from `tests/mini_http_only.rs` and the existing `runner_ndjson_regression.rs`). Asserts that per-attempt events fire from the runner on the wire, that storage records the same body / signals as before, and that the NDJSON event-kind sequence is byte-identical to the golden from issue #16.

2. **`JobOutcome` typed-field migration.** Existing 44 runner unit tests are updated to assert on `Vec<Detection>` instead of `Vec<ChallengeSignalPlaceholder>` and on `Option<antibot::SessionState>` instead of `Option<SessionStatePlaceholder>`. New tests cover the round-trip from `Fingerprinter::analyze_hot` Detections through `JobOutcome.signals` to `Crawler` post-processing.

3. **`policy::engine` Fingerprinter integration.** Existing policy::engine tests stay; their expected `Decision` outputs do not change because the antibot signal payload that drives `Decision::Render` arrives the same way. New test cases assert that a `403 + cf-chl-bypass` body produces `Decision::Render` with the same `DecisionReason::antibot_challenge` shape, and that the `Vendor` field on the reason uses `fingerprint::Vendor::Cloudflare` directly.

4. **`Fingerprinter::analyze_warm` end-to-end.** New integration test under `tests/` drives the Warm tier against a wiremock-served robots.txt, security.txt, openid-configuration, and favicon. Asserts the `FingerprintReport` populates the corresponding source slots and the cache holds the entry with the configured TTL.

5. **`Fingerprinter::analyze_cold` against mocked DNS / RDAP.** Test against a mocked DNS client returning a `cloudfront.net` CNAME and a mocked RDAP client returning AS13335. Asserts the Cold-tier Detections land in `report.cdn` / `report.dns_hosting`.

6. **`SelfFingerprint::capture_live` against a real HTTPS request.** Test against a wiremock TLS server (`wiremock` 0.6 supports TLS). Captures the ClientHello, runs `compute_ja3`, asserts the hash matches the catalog entry for the active `Profile::Chrome131Stable`. The catalog entry is the one recorded in the one-shot `#[ignore]` test described in Phase 5.

7. **`Coherence` end-to-end.** Constructed scenarios: clean profile + no antibot → both bools `true`; clean profile + Akamai Bot Manager detected → second false + warning; drift introduced via mocked ClientHello with a different cipher list → first false + warning. The drift case proves the live-capture-vs-catalog path produces meaningful answers.

8. **CLI subcommand smoke.** Test in `tests/cli_fingerprint.rs` (new file) drives `crawlex fingerprint http://wiremock-mock-url` and asserts the printed JSON parses as a `FingerprintReport` with `host`, `cdn`, `cookie_pattern`, etc. fields populated. A second case runs with `--audit-tls` against a mocked oracle endpoint and asserts `coherence.our_ja3_matches_profile` populated.

9. **NDJSON regression byte-stable through all six phases.** The trip-wire test from issue #16 continues to pass through every commit. New event payloads (Evidence list on `challenge.detected` / `tech.fingerprint_detected`) are explicitly compatible (new fields, existing fields unchanged) — the golden file gains the new fields in Phase 2 in a deliberate diff commit reviewed alongside the JobOutcome widening.

10. **Test-suite migration trip wire.** Pre-deletion, a "migration parity" suite runs both the old `ChallengeVendor` path and the new `fingerprint::Vendor` path against the same fixtures and asserts identical outputs. The parity test is deleted alongside the legacy enums in Phase 4.

**Prior art in the codebase.**

- `tests/runner_ndjson_regression.rs` — byte-stable trip wire from slice #16. Continues to guard every commit.
- `tests/runner_integration.rs` — 4-scenario test from A5. Continues to pass through every phase.
- `tests/mini_http_only.rs` — pattern for wiremock-backed `Crawler::run` end-to-end tests.
- `tests/escalation.rs` — pattern for `(status, headers, body)` table-driven antibot detection tests; informs the policy::engine swap test cases.
- `src/fingerprint/target/sources/*::tests` — per-source unit-test pattern that the Warm/Cold dispatch tests follow.

## Out of Scope

- The render path's full cutover. `Method::Render` continues to flow through the inline render dispatch in `Crawler::process_job`. The render pipeline's consumption of `RenderedPage` (Web Vitals, screenshot to storage, asset_refs, tech_fingerprint runtime data, ScriptSpec `RunOutcome`) is entangled with the existing `Crawler` post-processing in ways that warrant a dedicated render-cutover PRD. The runner's `FetchOutput::Rendered` variant continues to exist; this PRD does not exercise that variant from `process_job`.
- The deeper `SessionIdentity` unification (architecture review candidate #3 from PRD #15). `SessionContext.identity` carries a thin bundle of `ImpersonateClient + IdentityBundle + cookies` as it does today. Full unification stays out of scope.
- The unified `LifecycleHook` collapsing hooks + events + script (architecture candidate #5). The three layers keep their identities; only the emitter of per-attempt events moves from `Crawler` to `JobRunner`, which already happened structurally in slice #23 and activates on the wire after Phase 1.
- The `DiscoveryBackend` registry (architecture candidate #4). Discovery adapters under `discovery/` continue to operate as today.
- Splitting `config.rs` (architecture candidate #6). The runner reads from the existing flat `Config` via `SessionContext.policy`.
- Frontier admission deepening (architecture candidate #7). Admission stays on `Crawler`.
- New CLI subcommands beyond `crawlex fingerprint`. The existing `crawl` / `scrape` / etc. subcommands are unchanged.
- New external dependencies for fingerprinting computation. JA3 (md-5 already added in B10), JA4 (sha2 already a dep), h2 SETTINGS hashing (sha2). No new crates.
- Public catalog data for FP-B beyond the three Chrome profiles already in `impersonate::Profile`. Adding a Firefox / Safari profile catalog entry is a follow-up PRD if those profiles ever land in `impersonate`.
- Active probing (sending crafted requests to elicit specific responses). All Fingerprinter sources remain passive — they observe responses the Crawler / CLI already makes.
- New configuration surfaces beyond the three flags this PRD adds (`--deep-fingerprint`, `--audit-tls`, and the `crawlex fingerprint` subcommand itself). Anything more granular waits for operator feedback after first ship.

## Further Notes

- Parent PRDs: forattini-dev/crawlex#15 (extraction), #24 (completion), #25 (fingerprint). This PRD closes the deferred items across all three.
- ADRs in scope: ADR-0001 (value-return JobOutcome) and ADR-0002 (AutoFetcher re-queue) remain. ADR-0003 (detection consolidation) and ADR-0004 (delete AutoFetcher) carry into deletion. **ADR-0005** records the cutover decision and the irreversible deletions of the legacy detection modules.
- Honest expectation: Phase 1 (`process_job` cutover) is the largest single diff in this PRD's scope. It touches the middle of `crawler.rs` substantially — ~500 LOC of inline dispatch collapses into one runner call plus post-processing. The strangler PR style from PRD #15 applies: build the SessionContext construction in a precursor commit, swap the inline block in a follow-up commit, verify each step against the NDJSON regression golden.
- After this PRD lands, the JobRunner + Fingerprinter architecture from PRDs #15 / #24 / #25 has **actual production impact**. Until then, the architecture exists in tests only. The deferred-items list goes from 9 to 0, and the C+ grade from the previous wave becomes a measurable A based on the runner-as-entry-point + god-module shrinkage criteria the prior PRDs set.
- Comparison benchmark vs `redblue` should be re-run after Phase 6 (CLI surface) lands: `crawlex fingerprint https://www.drogasil.com.br` vs `rb web asset fingerprint https://www.drogasil.com.br`, side-by-side. The expected outcome — crawlex covering all 9 redblue categories at parity or better plus 12+ additional categories plus `SelfFingerprint` + Coherence — was the success criterion from PRD #25 and is verifiable from the CLI for the first time after this PRD.
- The 333 deprecation warnings ship to zero across this PRD. CI gains a `cargo build --tests` step asserting **zero deprecation warnings** from `crawlex` code (third-party deprecations stay allowed) to prevent regression into a partially-deprecated state.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete the cutover: runner+fingerprinter on production path, legacy deletion, CLI surface #46

Problem Statement

Solution

User Stories

Implementation Decisions

Testing Decisions

Out of Scope

Further Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Complete the cutover: runner+fingerprinter on production path, legacy deletion, CLI surface #46

Description

Problem Statement

Solution

User Stories

Implementation Decisions

Testing Decisions

Out of Scope

Further Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions