Parent
#15
What to build
Promote JobRunner::run to the per-Job entry point. Crawler::process_job becomes: build SessionContext, call runner.run(job, ctx).await, then post-process the returned JobOutcome (storage write, frontier feed, retry decision based on RetryDecision, commit new_session_state if present).
JobRunner is held as Arc<JobRunner>, constructed once at run start with Arc<dyn Fetcher> (a dispatcher that picks Spoof/Render/Auto by Method), Arc<Extractor>, Arc<ChallengeDetector>, Arc<EventSink>, Arc<HookRegistry>. Shared across all workers. Send + Sync, no per-call mutable state on self.
SessionContext carries: Arc<SessionIdentity> (thin placeholder bundling current ImpersonateClient + IdentityBundle + cookies — full unification is out of scope), optional ProxyLease, SessionState, JobBudgets, Arc<PolicyProfile>.
JobOutcome shape locked: result: Result<FetchSuccess, JobError>, timings: JobTimings (populated on both branches), retry: RetryDecision, new_session_state: Option<SessionState>. JobError variants: Network, Timeout, RenderFailed, ChallengeUnrecoverable, BudgetExhausted, Cancelled.
Retry policy stays on Crawler: it reads RetryDecision::Suggest { reason, backoff_hint } and applies retry caps, host cooldowns, and budget accounting before deciding to re-enqueue.
Acceptance criteria
Blocked by
Parent
#15
What to build
Promote
JobRunner::runto the per-Job entry point.Crawler::process_jobbecomes: buildSessionContext, callrunner.run(job, ctx).await, then post-process the returnedJobOutcome(storage write, frontier feed, retry decision based onRetryDecision, commitnew_session_stateif present).JobRunneris held asArc<JobRunner>, constructed once at run start withArc<dyn Fetcher>(a dispatcher that picks Spoof/Render/Auto byMethod),Arc<Extractor>,Arc<ChallengeDetector>,Arc<EventSink>,Arc<HookRegistry>. Shared across all workers.Send + Sync, no per-call mutable state onself.SessionContextcarries:Arc<SessionIdentity>(thin placeholder bundling currentImpersonateClient+IdentityBundle+ cookies — full unification is out of scope), optionalProxyLease,SessionState,JobBudgets,Arc<PolicyProfile>.JobOutcomeshape locked:result: Result<FetchSuccess, JobError>,timings: JobTimings(populated on both branches),retry: RetryDecision,new_session_state: Option<SessionState>.JobErrorvariants:Network,Timeout,RenderFailed,ChallengeUnrecoverable,BudgetExhausted,Cancelled.Retry policy stays on
Crawler: it readsRetryDecision::Suggest { reason, backoff_hint }and applies retry caps, host cooldowns, and budget accounting before deciding to re-enqueue.Acceptance criteria
JobRunner::runis the per-Job entry point called fromCrawler::process_jobSessionContextconstructed byCrawlerbefore each callJobOutcomereturned by value;Crawlerpost-processes storage, frontier, retry, session-state commitJobErrorenum defined with the listed variantsRetryDecision::Suggestreasons map cleanly:EscalateToRender,Timeout,Network,ChallengeRecoverableCrawlerretry path honors retry caps, host cooldowns, budgets when interpretingRetryDecision::Suggest#[cfg(test)] mod testsforJobRunner::runwith fakeFetcher: assert timings populated on failure,new_session_statereturned on challenge#[cfg(test)] mod testsfor theJobError→RetryDecisionmappingcargo test --all-featuresgreensrc/crawler.rsLOC measurably reduced (track delta in PR description)Blocked by