refactor(scheduler): decouple from orchestrator#6276
Conversation
There was a problem hiding this comment.
7 issues found across 20 files
Confidence score: 2/5
- Several high-confidence issues are user-impacting (7–8/10), so merge risk is elevated until key guards/validation are added.
- Most severe:
packages/scheduler/lib/daemons/cleaning/cleaning.daemon.tstakes publicolderThanDaysinto raw SQL intervals without validation, which can break cleanup queries or delete unintended rows. - Daemon resilience is at risk in
packages/scheduler/lib/daemons/monitoring/backpressure-monitoring.daemon.tsandpackages/scheduler/lib/daemons/scheduling/scheduling.daemon.ts: unguardedonEventlistener/telemetry failures can crash processing or roll back scheduling work. - Pay close attention to
packages/scheduler/lib/daemons/cleaning/cleaning.daemon.ts,packages/scheduler/lib/daemons/monitoring/backpressure-monitoring.daemon.ts,packages/scheduler/lib/daemons/scheduling/scheduling.daemon.ts, andpackages/orchestrator/lib/app.ts- input validation, hook isolation, and SSL fallback handling are the main stability risks.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/scheduler/lib/utils/logger.ts">
<violation number="1" location="packages/scheduler/lib/utils/logger.ts:5">
P2: Defaulting the package logger to `noopLogger` silently drops scheduler and migration logs for callers that do not inject a logger.</violation>
</file>
<file name="packages/scheduler/lib/scheduler.ts">
<violation number="1" location="packages/scheduler/lib/scheduler.ts:69">
P2: Logger injection is not instance-safe here because this overwrites the package-global logger for every Scheduler in the process.</violation>
</file>
<file name="packages/orchestrator/lib/app.ts">
<violation number="1" location="packages/orchestrator/lib/app.ts:37">
P1: Use the shared DB SSL flag when `databaseUrl` falls back to `NANGO_DATABASE_URL`/`NANGO_DB_*`; otherwise SSL-required deployments can stop connecting.</violation>
</file>
<file name="packages/orchestrator/lib/scheduler-config.ts">
<violation number="1" location="packages/orchestrator/lib/scheduler-config.ts:37">
P2: Avoid tagging this metric with the full `groupKey`; it creates one time series per queue and can explode monitoring cardinality.
(Based on your team's feedback about limiting high-cardinality monitoring dimensions to essentials.) [FEEDBACK_USED]</violation>
</file>
Tip: instead of fixing issues one by one fix them all with cubic
Re-trigger cubic
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/scheduler/lib/utils/logger.ts">
<violation number="1" location="packages/scheduler/lib/utils/logger.ts:5">
P2: Defaulting the package logger to `noopLogger` silently drops scheduler and migration logs for callers that do not inject a logger.</violation>
</file>
<file name="packages/scheduler/lib/scheduler.ts">
<violation number="1" location="packages/scheduler/lib/scheduler.ts:69">
P2: Logger injection is not instance-safe here because this overwrites the package-global logger for every Scheduler in the process.</violation>
</file>
<file name="packages/orchestrator/lib/app.ts">
<violation number="1" location="packages/orchestrator/lib/app.ts:37">
P1: Use the shared DB SSL flag when `databaseUrl` falls back to `NANGO_DATABASE_URL`/`NANGO_DB_*`; otherwise SSL-required deployments can stop connecting.</violation>
</file>
<file name="packages/orchestrator/lib/scheduler-config.ts">
<violation number="1" location="packages/orchestrator/lib/scheduler-config.ts:37">
P2: Avoid tagging this metric with the full `groupKey`; it creates one time series per queue and can explode monitoring cardinality.
(Based on your team's feedback about limiting high-cardinality monitoring dimensions to essentials.) [FEEDBACK_USED]</violation>
</file>
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
There was a problem hiding this comment.
6 issues found across 21 files
Confidence score: 2/5
- Merge risk is elevated because
packages/scheduler/lib/db/client.tsmapsssl: truetorejectUnauthorized: false, which can silently disable certificate verification when callers expect standard TLS validation. - There are concrete behavior/accuracy risks in scheduling metrics:
packages/scheduler/lib/daemons/scheduling/scheduling.daemon.tscan repeatedly emittask_droppedfor still-due schedules, andpackages/scheduler/lib/scheduler.tsmay emit the same event before a transaction commits. - Configuration and query-safety concerns add additional risk:
packages/scheduler/lib/daemons/cleaning/cleaning.daemon.tsmay spin in a near-tight loop with invalidtickIntervalMs,packages/scheduler/lib/models/tasks.tsinterpolatesbatchSizeinto raw SQL, andpackages/orchestrator/lib/scheduler-config.tscan create high-cardinality metric tags. - Pay close attention to
packages/scheduler/lib/db/client.ts,packages/scheduler/lib/daemons/scheduling/scheduling.daemon.ts,packages/scheduler/lib/daemons/cleaning/cleaning.daemon.ts,packages/scheduler/lib/models/tasks.ts,packages/orchestrator/lib/scheduler-config.ts,packages/scheduler/lib/scheduler.ts- TLS verification semantics, metric/event correctness, config validation, and SQL parameterization need fixes before relying on this behavior.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/scheduler/lib/utils/logger.ts">
<violation number="1" location="packages/scheduler/lib/utils/logger.ts:5">
P2: Defaulting the package logger to `noopLogger` silently drops scheduler and migration logs for callers that do not inject a logger.</violation>
</file>
<file name="packages/scheduler/lib/scheduler.ts">
<violation number="1" location="packages/scheduler/lib/scheduler.ts:69">
P2: Logger injection is not instance-safe here because this overwrites the package-global logger for every Scheduler in the process.</violation>
</file>
<file name="packages/orchestrator/lib/app.ts">
<violation number="1" location="packages/orchestrator/lib/app.ts:37">
P1: Use the shared DB SSL flag when `databaseUrl` falls back to `NANGO_DATABASE_URL`/`NANGO_DB_*`; otherwise SSL-required deployments can stop connecting.</violation>
</file>
<file name="packages/orchestrator/lib/scheduler-config.ts">
<violation number="1" location="packages/orchestrator/lib/scheduler-config.ts:37">
P2: Avoid tagging this metric with the full `groupKey`; it creates one time series per queue and can explode monitoring cardinality.
(Based on your team's feedback about limiting high-cardinality monitoring dimensions to essentials.) [FEEDBACK_USED]</violation>
<violation number="2" location="packages/orchestrator/lib/scheduler-config.ts:37">
P2: Don't tag `ORCH_QUEUE_BACKPRESSURE` with the raw `groupKey`; it creates one time series per queue and will drive high-cardinality metrics costs/noise. Keep only the bounded `primitive` tag here and log the full key separately if you need it for debugging.
(Based on your team's feedback about limiting high-cardinality monitoring dimensions to essentials.) [FEEDBACK_USED]</violation>
</file>
<file name="packages/scheduler/lib/daemons/scheduling/scheduling.daemon.ts">
<violation number="1" location="packages/scheduler/lib/daemons/scheduling/scheduling.daemon.ts:97">
P2: This `task_dropped` event is emitted for schedules that stay due, so one capped schedule will be counted again on every scheduling tick until the backlog clears.</violation>
</file>
<file name="packages/scheduler/lib/daemons/cleaning/cleaning.daemon.ts">
<violation number="1" location="packages/scheduler/lib/daemons/cleaning/cleaning.daemon.ts:33">
P2: Validate `tickIntervalMs` here as well. Since it is now caller-supplied config, negative/NaN values turn the cleanup daemon into a near-tight loop.</violation>
</file>
Tip: instead of fixing issues one by one fix them all with cubic
Re-trigger cubic
747cd33 to
d376a72
Compare
TBonnin
left a comment
There was a problem hiding this comment.
❤️
I would love to see the backpressure monitoring move outside of scheduler since it is a Nango functions executions / orchesrator problem, not a task scheduling problem. I am fine if you want to do it in next PR
| cleaningTickIntervalMs: envs.ORCHESTRATOR_CLEANING_TICK_INTERVAL_MS, | ||
| monitoringTickIntervalMs: envs.ORCHESTRATOR_BACKPRESSURE_MONITORING_TICK_INTERVAL_MS, | ||
| cleaningOlderThanDays: envs.ORCHESTRATOR_CLEANING_OLDER_THAN_DAYS, | ||
| monitoringTopN: envs.ORCHESTRATOR_BACKPRESSURE_MONITORING_TOP_N |
There was a problem hiding this comment.
can we move the entire backpressure monitoring out of scheduler and into orchestrator? Fine if you want to do it in a separate PR
There was a problem hiding this comment.
Sounds like it makes sense, butI haven't looked too much at it's workings. Is it too specific to the orchestrator?
There was a problem hiding this comment.
2 issues found across 4 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
…ts past tx commit
The scheduler package currently has a some orchestrator coupling:
ORCHESTRATOR_*env varsORCH_*metricsThis PR removes that coupling by:
And adapts orchestrator to consume this with no behavior change.
Known design tradeoffs (per cubic review)
loggerdefaults to anoopLoggerrather thangetLogger('scheduler')so the package stays generic. The orchestrator passes a logger so prod is unchanged; anyone constructing a Scheduler elsewhere should passlogger: getLogger('scheduler')(or anyStrictLogger) if they want logs.