feat(runner): send runner telemetry to persist#6209
Conversation
There was a problem hiding this comment.
2 issues found across 15 files
Confidence score: 3/5
- There is concrete reliability risk in
packages/runner/lib/telemetry.ts: the batchprocesspath fire-and-forgets delivery, so failed sends are not retried and shutdown may complete before telemetry is persisted. packages/persist/lib/routes/runner/telemetry/postTelemetry.tssums byte counters without safe bounds, creating overflow/precision-loss risk for large values and potentially corrupting billing/monitoring totals.- Given two medium-high severity findings (6-7/10) with strong confidence, this looks mergeable only with caution because user-visible telemetry accuracy and durability can regress.
- Pay close attention to
packages/runner/lib/telemetry.tsandpackages/persist/lib/routes/runner/telemetry/postTelemetry.ts- async delivery/flush semantics and bounded counter arithmetic need hardening.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/runner/lib/telemetry.ts">
<violation number="1" location="packages/runner/lib/telemetry.ts:46">
P1: The batch `process` function fire-and-forgets telemetry delivery, so failures are not retried and shutdown can return before telemetry is actually persisted.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
ee21bb1 to
31f03a8
Compare
There was a problem hiding this comment.
2 issues found across 18 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/shared/lib/seeders/plan.seeder.ts">
<violation number="1" location="packages/shared/lib/seeders/plan.seeder.ts:52">
P2: Test seeder defaults `export_runner_telemetry` to `true`, but the production default is `false`. This mismatch means tests using `getTestPlan()` exercise the enabled path while production plans start with telemetry off, potentially masking issues.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Tip: cubic used a learning from your PR history. Let your coding agent read cubic learnings directly with the cubic MCP.
Fix all with cubic | Re-trigger cubic
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36262842a5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 66f7979c67
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
66f7979 to
6696e7c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6696e7c02d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 451d059017
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Hey @TBonnin, I've addressed the open reviews in the last commits; here's an AI generated summary of their contents:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 80d3d0ad4a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
3 issues found across 9 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/shared/lib/seeders/plan.seeder.ts">
<violation number="1" location="packages/shared/lib/seeders/plan.seeder.ts:52">
P2: Test seeder defaults `export_runner_telemetry` to `true`, but the production default is `false`. This mismatch means tests using `getTestPlan()` exercise the enabled path while production plans start with telemetry off, potentially masking issues.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 10f374e195
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/shared/lib/seeders/plan.seeder.ts">
<violation number="1" location="packages/shared/lib/seeders/plan.seeder.ts:52">
P2: Test seeder defaults `export_runner_telemetry` to `true`, but the production default is `false`. This mismatch means tests using `getTestPlan()` exercise the enabled path while production plans start with telemetry off, potentially masking issues.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4f09ae45d1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
2 issues found across 2 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
6a1c50c to
3e45487
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3e454877fa
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| async shutdown({ timeoutMs = 5_000 }: { timeoutMs?: number } = {}) { | ||
| if (!batcher) return Ok(undefined); | ||
|
|
||
| const res = await batcher.shutdown({ timeoutMs }); |
There was a problem hiding this comment.
Retry final telemetry flushes before disposing recorder
When this shutdown runs at the end of a Lambda execution or during resource eviction, a transient persist failure makes Batcher.shutdown() return after the first failed flush() even though flush() has requeued the batch for retry (packages/utils/lib/batcher.ts lines 157-160). Because the recorder is then disposed/no longer scheduled, those queued telemetry events are lost instead of using the configured retry budget; keep flushing until the queue is empty, retries are exhausted, or the timeout is reached.
Useful? React with 👍 / 👎.
|
You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 90637529d4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| logger.warning('Failed to release all locks', { reason: err }); | ||
| } | ||
| await telemetryRecorder | ||
| .shutdown({ timeoutMs: 5000 }) |
There was a problem hiding this comment.
Bound telemetry shutdown while flush is in flight
When telemetry export is enabled and the persist telemetry POST stalls instead of failing quickly, this timeoutMs: 5000 does not actually bound runner completion: Batcher.shutdown() only checks the deadline before calling await this.flush(), and postRunnerTelemetry ultimately uses fetch without an abort timeout, so exec() can hang indefinitely after the user function has finished. Fresh evidence since the earlier timeout thread is that the current shutdown path now delegates to Batcher.shutdown() with no in-flight timeout/race around the flush.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid point, but it won't be relevant until we turn on the flag that gates telemetry exporting, which we're unlikely to do until Monday. Anyways, I have a fix at the ready, will move it to the last PR that's based off of this one.
The TelemetryBag can be silently dropped on runtime errors, making it unsuitable for sensitive telemetry, like billing data.
Introduces an alternative: runner collects telemetry events, aggregates by integration/connection, and flushes to persist via HTTP on interval. Balances reliability and complexity without introducing new transports or requiring major SDK refactors.