Skip to content

feat(runner): send runner telemetry to persist#6209

Merged
ErickRDev merged 22 commits into
masterfrom
erickr/NAN-5668/runner-persist-telemetry
Jun 1, 2026
Merged

feat(runner): send runner telemetry to persist#6209
ErickRDev merged 22 commits into
masterfrom
erickr/NAN-5668/runner-persist-telemetry

Conversation

@ErickRDev

Copy link
Copy Markdown
Contributor

The TelemetryBag can be silently dropped on runtime errors, making it unsuitable for sensitive telemetry, like billing data.

Introduces an alternative: runner collects telemetry events, aggregates by integration/connection, and flushes to persist via HTTP on interval. Balances reliability and complexity without introducing new transports or requiring major SDK refactors.

@linear

linear Bot commented May 21, 2026

Copy link
Copy Markdown

NAN-5668

@ErickRDev ErickRDev self-assigned this May 21, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 15 files

Confidence score: 3/5

  • There is concrete reliability risk in packages/runner/lib/telemetry.ts: the batch process path fire-and-forgets delivery, so failed sends are not retried and shutdown may complete before telemetry is persisted.
  • packages/persist/lib/routes/runner/telemetry/postTelemetry.ts sums byte counters without safe bounds, creating overflow/precision-loss risk for large values and potentially corrupting billing/monitoring totals.
  • Given two medium-high severity findings (6-7/10) with strong confidence, this looks mergeable only with caution because user-visible telemetry accuracy and durability can regress.
  • Pay close attention to packages/runner/lib/telemetry.ts and packages/persist/lib/routes/runner/telemetry/postTelemetry.ts - async delivery/flush semantics and bounded counter arithmetic need hardening.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/runner/lib/telemetry.ts">

<violation number="1" location="packages/runner/lib/telemetry.ts:46">
P1: The batch `process` function fire-and-forgets telemetry delivery, so failures are not retried and shutdown can return before telemetry is actually persisted.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread packages/runner/lib/telemetry.ts Outdated
Comment thread packages/persist/lib/routes/runner/telemetry/postTelemetry.ts Outdated
@ErickRDev ErickRDev force-pushed the erickr/NAN-5668/runner-persist-telemetry branch from ee21bb1 to 31f03a8 Compare May 22, 2026 15:05

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 18 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/shared/lib/seeders/plan.seeder.ts">

<violation number="1" location="packages/shared/lib/seeders/plan.seeder.ts:52">
P2: Test seeder defaults `export_runner_telemetry` to `true`, but the production default is `false`. This mismatch means tests using `getTestPlan()` exercise the enabled path while production plans start with telemetry off, potentially masking issues.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.
Tip: cubic used a learning from your PR history. Let your coding agent read cubic learnings directly with the cubic MCP.

Fix all with cubic | Re-trigger cubic

Comment thread packages/shared/lib/seeders/plan.seeder.ts
Comment thread packages/runner/lib/telemetry.ts Outdated
@ErickRDev ErickRDev marked this pull request as ready for review May 22, 2026 23:19
@superagent-security superagent-security Bot added the contributor:verified Contributor passed trust analysis. label May 22, 2026
@ErickRDev ErickRDev requested review from TBonnin and rossmcewan May 22, 2026 23:19
@superagent-security superagent-security Bot added the pr:verified PR passed security analysis. label May 22, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36262842a5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/runner/lib/telemetry.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 66f7979c67

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/runner/lib/telemetry.ts Outdated
@ErickRDev ErickRDev force-pushed the erickr/NAN-5668/runner-persist-telemetry branch from 66f7979 to 6696e7c Compare May 26, 2026 12:09

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6696e7c02d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/runner/lib/telemetry.ts Outdated
Comment thread packages/utils/lib/environment/parse.ts Outdated
Comment thread packages/runner/lib/clients/persist.ts Outdated
Comment thread packages/persist/lib/server.ts Outdated
Comment thread packages/runner/lib/exec.ts
Comment thread packages/runner/lib/telemetry.ts Outdated
Comment thread packages/persist/lib/routes/runner/telemetry/postTelemetry.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 451d059017

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/runner/lib/telemetry.ts Outdated
Comment thread packages/utils/lib/environment/parse.ts
@ErickRDev

Copy link
Copy Markdown
Contributor Author

Hey @TBonnin, I've addressed the open reviews in the last commits; here's an AI generated summary of their contents:

868d6d6 — Resource pool at runner process level

server.ts now maintains a Map<string, EnvironmentResources> keyed on environmentId:secretKey:exportRunnerTelemetry. Instead of creating a new PersistClient + TelemetryRecorder per task execution, getOrCreateEnvironmentResources() reuses existing ones. Pool entries are evicted after RUNNER_RESOURCE_POOL_EVICT_IDLE_MS (default 5 min) of inactivity. On SIGTERM, all recorders are shut down. exec() gains optional persistClient and telemetryRecorder params — when provided by the server, it skips creating its own and also skips shutting them down at the end.

64bf638syncId dimension in telemetry tags

syncId added to WithTags in types/persist/api.ts, to the persist validation schema, and populated from this.syncId in the SDK. Grouping key in telemetry.ts updated to include syncId so events for different syncs don't collapse together.

49504ce — Retry on telemetry send failure

createTelemetryBatcher's process callback now throws the error instead of swallowing it — this signals Batcher to retry the batch. Log level downgraded from errorwarning with "might retry later" message.

80d3d0a — Split runner vs Lambda telemetry config

TelemetryRecorder refactored to accept a TelemetryRecorderConfig object (batch size, flush interval, recordingEnabled). Two separate env var pairs introduced: RUNNER_TELEMETRY_* (defaults: 50 / 5s) for the long-lived runner server, LAMBDA_TELEMETRY_* (defaults: 10 / 2s) for Lambda-style invocations via exec() without an injected recorder. Both capped at max(1000).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80d3d0ad4a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/runner/lib/sdk/sdk.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 9 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/shared/lib/seeders/plan.seeder.ts">

<violation number="1" location="packages/shared/lib/seeders/plan.seeder.ts:52">
P2: Test seeder defaults `export_runner_telemetry` to `true`, but the production default is `false`. This mismatch means tests using `getTestPlan()` exercise the enabled path while production plans start with telemetry off, potentially masking issues.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread packages/runner/lib/telemetry.ts
Comment thread packages/runner/lib/server.ts Outdated
Comment thread packages/runner/lib/server.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10f374e195

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/runner/lib/server.ts Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/shared/lib/seeders/plan.seeder.ts">

<violation number="1" location="packages/shared/lib/seeders/plan.seeder.ts:52">
P2: Test seeder defaults `export_runner_telemetry` to `true`, but the production default is `false`. This mismatch means tests using `getTestPlan()` exercise the enabled path while production plans start with telemetry off, potentially masking issues.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread packages/runner/lib/server.ts Outdated
@superagent-security superagent-security Bot added pr:flagged PR flagged for review by security analysis. and removed pr:verified PR passed security analysis. labels May 27, 2026

@superagent-security superagent-security Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superagent found 1 security concern(s).

Comment thread packages/runner/lib/server.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f09ae45d1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/runner/lib/telemetry.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread packages/runner/lib/app.ts Outdated
Comment thread packages/runner/lib/server.ts Outdated
@superagent-security superagent-security Bot added pr:verified PR passed security analysis. and removed pr:flagged PR flagged for review by security analysis. labels May 27, 2026
@ErickRDev ErickRDev force-pushed the erickr/NAN-5668/runner-persist-telemetry branch from 6a1c50c to 3e45487 Compare May 28, 2026 16:39
@superagent-security superagent-security Bot removed the contributor:verified Contributor passed trust analysis. label May 28, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e454877fa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

async shutdown({ timeoutMs = 5_000 }: { timeoutMs?: number } = {}) {
if (!batcher) return Ok(undefined);

const res = await batcher.shutdown({ timeoutMs });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Retry final telemetry flushes before disposing recorder

When this shutdown runs at the end of a Lambda execution or during resource eviction, a transient persist failure makes Batcher.shutdown() return after the first failed flush() even though flush() has requeued the batch for retry (packages/utils/lib/batcher.ts lines 157-160). Because the recorder is then disposed/no longer scheduled, those queued telemetry events are lost instead of using the configured retry budget; keep flushing until the queue is empty, retries are exhausted, or the timeout is reached.

Useful? React with 👍 / 👎.

@ErickRDev ErickRDev requested a review from TBonnin May 28, 2026 17:58
@cubic-dev-ai

cubic-dev-ai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment @cubic-dev-ai review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 90637529d4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

logger.warning('Failed to release all locks', { reason: err });
}
await telemetryRecorder
.shutdown({ timeoutMs: 5000 })

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound telemetry shutdown while flush is in flight

When telemetry export is enabled and the persist telemetry POST stalls instead of failing quickly, this timeoutMs: 5000 does not actually bound runner completion: Batcher.shutdown() only checks the deadline before calling await this.flush(), and postRunnerTelemetry ultimately uses fetch without an abort timeout, so exec() can hang indefinitely after the user function has finished. Fresh evidence since the earlier timeout thread is that the current shutdown path now delegates to Batcher.shutdown() with no in-flight timeout/race around the flush.

Useful? React with 👍 / 👎.

@ErickRDev ErickRDev May 28, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, but it won't be relevant until we turn on the flag that gates telemetry exporting, which we're unlikely to do until Monday. Anyways, I have a fix at the ready, will move it to the last PR that's based off of this one.

@ErickRDev ErickRDev added this pull request to the merge queue Jun 1, 2026
Merged via the queue into master with commit 9a98837 Jun 1, 2026
27 checks passed
@ErickRDev ErickRDev deleted the erickr/NAN-5668/runner-persist-telemetry branch June 1, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants