TKAI-2: add session tracing propagation#1
Conversation
| this.tracer = new SimpleTracer({ | ||
| serviceName: 'valet-worker', | ||
| endpoint: this.env.OTEL_EXPORTER_OTLP_ENDPOINT, | ||
| headers: this.env.OTEL_EXPORTER_OTLP_HEADERS, |
There was a problem hiding this comment.
I'm not sure why we need these?
There was a problem hiding this comment.
These are the auth headers / endpoints that receive the telemetry since there's no intermediate grafana agent until this is running in k8s
|
|
||
| private startPromptDispatchSpan(attrs: SpanAttributes): SimpleSpan { | ||
| const queuedAt = this.promptQueue.promptReceivedAt; | ||
| const waitMs = queuedAt > 0 ? Date.now() - queuedAt : 0; |
There was a problem hiding this comment.
Every call site does span.end() then this.flushTracing(), which fires a separate ctx.waitUntil(tracer.flush()) — and each flush() POSTs all accumulated spans to the OTLP endpoint. During a session lifecycle (spawn → dispatch → dispatch → hibernate → restore → dispatch…) this generates a lot of small HTTP requests to Grafana Cloud.
Consider batching: either flush on a timer/threshold inside SimpleTracer (e.g. every 5s or 50 spans), or consolidate flush calls to session-level boundaries rather than per-span. The Runner side already naturally batches because tool/LLM spans accumulate and only flush at the turn finally block — the Worker should do the same.
…-flush msToUnixNano multiplied a JS number by 1_000_000, exceeding MAX_SAFE_INTEGER for present-day epoch milliseconds and silently rounding span timestamps. Adds maxQueuedSpans + scheduleFlush options so a hot tracer can auto-flush in batches once a buffer threshold is hit, with a host hook for ctx.waitUntil. Addresses review feedback on #1 and yourbuddyconner#45.
The prompt handler signature had grown to 14 positional args (messageId, content, model, author, modelPreferences, attachments, channelType, channelId, opencodeSessionId, continuationContext, threadId, replyChannelType, replyChannelId, traceparent), making call sites and the wire shape unreviewable. Introduces PromptDispatch / PromptHandlerFn so onPrompt and handlePrompt take a single typed object. Updates the agent-client dispatcher, PromptHandler.handlePrompt, the bin.ts callback, and the prompt unit tests to use the new shape. Addresses figitaki review on yourbuddyconner#45.
Previously every span ended in a session-agent dispatch path called flushTracing() inline, fanning out to a separate ctx.waitUntil(flush()) per span and POSTing the buffer to OTLP each time. Across spawn → dispatch → dispatch → hibernate → restore lifecycles this generated many small HTTP requests against Grafana Cloud. Configures SimpleTracer with maxQueuedSpans=50 and a scheduleFlush hook that registers auto-flush promises with ctx.waitUntil. Drops the per-span flushTracing() calls at dispatch sites; explicit flushes remain at session-level boundaries (hibernate, terminate, wake guard, child session error finally blocks) where the DO may go idle before the threshold is hit. Addresses f3nry review on #1.
Summary
Test plan
Need to set.
Verification
pnpm --filter @valet/shared typecheckpnpm --filter @valet/runner typecheckpnpm --filter @valet/worker typecheckpnpm --filter @valet/runner test -- src/prompt.test.tspnpm --filter @valet/worker exec vitest run src/durable-objects/prompt-queue.test.ts src/durable-objects/runner-link.test.tsRefs TKAI-2