Skip to content

fix: consolidate waitUntil utility and protect all Slack fire-and-forget chains#2014

Merged
amikofalvy merged 9 commits intomainfrom
fix/waituntil-consolidation-slack
Feb 14, 2026
Merged

fix: consolidate waitUntil utility and protect all Slack fire-and-forget chains#2014
amikofalvy merged 9 commits intomainfrom
fix/waituntil-consolidation-slack

Conversation

@nick-inkeep
Copy link
Collaborator

@nick-inkeep nick-inkeep commented Feb 14, 2026

Summary

  • Shared getWaitUntil() utility in agents-core — consolidates 3 duplicate implementations into a single promise-based lazy singleton that dynamically imports @vercel/functions at runtime
  • Wrapped all 9 Slack fire-and-forget chains with waitUntil — prevents Vercel serverless functions from freezing before background work (agent execution, modal handling, trace flushing) completes
    • 7 chains in /events route handler
    • 2 chains in /commands route handler (handleQuestionCommand, handleRunCommand)
  • Replaced duplicate implementations in createApp.ts, TriggerService.ts, and scheduledTriggers.ts with the shared utility

Context

On Vercel's serverless runtime, functions can freeze/terminate immediately after the HTTP response is sent. The Slack events handler returns HTTP 200 immediately (within 3 seconds per Slack's requirement) then processes agent execution asynchronously. Without waitUntil, these fire-and-forget promises have no guarantee of completion.

The codebase had 3 independent getWaitUntil() implementations — this PR consolidates them and extends coverage to all 9 unprotected Slack handler chains.

Changes

Commit Scope Description
WU-001 agents-core New getWaitUntil() utility with 6 unit tests
WU-002 agents-api Replace 3 duplicate implementations with shared import
WU-003 agents-work-apps Wrap all 7 /events fire-and-forget chains
Type fix agents-core Ambient type declaration for @vercel/functions
Changeset agents-core Patch bump for new utility
Review fix both Promise-based singleton + /commands route protection

Design decisions

  • Promise-based singleton — concurrent callers share the same import promise, avoiding race conditions
  • Dynamic import — resolves at runtime from host app's node_modules, no direct dependency in agents-core
  • Ambient type declaration — follows existing @napi-rs/keyring pattern
  • Graceful degradation — returns undefined in non-Vercel environments
  • Non-breaking — purely additive; no changes to existing behavior, APIs, or data contracts

Test plan

  • 6 unit tests covering all code paths
  • typecheck, build, lint, format all pass for agents-core
  • CI/CD pipeline validation

Generated with Claude Code

nick-inkeep and others added 6 commits February 13, 2026 18:49
… tests

Add a lazy-cached singleton utility for Vercel's waitUntil function.
Consolidates 3 duplicate implementations into one shared location.

- getWaitUntil(): returns waitUntil fn on Vercel, undefined elsewhere
- Graceful degradation if @vercel/functions import fails
- 6 unit tests covering all paths including edge cases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ared utility

Remove local getWaitUntil() implementations from TriggerService.ts,
scheduledTriggers.ts, and createApp.ts. All now import from @inkeep/agents-core.

Behavior is identical: waitUntil on Vercel, await fallback otherwise.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… chains

Wrap all fire-and-forget promise chains in the Slack events handler with
Vercel's waitUntil to prevent serverless function freeze from killing
background work after the HTTP 200 is sent.

Chains wrapped: handleAppMention, handleOpenAgentSelectorModal,
modal_project_select IIFE, handleOpenFollowUpModal, handleMessageShortcut,
handleModalSubmission, handleFollowUpSubmission.

waitUntil is resolved once per request at the top of the handler.
When unavailable (non-Vercel), fire-and-forget works naturally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The shared getWaitUntil utility uses dynamic import('@vercel/functions')
which resolves at runtime from the host app's node_modules. This type
declaration provides TypeScript resolution without adding a direct
dependency, following the existing @napi-rs/keyring pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@changeset-bot
Copy link

changeset-bot bot commented Feb 14, 2026

🦋 Changeset detected

Latest commit: 8c10c2a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-core Patch
@inkeep/agents-api Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch
@inkeep/agents-manage-mcp Patch
@inkeep/agents-mcp Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link

vercel bot commented Feb 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Feb 14, 2026 3:53am
agents-docs Ready Ready Preview, Comment Feb 14, 2026 3:53am
agents-manage-ui Ready Ready Preview, Comment Feb 14, 2026 3:53am

Request Review

@inkeep
Copy link
Contributor

inkeep bot commented Feb 14, 2026

📝 Documentation Review

Reviewed this PR for documentation impact. No documentation updates are needed.

Reasoning: The waitUntil consolidation and Slack fire-and-forget protection are internal implementation details that:

  • Work automatically without user configuration
  • Gracefully degrade in non-Vercel environments
  • Don't change any user-facing APIs or behavior

The existing Vercel deployment and Slack integration docs remain accurate since they focus on user-facing setup and usage rather than internal runtime behavior.

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(4) Total Issues | Risk: Medium

🟠⚠️ Major (2) 🟠⚠️

🟠 1) packages/agents-work-apps/src/slack/services/commands/index.ts:463 Missing waitUntil coverage on /commands route fire-and-forget chains

files: packages/agents-work-apps/src/slack/routes/events.ts:51-90, packages/agents-work-apps/src/slack/services/commands/index.ts:463, packages/agents-work-apps/src/slack/services/commands/index.ts:626

Issue: The PR description states it "wrapped all 7 Slack fire-and-forget chains," but the /commands route handler (events.ts lines 51-90) calls handleCommand() which ultimately invokes executeAgentInBackground() (lines 463-467, 626-630) as fire-and-forget promises. Unlike the /events route which correctly acquires waitUntil at line 109 and wraps all background work, the /commands route has no waitUntil protection.

Why: On Vercel, when the HTTP response is sent (the empty {} returned to Slack), the serverless function can freeze before the background agent execution (30-second API calls) completes. This causes silent failures for /inkeep <question> and /inkeep run "agent" question slash commands — the exact class of problem this PR aims to solve.

Fix: Either:

  1. Add waitUntil protection at the /commands route level similar to /events:
app.post('/commands', async (c) => {
  const waitUntil = await getWaitUntil();
  // ... existing validation ...
  const response = await handleCommand(payload, waitUntil); // pass waitUntil
  return c.json(response);
});
  1. Or wrap background promises in handleQuestionCommand and handleRunCommand:
const backgroundWork = executeAgentInBackground(...).catch(...);
if (waitUntil) waitUntil(backgroundWork);

Refs:


🟠 2) packages/agents-work-apps/src/slack/services/commands/index.ts:474-583 Missing flushTraces() in background agent execution

Issue: The executeAgentInBackground() function does not call flushTraces() in its error handling or finally block. All 7 fire-and-forget chains in the /events route consistently call .finally(() => flushTraces()) to ensure traces are exported before the function context may be frozen.

Why: If the serverless function freezes before natural trace export, spans from command-triggered agent executions may be lost, making debugging and observability unreliable for slash commands.

Fix: Add flushTraces() call at the end of the background execution:

async function executeAgentInBackground(...): Promise<void> {
  try {
    // ... existing code ...
  } catch (error) {
    logger.error({ error, slackUserId: payload.userId }, 'Background agent execution failed');
    // ... existing error handling ...
  } finally {
    await flushTraces();
  }
}

Refs:

🟡 Minor (1) 🟡

Inline Comments:

  • 🟡 Minor: wait-until.ts:20-23 Race condition in lazy singleton initialization under concurrent calls

💭 Consider (1) 💭

💭 1) .changeset/subsequent-orange-cricket.md Changeset message could be more specific

Issue: The message "Add shared getWaitUntil utility for Vercel serverless function lifetime extension" is accurate but could better convey the user-facing benefit.

Why: Consumers of the changelog benefit from understanding the reliability improvement, not just the implementation detail.

Fix: Consider: "Add shared getWaitUntil utility that ensures fire-and-forget operations complete reliably on Vercel serverless"


🚫 REQUEST CHANGES

Summary: The consolidation of getWaitUntil() into a shared utility is well-implemented with good test coverage and follows established patterns. However, the PR description claims to protect "all 7 Slack fire-and-forget chains" but the /commands route has similar unprotected fire-and-forget patterns (executeAgentInBackground) that were not addressed. This creates an inconsistency where /events handlers are protected but /commands handlers remain vulnerable to the same serverless freeze issue. Recommend extending waitUntil coverage to the commands handler before merging.

Discarded (5)
Location Issue Reason Discarded
wait-until.test.ts Missing test for concurrent calls Overlaps with inline comment on race condition; implementation fix is more valuable than test coverage
events.ts Slack handler changes not directly unit tested LOW confidence; integration tests exist and testing implementation details adds limited value
wait-until.test.ts:45 Test doesn't verify returned function works Academic completeness only; the mock verifies correct function is returned
wait-until.ts:33-39 Test-only reset function naming is novel Valid observation but the pattern is reasonable and works well for this use case
Various INFO findings Positive observations about pattern consistency No action needed — these confirm the PR follows good practices
Reviewers (5)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 0 0 0 0 0 0 0
pr-review-sre 4 2 0 0 1 0 1
pr-review-tests 3 0 0 0 0 0 3
pr-review-consistency 6 0 0 0 0 0 6
pr-review-devops 5 0 1 0 0 0 4
Total 18 2 1 0 1 0 14

Note: pr-review-consistency and pr-review-sre both identified the commands handler gap; findings merged to avoid duplication.

Comment on lines 20 to 23
export async function getWaitUntil(): Promise<((promise: Promise<unknown>) => void) | undefined> {
if (_waitUntilResolved) return _waitUntil;
_waitUntilResolved = true;
if (!process.env.VERCEL) return undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Potential race condition in lazy singleton initialization

Issue: The flag _waitUntilResolved is set to true before the async import completes. If two concurrent calls occur before the first import resolves, both will pass the if (_waitUntilResolved) check simultaneously, triggering duplicate imports.

Why: While functionally harmless (both resolve to the same module), this could cause duplicate warning logs on import failure and is a minor inefficiency under high concurrency.

Fix: Consider storing the import promise itself rather than a boolean flag:

let _importPromise: Promise<((promise: Promise<unknown>) => void) | undefined> | undefined;

export async function getWaitUntil(): Promise<((promise: Promise<unknown>) => void) | undefined> {
  if (_importPromise) return _importPromise;
  _importPromise = (async () => {
    if (!process.env.VERCEL) return undefined;
    try {
      const mod = await import('@vercel/functions');
      return mod.waitUntil;
    } catch (e) {
      logger.warn({ error: e }, 'Failed to import @vercel/functions, waitUntil unavailable');
      return undefined;
    }
  })();
  return _importPromise;
}

Refs:

@github-actions github-actions bot deleted a comment from claude bot Feb 14, 2026
- Fix race condition in getWaitUntil by using promise-based singleton
  pattern (concurrent callers now share the same import promise)
- Add waitUntil + flushTraces to both fire-and-forget chains in
  /commands route (handleQuestionCommand, handleRunCommand)
- Ensures slash command agent execution completes on Vercel serverless

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nick-inkeep
Copy link
Collaborator Author

Addressing Review Feedback (7a2133b)

All three findings have been addressed:

1. Missing waitUntil on /commands route (Major) — Fixed. Added waitUntil + flushTraces() wrapping to both fire-and-forget chains in handleQuestionCommand and handleRunCommand. The commands handler now uses the same protection pattern as the events handler.

2. Missing flushTraces in executeAgentInBackground (Major) — Fixed. Added .finally(() => flushTraces()) to both command fire-and-forget chains, consistent with the events handler pattern.

3. Race condition in lazy singleton (Minor) — Fixed. Switched from boolean flag + mutable variable to a promise-based singleton. Concurrent callers now await the same import promise, eliminating the window where a second caller could get undefined during the await import().

4. Changeset message (Consider) — Declined. The current message accurately describes the technical change for package consumers. The changelog targets developers consuming agents-core, who care about what was added, not the product-level benefit.

@nick-inkeep nick-inkeep changed the title fix: consolidate waitUntil utility and protect Slack fire-and-forget chains fix: consolidate waitUntil utility and protect all Slack fire-and-forget chains Feb 14, 2026
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a re-review scoped to the delta since the last automated review (commit 10b34bc1).

✅ Prior Feedback Addressed

All three findings from the previous review have been addressed:

Prior Finding Resolution
🟠 Major: Missing waitUntil on /commands route ✅ Fixed — Added waitUntil wrapping to both handleQuestionCommand (lines 465-477) and handleRunCommand (lines 636-648)
🟠 Major: Missing flushTraces() in command handlers ✅ Fixed — Both command handlers now use .finally(() => flushTraces()) consistent with events.ts
🟡 Minor: Race condition in lazy singleton ✅ Fixed — Switched from boolean flag + mutable variable to promise-based singleton (_importPromise) that concurrent callers share
💭 Consider: Changeset message ⏭️ Declined — Author's reasoning is valid (changelog targets package consumers)

Delta Verification

wait-until.ts (promise-based singleton):

  • The promise is created and cached synchronously before any await point
  • Concurrent callers return the same promise reference
  • No race window possible — clean implementation

commands/index.ts (waitUntil + flushTraces):

  • Pattern matches events.ts: *Work variable naming, .catch().finally(() => flushTraces()), conditional if (waitUntil) waitUntil(work)
  • Minor structural difference (inline getWaitUntil() vs. top-of-handler) is functionally equivalent due to singleton caching and matches patterns in TriggerService.ts

✅ APPROVE

Summary: The delta changes correctly address all prior review feedback. The promise-based singleton eliminates the race condition, and the command handlers now have consistent waitUntil + flushTraces() coverage matching the events handler pattern. Clean implementation, no new issues found. 🎉

Reviewers (3)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 0 0 0 0 0 0 0
pr-review-sre 0 0 0 0 0 0 0
pr-review-consistency 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0

Note: This was a delta-scoped re-review. All reviewers confirmed the fixes are correctly implemented.

@github-actions github-actions bot deleted a comment from claude bot Feb 14, 2026
The package is dynamically imported by agents-core's getWaitUntil()
at runtime. It must remain a dependency of agents-api so the import
resolves, but knip can't trace the dynamic import through the
dependency chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a delta re-review scoped to changes since the last automated review (commit 7a2133b6).

Delta Analysis

Commit reviewed: 2d03674c — "fix(agents-api): ignore @vercel/functions in knip unused dep check"

Aspect Assessment
Configuration correctness ✅ Correct — @vercel/functions is a legitimate runtime dependency
Comment clarity ✅ Clear — explains both what and why
Pattern consistency ✅ Matches existing knip ignore patterns in the monorepo

Why this ignore is necessary:

  • @inkeep/agents-core's getWaitUntil() performs a dynamic import: await import('@vercel/functions')
  • Dynamic imports resolve from the host app's node_modules, not the package declaring them
  • Therefore agents-api must have @vercel/functions installed, but knip's static analysis can't see the transitive usage
  • The ignore with an explanatory comment is the correct solution

Prior Feedback Status

All findings from the initial review have been addressed:

Prior Finding Status
🟠 Missing waitUntil on /commands route ✅ Fixed in 7a2133b6
🟠 Missing flushTraces() in command handlers ✅ Fixed in 7a2133b6
🟡 Race condition in lazy singleton ✅ Fixed in 7a2133b6
💭 Changeset message ⏭️ Declined (author's reasoning valid)

✅ APPROVE

Summary: The delta correctly addresses a knip false-positive by adding @vercel/functions to the ignore list with a clear explanation. All prior review feedback has been addressed. Clean implementation, ready to merge! 🎉

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-devops 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0

Note: This was a delta-scoped re-review of a trivial tooling fix. The DevOps reviewer confirmed the knip configuration is correct.

@github-actions github-actions bot deleted a comment from claude bot Feb 14, 2026
@inkeep
Copy link
Contributor

inkeep bot commented Feb 14, 2026

No documentation updates needed for this PR.

This is an internal infrastructure improvement that consolidates duplicate getWaitUntil() implementations and adds reliability for Vercel serverless deployments. The changes are transparent to users and don't affect any public APIs or user-facing behavior.

The dynamic import resolves at runtime from the host app (agents-api).
Adding a knip config to agents-core to ignore this known pattern,
matching the same approach used in agents-api/knip.config.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a delta re-review scoped to changes since the last automated review (commit 2d03674c).

Delta Analysis

Commit reviewed: 8c10c2ae — "fix(agents-core): ignore @vercel/functions in knip unlisted dep check"

Aspect Assessment
Configuration correctness ✅ Correct — @vercel/functions is correctly ignored as an unlisted dependency
Comment clarity ✅ Clear — explains the dynamic import pattern and host-app resolution
Pattern consistency ✅ Matches existing agents-api/knip.config.ts pattern
Script update ✅ Correct — --config flag properly references the new config file

Why this ignore is necessary:

  • agents-core's getWaitUntil() performs a dynamic import: await import('@vercel/functions')
  • Dynamic imports resolve from the host app's node_modules, not the declaring package
  • @vercel/functions is intentionally NOT a dependency of agents-core — it's a dependency of agents-api
  • Knip's static analysis reports this as "unlisted dependency" — the ignore is the correct solution

Prior Feedback Status

All findings from the initial review were addressed in 7a2133b6:

Prior Finding Status
🟠 Missing waitUntil on /commands route ✅ Fixed
🟠 Missing flushTraces() in command handlers ✅ Fixed
🟡 Race condition in lazy singleton ✅ Fixed
💭 Changeset message ⏭️ Declined (author's reasoning valid)

✅ APPROVE

Summary: The delta correctly fixes a knip false-positive by adding @vercel/functions to the ignore list in agents-core, completing the pair with the existing agents-api ignore added in 2d03674c. All prior review feedback has been addressed. Clean implementation — ship it! 🚀

Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
Total 0 0 0 0 0 0 0

Note: This was a trivial delta (tooling config only) — no sub-reviewers dispatched.

@amikofalvy amikofalvy merged commit 79dffed into main Feb 14, 2026
15 of 17 checks passed
@amikofalvy amikofalvy deleted the fix/waituntil-consolidation-slack branch February 14, 2026 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants