fix: consolidate waitUntil utility and protect all Slack fire-and-forget chains#2014
fix: consolidate waitUntil utility and protect all Slack fire-and-forget chains#2014amikofalvy merged 9 commits intomainfrom
Conversation
… tests Add a lazy-cached singleton utility for Vercel's waitUntil function. Consolidates 3 duplicate implementations into one shared location. - getWaitUntil(): returns waitUntil fn on Vercel, undefined elsewhere - Graceful degradation if @vercel/functions import fails - 6 unit tests covering all paths including edge cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ared utility Remove local getWaitUntil() implementations from TriggerService.ts, scheduledTriggers.ts, and createApp.ts. All now import from @inkeep/agents-core. Behavior is identical: waitUntil on Vercel, await fallback otherwise. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… chains Wrap all fire-and-forget promise chains in the Slack events handler with Vercel's waitUntil to prevent serverless function freeze from killing background work after the HTTP 200 is sent. Chains wrapped: handleAppMention, handleOpenAgentSelectorModal, modal_project_select IIFE, handleOpenFollowUpModal, handleMessageShortcut, handleModalSubmission, handleFollowUpSubmission. waitUntil is resolved once per request at the top of the handler. When unavailable (non-Vercel), fire-and-forget works naturally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The shared getWaitUntil utility uses dynamic import('@vercel/functions')
which resolves at runtime from the host app's node_modules. This type
declaration provides TypeScript resolution without adding a direct
dependency, following the existing @napi-rs/keyring pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 8c10c2a The changes in this PR will be included in the next version bump. This PR includes changesets to release 10 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
📝 Documentation Review Reviewed this PR for documentation impact. No documentation updates are needed. Reasoning: The
The existing Vercel deployment and Slack integration docs remain accurate since they focus on user-facing setup and usage rather than internal runtime behavior. |
There was a problem hiding this comment.
PR Review Summary
(4) Total Issues | Risk: Medium
🟠⚠️ Major (2) 🟠⚠️
🟠 1) packages/agents-work-apps/src/slack/services/commands/index.ts:463 Missing waitUntil coverage on /commands route fire-and-forget chains
files: packages/agents-work-apps/src/slack/routes/events.ts:51-90, packages/agents-work-apps/src/slack/services/commands/index.ts:463, packages/agents-work-apps/src/slack/services/commands/index.ts:626
Issue: The PR description states it "wrapped all 7 Slack fire-and-forget chains," but the /commands route handler (events.ts lines 51-90) calls handleCommand() which ultimately invokes executeAgentInBackground() (lines 463-467, 626-630) as fire-and-forget promises. Unlike the /events route which correctly acquires waitUntil at line 109 and wraps all background work, the /commands route has no waitUntil protection.
Why: On Vercel, when the HTTP response is sent (the empty {} returned to Slack), the serverless function can freeze before the background agent execution (30-second API calls) completes. This causes silent failures for /inkeep <question> and /inkeep run "agent" question slash commands — the exact class of problem this PR aims to solve.
Fix: Either:
- Add
waitUntilprotection at the/commandsroute level similar to/events:
app.post('/commands', async (c) => {
const waitUntil = await getWaitUntil();
// ... existing validation ...
const response = await handleCommand(payload, waitUntil); // pass waitUntil
return c.json(response);
});- Or wrap background promises in
handleQuestionCommandandhandleRunCommand:
const backgroundWork = executeAgentInBackground(...).catch(...);
if (waitUntil) waitUntil(backgroundWork);Refs:
- events.ts:109 — waitUntil acquisition pattern
- events.ts:226 — wrapping pattern
- commands/index.ts:463 — unprotected fire-and-forget
🟠 2) packages/agents-work-apps/src/slack/services/commands/index.ts:474-583 Missing flushTraces() in background agent execution
Issue: The executeAgentInBackground() function does not call flushTraces() in its error handling or finally block. All 7 fire-and-forget chains in the /events route consistently call .finally(() => flushTraces()) to ensure traces are exported before the function context may be frozen.
Why: If the serverless function freezes before natural trace export, spans from command-triggered agent executions may be lost, making debugging and observability unreliable for slash commands.
Fix: Add flushTraces() call at the end of the background execution:
async function executeAgentInBackground(...): Promise<void> {
try {
// ... existing code ...
} catch (error) {
logger.error({ error, slackUserId: payload.userId }, 'Background agent execution failed');
// ... existing error handling ...
} finally {
await flushTraces();
}
}Refs:
🟡 Minor (1) 🟡
Inline Comments:
- 🟡 Minor:
wait-until.ts:20-23Race condition in lazy singleton initialization under concurrent calls
💭 Consider (1) 💭
💭 1) .changeset/subsequent-orange-cricket.md Changeset message could be more specific
Issue: The message "Add shared getWaitUntil utility for Vercel serverless function lifetime extension" is accurate but could better convey the user-facing benefit.
Why: Consumers of the changelog benefit from understanding the reliability improvement, not just the implementation detail.
Fix: Consider: "Add shared getWaitUntil utility that ensures fire-and-forget operations complete reliably on Vercel serverless"
🚫 REQUEST CHANGES
Summary: The consolidation of getWaitUntil() into a shared utility is well-implemented with good test coverage and follows established patterns. However, the PR description claims to protect "all 7 Slack fire-and-forget chains" but the /commands route has similar unprotected fire-and-forget patterns (executeAgentInBackground) that were not addressed. This creates an inconsistency where /events handlers are protected but /commands handlers remain vulnerable to the same serverless freeze issue. Recommend extending waitUntil coverage to the commands handler before merging.
Discarded (5)
| Location | Issue | Reason Discarded |
|---|---|---|
wait-until.test.ts |
Missing test for concurrent calls | Overlaps with inline comment on race condition; implementation fix is more valuable than test coverage |
events.ts |
Slack handler changes not directly unit tested | LOW confidence; integration tests exist and testing implementation details adds limited value |
wait-until.test.ts:45 |
Test doesn't verify returned function works | Academic completeness only; the mock verifies correct function is returned |
wait-until.ts:33-39 |
Test-only reset function naming is novel | Valid observation but the pattern is reasonable and works well for this use case |
| Various INFO findings | Positive observations about pattern consistency | No action needed — these confirm the PR follows good practices |
Reviewers (5)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
pr-review-sre |
4 | 2 | 0 | 0 | 1 | 0 | 1 |
pr-review-tests |
3 | 0 | 0 | 0 | 0 | 0 | 3 |
pr-review-consistency |
6 | 0 | 0 | 0 | 0 | 0 | 6 |
pr-review-devops |
5 | 0 | 1 | 0 | 0 | 0 | 4 |
| Total | 18 | 2 | 1 | 0 | 1 | 0 | 14 |
Note: pr-review-consistency and pr-review-sre both identified the commands handler gap; findings merged to avoid duplication.
| export async function getWaitUntil(): Promise<((promise: Promise<unknown>) => void) | undefined> { | ||
| if (_waitUntilResolved) return _waitUntil; | ||
| _waitUntilResolved = true; | ||
| if (!process.env.VERCEL) return undefined; |
There was a problem hiding this comment.
🟡 Minor: Potential race condition in lazy singleton initialization
Issue: The flag _waitUntilResolved is set to true before the async import completes. If two concurrent calls occur before the first import resolves, both will pass the if (_waitUntilResolved) check simultaneously, triggering duplicate imports.
Why: While functionally harmless (both resolve to the same module), this could cause duplicate warning logs on import failure and is a minor inefficiency under high concurrency.
Fix: Consider storing the import promise itself rather than a boolean flag:
let _importPromise: Promise<((promise: Promise<unknown>) => void) | undefined> | undefined;
export async function getWaitUntil(): Promise<((promise: Promise<unknown>) => void) | undefined> {
if (_importPromise) return _importPromise;
_importPromise = (async () => {
if (!process.env.VERCEL) return undefined;
try {
const mod = await import('@vercel/functions');
return mod.waitUntil;
} catch (e) {
logger.warn({ error: e }, 'Failed to import @vercel/functions, waitUntil unavailable');
return undefined;
}
})();
return _importPromise;
}Refs:
- Fix race condition in getWaitUntil by using promise-based singleton pattern (concurrent callers now share the same import promise) - Add waitUntil + flushTraces to both fire-and-forget chains in /commands route (handleQuestionCommand, handleRunCommand) - Ensures slash command agent execution completes on Vercel serverless Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addressing Review Feedback (7a2133b)All three findings have been addressed: 1. Missing waitUntil on /commands route (Major) — Fixed. Added 2. Missing flushTraces in executeAgentInBackground (Major) — Fixed. Added 3. Race condition in lazy singleton (Minor) — Fixed. Switched from boolean flag + mutable variable to a promise-based singleton. Concurrent callers now 4. Changeset message (Consider) — Declined. The current message accurately describes the technical change for package consumers. The changelog targets developers consuming |
There was a problem hiding this comment.
PR Review Summary
(0) Total Issues | Risk: Low
This is a re-review scoped to the delta since the last automated review (commit 10b34bc1).
✅ Prior Feedback Addressed
All three findings from the previous review have been addressed:
| Prior Finding | Resolution |
|---|---|
🟠 Major: Missing waitUntil on /commands route |
✅ Fixed — Added waitUntil wrapping to both handleQuestionCommand (lines 465-477) and handleRunCommand (lines 636-648) |
🟠 Major: Missing flushTraces() in command handlers |
✅ Fixed — Both command handlers now use .finally(() => flushTraces()) consistent with events.ts |
| 🟡 Minor: Race condition in lazy singleton | ✅ Fixed — Switched from boolean flag + mutable variable to promise-based singleton (_importPromise) that concurrent callers share |
| 💭 Consider: Changeset message | ⏭️ Declined — Author's reasoning is valid (changelog targets package consumers) |
Delta Verification
wait-until.ts (promise-based singleton):
- The promise is created and cached synchronously before any
awaitpoint - Concurrent callers return the same promise reference
- No race window possible — clean implementation
commands/index.ts (waitUntil + flushTraces):
- Pattern matches
events.ts:*Workvariable naming,.catch().finally(() => flushTraces()), conditionalif (waitUntil) waitUntil(work) - Minor structural difference (inline
getWaitUntil()vs. top-of-handler) is functionally equivalent due to singleton caching and matches patterns inTriggerService.ts
✅ APPROVE
Summary: The delta changes correctly address all prior review feedback. The promise-based singleton eliminates the race condition, and the command handlers now have consistent waitUntil + flushTraces() coverage matching the events handler pattern. Clean implementation, no new issues found. 🎉
Reviewers (3)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
pr-review-sre |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
pr-review-consistency |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Note: This was a delta-scoped re-review. All reviewers confirmed the fixes are correctly implemented.
The package is dynamically imported by agents-core's getWaitUntil() at runtime. It must remain a dependency of agents-api so the import resolves, but knip can't trace the dynamic import through the dependency chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
PR Review Summary
(0) Total Issues | Risk: Low
This is a delta re-review scoped to changes since the last automated review (commit 7a2133b6).
Delta Analysis
Commit reviewed: 2d03674c — "fix(agents-api): ignore @vercel/functions in knip unused dep check"
| Aspect | Assessment |
|---|---|
| Configuration correctness | ✅ Correct — @vercel/functions is a legitimate runtime dependency |
| Comment clarity | ✅ Clear — explains both what and why |
| Pattern consistency | ✅ Matches existing knip ignore patterns in the monorepo |
Why this ignore is necessary:
@inkeep/agents-core'sgetWaitUntil()performs a dynamic import:await import('@vercel/functions')- Dynamic imports resolve from the host app's
node_modules, not the package declaring them - Therefore
agents-apimust have@vercel/functionsinstalled, but knip's static analysis can't see the transitive usage - The ignore with an explanatory comment is the correct solution
Prior Feedback Status
All findings from the initial review have been addressed:
| Prior Finding | Status |
|---|---|
🟠 Missing waitUntil on /commands route |
✅ Fixed in 7a2133b6 |
🟠 Missing flushTraces() in command handlers |
✅ Fixed in 7a2133b6 |
| 🟡 Race condition in lazy singleton | ✅ Fixed in 7a2133b6 |
| 💭 Changeset message | ⏭️ Declined (author's reasoning valid) |
✅ APPROVE
Summary: The delta correctly addresses a knip false-positive by adding @vercel/functions to the ignore list with a clear explanation. All prior review feedback has been addressed. Clean implementation, ready to merge! 🎉
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-devops |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Note: This was a delta-scoped re-review of a trivial tooling fix. The DevOps reviewer confirmed the knip configuration is correct.
|
No documentation updates needed for this PR. This is an internal infrastructure improvement that consolidates duplicate |
The dynamic import resolves at runtime from the host app (agents-api). Adding a knip config to agents-core to ignore this known pattern, matching the same approach used in agents-api/knip.config.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
PR Review Summary
(0) Total Issues | Risk: Low
This is a delta re-review scoped to changes since the last automated review (commit 2d03674c).
Delta Analysis
Commit reviewed: 8c10c2ae — "fix(agents-core): ignore @vercel/functions in knip unlisted dep check"
| Aspect | Assessment |
|---|---|
| Configuration correctness | ✅ Correct — @vercel/functions is correctly ignored as an unlisted dependency |
| Comment clarity | ✅ Clear — explains the dynamic import pattern and host-app resolution |
| Pattern consistency | ✅ Matches existing agents-api/knip.config.ts pattern |
| Script update | ✅ Correct — --config flag properly references the new config file |
Why this ignore is necessary:
agents-core'sgetWaitUntil()performs a dynamic import:await import('@vercel/functions')- Dynamic imports resolve from the host app's
node_modules, not the declaring package @vercel/functionsis intentionally NOT a dependency ofagents-core— it's a dependency ofagents-api- Knip's static analysis reports this as "unlisted dependency" — the ignore is the correct solution
Prior Feedback Status
All findings from the initial review were addressed in 7a2133b6:
| Prior Finding | Status |
|---|---|
🟠 Missing waitUntil on /commands route |
✅ Fixed |
🟠 Missing flushTraces() in command handlers |
✅ Fixed |
| 🟡 Race condition in lazy singleton | ✅ Fixed |
| 💭 Changeset message | ⏭️ Declined (author's reasoning valid) |
✅ APPROVE
Summary: The delta correctly fixes a knip false-positive by adding @vercel/functions to the ignore list in agents-core, completing the pair with the existing agents-api ignore added in 2d03674c. All prior review feedback has been addressed. Clean implementation — ship it! 🚀
Reviewers (0)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
| Total | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Note: This was a trivial delta (tooling config only) — no sub-reviewers dispatched.
Summary
getWaitUntil()utility inagents-core— consolidates 3 duplicate implementations into a single promise-based lazy singleton that dynamically imports@vercel/functionsat runtimewaitUntil— prevents Vercel serverless functions from freezing before background work (agent execution, modal handling, trace flushing) completes/eventsroute handler/commandsroute handler (handleQuestionCommand,handleRunCommand)createApp.ts,TriggerService.ts, andscheduledTriggers.tswith the shared utilityContext
On Vercel's serverless runtime, functions can freeze/terminate immediately after the HTTP response is sent. The Slack events handler returns HTTP 200 immediately (within 3 seconds per Slack's requirement) then processes agent execution asynchronously. Without
waitUntil, these fire-and-forget promises have no guarantee of completion.The codebase had 3 independent
getWaitUntil()implementations — this PR consolidates them and extends coverage to all 9 unprotected Slack handler chains.Changes
agents-coregetWaitUntil()utility with 6 unit testsagents-apiagents-work-apps/eventsfire-and-forget chainsagents-core@vercel/functionsagents-core/commandsroute protectionDesign decisions
node_modules, no direct dependency inagents-core@napi-rs/keyringpatternundefinedin non-Vercel environmentsTest plan
Generated with Claude Code