fix: prevent task goroutine context from being canceled by HTTP request lifecycle#900
fix: prevent task goroutine context from being canceled by HTTP request lifecycle#900xlyoung wants to merge 1 commit into
Conversation
…st lifecycle Fixes mark3labs#897 When a tool registered with WithTaskSupport(TaskSupportOptional) or AddTaskTool is called asynchronously via streamable HTTP, the task context was derived from the HTTP request context (r.Context()). After the HTTP handler returned the CreateTaskResult response, Go's HTTP server canceled r.Context(), which cascaded into the task context and killed the handler goroutine. Fix: Use context.WithoutCancel(ctx) to detach from the HTTP request lifecycle while preserving context values (session metadata, etc.). The task context is now only canceled by: - The task's own timeout/TTL - Explicit tasks/cancel from the client - Server shutdown Both executeTaskTool and executeRegularToolAsTask are fixed. Updated the corresponding test to verify internal handler cancellation (timeout-based) rather than relying on parent context cancellation, which is no longer propagated to task goroutines.
|
Connected to Huly®: MCP_G-464 |
WalkthroughTask goroutines for asynchronous tool execution are now detached from HTTP request cancellation using ChangesTask context detachment from HTTP lifecycle
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/server.go`:
- Around line 2023-2027: The task context is currently created with
context.WithCancel(context.WithoutCancel(ctx)) which removes parent deadlines so
TTL cleanup only deletes bookkeeping; fix by making the TTL actually cancel
execution: in executeTaskTool / executeRegularToolAsTask create taskCtx from a
non-request parent (e.g., context.Background()) and apply the TTL via
context.WithTimeout or context.WithDeadline to produce a cancellable context
assigned to taskCtx (or ensure scheduleTaskCleanup calls entry.cancelFunc() /
cancelTask before deleting s.tasks); update scheduleTaskCleanup to invoke
entry.cancelFunc() when TTL expires (then remove bookkeeping) so background
goroutines are actually stopped.
In `@server/task_tool_test.go`:
- Around line 533-543: The test "task tool handler returns context.Canceled
before tasks/cancel called" no longer verifies the `#897` regression because the
handler cancels an inner context (innerCtx) while the parent/request context is
never cancelled; add an additional subcase that explicitly cancels the
parent/request context immediately after calling handleToolCall (simulate parent
cancellation), then assert the task is not marked cancelled/failed until the
handler's own cancellation path runs (i.e., the task remains alive until
innerCtx times out); apply the same change to the similar test block around
lines 577-601 so both scenarios (parent-cancel and handler-internal-cancel) are
covered using the existing test helpers and the same assertions on task state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: cbed8717-457f-4f6c-8212-4bc266995e49
📒 Files selected for processing (2)
server/server.goserver/task_tool_test.go
| // Create cancellable context for this task execution. | ||
| // Use WithoutCancel to detach from the HTTP request lifecycle so the task | ||
| // continues running after the HTTP handler returns. Context values (session | ||
| // metadata, etc.) are preserved. | ||
| taskCtx, cancel := context.WithCancel(context.WithoutCancel(ctx)) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Go's standard library, does context.WithoutCancel preserve the parent context's deadline or Done channel?
💡 Result:
In Go's standard library, context.WithoutCancel does not preserve the parent context's deadline, and its Done channel is nil [1][2][3]. According to the official documentation, context.WithoutCancel returns a derived context that points to the parent context but is explicitly designed to not be canceled when the parent is canceled [1][3]. Specifically, the returned context has the following properties: - Deadline: It returns no deadline [1][2]. - Done channel: Its Done channel is nil [1][2]. - Err: It returns no error [1][2]. - Cause: Calling Cause on it returns nil [1][2]. While it does not carry over the cancellation or deadline signals from the parent, it does maintain the ability to access values stored in the parent context through its Value method, as it still points to the parent [4]. This makes it useful for scenarios where you need to preserve request-scoped values for background tasks while detaching those tasks from the lifetime of the original request [4].
Citations:
- 1: https://go.dev/pkg/context/?m=old
- 2: https://pkg.go.dev/context
- 3: https://pkg.go.dev/context@go1.26.3
- 4: https://boldlygo.tech/archive/2025-07-14-context.withoutcancel/
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the relevant segments around the referenced line ranges
sed -n '2000,2060p' server/server.go
echo "----"
sed -n '2080,2160p' server/server.go
echo "----"
sed -n '1920,2005p' server/server.go
echo "----"
sed -n '2480,2605p' server/server.go
# Locate createTask and task execution/cancellation usage
echo "==== createTask definition/uses ===="
rg -n "func (createTask|\\w*createTask)\\b|\\bcreateTask\\b" server/server.go
echo "----"
# Print the function(s) containing createTask, if found
rg -n "createTask\\(" server/server.go | head -n 20Repository: mark3labs/mcp-go
Length of output: 12006
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "scheduleTaskCleanup\\b|cancelTask\\b|tasks/cancel|TaskStatusCancelled|cancelFunc\\b" server/server.go
echo "----"
# Show likely relevant implementations around those matches
# (Using approximate line windows from rg results via grep for line numbers)
python3 - <<'PY'
import re,subprocess,os,sys,json,math
from pathlib import Path
text=subprocess.check_output(["bash","-lc","rg -n \"scheduleTaskCleanup\\\\b|cancelTask\\\\b|tasks/cancel|TaskStatusCancelled|cancelFunc\\\\b\" server/server.go"],text=True)
lines=[]
for line in text.strip().splitlines():
m=re.match(r"(\d+):", line)
if m:
lines.append(int(m.group(1)))
# print windows around each match
for ln in sorted(set(lines)):
start=max(1,ln-30); end=ln+80
print(f"\n===== server/server.go:{start}-{end} (around {ln}) =====")
subprocess.run(["bash","-lc", f"sed -n '{start},{end}p' server/server.go"], check=False)
PYRepository: mark3labs/mcp-go
Length of output: 53361
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Narrow, targeted inspection to avoid truncation
echo "===== executeTaskTool context-withoutcancel sites ====="
sed -n '2010,2055p' server/server.go
echo "----"
sed -n '2105,2175p' server/server.go
echo "===== createTask TTL scheduling call ====="
sed -n '2470,2565p' server/server.go
echo "===== cancelTask implementation ====="
sed -n '2660,2765p' server/server.goRepository: mark3labs/mcp-go
Length of output: 9723
Ensure task TTL/timeout actually cancels background execution (not just bookkeeping)
executeTaskTool/executeRegularToolAsTaskbuildtaskCtxviacontext.WithCancel(context.WithoutCancel(ctx));context.WithoutCancelremoves the parent deadline and makesDonenil, so the task won’t stop when the request context ends.- Cancellation via
tasks/cancelcallsentry.cancelFunc()correctly (cancelTask), butscheduleTaskCleanup(on TTL expiry) onlydelete(s.tasks, taskID)/ writess.expiredTasksand never cancels the running goroutine. - This means work can continue running after TTL expiry and even after the task record is removed, leaving no reliable way to stop it via TTL.
Fix by applying the TTL as a deadline on the task context (from an appropriate non-request parent) and/or invoking entry.cancelFunc() (or cancelTask) when TTL cleanup triggers, before removing bookkeeping.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/server.go` around lines 2023 - 2027, The task context is currently
created with context.WithCancel(context.WithoutCancel(ctx)) which removes parent
deadlines so TTL cleanup only deletes bookkeeping; fix by making the TTL
actually cancel execution: in executeTaskTool / executeRegularToolAsTask create
taskCtx from a non-request parent (e.g., context.Background()) and apply the TTL
via context.WithTimeout or context.WithDeadline to produce a cancellable context
assigned to taskCtx (or ensure scheduleTaskCleanup calls entry.cancelFunc() /
cancelTask before deleting s.tasks); update scheduleTaskCleanup to invoke
entry.cancelFunc() when TTL expires (then remove bookkeeping) so background
goroutines are actually stopped.
| t.Run("task tool handler returns context.Canceled before tasks/cancel called", func(t *testing.T) { | ||
| // This test verifies that if a handler detects context cancellation | ||
| // (e.g., from parent context timeout) and returns ctx.Err() before | ||
| // (e.g., from its own internal timeout) and returns ctx.Err() before | ||
| // tasks/cancel is explicitly called, the task is still marked as cancelled | ||
| // rather than failed. | ||
| // | ||
| // Note: With the fix for #897 (context.WithoutCancel), the task context | ||
| // is now detached from the parent/HTTP request context. This means the | ||
| // handler must manage its own cancellation via the task's context, not | ||
| // rely on parent context cancellation. We test internal handler cancellation | ||
| // by having the handler cancel its own context via a timeout. |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
This no longer tests the #897 regression.
The handler now cancels itself via innerCtx, but the parent/request context is never canceled. If the production code regressed to context.WithCancel(ctx), this subtest would still pass because the inner timeout fires first. Please keep a case that cancels the parent right after handleToolCall and asserts the task stays alive until its own cancellation path fires.
Also applies to: 577-601
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/task_tool_test.go` around lines 533 - 543, The test "task tool handler
returns context.Canceled before tasks/cancel called" no longer verifies the `#897`
regression because the handler cancels an inner context (innerCtx) while the
parent/request context is never cancelled; add an additional subcase that
explicitly cancels the parent/request context immediately after calling
handleToolCall (simulate parent cancellation), then assert the task is not
marked cancelled/failed until the handler's own cancellation path runs (i.e.,
the task remains alive until innerCtx times out); apply the same change to the
similar test block around lines 577-601 so both scenarios (parent-cancel and
handler-internal-cancel) are covered using the existing test helpers and the
same assertions on task state.
|
@xlyoung can you address the coderabbit comments please? |
|
Hi @ezynda3, thanks for the ping. Ive reviewed both CodeRabbit findings: Comment 1: TTL cleanup doesnt cancel executionYoure right that That said, I agree it should be fixed. Two options:
Happy to go with either approach — what do you prefer? Comment 2: Test doesnt verify #897 regressionGood catch. Ill add a subtest that:
This proves the context is actually detached from the parent, not just relying on the inner timeout. Ill push an update soon. Let me know if you want me to proceed with the TTL fix in this PR or keep it as a separate follow-up. |
ezynda3
left a comment
There was a problem hiding this comment.
Review summary
context.WithoutCancel(ctx) is the right fix here — it detaches the task goroutine from the HTTP request lifecycle while preserving context values, and tasks/cancel / TTL / server shutdown remain the only cancellation paths. Applied to both executeTaskTool and executeRegularToolAsTask. Build, vet, go test ./... -race, golangci-lint, and staticcheck are all clean on HEAD; no regressions vs main.
One blocking item: the updated test file is not gofmt-clean. The project's AGENTS.md expects gofmt-clean code; CI's lint job didn't catch this because .golangci.yml doesn't enable a gofmt/gofumpt linter.
Blocking
server/task_tool_test.go — gofmt regression
Inside t.Run("task tool handler returns context.Canceled before tasks/cancel called", ...), the body from // Step 4: Call tool with task augmentation through the closing }) of the subtest lost one tab of indentation (single tab instead of double). The file compiles and tests pass, but gofmt -d server/task_tool_test.go rewrites ~40 lines.
Fix:
gofmt -w server/task_tool_test.go
Non-blocking suggestions
-
Extract a helper. The new comment block +
context.WithCancel(context.WithoutCancel(ctx))is duplicated verbatim inexecuteTaskToolandexecuteRegularToolAsTask. A small helper likenewDetachedTaskContext(ctx) (context.Context, context.CancelFunc)would centralize theWithoutCancelpolicy and prevent future drift. -
Test coverage for the
tasks/cancelpath. The updated subtest only exercises the internal-timeout cancellation path. Worth confirming a sibling subtest still drives cancellation via an explicittasks/cancelRPC, so that route's mapping toTaskStatusCancelledstays covered after theWithoutCancelchange. -
Test step renumbering. Comments now read 1, 2, 4, 5, 6 (Step 3 went away with
cancelParent). Minor, but renumbering helps future readers.
What I ran
go build ./...— clean on base and HEADgo vet ./...— clean on base and HEADgo test ./... -race -count=1— pass on base and HEADgolangci-lint run(project config) — 0 issuesstaticcheck ./...— cleango mod tidy— no diffgofmt -l server/task_tool_test.go server/server.go—server/task_tool_test.goflagged- Static security scan on
+lines (secrets, exec, SQL, gob, unsafe, InsecureSkipVerify, math/rand, panic, debug leftovers) — all clean
Once gofmt is run, this is good to go from my side.
Fixes #897
Problem
When a tool registered with WithTaskSupport(TaskSupportOptional) or AddTaskTool is called asynchronously via streamable HTTP, the task context was derived from the HTTP request context (r.Context()). After the HTTP handler returned the CreateTaskResult response, Go HTTP server canceled r.Context(), which cascaded into the task context and killed the handler goroutine.
Fix
Use context.WithoutCancel(ctx) to detach from the HTTP request lifecycle while preserving context values (session metadata, etc.).
The task context is now only canceled by:
Both executeTaskTool and executeRegularToolAsTask are fixed.
Testing
Updated the corresponding test to verify internal handler cancellation (timeout-based) rather than relying on parent context cancellation. All existing tests pass with -race.
Summary by CodeRabbit