fix(realtime): downgrade benign WS close codes (1000/1001/1006) from ERROR to WARNING by moshemorad · Pull Request #2038 · HolmesGPT/holmesgpt

moshemorad · 2026-05-14T14:16:18Z

What

WebSocket close codes 1000 (normal), 1001 (going away), and 1006 (abnormal close) are routine operational events for the ConversationsWorker realtime subscription. The worker's own _channel_unhealthy + _full_reconnect loop already recovers from them transparently — logging them at ERROR via logging.exception causes sentry_sdk's logging integration to surface them as actionable Sentry issues, which is noise.

Sentry example: https://robusta-eu.sentry.io/issues/118384168/

Why these three codes are benign

Per RFC 6455:

1000 — normal close.
1001 — peer going away (typical: Supabase pod restart / deploy).
1006 — abnormal close, synthesized locally by the websockets library when the TCP/TLS connection drops without a Close frame (NAT idle-timeout, LB kill, network blip). This code is never sent on the wire.

In all three cases the existing reconnect loop catches the dropped socket on its next health tick and rebuilds with exponential backoff.

Not included: 1008 (policy violation), 1011 (server error), and the 4xxx custom range — those typically indicate auth/RLS misconfig or server bugs and should continue to page.

Changes

holmes/core/conversations_worker/realtime_manager.py:

New module helper _benign_ws_close_code(exc) that classifies a websockets.ConnectionClosed* (walking __cause__/__context__) by close code, returning the code only if in {1000, 1001, 1006}, else None. Uses rcvd/sent dispatch to avoid the websockets ≥ 13 .code deprecation while still handling the rcvd-is-None (1006) case.
6 logging.exception sites downgraded to logging.warning when the exception is benign:
- _thread_entry (defense in depth)
- _run main-loop catch (defense in depth)
- _full_reconnect failure
- _maybe_refresh_auth failure
- _connect_and_subscribe → set_auth failure
- _shutdown_async failure
1 site (_connect_and_subscribe cleanup-after-failed-connect, already in a cleanup path) further downgraded to debug.
Non-benign exceptions continue to log at ERROR exactly as before.

Tests

tests/core/conversations_worker/test_realtime_manager.py — 10 new tests:

Parametrized: helper recognizes 1000 / 1001 / 1006.
Helper rejects 1008 (policy violation), non-WS exceptions, and exceptions that happen to have a .code attribute.
Helper walks __cause__ and __context__ chains.
Helper handles None input.
End-to-end via caplog: _shutdown_async logs at WARNING for a 1006, and still at ERROR for a ValueError.

$ poetry run pytest tests/core/conversations_worker/test_realtime_manager.py --no-cov
============================= 32 passed in 25.36s ==============================

$ poetry run pytest tests/core/conversations_worker/ --no-cov --ignore=tests/core/conversations_worker/integration
============================= 115 passed in 16.51s =============================

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Reduced noisy error logs for certain WebSocket disconnects by treating specific abnormal closures as expected; these now produce warnings, improving operational clarity.
Tests
- Added tests covering WebSocket close classification and shutdown logging to ensure abnormal closures are handled with warning-level logs and other failures still surface as errors.

WebSocket close codes 1000 (normal), 1001 (going away), and 1006 (abnormal close) are routine operational events for the ConversationsWorker realtime subscription — the worker's own _channel_unhealthy + _full_reconnect loop already recovers from them transparently. Logging them at ERROR via logging.exception causes sentry_sdk's logging integration to surface them as actionable Sentry issues, which is noise. Adds _benign_ws_close_code() helper that classifies a websockets ConnectionClosed* exception (walking __cause__/__context__) by close code, returning a code only for {1000, 1001, 1006}. 1008 (policy violation), 1011 (server error), and the 4xxx custom range are intentionally excluded so auth/RLS misconfigs and server bugs still surface. Downgrades 6 logging.exception sites in realtime_manager.py to logging.warning when the exception is a benign close: - _thread_entry (defense in depth) - _run main-loop catch (defense in depth) - _full_reconnect failure - _maybe_refresh_auth failure - _connect_and_subscribe set_auth failure - _shutdown_async failure And one site (cleanup-after-failed-connect, already in a cleanup path) is further downgraded to debug. Non-benign exceptions continue to log at ERROR exactly as before, so real bugs still page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Mohse Morad <moshemorad12340@gmail.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

github-actions · 2026-05-14T14:16:37Z

📂 Previous Runs

📜 #1 · Run @ __055f63e__ (#25865124409) — May 14, 14:23 UTC

✅ Results of HolmesGPT evals

Automatically triggered by commit 055f63e on branch ignore-websocket-1006-err

View workflow logs

Results of HolmesGPT evals

ask_holmes: 11/11 test cases were successful, 0 regressions

Status	Test case	Time	Turns	Tools	Cost	Total tokens	Input	Max input	Output	Max output	Cached	Non-cached	Reasoning	Compactions
✅	09_crashpod	37.6s	6	10	$0.2507	122,285	120,216	23,419	2,069	570	95,871	24,345	86	—
✅	101_loki_historical_logs_pod_deleted	52.8s	7	12	$0.3011	151,909	148,978	25,826	2,931	560	122,085	26,893	171	—
✅	112_find_pvcs_by_uuid	22.7s	4	4	$0.1926	79,110	78,010	21,916	1,100	390	56,082	21,928	126	—
✅	12_job_crashing	38.2s	6	13	$0.2678	136,112	133,988	25,126	2,124	585	108,145	25,843	103	—
✅	176_network_policy_blocking_traffic_no_skills	60.8s	9	18	$0.3488	210,672	207,393	27,892	3,279	808	178,880	28,513	107	—
✅	227_count_configmaps_per_namespace[0]	25.5s	5	10	$0.2039	96,119	94,647	20,667	1,472	519	73,762	20,885	51	—
✅	243_pod_names_contain_service	34.1s	5	8	$0.2213	99,794	98,024	21,771	1,770	453	75,617	22,407	53	—
✅	24_misconfigured_pvc	37.6s	6	13	$0.2495	121,261	119,189	22,855	2,072	583	94,857	24,332	60	—
✅	43_current_datetime_from_prompt	5.9s	1	—	$0.1091	17,102	16,982	16,982	120	120	0	16,982	80	—
✅	51_logs_summarize_errors	22.2s	4	5	$0.1867	77,783	76,721	21,170	1,062	326	55,539	21,182	54	—
✅	61_exact_match_counting	9.3s	2	1	$0.1228	34,572	34,341	17,350	231	163	16,981	17,360	60	—
	Total	31.5s avg	5.0 avg	9.4 avg	$2.4542	1,146,719	1,128,489	27,892	18,230	808	877,819	250,670	951	—

Benchmark comparison unavailable: No ci-benchmark experiments found

Benchmark Comparison Details

Baseline: latest ci-benchmark experiment on master

Status: No ci-benchmark experiments found

Comparison indicators:

±0% — diff under 10% (within noise threshold)
↑N%/↓N% — diff 10-25%
↑N%/↓N% — diff over 25% (significant)

✅ Results of HolmesGPT evals

Automatically triggered by commit 00cf4c2 on branch ignore-websocket-1006-err

View workflow logs

Results of HolmesGPT evals

ask_holmes: 11/11 test cases were successful, 0 regressions

Status	Test case	Time	Turns	Tools	Cost	Total tokens	Input	Max input	Output	Max output	Cached	Non-cached	Reasoning	Compactions
✅	09_crashpod	34.1s	5	10	$0.2320	103,178	101,228	22,710	1,950	671	78,055	23,173	54	—
✅	101_loki_historical_logs_pod_deleted	45.8s	6	11	$0.2807	137,494	134,990	25,977	2,504	851	108,730	26,260	123	—
✅	112_find_pvcs_by_uuid	19.0s	3	3	$0.1781	60,629	59,669	21,601	960	545	38,057	21,612	88	—
✅	12_job_crashing	40.4s	6	16	$0.2867	139,016	136,710	26,376	2,306	533	108,320	28,390	52	—
✅	176_network_policy_blocking_traffic_no_skills	45.8s	7	15	$0.3183	160,428	157,656	26,891	2,772	564	127,311	30,345	119	—
✅	227_count_configmaps_per_namespace[0]	25.2s	5	9	$0.1986	95,141	93,890	20,922	1,251	578	72,955	20,935	75	—
✅	243_pod_names_contain_service	33.2s	5	8	$0.2231	99,417	97,569	21,819	1,848	596	75,169	22,400	52	—
✅	24_misconfigured_pvc	47.8s	8	17	$0.3145	177,717	174,763	25,874	2,954	506	147,891	26,872	112	—
✅	43_current_datetime_from_prompt	5.4s	1	—	$0.1094	17,112	16,982	16,982	130	130	0	16,982	91	—
✅	51_logs_summarize_errors	23.3s	4	5	$0.1871	77,661	76,566	21,111	1,095	346	55,443	21,123	54	—
✅	61_exact_match_counting	9.6s	2	1	$0.1233	34,611	34,365	17,374	246	178	16,981	17,384	65	—
	Total	29.9s avg	4.7 avg	9.5 avg	$2.4518	1,102,404	1,084,388	26,891	18,016	851	828,912	255,476	885	—

Benchmark comparison unavailable: No ci-benchmark experiments found

Benchmark Comparison Details

Baseline: latest ci-benchmark experiment on master

Status: No ci-benchmark experiments found

Comparison indicators:

±0% — diff under 10% (within noise threshold)
↑N%/↓N% — diff 10-25%
↑N%/↓N% — diff over 25% (significant)

📖 Legend

Icon	Meaning
✅	The test was successful
➖	The test was skipped
⚠️	The test failed but is known to be flaky or known to fail
🚧	The test had a setup failure (not a code regression)
🔧	The test failed due to mock data issues (not a code regression)
🚫	The test was throttled by API rate limits/overload
❌	The test failed and should be fixed before merging the PR

🔄 Re-run evals manually

⚠️ Warning: /eval comments always run using the workflow from master, not from this PR branch. If you modified the GitHub Action (e.g., added secrets or env vars), those changes won't take effect.

To test workflow changes, use the GitHub CLI or Actions UI instead:
gh workflow run eval-regression.yaml --repo HolmesGPT/holmesgpt --ref ignore-websocket-1006-err -f markers=regression -f filter=

Option 1: Comment on this PR with /eval:

/eval
tags: regression

Or with more options (one per line):

/eval
model: gpt-4o
tags: regression
id: 09_crashpod
iterations: 5

Run evals on a different branch (e.g., master) for comparison:

/eval
branch: master
tags: regression

Option	Description
`model`	Model(s) to test (default: same as automatic runs)
`tags`	Pytest tags / markers (no default - runs all tests!)
`id`	Eval ID / pytest -k filter (use `/list` to see valid eval names)
`iterations`	Number of runs, max 10
`branch`	Run evals on a different branch (for cross-branch comparison)

Quick re-run: Use /rerun to re-run the most recent /eval on this PR with the same parameters.

Option 2: Trigger via GitHub Actions UI → "Run workflow"

Option 3: Add PR labels to include extra evals (applies to both automatic runs and /eval comments):

Label	Effect
`evals-tag-<name>`	Run tests with tag `<name>` alongside regression
`evals-id-<name>`	Run a specific eval by test ID
`evals-model-<name>`	Override the model (use model list name, e.g. `sonnet-4.5`)

Examples: evals-tag-easy, evals-id-09_crashpod, evals-model-sonnet-4.5

🏷️ Valid tags

benchmark, chain-of-causation, compaction, confluence, context_window, conversation_worker, coralogix, counting, database, datadog, datetime, db-connectors, easy, elasticsearch, embeds, fast, frontend, grafana, hard, images, integration, kafka, kubernetes, leaked-information, logs, loki, manual, mcp, medium, metrics, network, newrelic, no-cicd, numerical, one-test, port-forward, prometheus, question-answer, regression, skills, slackbot, storage, token-limit, toolset-limitation, traces, transparency, victorialogs

🤖 Valid models

deepseek-chat, deepseek-r1-reasoner, deepseek-reasoner, deepseek-v3.2-chat, gemini-3-flash-preview, gemini-3-pro-preview, gemini-3.1-pro-preview, gpt-4.1, gpt-5.2-high-reasoning, gpt-5.3-codex, gpt-5.4, haiku-4.5, kimi-2.5, kimi-2.5-openrouter, opus-4.5, opus-4.6, opus-4.7, qwen-next-80B-instruct, qwen-next-80B-thinking, sonnet-4.5, sonnet-4.6

Commands: /eval · /rerun · /list

CLI: gh workflow run eval-regression.yaml --repo HolmesGPT/holmesgpt --ref ignore-websocket-1006-err -f markers=regression -f filter=

github-actions · 2026-05-14T14:16:41Z

✅ Docker images ready for 1854b34c (built in 1m 17s)

⚠️ Warning: does not support ARM (ARM images are built on release only - not on every PR)

Use these tags to pull the images for testing.

📋 Copy commands

⚠️ Temporary images are deleted after 30 days. Copy to a permanent registry before using them:

gcloud auth configure-docker us-central1-docker.pkg.dev
docker pull us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:1854b34c
docker tag us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:1854b34c me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:1854b34c
docker push me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:1854b34c
docker pull us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes-operator:1854b34c
docker tag us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes-operator:1854b34c me-west1-docker.pkg.dev/robusta-development/development/holmes-operator-dev:1854b34c
docker push me-west1-docker.pkg.dev/robusta-development/development/holmes-operator-dev:1854b34c

Patch Helm values in one line (choose the chart you use):

HolmesGPT chart:

helm upgrade --install holmesgpt ./helm/holmes \
  --set registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set image=holmes-dev:1854b34c \
  --set operator.registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set operator.image=holmes-operator-dev:1854b34c

Robusta wrapper chart:

helm upgrade --install robusta robusta/robusta \
  --reuse-values \
  --set holmes.registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set holmes.image=holmes-dev:1854b34c \
  --set holmes.operator.registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set holmes.operator.image=holmes-operator-dev:1854b34c

netlify · 2026-05-14T14:17:46Z

✅ Deploy Preview for holmes-docs ready!

Name	Link
🔨 Latest commit	`00cf4c2`
🔍 Latest deploy log	https://app.netlify.com/projects/holmes-docs/deploys/6a05db2d587ed10008458e36
😎 Deploy Preview	https://deploy-preview-2038--holmes-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-05-14T14:18:54Z

🔬 CLI Performance Benchmark

🟡 Startup Time (no LLM)

Measures holmes version execution time (imports + initialization)

Metric	PR	Master	Change
Cold Start	13.64s	13.50s	+1.0%
Warm Mean	6.33s	6.21s	+2.0%
Warm Min	6.30s	6.18s
Warm Max	6.40s	6.27s

🟡 Full CLI with LLM

Measures holmes ask execution time (OpenRouter + Haiku 4.5)

Metric	PR	Master	Change
Cold Start	16.27s	14.89s	+9.3%
Warm Mean	8.45s	8.64s	-2.2%
Warm Min	7.91s	8.03s
Warm Max	8.93s	9.47s

PR: 1854b34c | Master: 29fa7cfa | Iterations: 5

coderabbitai · 2026-05-14T14:20:43Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d8b48566-3135-4522-9505-78fb93c3125c

📥 Commits

Reviewing files that changed from the base of the PR and between 055f63e and 00cf4c2.

📒 Files selected for processing (2)

holmes/core/conversations_worker/realtime_manager.py
tests/core/conversations_worker/test_realtime_manager.py

Walkthrough

Add a helper _is_ws_abnormal_close() (websockets.exceptions) to detect a websocket “abnormal closure” (close code 1006, “no close frame”) and downgrade logging for that condition across realtime manager exception handlers; tests exercise the helper and shutdown logging.

Changes

Abnormal WebSocket 1006 handling

Layer / File(s)	Summary
Abnormal-close detection helper `holmes/core/conversations_worker/realtime_manager.py`	Import `websockets.exceptions` and add `_is_ws_abnormal_close(exc: BaseException) -> bool` that recognizes ConnectionClosed/no-close-frame (1006) cases.
Apply classification to exception handlers `holmes/core/conversations_worker/realtime_manager.py`	Update `_thread_entry`, `_run`, `_full_reconnect`, `_maybe_refresh_auth`, `_connect_and_subscribe` (auth + cleanup), and `_shutdown_async` to check `_is_ws_abnormal_close()` and log a WARNING/debug for 1006 instead of an exception stack; non-websocket errors still log exceptions.
Tests for detection and shutdown logging `tests/core/conversations_worker/test_realtime_manager.py`	Expand imports, add a synthesized 1006 close helper, assert `_is_ws_abnormal_close` classification (true for 1006, false for 1008 and non-ws exceptions), and add shutdown tests verifying 1006-originated close errors emit WARNING logs while other shutdown errors emit ERROR logs.

Sequence Diagram

sequenceDiagram
  participant Handler as Exception handler site
  participant Detector as _is_ws_abnormal_close
  participant Logger as Logger
  participant Reconnector as Reconnect/cleanup flow

  Handler->>Detector: pass caught exception
  Detector->>Detector: inspect type/message for ConnectionClosed / "no close frame"
  alt Detector returns True (1006)
    Detector-->>Handler: True
    Handler->>Logger: log WARNING / debug (no exception stack)
    Handler->>Reconnector: continue retry/reconnect/cleanup path
  else Detector returns False
    Detector-->>Handler: False
    Handler->>Logger: log ERROR / exception stack
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

RoiGlinik

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.84% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: downgrading WebSocket close codes (1006) from ERROR to WARNING logging, which is the core focus of the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Address review feedback that the helper was doing too much. Three simplifications: 1. Scope to 1006 only. The Sentry issue is specifically 1006; the 1000/1001 cases were speculative. 2. Replace the typed close-code dispatch (rcvd/sent, deprecated .code, frozenset lookup) with a plain `isinstance + "no close frame" in str(exc)` check. The phrase is the stable websockets __str__ for code 1006. 3. Drop __cause__/__context__ chain walking. The realtime library surfaces ConnectionClosed* directly at every site we catch. Helper shrinks from ~30 lines to 5; call sites shrink from ~10 lines each to ~6. Net diff vs master drops from +258 to +155 lines. Tests reduced to 5 (was 10): one positive, one for 1008 false-positive guard, one for non-WS exception, plus the two caplog end-to-end checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Mohse Morad <moshemorad12340@gmail.com>

claude Bot reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(realtime): downgrade benign WS close codes (1000/1001/1006) from ERROR to WARNING#2038

fix(realtime): downgrade benign WS close codes (1000/1001/1006) from ERROR to WARNING#2038
moshemorad wants to merge 2 commits into
masterfrom
ignore-websocket-1006-err

moshemorad commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

✅ Results of HolmesGPT evals

Results of HolmesGPT evals

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

netlify Bot commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

moshemorad commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why these three codes are benign

Changes

Tests

Summary by CodeRabbit

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📂 Previous Runs

✅ Results of HolmesGPT evals

Results of HolmesGPT evals

✅ Results of HolmesGPT evals

Results of HolmesGPT evals

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for holmes-docs ready!

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔬 CLI Performance Benchmark

🟡 Startup Time (no LLM)

🟡 Full CLI with LLM

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

moshemorad commented May 14, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading

netlify Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading