Skip to content

fix(realtime): downgrade benign WS close codes (1000/1001/1006) from ERROR to WARNING#2038

Open
moshemorad wants to merge 2 commits into
masterfrom
ignore-websocket-1006-err
Open

fix(realtime): downgrade benign WS close codes (1000/1001/1006) from ERROR to WARNING#2038
moshemorad wants to merge 2 commits into
masterfrom
ignore-websocket-1006-err

Conversation

@moshemorad
Copy link
Copy Markdown
Collaborator

@moshemorad moshemorad commented May 14, 2026

What

WebSocket close codes 1000 (normal), 1001 (going away), and 1006 (abnormal close) are routine operational events for the ConversationsWorker realtime subscription. The worker's own _channel_unhealthy + _full_reconnect loop already recovers from them transparently — logging them at ERROR via logging.exception causes sentry_sdk's logging integration to surface them as actionable Sentry issues, which is noise.

Sentry example: https://robusta-eu.sentry.io/issues/118384168/

Why these three codes are benign

Per RFC 6455:

  • 1000 — normal close.
  • 1001 — peer going away (typical: Supabase pod restart / deploy).
  • 1006 — abnormal close, synthesized locally by the websockets library when the TCP/TLS connection drops without a Close frame (NAT idle-timeout, LB kill, network blip). This code is never sent on the wire.

In all three cases the existing reconnect loop catches the dropped socket on its next health tick and rebuilds with exponential backoff.

Not included: 1008 (policy violation), 1011 (server error), and the 4xxx custom range — those typically indicate auth/RLS misconfig or server bugs and should continue to page.

Changes

holmes/core/conversations_worker/realtime_manager.py:

  • New module helper _benign_ws_close_code(exc) that classifies a websockets.ConnectionClosed* (walking __cause__/__context__) by close code, returning the code only if in {1000, 1001, 1006}, else None. Uses rcvd/sent dispatch to avoid the websockets ≥ 13 .code deprecation while still handling the rcvd-is-None (1006) case.
  • 6 logging.exception sites downgraded to logging.warning when the exception is benign:
    • _thread_entry (defense in depth)
    • _run main-loop catch (defense in depth)
    • _full_reconnect failure
    • _maybe_refresh_auth failure
    • _connect_and_subscribeset_auth failure
    • _shutdown_async failure
  • 1 site (_connect_and_subscribe cleanup-after-failed-connect, already in a cleanup path) further downgraded to debug.
  • Non-benign exceptions continue to log at ERROR exactly as before.

Tests

tests/core/conversations_worker/test_realtime_manager.py — 10 new tests:

  • Parametrized: helper recognizes 1000 / 1001 / 1006.
  • Helper rejects 1008 (policy violation), non-WS exceptions, and exceptions that happen to have a .code attribute.
  • Helper walks __cause__ and __context__ chains.
  • Helper handles None input.
  • End-to-end via caplog: _shutdown_async logs at WARNING for a 1006, and still at ERROR for a ValueError.
$ poetry run pytest tests/core/conversations_worker/test_realtime_manager.py --no-cov
============================= 32 passed in 25.36s ==============================

$ poetry run pytest tests/core/conversations_worker/ --no-cov --ignore=tests/core/conversations_worker/integration
============================= 115 passed in 16.51s =============================

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Reduced noisy error logs for certain WebSocket disconnects by treating specific abnormal closures as expected; these now produce warnings, improving operational clarity.
  • Tests

    • Added tests covering WebSocket close classification and shutdown logging to ensure abnormal closures are handled with warning-level logs and other failures still surface as errors.

Review Change Stack

WebSocket close codes 1000 (normal), 1001 (going away), and 1006
(abnormal close) are routine operational events for the
ConversationsWorker realtime subscription — the worker's own
_channel_unhealthy + _full_reconnect loop already recovers from them
transparently. Logging them at ERROR via logging.exception causes
sentry_sdk's logging integration to surface them as actionable Sentry
issues, which is noise.

Adds _benign_ws_close_code() helper that classifies a websockets
ConnectionClosed* exception (walking __cause__/__context__) by close
code, returning a code only for {1000, 1001, 1006}. 1008 (policy
violation), 1011 (server error), and the 4xxx custom range are
intentionally excluded so auth/RLS misconfigs and server bugs still
surface.

Downgrades 6 logging.exception sites in realtime_manager.py to
logging.warning when the exception is a benign close:
  - _thread_entry (defense in depth)
  - _run main-loop catch (defense in depth)
  - _full_reconnect failure
  - _maybe_refresh_auth failure
  - _connect_and_subscribe set_auth failure
  - _shutdown_async failure
And one site (cleanup-after-failed-connect, already in a cleanup
path) is further downgraded to debug.

Non-benign exceptions continue to log at ERROR exactly as before, so
real bugs still page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mohse Morad <moshemorad12340@gmail.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

📂 Previous Runs

📜 #1 · Run @ __055f63e__ (#25865124409) — May 14, 14:23 UTC

✅ Results of HolmesGPT evals

Automatically triggered by commit 055f63e on branch ignore-websocket-1006-err

View workflow logs

Results of HolmesGPT evals

  • ask_holmes: 11/11 test cases were successful, 0 regressions
Status Test case Time Turns Tools Cost Total tokens Input Max input Output Max output Cached Non-cached Reasoning Compactions
09_crashpod 37.6s 6 10 $0.2507 122,285 120,216 23,419 2,069 570 95,871 24,345 86
101_loki_historical_logs_pod_deleted 52.8s 7 12 $0.3011 151,909 148,978 25,826 2,931 560 122,085 26,893 171
112_find_pvcs_by_uuid 22.7s 4 4 $0.1926 79,110 78,010 21,916 1,100 390 56,082 21,928 126
12_job_crashing 38.2s 6 13 $0.2678 136,112 133,988 25,126 2,124 585 108,145 25,843 103
176_network_policy_blocking_traffic_no_skills 60.8s 9 18 $0.3488 210,672 207,393 27,892 3,279 808 178,880 28,513 107
227_count_configmaps_per_namespace[0] 25.5s 5 10 $0.2039 96,119 94,647 20,667 1,472 519 73,762 20,885 51
243_pod_names_contain_service 34.1s 5 8 $0.2213 99,794 98,024 21,771 1,770 453 75,617 22,407 53
24_misconfigured_pvc 37.6s 6 13 $0.2495 121,261 119,189 22,855 2,072 583 94,857 24,332 60
43_current_datetime_from_prompt 5.9s 1 $0.1091 17,102 16,982 16,982 120 120 0 16,982 80
51_logs_summarize_errors 22.2s 4 5 $0.1867 77,783 76,721 21,170 1,062 326 55,539 21,182 54
61_exact_match_counting 9.3s 2 1 $0.1228 34,572 34,341 17,350 231 163 16,981 17,360 60
Total 31.5s avg 5.0 avg 9.4 avg $2.4542 1,146,719 1,128,489 27,892 18,230 808 877,819 250,670 951

Benchmark comparison unavailable: No ci-benchmark experiments found

Benchmark Comparison Details

Baseline: latest ci-benchmark experiment on master

Status: No ci-benchmark experiments found

Comparison indicators:

  • ±0% — diff under 10% (within noise threshold)
  • ↑N%/↓N% — diff 10-25%
  • ↑N%/↓N% — diff over 25% (significant)

✅ Results of HolmesGPT evals

Automatically triggered by commit 00cf4c2 on branch ignore-websocket-1006-err

View workflow logs

Results of HolmesGPT evals

  • ask_holmes: 11/11 test cases were successful, 0 regressions
Status Test case Time Turns Tools Cost Total tokens Input Max input Output Max output Cached Non-cached Reasoning Compactions
09_crashpod 34.1s 5 10 $0.2320 103,178 101,228 22,710 1,950 671 78,055 23,173 54
101_loki_historical_logs_pod_deleted 45.8s 6 11 $0.2807 137,494 134,990 25,977 2,504 851 108,730 26,260 123
112_find_pvcs_by_uuid 19.0s 3 3 $0.1781 60,629 59,669 21,601 960 545 38,057 21,612 88
12_job_crashing 40.4s 6 16 $0.2867 139,016 136,710 26,376 2,306 533 108,320 28,390 52
176_network_policy_blocking_traffic_no_skills 45.8s 7 15 $0.3183 160,428 157,656 26,891 2,772 564 127,311 30,345 119
227_count_configmaps_per_namespace[0] 25.2s 5 9 $0.1986 95,141 93,890 20,922 1,251 578 72,955 20,935 75
243_pod_names_contain_service 33.2s 5 8 $0.2231 99,417 97,569 21,819 1,848 596 75,169 22,400 52
24_misconfigured_pvc 47.8s 8 17 $0.3145 177,717 174,763 25,874 2,954 506 147,891 26,872 112
43_current_datetime_from_prompt 5.4s 1 $0.1094 17,112 16,982 16,982 130 130 0 16,982 91
51_logs_summarize_errors 23.3s 4 5 $0.1871 77,661 76,566 21,111 1,095 346 55,443 21,123 54
61_exact_match_counting 9.6s 2 1 $0.1233 34,611 34,365 17,374 246 178 16,981 17,384 65
Total 29.9s avg 4.7 avg 9.5 avg $2.4518 1,102,404 1,084,388 26,891 18,016 851 828,912 255,476 885

Benchmark comparison unavailable: No ci-benchmark experiments found

Benchmark Comparison Details

Baseline: latest ci-benchmark experiment on master

Status: No ci-benchmark experiments found

Comparison indicators:

  • ±0% — diff under 10% (within noise threshold)
  • ↑N%/↓N% — diff 10-25%
  • ↑N%/↓N% — diff over 25% (significant)
📖 Legend
Icon Meaning
The test was successful
The test was skipped
⚠️ The test failed but is known to be flaky or known to fail
🚧 The test had a setup failure (not a code regression)
🔧 The test failed due to mock data issues (not a code regression)
🚫 The test was throttled by API rate limits/overload
The test failed and should be fixed before merging the PR
🔄 Re-run evals manually

⚠️ Warning: /eval comments always run using the workflow from master, not from this PR branch. If you modified the GitHub Action (e.g., added secrets or env vars), those changes won't take effect.

To test workflow changes, use the GitHub CLI or Actions UI instead:

gh workflow run eval-regression.yaml --repo HolmesGPT/holmesgpt --ref ignore-websocket-1006-err -f markers=regression -f filter=

Option 1: Comment on this PR with /eval:

/eval
tags: regression

Or with more options (one per line):

/eval
model: gpt-4o
tags: regression
id: 09_crashpod
iterations: 5

Run evals on a different branch (e.g., master) for comparison:

/eval
branch: master
tags: regression
Option Description
model Model(s) to test (default: same as automatic runs)
tags Pytest tags / markers (no default - runs all tests!)
id Eval ID / pytest -k filter (use /list to see valid eval names)
iterations Number of runs, max 10
branch Run evals on a different branch (for cross-branch comparison)

Quick re-run: Use /rerun to re-run the most recent /eval on this PR with the same parameters.

Option 2: Trigger via GitHub Actions UI → "Run workflow"

Option 3: Add PR labels to include extra evals (applies to both automatic runs and /eval comments):

Label Effect
evals-tag-<name> Run tests with tag <name> alongside regression
evals-id-<name> Run a specific eval by test ID
evals-model-<name> Override the model (use model list name, e.g. sonnet-4.5)

Examples: evals-tag-easy, evals-id-09_crashpod, evals-model-sonnet-4.5

🏷️ Valid tags

benchmark, chain-of-causation, compaction, confluence, context_window, conversation_worker, coralogix, counting, database, datadog, datetime, db-connectors, easy, elasticsearch, embeds, fast, frontend, grafana, hard, images, integration, kafka, kubernetes, leaked-information, logs, loki, manual, mcp, medium, metrics, network, newrelic, no-cicd, numerical, one-test, port-forward, prometheus, question-answer, regression, skills, slackbot, storage, token-limit, toolset-limitation, traces, transparency, victorialogs

🤖 Valid models

deepseek-chat, deepseek-r1-reasoner, deepseek-reasoner, deepseek-v3.2-chat, gemini-3-flash-preview, gemini-3-pro-preview, gemini-3.1-pro-preview, gpt-4.1, gpt-5.2-high-reasoning, gpt-5.3-codex, gpt-5.4, haiku-4.5, kimi-2.5, kimi-2.5-openrouter, opus-4.5, opus-4.6, opus-4.7, qwen-next-80B-instruct, qwen-next-80B-thinking, sonnet-4.5, sonnet-4.6


Commands: /eval · /rerun · /list

CLI: gh workflow run eval-regression.yaml --repo HolmesGPT/holmesgpt --ref ignore-websocket-1006-err -f markers=regression -f filter=

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

Docker images ready for 1854b34c (built in 1m 17s)

⚠️ Warning: does not support ARM (ARM images are built on release only - not on every PR)

Use these tags to pull the images for testing.

📋 Copy commands

⚠️ Temporary images are deleted after 30 days. Copy to a permanent registry before using them:

gcloud auth configure-docker us-central1-docker.pkg.dev
docker pull us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:1854b34c
docker tag us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:1854b34c me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:1854b34c
docker push me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:1854b34c
docker pull us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes-operator:1854b34c
docker tag us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes-operator:1854b34c me-west1-docker.pkg.dev/robusta-development/development/holmes-operator-dev:1854b34c
docker push me-west1-docker.pkg.dev/robusta-development/development/holmes-operator-dev:1854b34c

Patch Helm values in one line (choose the chart you use):

HolmesGPT chart:

helm upgrade --install holmesgpt ./helm/holmes \
  --set registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set image=holmes-dev:1854b34c \
  --set operator.registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set operator.image=holmes-operator-dev:1854b34c

Robusta wrapper chart:

helm upgrade --install robusta robusta/robusta \
  --reuse-values \
  --set holmes.registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set holmes.image=holmes-dev:1854b34c \
  --set holmes.operator.registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set holmes.operator.image=holmes-operator-dev:1854b34c

@netlify
Copy link
Copy Markdown

netlify Bot commented May 14, 2026

Deploy Preview for holmes-docs ready!

Name Link
🔨 Latest commit 00cf4c2
🔍 Latest deploy log https://app.netlify.com/projects/holmes-docs/deploys/6a05db2d587ed10008458e36
😎 Deploy Preview https://deploy-preview-2038--holmes-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

🔬 CLI Performance Benchmark

🟡 Startup Time (no LLM)

Measures holmes version execution time (imports + initialization)

Metric PR Master Change
Cold Start 13.64s 13.50s +1.0%
Warm Mean 6.33s 6.21s +2.0%
Warm Min 6.30s 6.18s
Warm Max 6.40s 6.27s

🟡 Full CLI with LLM

Measures holmes ask execution time (OpenRouter + Haiku 4.5)

Metric PR Master Change
Cold Start 16.27s 14.89s +9.3%
Warm Mean 8.45s 8.64s -2.2%
Warm Min 7.91s 8.03s
Warm Max 8.93s 9.47s

PR: 1854b34c | Master: 29fa7cfa | Iterations: 5

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d8b48566-3135-4522-9505-78fb93c3125c

📥 Commits

Reviewing files that changed from the base of the PR and between 055f63e and 00cf4c2.

📒 Files selected for processing (2)
  • holmes/core/conversations_worker/realtime_manager.py
  • tests/core/conversations_worker/test_realtime_manager.py

Walkthrough

Add a helper _is_ws_abnormal_close() (websockets.exceptions) to detect a websocket “abnormal closure” (close code 1006, “no close frame”) and downgrade logging for that condition across realtime manager exception handlers; tests exercise the helper and shutdown logging.

Changes

Abnormal WebSocket 1006 handling

Layer / File(s) Summary
Abnormal-close detection helper
holmes/core/conversations_worker/realtime_manager.py
Import websockets.exceptions and add _is_ws_abnormal_close(exc: BaseException) -> bool that recognizes ConnectionClosed/no-close-frame (1006) cases.
Apply classification to exception handlers
holmes/core/conversations_worker/realtime_manager.py
Update _thread_entry, _run, _full_reconnect, _maybe_refresh_auth, _connect_and_subscribe (auth + cleanup), and _shutdown_async to check _is_ws_abnormal_close() and log a WARNING/debug for 1006 instead of an exception stack; non-websocket errors still log exceptions.
Tests for detection and shutdown logging
tests/core/conversations_worker/test_realtime_manager.py
Expand imports, add a synthesized 1006 close helper, assert _is_ws_abnormal_close classification (true for 1006, false for 1008 and non-ws exceptions), and add shutdown tests verifying 1006-originated close errors emit WARNING logs while other shutdown errors emit ERROR logs.

Sequence Diagram

sequenceDiagram
  participant Handler as Exception handler site
  participant Detector as _is_ws_abnormal_close
  participant Logger as Logger
  participant Reconnector as Reconnect/cleanup flow

  Handler->>Detector: pass caught exception
  Detector->>Detector: inspect type/message for ConnectionClosed / "no close frame"
  alt Detector returns True (1006)
    Detector-->>Handler: True
    Handler->>Logger: log WARNING / debug (no exception stack)
    Handler->>Reconnector: continue retry/reconnect/cleanup path
  else Detector returns False
    Detector-->>Handler: False
    Handler->>Logger: log ERROR / exception stack
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • RoiGlinik
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: downgrading WebSocket close codes (1006) from ERROR to WARNING logging, which is the core focus of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Address review feedback that the helper was doing too much. Three
simplifications:

  1. Scope to 1006 only. The Sentry issue is specifically 1006; the
     1000/1001 cases were speculative.
  2. Replace the typed close-code dispatch (rcvd/sent, deprecated .code,
     frozenset lookup) with a plain `isinstance + "no close frame" in
     str(exc)` check. The phrase is the stable websockets __str__ for
     code 1006.
  3. Drop __cause__/__context__ chain walking. The realtime library
     surfaces ConnectionClosed* directly at every site we catch.

Helper shrinks from ~30 lines to 5; call sites shrink from ~10 lines
each to ~6. Net diff vs master drops from +258 to +155 lines.

Tests reduced to 5 (was 10): one positive, one for 1008 false-positive
guard, one for non-WS exception, plus the two caplog end-to-end checks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mohse Morad <moshemorad12340@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant