add tags to failing tests by Sheeproid · Pull Request #597 · HolmesGPT/holmesgpt

Sheeproid · 2025-07-06T16:15:19Z

No description provided.

coderabbitai · 2025-07-06T16:15:27Z

Walkthrough

This change expands test categorization by introducing new tags in both the codebase and test metadata. Four additional pytest markers and evaluation tags are defined, and multiple test case YAML files are updated to include relevant tags under a new or existing tags field for improved test organization and filtering.

Changes

File(s)	Change Summary
pyproject.toml	Added four new pytest markers: `misleading-history`, `k8s-misconfig`, `chain-of-causation`, and `slackbot` with descriptions.
tests/llm/utils/constants.py	Expanded `ALLOWED_EVAL_TAGS` to include the four new tags, each with descriptive comments.
tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml tests/llm/fixtures/test_ask_holmes/46_job_crashing_no_longer_exists/test_case.yaml	Added `tags` field with value `["logs"]`.
tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml	Added `tags` field with values `["misleading-history", "k8s-misconfig"]`.
tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml	Added `tags` field with value `["chain-of-causation"]`.
tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml	Added `tags` field with values `["chain-of-causation", "k8s-misconfig"]`.
tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml	Added `tags` field with values `["slackbot", "misleading-history"]`.
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_all_tools/test_case.yaml tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools/test_case.yaml tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_old_tools/test_case.yaml tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_all_tools/test_case.yaml tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_old_tools/test_case.yaml	Added `tags` field with values `["network", "runbooks"]`.

Suggested reviewers

aantn
arikalon1

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d64fd42c69a2e941bca9e39ccd7a5531015263d and f043d18.

📒 Files selected for processing (14)

pyproject.toml (1 hunks)
tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_all_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_old_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_all_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_old_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/46_job_crashing_no_longer_exists/test_case.yaml (1 hunks)
tests/llm/utils/constants.py (1 hunks)

✅ Files skipped from review due to trivial changes (2)

tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_all_tools/test_case.yaml
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_old_tools/test_case.yaml

🚧 Files skipped from review as they are similar to previous changes (12)

tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools/test_case.yaml
tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml
tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml
tests/llm/fixtures/test_ask_holmes/46_job_crashing_no_longer_exists/test_case.yaml
tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml
tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_all_tools/test_case.yaml
tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_old_tools/test_case.yaml
tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml
tests/llm/utils/constants.py
pyproject.toml

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: build (3.10)
GitHub Check: build (3.11)
GitHub Check: build (3.12)
GitHub Check: llm_evals

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🔭 Outside diff range comments (2)

tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_old_tools/test_case.yaml (1)
19-21: kubectl apply -f./manifest.yaml will fail – missing space

kubectl treats -f as a flag whose value must be provided after a space (or with =). Without the space, it is parsed as an unknown shorthand flag and the test will error out.
-  kubectl apply -f./manifest.yaml
+  kubectl apply -f ./manifest.yaml
Same fix is needed for the corresponding kubectl delete line.
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_all_tools/test_case.yaml (1)

6-9: Missing space after -f flag

Same kubectl apply/delete -f./manifest.yaml issue as in the sister file.

♻️ Duplicate comments (4)

tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_all_tools/test_case.yaml (1)

2-4: Duplicate feedback: see the earlier comment about network vs networking marker ambiguity.

tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_old_tools/test_case.yaml (1)

2-4: Duplicate feedback: see the earlier comment about network vs networking marker ambiguity.

tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools/test_case.yaml (1)

2-4: Duplicate feedback: see the earlier comment about network vs networking marker ambiguity.

tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_all_tools/test_case.yaml (1)

2-4: Same taxonomy collision as above

Ensure only one of network / networking survives in the constants file and in all test cases.

🧹 Nitpick comments (4)

tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml (1)
18-20: Minor typo in comment

Currenlty → Currently.
-# Currenlty holmes is able to answer the question correctly only around 10% of the time.
+# Currently Holmes is able to answer the question correctly only around 10 % of the time.
tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml (1)
2-4: Potential confusion between network vs networking tags

network is already an established marker across the suite, and now we introduce networking.
Unless the two convey distinct semantics, having both will fragment filtering (pytest -m network vs pytest -m networking) and make dashboards harder to search.

If the intent is simply to rename/clarify the old marker, consider:
-  - networking
+  - network
and then deprecate networking from pyproject.toml.
Otherwise, document the precise difference in CONTRIBUTING.md and ensure every test uses the right one.
pyproject.toml (1)
83-87: Minor ordering nit

For readability the marker list has so far been alphabetical.
Keeping that convention helps future reviewers spot duplicates quickly.
-    "networking: Tests involving networking functionality",
-    "misleading-history: Tests with misleading historical data",
-    "k8s-misconfig: Tests involving Kubernetes misconfigurations",
-    "chain-of-causation: Tests involving chain-of-causation analysis",
-    "slackbot: Tests involving Slack bot functionality",
+    "chain-of-causation: Tests involving chain-of-causation analysis",
+    "k8s-misconfig: Tests involving Kubernetes misconfigurations",
+    "misleading-history: Tests with misleading historical data",
+    "networking: Tests involving networking functionality",
+    "slackbot: Tests involving Slack bot functionality",
tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_old_tools/test_case.yaml (1)

5-17: Block scalar starts with an empty line – unintended leading newline

expected_output: | is immediately followed by a blank line, which introduces a leading newline into the value. If the evaluation logic performs exact or trimmed comparisons, this may cause false negatives. Delete the empty line or switch to |- to strip.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between baedc3b and 9d64fd42c69a2e941bca9e39ccd7a5531015263d.

📒 Files selected for processing (14)

pyproject.toml (1 hunks)
tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_all_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_old_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_all_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_new_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/42_dns_issues_steps_old_tools/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/46_job_crashing_no_longer_exists/test_case.yaml (1 hunks)
tests/llm/utils/constants.py (1 hunks)

🧰 Additional context used

🧠 Learnings (6)

tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml (2)

Learnt from: Sheeproid
PR: robusta-dev/holmesgpt#586
File: tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml:4-4
Timestamp: 2025-07-02T10:27:17.231Z
Learning: In LLM-as-judge test cases for HolmesGPT, expected outputs should be descriptive rather than prescriptive when testing for flexible responses like port numbers. Using specific values in expected outputs can cause unnecessary test failures when the AI generates different but equally valid responses.

Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.

tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_new_tools/test_case.yaml (1)

Learnt from: Sheeproid
PR: robusta-dev/holmesgpt#586
File: tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml:4-4
Timestamp: 2025-07-02T10:27:17.231Z
Learning: In LLM-as-judge test cases for HolmesGPT, expected outputs should be descriptive rather than prescriptive when testing for flexible responses like port numbers. Using specific values in expected outputs can cause unnecessary test failures when the AI generates different but equally valid responses.

tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_all_tools/test_case.yaml (1)

Learnt from: Sheeproid
PR: robusta-dev/holmesgpt#586
File: tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml:4-4
Timestamp: 2025-07-02T10:27:17.231Z
Learning: In LLM-as-judge test cases for HolmesGPT, expected outputs should be descriptive rather than prescriptive when testing for flexible responses like port numbers. Using specific values in expected outputs can cause unnecessary test failures when the AI generates different but equally valid responses.

tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml (1)

Learnt from: Sheeproid
PR: robusta-dev/holmesgpt#586
File: tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml:4-4
Timestamp: 2025-07-02T10:27:17.231Z
Learning: In LLM-as-judge test cases for HolmesGPT, expected outputs should be descriptive rather than prescriptive when testing for flexible responses like port numbers. Using specific values in expected outputs can cause unnecessary test failures when the AI generates different but equally valid responses.

tests/llm/fixtures/test_ask_holmes/42_dns_issues_result_old_tools/test_case.yaml (1)

Learnt from: Sheeproid
PR: robusta-dev/holmesgpt#586
File: tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml:4-4
Timestamp: 2025-07-02T10:27:17.231Z
Learning: In LLM-as-judge test cases for HolmesGPT, expected outputs should be descriptive rather than prescriptive when testing for flexible responses like port numbers. Using specific values in expected outputs can cause unnecessary test failures when the AI generates different but equally valid responses.

tests/llm/utils/constants.py (1)

Learnt from: Sheeproid
PR: robusta-dev/holmesgpt#586
File: tests/llm/fixtures/test_ask_holmes/03_what_is_the_command_to_port_forward/test_case.yaml:4-4
Timestamp: 2025-07-02T10:27:17.231Z
Learning: In LLM-as-judge test cases for HolmesGPT, expected outputs should be descriptive rather than prescriptive when testing for flexible responses like port numbers. Using specific values in expected outputs can cause unnecessary test failures when the AI generates different but equally valid responses.

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: build (3.12)
GitHub Check: build (3.11)
GitHub Check: build (3.10)
GitHub Check: build (3.12)
GitHub Check: build (3.10)
GitHub Check: build (3.11)
GitHub Check: llm_evals

🔇 Additional comments (6)

tests/llm/fixtures/test_ask_holmes/25_misconfigured_ingress_class/test_case.yaml (1)

2-4: Tags field looks correct and consistent with new taxonomy

YAML syntax, indentation, and values all comply with the updated ALLOWED_EVAL_TAGS. No further action required.

tests/llm/fixtures/test_ask_holmes/46_job_crashing_no_longer_exists/test_case.yaml (1)

2-3: Tag addition is valid

The new logs tag matches the updated constants; formatting is fine.

tests/llm/fixtures/test_ask_holmes/44_slack_statefulset_logs/test_case.yaml (1)

2-4: Tagging change accepted

Both slackbot and misleading-history are recognized tags, and YAML formatting is sound.

tests/llm/fixtures/test_ask_holmes/21_job_fail_curl_no_svc_account/test_case.yaml (1)

2-4: Tags correctly added

misleading-history and k8s-misconfig are now part of the allowed set; indentation is consistent.

tests/llm/fixtures/test_ask_holmes/22_high_latency_dbi_down/test_case.yaml (1)

4-5: Tag addition acknowledged

chain-of-causation tag is valid; YAML is well-formed.

tests/llm/fixtures/test_ask_holmes/08_sock_shop_frontend/test_case.yaml (1)

2-3: Tag already covered

logs already exists in ALLOWED_EVAL_TAGS; no issues here. ✅

github-actions · 2025-07-07T06:36:43Z

Results of HolmesGPT evals

ask_holmes: 41/59 test cases were successful
investigate: 14/15 test cases were successful

Test suite	Test case	Status
ask_holmes	01_how_many_pods	⚠️
ask_holmes	02_what_is_wrong_with_pod	✅
ask_holmes	03_what_is_the_command_to_port_forward	✅
ask_holmes	04_related_k8s_events	✅
ask_holmes	05_image_version	✅
ask_holmes	06_explain_issue	✅
ask_holmes	07_high_latency	✅
ask_holmes	08_sock_shop_frontend	⚠️
ask_holmes	09_crashpod	✅
ask_holmes	10_image_pull_backoff	✅
ask_holmes	11_init_containers	✅
ask_holmes	12_job_crashing	✅
ask_holmes	13_pending_node_selector	✅
ask_holmes	14_pending_resources	✅
ask_holmes	15_failed_readiness_probe	✅
ask_holmes	16_failed_no_toolset_found	✅
ask_holmes	17_oom_kill	✅
ask_holmes	18_crash_looping_v2	✅
ask_holmes	19_detect_missing_app_details	✅
ask_holmes	20_long_log_file_search	✅
ask_holmes	22_high_latency_dbi_down	⚠️
ask_holmes	23_app_error_in_current_logs	✅
ask_holmes	24_misconfigured_pvc	✅
ask_holmes	25_misconfigured_ingress_class	⚠️
ask_holmes	26_multi_container_logs	⚠️
ask_holmes	27_permissions_error_no_helm_tools	✅
ask_holmes	28_permissions_error_helm_tools_enabled	✅
ask_holmes	29_events_from_alert_manager	✅
ask_holmes	30_basic_promql_graph_cluster_memory	✅
ask_holmes	31_basic_promql_graph_pod_memory	✅
ask_holmes	32_basic_promql_graph_pod_cpu	✅
ask_holmes	33_http_latency_graph	✅
ask_holmes	34_memory_graph	✅
ask_holmes	35_tempo	✅
ask_holmes	36_argocd_find_resource	✅
ask_holmes	37_argocd_wrong_namespace	⚠️
ask_holmes	38_rabbitmq_split_head	✅
ask_holmes	39_failed_toolset	✅
ask_holmes	40_disabled_toolset	✅
ask_holmes	41_setup_argo	✅
ask_holmes	42_dns_issues_result_all_tools	⚠️
ask_holmes	42_dns_issues_result_new_tools	⚠️
ask_holmes	42_dns_issues_result_old_tools	⚠️
ask_holmes	42_dns_issues_steps_new_all_tools	⚠️
ask_holmes	42_dns_issues_steps_new_tools	⚠️
ask_holmes	42_dns_issues_steps_old_tools	⚠️
ask_holmes	43_current_datetime_from_prompt	✅
ask_holmes	43_slack_deployment_logs	✅
ask_holmes	44_slack_statefulset_logs	✅
ask_holmes	45_fetch_deployment_logs_simple	✅
ask_holmes	46_job_crashing_no_longer_exists	⚠️
ask_holmes	47_truncated_logs_context_window	⚠️
ask_holmes	48_logs_since_thursday	⚠️
ask_holmes	49_logs_since_last_week	⚠️
ask_holmes	50_logs_since_specific_date	⚠️
ask_holmes	51_logs_summarize_errors	✅
ask_holmes	52_logs_login_issues	⚠️
ask_holmes	53_logs_find_term	✅
ask_holmes	54_not_truncated_when_getting_pods	✅
investigate	01_oom_kill	✅
investigate	02_crashloop_backoff	✅
investigate	03_cpu_throttling	✅
investigate	04_image_pull_backoff	✅
investigate	05_crashpod	✅
investigate	06_job_failure	✅
investigate	07_job_syntax_error	✅
investigate	08_memory_pressure	✅
investigate	09_high_latency	✅
investigate	10_KubeDeploymentReplicasMismatch	✅
investigate	11_KubePodCrashLooping	✅
investigate	12_KubePodNotReady	✅
investigate	13_Watchdog	✅
investigate	14_tempo	✅
investigate	15_dns_resolution	⚠️

Legend

✅ the test was successful
⚠️ the test failed but is known to be flakky or known to fail
❌ the test failed and should be fixed before merging the PR

Sheeproid requested a review from moshemorad July 6, 2025 16:15

coderabbitai Bot reviewed Jul 6, 2025

View reviewed changes

add tags to failing tests

f043d18

Sheeproid force-pushed the tag-failing-test branch from 9d64fd4 to f043d18 Compare July 6, 2025 16:22

moshemorad approved these changes Jul 6, 2025

View reviewed changes

Merge branch 'master' into tag-failing-test

7d30e46

Sheeproid enabled auto-merge (squash) July 7, 2025 06:28

Sheeproid disabled auto-merge July 7, 2025 06:29

Sheeproid enabled auto-merge (squash) July 7, 2025 06:29

Sheeproid merged commit d8524a8 into master Jul 7, 2025
11 checks passed

Sheeproid deleted the tag-failing-test branch July 7, 2025 06:37

This was referenced Jul 24, 2025

Dont require defining evals twice #707

Merged

Improve evals #709

Merged

coderabbitai Bot mentioned this pull request Aug 3, 2025

Show better error message if eval uses invalid tags #774

Merged

coderabbitai Bot mentioned this pull request Aug 12, 2025

move some evals out of easy + delete bad eval #828

Merged

This was referenced Aug 25, 2025

Move eval 03 out of easy #912

Merged

ROB-1807: add eval for disk space PVC issue #920

Merged

coderabbitai Bot mentioned this pull request Sep 14, 2025

Freeform newrelic #969

Merged

coderabbitai Bot mentioned this pull request Dec 24, 2025

add a regression test tag that is to be run in PRs instead of easy tests #1241

Merged

This was referenced Feb 1, 2026

Allow running Prometheus evals in CI/CD #1457

Merged

minor runbook eval improvements #1520

Merged

Add eval tests for detecting overconfident language in diagnostics #1523

Open

coderabbitai Bot mentioned this pull request Feb 19, 2026

Remove duplicate test fixtures and consolidate context_window tags #1585

Merged

This was referenced Mar 15, 2026

Add MCP complex schema test case for Pydantic model resolution #1785

Merged

Add three LLM overconfidence test cases for Elasticsearch #1831

Open

coderabbitai Bot mentioned this pull request May 15, 2026

Add LLM eval tests for Kubernetes tool usage patterns #2048

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tags to failing tests#597

add tags to failing tests#597
Sheeproid merged 2 commits into
masterfrom
tag-failing-test

Sheeproid commented Jul 6, 2025

Uh oh!

coderabbitai Bot commented Jul 6, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jul 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sheeproid commented Jul 6, 2025

Uh oh!

coderabbitai Bot commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jul 7, 2025

Results of HolmesGPT evals

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jul 6, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)