Add HPA diagnosis insights#916
Open
nadaverell wants to merge 1 commit into
Open
Conversation
cb0d80a to
2db4ac8
Compare
347883b to
255cb83
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 255cb83. Configure here.
255cb83 to
32c9045
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HPAs can fail quietly: the target workload may still have healthy-looking pods while autoscaling is capped, unable to read metrics, pinned by configuration, or paused at zero replicas. This PR makes HPA diagnosis a first-class Radar insight so operators can understand autoscaling state directly from Radar instead of reconstructing it from raw HPA conditions.
The goal is not to turn every HPA condition into an alert. The branch separates high-signal list/dashboard states from richer drawer context: broken/capped autoscaling is surfaced prominently, while states like min-bound, stale status, partial metrics, and stabilization remain available in detail views without adding table noise.
What Changed
Shared HPA Diagnosis Engine
pkg/hpadiag, a shared Go analyzer for autoscaling/v2 HPAs.Diagnosiswith:state: normalized HPA state such aslimited_max,metrics_unavailable,unable_to_scale,disabled,pinned,stale,stabilized,scaling_up, andok.summary: operator-facing text intended for Radar UI / AI context.target: scale target reference.bounds: min/max/current/desired replica data plus generation info.metrics: normalized configured/current metric rows.reasons: raw condition-backed evidence, preserving Kubernetes condition type/reason/message where available.testdata/hpa-diagnosis/cases.jsoncovering maxed, metrics unavailable, partial metrics missing, unable to scale, disabled, pinned, scaling, stale, min-limited, stabilized, stable, and “at max without controller limit condition.”Signal Policy
ScalingLimited=True/TooManyReplicasevidence.current == desired == maxReplicasis treated as normal unless the controller says it wanted more replicas and was capped.ScalingActive=Falseis classified asmetrics_unavailableunless it is the intentional zero-replicaScalingDisabledcase.AbleToScale=Falseis classified separately asunable_to_scale.Backend Surfaces
hpaDiagnosisfor HPA resources.Frontend / UX
Shared UI / Types
@skyhook-io/k8s-ui.resource-utils-hpafor table-state classification, label/tone mapping, and status badge generation.Reviewer Focus
pkg/hpadiag: whether each condition/state maps to the right Radar severity and surface.metrics_incomplete,limited_min,stale, orstabilizedinto table warnings.hpaDiagnosisto resource detail responses is the right shape for Radar app consumers.Testing
go test github.com/skyhook-io/radar/pkg/hpadiag github.com/skyhook-io/radar/internal/k8s github.com/skyhook-io/radar/pkg/ai/context github.com/skyhook-io/radar/pkg/resourcecontext github.com/skyhook-io/radar/internal/servernpm test --workspace @skyhook-io/k8s-ui -- resource-utils-hpa.test.ts WorkloadRenderer.test.tsx ResourceRendererDispatch.test.tsxmake tscmake testVisual-test: ran against
kind-radar-gitops-demowith live HPA fixtures covering the HPA list, maxed drawer, metrics-unavailable drawer, and workload HPA context.Notes / Tradeoffs
Note
Medium Risk
Touches problem detection and resource API shape for autoscaling; behavior changes (stricter “maxed”) could alter dashboard issue counts, but logic is fixture-backed and well-tested.
Overview
Introduces
pkg/hpadiag, a shared analyzer that turns HPA spec/status/conditions into a structured diagnosis (state, summary, bounds, metrics, reasons). “Maxed” now requires controller evidence (ScalingLimited/TooManyReplicas); sitting at max replicas without that condition is no longer flagged. Metrics and scale failures map tocannot-scaleissues; min-bound, stale, stabilization, and similar states stay detail-only.Backend: Resource GET responses and topology wrappers add optional
hpaDiagnosis; resource context and AI summaries use the same analyzer.DetectHPAProblemsdelegates tohpadiaginstead of inline condition parsing.Frontend: HPA drawers show a Diagnosis section (badge, reasons, metrics); list/table status uses conservative
resource-utils-hpaclassification. Workloads controlled by an HPA fetch compact scaler diagnosis;ConditionsSectionsupports warning tones for max-limited conditions.Reviewed by Cursor Bugbot for commit 32c9045. Bugbot is set up for automated code reviews on this repo. Configure here.