Skip to content

Distinguish zero concurrency from slow/failed scraping when bucketing #8610

@julz

Description

@julz

Describe the feature

Currently we do not differentiate between a scrape that actually reports zero concurrency from a replica and just not having data for a particular bucket. This is fine if the network is fast and autoscaler is not overloaded because we will have data ~every second, but on a slow or overloaded network (or e.g. with a resource constrained host => slow QP response to scrapes) it could cause issues: when we average over the bucket we could think we have lower load than we do, and scale down (or fail to scale up) replicas incorrectly.

(This is somewhat related to #8377 in that if we introduce a work pool there's a greater danger of things backed up in the queue not getting stats every second).

Metadata

Metadata

Labels

area/autoscalehelp wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/featureWell-understood/specified features, ready for coding.triage/acceptedIssues which should be fixed (post-triage)

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions