Skip to content

feat(outbound): refactored unified circuit breaker with success-rate and wired up load-biaser#4561

Merged
unleashed merged 29 commits into
mainfrom
amr/refactored-unified-breaker
Jun 11, 2026
Merged

feat(outbound): refactored unified circuit breaker with success-rate and wired up load-biaser#4561
unleashed merged 29 commits into
mainfrom
amr/refactored-unified-breaker

Conversation

@unleashed

Copy link
Copy Markdown
Member

This supersedes #4546.

Here we adapt to linkerd2-proxy-api 0.20.0 (but see below for notes), with a unified breaker (opt-in) that uses both consecutive-failures and success-rate policies, supporting pushback hints. This PR also wires the load-biaser (opt-in).

All features are opt-in, with the explicit goal of leaving default behavior as-is.

[additional details on the PR TBD here -- for the time being the note below is the other most important change]

Note

A relevant note on a design direction question: the success-rate policy in this branch has been switched from using an internal EWMA (similar to Finagle) to a time-based fixed bucket sliding window (similar to Envoy, gRPC and others). This has been done because the EWMA has some edge cases, particularly on low traffic volume, that are cleanly solved by the sliding window due to being rate-independent:

Quoted from Finagle:

Returns a policy based on an exponentially-weighted moving average success rate over a window of requests. A moving average is used so the success rate calculation is biased towards more recent requests; for an endpoint with low traffic, the window will span a longer time period and early successes and failures may not accurately reflect the current health.

The result is that at low traffic with the exponentially moving average a new sample can skew the rate very quickly and trip due to having a large weight. The rationale for a switch is that we are probably going to hit the low traffic case in many cases, potentially confusing users, while the high traffic cases remain covered -- and additionally, instinctively "45 out of 100 failed in the last 10s" seems easier to understand.

This, however, means that the messages in linkerd2-proxy-api, while workable, are (ab)used to signal different things (ie. decay would now be a window length), and while we don't need technically to change the API, it'd be good to do so (we can tune the annotation surface and still use the same 0.20.0 shape though, but that is confusing).

Rolling back to use an EWMA is still an option, though, so if you'd prefer that make it clear in the comments.

/// configured `decay` for any value at or above this floor. Flooring the bucket
/// width alone would instead round a decay between this floor and one bucket up
/// to this span without the caller asking for it.
const MIN_DECAY: Duration =

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout this PR we use the name "decay" for what I think is the "window". I understand we have "decay" in the proxy API because we had planned to use EWMA here originally. But I think we should use the more accurate name "window" throughout the proxy code and just include a note where we translate the "decay" parameter in the proxy API and use it for the "window".

If time permits, we can go back and rename this field in the proxy API.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this this has been fixed via the merging of #4565 and the note's been added in fbd850a.

/// policy types to these before it builds the breaker, so the state machine never
/// depends on how the configuration is represented.
#[derive(Debug)]
pub(crate) struct UnifiedBreakerConfig {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this struct identical to UnifiedBreaker, besides having public fields?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this needs cleaning up.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed UnifiedBreakerConfig in b3adddb, and the breaker now takes its parameters directly like the consecutive-failures breaker.

Comment on lines +477 to +480
/// `Err` means the gate was lost. The probe may give no class when a
/// `ResponseFuture` is cancelled before its head resolves, or when a
/// classification send drops on a full channel. The deadline then keeps the
/// loop moving. A probe that is never routed cannot block the breaker or hold

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this need for a deadline or protection against an unrouted probe. The existing consecutive failures breaker has no such protections. Are we sure this is necessary?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If a probe is sent we have a few scenarios in which the recv call is left pending forever, with the task parked and the gate left shut.

For example, a response to the probe may be received, but the classification is dropped because the channel is full, or no response is ever received, or the future is cancelled before the response head is resolved. In those cases we unpark the task via the deadline and try again with the next backoff.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation does not have this protection: https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/app/outbound/src/http/breaker/consecutive_failures.rs#L98-L105

Are you saying that there's a bug in that code? As far as I know this has never manifested.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's a bug in that code, but it's not like it's permanent or growing in severity over time. If for whatever reason in the readiness of the endpoint changes, or when the destination controller pushes an updated set of addresses, it resets.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed keeping this consistent with the behavior of consecutive failures and fixing this bug in both breakers as a separate change.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially removed via #4565 and then via aec5108, 1f659ac, a6e552b, f4847af, 43dd9ba, 57a13af, and 1eddfb0.

There's yet another bug reintroduced when merging #4565, which is the removal of the channel draining when probing: while self.rsps.try_recv().is_ok() {}. I assume this is made to keep parity with the existing breaker, so will defer to a separate fix.

@@ -0,0 +1,1167 @@
//! Retry-After and grpc-retry-pushback-ms parsing for rate limiting responses.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These RetryAfterStore and GrpcRetryPushbackStore introduce a lot of complexity that's challenging to reason about. There's a fundamental mismatch here where circuit breaking is a property of the endpoint but the retry-after hints are a property of individual responses.  imagine a scenario where each responsive gave us a different retry-after value and many different responses contributed towards tripping the breaker.

what you have here is some kind of aggregate measure where we maintain a retry-after value for the endpoint which is influenced by each retry-after response.  this introduces complexity and coupling.
I'd recommend either:

  • taking retry-after out of project scope here.  using retry-after to inform probation is potentially a good hint at how overloaded a server is, but using it in this way isn't what the spec intended and there's a fundamental mismatch between endpoint and response.  plus, our jittered exponential backoff is probably going to give us behavior that's good enough, even if the server happens to be signaling a longer retry-after.
  • or, we can make a good faith attempt to incorporate the retry-after hint by only considering the current response (i.e. the one that tripped the breaker, or the probe response).  In this case, we can just put the hint on the Class and remove the stores.  It's not perfect, but it doesn't need to be: we're just treating retry-after as an advisory hint.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think retry hints are a property of individual responses, and the spec for Retry-After makes is quite generic and brief, referencing a follow-up request and scoped to the service. It's common however to see these headers used for a variety of different meanings, with some rate limiters using them to signal different things (like gating resource consumption rates, instead of scoping to the service). This is why there's a draft for rate limiting headers I worked on to provide a scope (and other details).

I think removing the support for hints detracts from the value proposition. If we see the headers in the outbound stack chances are high that they'll refer to the endpoint, and keeping the last-seen hint would be useful in cases where the last response does not provide it, which in an ideal world would not happen, but it can happen and you can safely assume the endpoint is probably as overloaded or more as it was when we last read the hint. Ultimately the motivation is to help avoid the extra load on the endpoint caused by quick probes sent to it that will be unlikely to succeed (the early backoff ones), using the most recent information we got with the deadline we got (so a 30s R-A doesn't necessarily delay probes for 30s, but until the time the R-A asked for if we tripped later), capping the value so that we still keep probing in a reasonable timeframe.

I accept the complexity argument, and "good enough" being a thing. I think we can do better, though, and some others do so (admittedly, their breaker architecture is different and they have specific separate retry layers, but we have to start somewhere).

With that said, I don't think we need a non-core feature now, and can rethink or refactor this, or one of your proposals, later on.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed keeping this out of scope for this initial work.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed via #4565.

//! extends only that endpoint's backoff and one endpoint's failures never gate
//! another.
//!
//! When failure accrual is set up for the target and the policy respects server

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with my above suggestion about retry-after, this file would no longer be needed.

@unleashed unleashed Jun 10, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's the case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed via #4565.

}
}

impl<T> svc::Param<linkerd_proxy_client_policy::PenaltyPeakEwma> for Balance<T> {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not as familiar with how stacks are put together and I don't think we should get into it here so this is not a blocker for this PR. But I do want to raise it as a question for my own education and potentially a note for future cleanup:

Could Balance be parameterized on the specific Load type (Balance<T, L>) so that we can impl only the correct Param types for each Load measure?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, with a bit of refactoring.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred.

classify::Class::Http(Err(_)) => false, // 5xx failure.
// RESOURCE_EXHAUSTED is handled above, so any other accepted Ok code
// is a recovered probe.
classify::Class::Grpc(Ok(_)) => true,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only Grpc(Ok(0)) should be treated as success here. All non-zero gRPC status codes should be treated as failures.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have Grpc(Ok(NOT_FOUND)), and others, that shouldn't fail a probe, and note that class.is_success() returns true already for Grpc(Ok(_)).

I think I can remove a few of these entries and delegate to class.is_success() directly, though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned this up in eefc2bb -- there's arguably a bug I won't fix here (deferred to a separate PR) to keep parity with CF: gRPC's PERMISSION_DENIED is classified as an error and the breaker will count it as such, but it should be exempted from contributing towards a trip.

Base automatically changed from amr/load-biaser to main June 11, 2026 08:54
Comment thread linkerd/load-biaser/src/lib.rs Outdated
Comment on lines +535 to +536
resp.attach_parsed_rate_limit_hint();
let penalty = handle.shared.effective_rtt(hint, &resp);
tracing::debug!(
rtt_secs = penalty.as_secs_f64(),
?hint,
"Failure recorded as effective RTT"
);
handle.penalty = Some(penalty);
drop(handle);
None
} else {
Some(handle)
};

// Successful responses keep the handle until completion.
// Failed responses pass `None` since their penalty was recorded
// above.
let value = handle.shared.effective_rtt(hint, &resp);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we cache the hint and then immediately read it. There isn't anywhere else we read this cached hint. So I think we don't need this caching at all and can simply parse the hint and use it immediately.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred to another PR.

@unleashed unleashed force-pushed the amr/refactored-unified-breaker branch from b3adddb to 8544696 Compare June 11, 2026 17:49
unleashed and others added 19 commits June 11, 2026 20:03
The `ExponentialBackoff` strategy keeps its minimum and maximum
durations in private fields, so callers that hold a backoff have no way
to read the window it covers. Add public accessors that return those
two durations. They let a caller clamp an externally supplied delay to
the configured backoff window.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The broadcast classification channel reported every dropped send the
same way, which conflated two unrelated conditions. When no breaker
consumes classifications, the receiver is dropped and the channel
closes for good. On the default path that is the steady state and not a
fault, so logging it only adds noise. A full channel differs. A
consumer exists yet does not drain fast enough, and that backpressure
is worth surfacing.

Route every send site through a single helper that inspects the
try-send error. A closed channel now emits a quiet trace line noting
there is no consumer, while a full channel emits a debug line flagging
the backpressure. Telling the two apart keeps the common no-consumer
case silent without hiding the one operators care about.

Replace the derived Debug on the channel state with a hand-written impl
so the formatting no longer demands that the class type itself be
Debug, which it need not be at every use site.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Introduce a SuccessRateWindow primitive the unified circuit breaker will
consult to tell whether the recent fraction of healthy responses has
fallen below a threshold. It tracks the ratio with a ring of ten
fixed-duration buckets that span a configurable decay. Recording a
response advances the ring to now, zeroes any bucket aged out of the
window, then tallies the sample into the live bucket. A check sums the
live buckets and trips when the window holds a minimum count of requests
and the ratio is below the threshold. An idle gap past the window clears
every bucket so a quiet endpoint starts cold.

The breaker needs a rate-independent measure here. A decaying moving
average weights each sample by how long since the previous one, so a
burst of failures in a row gives each sample almost no weight and can
hide a complete outage, while the same failures spread out trip sooner.
Exact counts over a fixed window have no such blind spot. A given
fraction of failures reaches the same decision regardless of arrival
rate, the property an operator expects from a success-rate threshold.

The module takes plain parameters rather than a configuration type, so
it stays independent of how the breaker is configured. It is not yet
wired into the breaker. A later change connects it.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
A server under load can tell a client how long to wait before trying
again, through an HTTP Retry-After header or a gRPC
grpc-retry-pushback-ms trailer on a RESOURCE_EXHAUSTED response. The
circuit breaker should honor that signal rather than relying only on its
own escalating backoff, so a backend that asks for a longer pause gets
one and a backend that recovers fast is not punished beyond what it
requested.

This adds the per-endpoint plumbing that captures those hints. A
duration hint store keeps the latest hint, last value wins, stamped with
the instant it was recorded so an old value can be detected and dropped
rather than replayed into a later backoff cycle. Keeping the freshest
hint lets a recovering server that lowers its pushback be honored at the
lower value. Taking a hint subtracts the time already spent waiting, so
the breaker sees only the remaining delay, and an exhausted or overshot
hint clears the slot.

HTTP and gRPC hints live in separate stores kept strictly by source, and
gRPC is parsed only on a genuine gRPC response, gated on a 200 OK so a
429 with a spurious grpc-status cannot leak across. A free function
drains both stores and clamps each hint into the probe backoff's minimum
and maximum before taking the larger, so one response can neither
shorten the base backoff nor push past the ceiling the breaker escalates
toward. The classifiers wrap an inner one and record hints as a side
effect, leaving every decision unchanged. Nothing consumes the stores
yet, and the core gRPC status accessor becomes public for that next
step.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Introduce a circuit breaker that watches two failure signals at once. A
consecutive-failure count reacts the instant an endpoint hard-fails but
stays silent under a partial outage, where requests fail steadily yet
never string together a long enough run to trip. A windowed success
ratio catches that partial degradation, though it lags on a total
outage since it must first gather its minimum sample. Running both in
one breaker pairs the fast reaction of the count with the partial
coverage of the ratio, so either kind of failure opens the circuit.

The ratio comes from a time-windowed counter rather than a time-decayed
average, so the trip decision tracks the failure fraction and not the
request rate. A tight burst and the same burst spread out reach the
same verdict. The ratio dimension is protected at first start and
cannot trip until enough samples sit in its window, and a threshold of
zero disables it. The engine takes only primitive parameters and the
threshold is a plain fraction the caller supplies, so it never sees how
the policy is represented at the configuration layer.

The state machine has three states. Open accepts traffic while tracking
both signals. Shut rejects traffic and runs the backoff, re-reading the
combined Retry-After and gRPC hint each iteration so a fresh, longer
hint from a later probe raises the floor for the waits that follow.
Probation admits one probe, bounded by the backoff ceiling rather than
the window just waited. The probe is strict. An HTTP probe must be
non-5xx and non-429, and a gRPC probe passes for any class other than
RESOURCE_EXHAUSTED, since a 429 there means the endpoint is still rate
limiting.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Add the per-endpoint gating layer that connects each balancer
endpoint's response classification stream to its own breaker. The gate
set reads the target's failure accrual policy and, for each endpoint,
builds a gate whose readiness a breaker task controls and a classifier
that broadcasts each classification over that breaker's channel.

Isolation is the point. The stores that hold Retry-After and gRPC
pushback hints, and the breaker task that reads them, are built per
endpoint, so a hint observed on one endpoint extends only that
endpoint's backoff and a run of failures on one never gates another.
When the policy respects server hints, the classifier is wrapped so the
inner broadcaster records those hints into the endpoint's stores. A
policy that leaves the hint flag off never reads the stores and skips
that wrapper.

A policy that can never trip costs the same as none at all, so a
missing or inert policy resolves to a no-op path: no breaker task and
no hint stores get allocated, and the gate simply never shuts. Hints
are clamped to the chosen policy's maximum backoff, the same ceiling
the breaker applies, so an oversized header is never held in a store
until the breaker would discard it.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Failure accrual was a closed enum of None or ConsecutiveFailures that
every HTTP and gRPC protocol config held by value. That left no room for
the new success-rate breaker and forced the disabled state to be its own
variant. Model it as an opt-in oneof instead. Each protocol config now
holds an Option, where None means accrual is off, so the disabled case
lives at the config level. The enum becomes Consecutive or Unified. The
consecutive policy counts failures in a row, while the unified policy
adds a windowed success-rate threshold on top of a consecutive-failure
ceiling. Both variants also hold a Retry-After preference.

Store the success-rate threshold as basis points in a small newtype
rather than a float. A float is neither Eq nor Hash, which a backend's
cache identity requires, while an integer in the range zero to ten
thousand is both and stays finer than any meaningful success-rate
target. This keeps the protocol configs on their plain derived equality
and hashing. The proto conversion validates the incoming fraction for
range and NaN before quantizing.

The proto conversion dispatches on the oneof kind, reads the Retry-After
preference off the backoff message, and enforces ceilings on the
cold-start request floor and the success-rate window. That window floor
binds only under a nonzero threshold. A zero threshold disables the
success-rate dimension, so any decay is accepted and the
consecutive-failure ceiling stands alone. The outbound breaker wiring
then maps an absent policy to a gate that never closes, a consecutive
policy to the consecutive breaker, and a unified policy to the
success-rate engine. The per-endpoint gate set threads each endpoint's
Retry-After stores to its own breaker, so a hint on one endpoint never
extends another's.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Move the backoff-wait loop, which optionally floors the wait on a
server Retry-After or gRPC pushback hint, out of the unified breaker
and into the retry-after module beside the hint stores. The unified
breaker calls the shared helper; the consecutive-failures breaker
keeps its own hint-free exponential backoff, so only the unified
policy ever floors a wait on a server hint.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The balancer load model now describes a second estimator. Alongside
plain peak-EWMA, a backend may select a variant that penalizes endpoints
that signal rate limiting, biasing P2C toward healthier peers. Extend
the client-policy Load enum with a PenaltyPeakEwma variant and hold the
strategy's fields verbatim from the control-plane policy: the RTT seed
and decay window, the penalty magnitude and its own decay, and the cap
on how far a Retry-After hint may extend that penalty. The policy layer
only records them as part of the backend's identity. The balancer maps
them onto its estimator later.

Every field is a duration, so the new struct and the widened enum keep
deriving Eq and Hash and remain part of a backend's cache key, matching
the existing peak-EWMA strategy. The proto conversion gains the matching
branch. Since each field is optional on the wire, an absent value takes
a documented default while a value that is present but invalid still
surfaces an error rather than being silently discarded. A small helper
expresses that decoding next to the existing required-field decoder.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Pin the failure-accrual proto conversion across both branches of the
discovery oneof and the rules that guard the success-rate path. On the
consecutive branch the failure ceiling comes through, and the
Retry-After preference is read off the backoff message both when the
hint is requested and when it is left unset. A branch with no backoff is
treated as missing, and an accrual with no kind at all is rejected too.

On the unified branch the wire fraction lands in the basis-points
threshold rather than a raw float, so the threshold can stay part of a
backend's cache identity. The measurement window defaults to ten seconds
when absent. A zero threshold disables the success-rate dimension, so a
sub-floor window is accepted there and the consecutive-failure ceiling
stands on its own. Thresholds outside the unit interval, NaN among them,
are rejected at the boundary before any rounding, and a populated
ejection field is ignored since the conversion reads only the discovery
kind.

This boundary is the single place where untrusted control-plane numbers
become typed policy. A regression here would either crash a backend or
admit a breaker configuration that can never trip.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
Give the P2C balancer a second load estimator alongside its Tower
peak-EWMA wrapper. The new one tracks load with the response-aware
biaser, so an endpoint returning rate-limit signals is de-prioritized
for a penalty window. NewPenaltyPeakEwmaBalance builds each endpoint
through NewLoadBiaser and serves only backends whose policy opts into
penalties. A shared helper gives both paths the same queue, metrics, and
endpoint setup, so only the per-endpoint load tracker and pool differ.
Nothing selects the penalty path yet, so the change is additive.

A backend's PenaltyPeakEwma policy maps onto the biaser configuration
field for field, with one exception. The policy has a separate
penalty_decay, yet the biaser records a penalized response as a raised
effective RTT that fades through the same RTT EWMA window. No second
decay remains to drive, so penalty_decay folds into rtt_decay and drops
from the mapping. Both paths also floor the seed RTT at MIN_DEFAULT_RTT,
one millisecond, since the estimate scales with the seed and a near-zero
seed would let P2C over-select a fresh endpoint. RTT is sampled at the
first response data frame.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
The HTTP balance layer now chooses its load estimator from the backend's
policy. NewBalance reads the load policy from the target and dispatches
to svc::Either: the PeakEwma branch builds the existing peak-EWMA
balancer, and the PenaltyPeakEwma branch builds the response-aware
penalty estimator that de-prioritizes endpoints returning rate-limit
signals. The default stays peak-EWMA, so a backend that does not opt in
behaves exactly as before, and RTT is still sampled at the first
response data frame on either path.

Since the two pool service types differ, each branch boxes its response
body so the two unify to a single response type. Boxing does not change
when first-data load completion fires, as the underlying handle still
observes the boxed body when polled and the balancer's response body is
boxed downstream regardless.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
Hold the backend's load policy on the balance target so the HTTP
balancer can pick its endpoint-load estimator per backend. The concrete
balance dispatch now holds the client-policy load oneof in place of a
bare EWMA configuration, and the target gains what the selector reads:
the load itself, an EWMA configuration derived from the named estimator,
and a penalty configuration that defaults to penalty-free for peak-EWMA
so the parameter stays total. A Load::peak_ewma_rtt() helper reads the
decay and seed RTT out of either estimator for the EWMA call sites.

The logical, profile, and policy routers pick the load policy that flows
into the dispatch. Profiles have no penalty configuration, so that path
uses a peak-EWMA default whose RTT settings match the prior balancer,
and a backend opting into nothing behaves as before. Opaque and TLS
routing lack HTTP response classification, so they take only the RTT
settings, and when an operator did set a real penalty the router warns
it ignores that estimator so the drop stays visible. The control-plane
balancer is likewise RTT-only and uses peak-EWMA directly.

The per-backend selector boxes each branch's response body, relaxing its
bounds so the penalty branch may use a distinct, independently boxable
body. That branch samples penalty load at the first response data frame
and wraps the endpoint body in the biaser's completion type, which
differs from the peak-EWMA wrapper. Both bodies box to one response type
regardless.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
Add integration tests that drive the unified circuit breaker through the
dispatch the outbound stack uses to spawn a policy per endpoint, rather
than poking the engine internals directly. The unit tests already pin
the state machine, so these pin the wiring that turns a failure-accrual
policy into a running breaker. Both trip conditions are covered: a run
of consecutive failures opens the circuit at once with no cold-start
guard, and a low windowed success rate opens it when no such run forms.

Recovery goes through bounded probation. A clean probe reopens the
circuit, while a failed or silent probe re-shuts it and advances the
backoff so a still-broken endpoint stays ejected. The two probe verdicts
are covered on both breaker branches, since the consecutive branch keeps
judging a probe by the default classifier while the unified branch
treats a rate-limit signal during probation as a failure.

Server pushback is covered as an opt-in backoff floor clamped to the
ceiling, and isolated per endpoint so one peer's hint cannot stretch
another endpoint's recovery.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
This drops the retry hint support from breaker backoffs.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
unleashed added 10 commits June 11, 2026 20:03
The unified and consecutive-failures breakers no longer read
Retry-After or gRPC pushback hints, so the tests that record a hint
and assert a floored backoff no longer serve their purpose, so
remove them.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The unified failure-accrual policy still had respect_retry_after_hint,
but the breaker stopped reading it when Retry-After support was
removed. Drop it.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The Retry-After and gRPC pushback hint stores are gone, so remove
doc references to them.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
With the unified policy's respect_retry_after_hint field gone, the
tests asserting or setting the flag are no longer useful.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The success-rate window is spec'ed on the proto as decay, a name
kept from an earlier EWMA design. Note the conversion so the rename
to window is explained where the field is being read.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The merge dropping the R-A support machinery for breakers also
removed the breaker's probe deadline to avoid stalled gates,
matching the recovery to the consecutive-failures breaker.

However a couple tests were still relying on the probe timeout,
so remove them and reword comments to match current code.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Removing the probe deadline deleted the timeout that re-shut a silent
probe. This test stayed behind and still asserted that re-shut.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The deadline removal rendered this test's name and comments stale,
so rename it and adapt adjust its comments.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Besides the initial check for a rate-limiting signal, delegate to
the class' is_success method, which covers the same cases we were
manually checking.

There is a latent problem here in which arguably we should check
for gRPC's PERMISSION_DENIED, since it's classified as an error
but it should really not be considered an overloading signal.

While at it also correct module docs and add tests.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The config struct duplicated the breaker fields and fed the
constructor. Take the parameters directly the way the consecutive
failures breaker does, so there is one field list and no extra struct
to maintain.

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
@unleashed unleashed force-pushed the amr/refactored-unified-breaker branch from 8544696 to 713e33d Compare June 11, 2026 18:06
@unleashed

Copy link
Copy Markdown
Member Author

Rebased, solved conflicts and removed TODO markers.

@unleashed unleashed merged commit 28007c1 into main Jun 11, 2026
15 of 16 checks passed
@unleashed unleashed deleted the amr/refactored-unified-breaker branch June 11, 2026 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants