feat(outbound): refactored unified circuit breaker with success-rate and wired up load-biaser#4561
Conversation
| /// configured `decay` for any value at or above this floor. Flooring the bucket | ||
| /// width alone would instead round a decay between this floor and one bucket up | ||
| /// to this span without the caller asking for it. | ||
| const MIN_DECAY: Duration = |
There was a problem hiding this comment.
Throughout this PR we use the name "decay" for what I think is the "window". I understand we have "decay" in the proxy API because we had planned to use EWMA here originally. But I think we should use the more accurate name "window" throughout the proxy code and just include a note where we translate the "decay" parameter in the proxy API and use it for the "window".
If time permits, we can go back and rename this field in the proxy API.
| /// policy types to these before it builds the breaker, so the state machine never | ||
| /// depends on how the configuration is represented. | ||
| #[derive(Debug)] | ||
| pub(crate) struct UnifiedBreakerConfig { |
There was a problem hiding this comment.
Isn't this struct identical to UnifiedBreaker, besides having public fields?
There was a problem hiding this comment.
Yes, this needs cleaning up.
There was a problem hiding this comment.
Removed UnifiedBreakerConfig in b3adddb, and the breaker now takes its parameters directly like the consecutive-failures breaker.
| /// `Err` means the gate was lost. The probe may give no class when a | ||
| /// `ResponseFuture` is cancelled before its head resolves, or when a | ||
| /// classification send drops on a full channel. The deadline then keeps the | ||
| /// loop moving. A probe that is never routed cannot block the breaker or hold |
There was a problem hiding this comment.
I don't understand this need for a deadline or protection against an unrouted probe. The existing consecutive failures breaker has no such protections. Are we sure this is necessary?
There was a problem hiding this comment.
Yes. If a probe is sent we have a few scenarios in which the recv call is left pending forever, with the task parked and the gate left shut.
For example, a response to the probe may be received, but the classification is dropped because the channel is full, or no response is ever received, or the future is cancelled before the response head is resolved. In those cases we unpark the task via the deadline and try again with the next backoff.
There was a problem hiding this comment.
The current implementation does not have this protection: https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/app/outbound/src/http/breaker/consecutive_failures.rs#L98-L105
Are you saying that there's a bug in that code? As far as I know this has never manifested.
There was a problem hiding this comment.
Yes, there's a bug in that code, but it's not like it's permanent or growing in severity over time. If for whatever reason in the readiness of the endpoint changes, or when the destination controller pushes an updated set of addresses, it resets.
There was a problem hiding this comment.
We discussed keeping this consistent with the behavior of consecutive failures and fixing this bug in both breakers as a separate change.
There was a problem hiding this comment.
Partially removed via #4565 and then via aec5108, 1f659ac, a6e552b, f4847af, 43dd9ba, 57a13af, and 1eddfb0.
There's yet another bug reintroduced when merging #4565, which is the removal of the channel draining when probing: while self.rsps.try_recv().is_ok() {}. I assume this is made to keep parity with the existing breaker, so will defer to a separate fix.
| @@ -0,0 +1,1167 @@ | |||
| //! Retry-After and grpc-retry-pushback-ms parsing for rate limiting responses. | |||
There was a problem hiding this comment.
These RetryAfterStore and GrpcRetryPushbackStore introduce a lot of complexity that's challenging to reason about. There's a fundamental mismatch here where circuit breaking is a property of the endpoint but the retry-after hints are a property of individual responses. imagine a scenario where each responsive gave us a different retry-after value and many different responses contributed towards tripping the breaker.
what you have here is some kind of aggregate measure where we maintain a retry-after value for the endpoint which is influenced by each retry-after response. this introduces complexity and coupling.
I'd recommend either:
- taking retry-after out of project scope here. using retry-after to inform probation is potentially a good hint at how overloaded a server is, but using it in this way isn't what the spec intended and there's a fundamental mismatch between endpoint and response. plus, our jittered exponential backoff is probably going to give us behavior that's good enough, even if the server happens to be signaling a longer retry-after.
- or, we can make a good faith attempt to incorporate the retry-after hint by only considering the current response (i.e. the one that tripped the breaker, or the probe response). In this case, we can just put the hint on the Class and remove the stores. It's not perfect, but it doesn't need to be: we're just treating retry-after as an advisory hint.
There was a problem hiding this comment.
I don't think retry hints are a property of individual responses, and the spec for Retry-After makes is quite generic and brief, referencing a follow-up request and scoped to the service. It's common however to see these headers used for a variety of different meanings, with some rate limiters using them to signal different things (like gating resource consumption rates, instead of scoping to the service). This is why there's a draft for rate limiting headers I worked on to provide a scope (and other details).
I think removing the support for hints detracts from the value proposition. If we see the headers in the outbound stack chances are high that they'll refer to the endpoint, and keeping the last-seen hint would be useful in cases where the last response does not provide it, which in an ideal world would not happen, but it can happen and you can safely assume the endpoint is probably as overloaded or more as it was when we last read the hint. Ultimately the motivation is to help avoid the extra load on the endpoint caused by quick probes sent to it that will be unlikely to succeed (the early backoff ones), using the most recent information we got with the deadline we got (so a 30s R-A doesn't necessarily delay probes for 30s, but until the time the R-A asked for if we tripped later), capping the value so that we still keep probing in a reasonable timeframe.
I accept the complexity argument, and "good enough" being a thing. I think we can do better, though, and some others do so (admittedly, their breaker architecture is different and they have specific separate retry layers, but we have to start somewhere).
With that said, I don't think we need a non-core feature now, and can rethink or refactor this, or one of your proposals, later on.
There was a problem hiding this comment.
We discussed keeping this out of scope for this initial work.
| //! extends only that endpoint's backoff and one endpoint's failures never gate | ||
| //! another. | ||
| //! | ||
| //! When failure accrual is set up for the target and the policy respects server |
There was a problem hiding this comment.
I think with my above suggestion about retry-after, this file would no longer be needed.
There was a problem hiding this comment.
Yes, I think that's the case.
| } | ||
| } | ||
|
|
||
| impl<T> svc::Param<linkerd_proxy_client_policy::PenaltyPeakEwma> for Balance<T> { |
There was a problem hiding this comment.
I'm not as familiar with how stacks are put together and I don't think we should get into it here so this is not a blocker for this PR. But I do want to raise it as a question for my own education and potentially a note for future cleanup:
Could Balance be parameterized on the specific Load type (Balance<T, L>) so that we can impl only the correct Param types for each Load measure?
There was a problem hiding this comment.
I think so, with a bit of refactoring.
| classify::Class::Http(Err(_)) => false, // 5xx failure. | ||
| // RESOURCE_EXHAUSTED is handled above, so any other accepted Ok code | ||
| // is a recovered probe. | ||
| classify::Class::Grpc(Ok(_)) => true, |
There was a problem hiding this comment.
I think only Grpc(Ok(0)) should be treated as success here. All non-zero gRPC status codes should be treated as failures.
There was a problem hiding this comment.
You can have Grpc(Ok(NOT_FOUND)), and others, that shouldn't fail a probe, and note that class.is_success() returns true already for Grpc(Ok(_)).
I think I can remove a few of these entries and delegate to class.is_success() directly, though.
There was a problem hiding this comment.
Cleaned this up in eefc2bb -- there's arguably a bug I won't fix here (deferred to a separate PR) to keep parity with CF: gRPC's PERMISSION_DENIED is classified as an error and the breaker will count it as such, but it should be exempted from contributing towards a trip.
| resp.attach_parsed_rate_limit_hint(); | ||
| let penalty = handle.shared.effective_rtt(hint, &resp); | ||
| tracing::debug!( | ||
| rtt_secs = penalty.as_secs_f64(), | ||
| ?hint, | ||
| "Failure recorded as effective RTT" | ||
| ); | ||
| handle.penalty = Some(penalty); | ||
| drop(handle); | ||
| None | ||
| } else { | ||
| Some(handle) | ||
| }; | ||
|
|
||
| // Successful responses keep the handle until completion. | ||
| // Failed responses pass `None` since their penalty was recorded | ||
| // above. | ||
| let value = handle.shared.effective_rtt(hint, &resp); |
There was a problem hiding this comment.
Here we cache the hint and then immediately read it. There isn't anywhere else we read this cached hint. So I think we don't need this caching at all and can simply parse the hint and use it immediately.
There was a problem hiding this comment.
Deferred to another PR.
b3adddb to
8544696
Compare
The `ExponentialBackoff` strategy keeps its minimum and maximum durations in private fields, so callers that hold a backoff have no way to read the window it covers. Add public accessors that return those two durations. They let a caller clamp an externally supplied delay to the configured backoff window. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The broadcast classification channel reported every dropped send the same way, which conflated two unrelated conditions. When no breaker consumes classifications, the receiver is dropped and the channel closes for good. On the default path that is the steady state and not a fault, so logging it only adds noise. A full channel differs. A consumer exists yet does not drain fast enough, and that backpressure is worth surfacing. Route every send site through a single helper that inspects the try-send error. A closed channel now emits a quiet trace line noting there is no consumer, while a full channel emits a debug line flagging the backpressure. Telling the two apart keeps the common no-consumer case silent without hiding the one operators care about. Replace the derived Debug on the channel state with a hand-written impl so the formatting no longer demands that the class type itself be Debug, which it need not be at every use site. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Introduce a SuccessRateWindow primitive the unified circuit breaker will consult to tell whether the recent fraction of healthy responses has fallen below a threshold. It tracks the ratio with a ring of ten fixed-duration buckets that span a configurable decay. Recording a response advances the ring to now, zeroes any bucket aged out of the window, then tallies the sample into the live bucket. A check sums the live buckets and trips when the window holds a minimum count of requests and the ratio is below the threshold. An idle gap past the window clears every bucket so a quiet endpoint starts cold. The breaker needs a rate-independent measure here. A decaying moving average weights each sample by how long since the previous one, so a burst of failures in a row gives each sample almost no weight and can hide a complete outage, while the same failures spread out trip sooner. Exact counts over a fixed window have no such blind spot. A given fraction of failures reaches the same decision regardless of arrival rate, the property an operator expects from a success-rate threshold. The module takes plain parameters rather than a configuration type, so it stays independent of how the breaker is configured. It is not yet wired into the breaker. A later change connects it. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
A server under load can tell a client how long to wait before trying again, through an HTTP Retry-After header or a gRPC grpc-retry-pushback-ms trailer on a RESOURCE_EXHAUSTED response. The circuit breaker should honor that signal rather than relying only on its own escalating backoff, so a backend that asks for a longer pause gets one and a backend that recovers fast is not punished beyond what it requested. This adds the per-endpoint plumbing that captures those hints. A duration hint store keeps the latest hint, last value wins, stamped with the instant it was recorded so an old value can be detected and dropped rather than replayed into a later backoff cycle. Keeping the freshest hint lets a recovering server that lowers its pushback be honored at the lower value. Taking a hint subtracts the time already spent waiting, so the breaker sees only the remaining delay, and an exhausted or overshot hint clears the slot. HTTP and gRPC hints live in separate stores kept strictly by source, and gRPC is parsed only on a genuine gRPC response, gated on a 200 OK so a 429 with a spurious grpc-status cannot leak across. A free function drains both stores and clamps each hint into the probe backoff's minimum and maximum before taking the larger, so one response can neither shorten the base backoff nor push past the ceiling the breaker escalates toward. The classifiers wrap an inner one and record hints as a side effect, leaving every decision unchanged. Nothing consumes the stores yet, and the core gRPC status accessor becomes public for that next step. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Introduce a circuit breaker that watches two failure signals at once. A consecutive-failure count reacts the instant an endpoint hard-fails but stays silent under a partial outage, where requests fail steadily yet never string together a long enough run to trip. A windowed success ratio catches that partial degradation, though it lags on a total outage since it must first gather its minimum sample. Running both in one breaker pairs the fast reaction of the count with the partial coverage of the ratio, so either kind of failure opens the circuit. The ratio comes from a time-windowed counter rather than a time-decayed average, so the trip decision tracks the failure fraction and not the request rate. A tight burst and the same burst spread out reach the same verdict. The ratio dimension is protected at first start and cannot trip until enough samples sit in its window, and a threshold of zero disables it. The engine takes only primitive parameters and the threshold is a plain fraction the caller supplies, so it never sees how the policy is represented at the configuration layer. The state machine has three states. Open accepts traffic while tracking both signals. Shut rejects traffic and runs the backoff, re-reading the combined Retry-After and gRPC hint each iteration so a fresh, longer hint from a later probe raises the floor for the waits that follow. Probation admits one probe, bounded by the backoff ceiling rather than the window just waited. The probe is strict. An HTTP probe must be non-5xx and non-429, and a gRPC probe passes for any class other than RESOURCE_EXHAUSTED, since a 429 there means the endpoint is still rate limiting. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Add the per-endpoint gating layer that connects each balancer endpoint's response classification stream to its own breaker. The gate set reads the target's failure accrual policy and, for each endpoint, builds a gate whose readiness a breaker task controls and a classifier that broadcasts each classification over that breaker's channel. Isolation is the point. The stores that hold Retry-After and gRPC pushback hints, and the breaker task that reads them, are built per endpoint, so a hint observed on one endpoint extends only that endpoint's backoff and a run of failures on one never gates another. When the policy respects server hints, the classifier is wrapped so the inner broadcaster records those hints into the endpoint's stores. A policy that leaves the hint flag off never reads the stores and skips that wrapper. A policy that can never trip costs the same as none at all, so a missing or inert policy resolves to a no-op path: no breaker task and no hint stores get allocated, and the gate simply never shuts. Hints are clamped to the chosen policy's maximum backoff, the same ceiling the breaker applies, so an oversized header is never held in a store until the breaker would discard it. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Failure accrual was a closed enum of None or ConsecutiveFailures that every HTTP and gRPC protocol config held by value. That left no room for the new success-rate breaker and forced the disabled state to be its own variant. Model it as an opt-in oneof instead. Each protocol config now holds an Option, where None means accrual is off, so the disabled case lives at the config level. The enum becomes Consecutive or Unified. The consecutive policy counts failures in a row, while the unified policy adds a windowed success-rate threshold on top of a consecutive-failure ceiling. Both variants also hold a Retry-After preference. Store the success-rate threshold as basis points in a small newtype rather than a float. A float is neither Eq nor Hash, which a backend's cache identity requires, while an integer in the range zero to ten thousand is both and stays finer than any meaningful success-rate target. This keeps the protocol configs on their plain derived equality and hashing. The proto conversion validates the incoming fraction for range and NaN before quantizing. The proto conversion dispatches on the oneof kind, reads the Retry-After preference off the backoff message, and enforces ceilings on the cold-start request floor and the success-rate window. That window floor binds only under a nonzero threshold. A zero threshold disables the success-rate dimension, so any decay is accepted and the consecutive-failure ceiling stands alone. The outbound breaker wiring then maps an absent policy to a gate that never closes, a consecutive policy to the consecutive breaker, and a unified policy to the success-rate engine. The per-endpoint gate set threads each endpoint's Retry-After stores to its own breaker, so a hint on one endpoint never extends another's. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Move the backoff-wait loop, which optionally floors the wait on a server Retry-After or gRPC pushback hint, out of the unified breaker and into the retry-after module beside the hint stores. The unified breaker calls the shared helper; the consecutive-failures breaker keeps its own hint-free exponential backoff, so only the unified policy ever floors a wait on a server hint. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The balancer load model now describes a second estimator. Alongside plain peak-EWMA, a backend may select a variant that penalizes endpoints that signal rate limiting, biasing P2C toward healthier peers. Extend the client-policy Load enum with a PenaltyPeakEwma variant and hold the strategy's fields verbatim from the control-plane policy: the RTT seed and decay window, the penalty magnitude and its own decay, and the cap on how far a Retry-After hint may extend that penalty. The policy layer only records them as part of the backend's identity. The balancer maps them onto its estimator later. Every field is a duration, so the new struct and the widened enum keep deriving Eq and Hash and remain part of a backend's cache key, matching the existing peak-EWMA strategy. The proto conversion gains the matching branch. Since each field is optional on the wire, an absent value takes a documented default while a value that is present but invalid still surfaces an error rather than being silently discarded. A small helper expresses that decoding next to the existing required-field decoder. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Pin the failure-accrual proto conversion across both branches of the discovery oneof and the rules that guard the success-rate path. On the consecutive branch the failure ceiling comes through, and the Retry-After preference is read off the backoff message both when the hint is requested and when it is left unset. A branch with no backoff is treated as missing, and an accrual with no kind at all is rejected too. On the unified branch the wire fraction lands in the basis-points threshold rather than a raw float, so the threshold can stay part of a backend's cache identity. The measurement window defaults to ten seconds when absent. A zero threshold disables the success-rate dimension, so a sub-floor window is accepted there and the consecutive-failure ceiling stands on its own. Thresholds outside the unit interval, NaN among them, are rejected at the boundary before any rounding, and a populated ejection field is ignored since the conversion reads only the discovery kind. This boundary is the single place where untrusted control-plane numbers become typed policy. A regression here would either crash a backend or admit a breaker configuration that can never trip. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Give the P2C balancer a second load estimator alongside its Tower peak-EWMA wrapper. The new one tracks load with the response-aware biaser, so an endpoint returning rate-limit signals is de-prioritized for a penalty window. NewPenaltyPeakEwmaBalance builds each endpoint through NewLoadBiaser and serves only backends whose policy opts into penalties. A shared helper gives both paths the same queue, metrics, and endpoint setup, so only the per-endpoint load tracker and pool differ. Nothing selects the penalty path yet, so the change is additive. A backend's PenaltyPeakEwma policy maps onto the biaser configuration field for field, with one exception. The policy has a separate penalty_decay, yet the biaser records a penalized response as a raised effective RTT that fades through the same RTT EWMA window. No second decay remains to drive, so penalty_decay folds into rtt_decay and drops from the mapping. Both paths also floor the seed RTT at MIN_DEFAULT_RTT, one millisecond, since the estimate scales with the seed and a near-zero seed would let P2C over-select a fresh endpoint. RTT is sampled at the first response data frame. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The HTTP balance layer now chooses its load estimator from the backend's policy. NewBalance reads the load policy from the target and dispatches to svc::Either: the PeakEwma branch builds the existing peak-EWMA balancer, and the PenaltyPeakEwma branch builds the response-aware penalty estimator that de-prioritizes endpoints returning rate-limit signals. The default stays peak-EWMA, so a backend that does not opt in behaves exactly as before, and RTT is still sampled at the first response data frame on either path. Since the two pool service types differ, each branch boxes its response body so the two unify to a single response type. Boxing does not change when first-data load completion fires, as the underlying handle still observes the boxed body when polled and the balancer's response body is boxed downstream regardless. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
Hold the backend's load policy on the balance target so the HTTP balancer can pick its endpoint-load estimator per backend. The concrete balance dispatch now holds the client-policy load oneof in place of a bare EWMA configuration, and the target gains what the selector reads: the load itself, an EWMA configuration derived from the named estimator, and a penalty configuration that defaults to penalty-free for peak-EWMA so the parameter stays total. A Load::peak_ewma_rtt() helper reads the decay and seed RTT out of either estimator for the EWMA call sites. The logical, profile, and policy routers pick the load policy that flows into the dispatch. Profiles have no penalty configuration, so that path uses a peak-EWMA default whose RTT settings match the prior balancer, and a backend opting into nothing behaves as before. Opaque and TLS routing lack HTTP response classification, so they take only the RTT settings, and when an operator did set a real penalty the router warns it ignores that estimator so the drop stays visible. The control-plane balancer is likewise RTT-only and uses peak-EWMA directly. The per-backend selector boxes each branch's response body, relaxing its bounds so the penalty branch may use a distinct, independently boxable body. That branch samples penalty load at the first response data frame and wraps the endpoint body in the biaser's completion type, which differs from the peak-EWMA wrapper. Both bodies box to one response type regardless. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Add integration tests that drive the unified circuit breaker through the dispatch the outbound stack uses to spawn a policy per endpoint, rather than poking the engine internals directly. The unit tests already pin the state machine, so these pin the wiring that turns a failure-accrual policy into a running breaker. Both trip conditions are covered: a run of consecutive failures opens the circuit at once with no cold-start guard, and a low windowed success rate opens it when no such run forms. Recovery goes through bounded probation. A clean probe reopens the circuit, while a failed or silent probe re-shuts it and advances the backoff so a still-broken endpoint stays ejected. The two probe verdicts are covered on both breaker branches, since the consecutive branch keeps judging a probe by the default classifier while the unified branch treats a rate-limit signal during probation as a failure. Server pushback is covered as an opt-in backoff floor clamped to the ceiling, and isolated per endpoint so one peer's hint cannot stretch another endpoint's recovery. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
This drops the retry hint support from breaker backoffs. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The unified and consecutive-failures breakers no longer read Retry-After or gRPC pushback hints, so the tests that record a hint and assert a floored backoff no longer serve their purpose, so remove them. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The unified failure-accrual policy still had respect_retry_after_hint, but the breaker stopped reading it when Retry-After support was removed. Drop it. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The Retry-After and gRPC pushback hint stores are gone, so remove doc references to them. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
With the unified policy's respect_retry_after_hint field gone, the tests asserting or setting the flag are no longer useful. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The success-rate window is spec'ed on the proto as decay, a name kept from an earlier EWMA design. Note the conversion so the rename to window is explained where the field is being read. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The merge dropping the R-A support machinery for breakers also removed the breaker's probe deadline to avoid stalled gates, matching the recovery to the consecutive-failures breaker. However a couple tests were still relying on the probe timeout, so remove them and reword comments to match current code. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Removing the probe deadline deleted the timeout that re-shut a silent probe. This test stayed behind and still asserted that re-shut. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The deadline removal rendered this test's name and comments stale, so rename it and adapt adjust its comments. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
Besides the initial check for a rate-limiting signal, delegate to the class' is_success method, which covers the same cases we were manually checking. There is a latent problem here in which arguably we should check for gRPC's PERMISSION_DENIED, since it's classified as an error but it should really not be considered an overloading signal. While at it also correct module docs and add tests. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
The config struct duplicated the breaker fields and fed the constructor. Take the parameters directly the way the consecutive failures breaker does, so there is one field list and no extra struct to maintain. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>
8544696 to
713e33d
Compare
|
Rebased, solved conflicts and removed TODO markers. |
This supersedes #4546.
Here we adapt to linkerd2-proxy-api 0.20.0 (but see below for notes), with a unified breaker (opt-in) that uses both consecutive-failures and success-rate policies, supporting pushback hints. This PR also wires the load-biaser (opt-in).
All features are opt-in, with the explicit goal of leaving default behavior as-is.
[additional details on the PR TBD here -- for the time being the note below is the other most important change]
Note
A relevant note on a design direction question: the success-rate policy in this branch has been switched from using an internal EWMA (similar to Finagle) to a time-based fixed bucket sliding window (similar to Envoy, gRPC and others). This has been done because the EWMA has some edge cases, particularly on low traffic volume, that are cleanly solved by the sliding window due to being rate-independent:
Quoted from Finagle:
The result is that at low traffic with the exponentially moving average a new sample can skew the rate very quickly and trip due to having a large weight. The rationale for a switch is that we are probably going to hit the low traffic case in many cases, potentially confusing users, while the high traffic cases remain covered -- and additionally, instinctively "45 out of 100 failed in the last 10s" seems easier to understand.
This, however, means that the messages in linkerd2-proxy-api, while workable, are (ab)used to signal different things (ie. decay would now be a window length), and while we don't need technically to change the API, it'd be good to do so (we can tune the annotation surface and still use the same 0.20.0 shape though, but that is confusing).
Rolling back to use an EWMA is still an option, though, so if you'd prefer that make it clear in the comments.