refactored unified breaker with retry after by adleong · Pull Request #4566 · linkerd/linkerd2-proxy

adleong · 2026-06-09T22:32:59Z

This PR implements my feedback from #4561, simplifying retry-after hinting to only look at the most recent failure. In order to accomplish this, we store the retry-after hint on the classification.

Tests have not been updated.

Introduce linkerd-ewma, a general-purpose exponentially-weighted moving average crate. The crate provides five public methods on an Ewma struct: new (initializes with INFINITY sentinel), get (returns stored value), add (blends a new sample using exponential decay), add_peak (replaces stored value when the new sample exceeds it), and add_rate (derives a rate from the inverse of the elapsed interval and feeds it through add). This is being added in spite of tower::PeakEwma because this is not limited to middleware-based RTT computing. We specifically plan to use this implementation for a load biasing feature and a success-rate circuit breaker policy, which would otherwise not be possible. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Extend linkerd-ewma with the API surface needed for success-rate circuit breaking. A MIN_DECAY constant (1 ms) is now applied in both constructors so that a zero-duration decay never produces division-by-zero or NaN results in downstream arithmetic. New methods: new_with_value sets an explicit initial sample instead of the INFINITY sentinel, reset overwrites both value and timestamp for breaker recovery, and get_at projects the stored value forward through exponential decay without mutating internal state. Also add_peak is now decay-aware: it projects the stored value to the candidate timestamp before deciding whether to replace it, and it unconditionally replaces INFINITY so that the first real sample always takes effect even at the construction timestamp. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Add a retry_after module to linkerd-http-classify with shared parsing functions for extracting backoff hints from HTTP and gRPC responses. parse_retry_after handles 429/503 responses with both delay-seconds and HTTP-date formats per RFC 7231, capping the returned duration at a caller-specified maximum. parse_grpc_retry_pushback reads the grpc-retry-pushback-ms header per the gRPC A6 spec, rejecting negative values and capping positive ones. We use the httpdate crate for the actual RFC 7231 HTTP-date parsing. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

…re penalties Introduce the linkerd-load-biaser crate, which wraps any tower::Service to provide per-endpoint load metrics for P2C balancing. The crate tracks request latency via EWMA and injects penalties when failure responses are detected, steering traffic away from unhealthy endpoints. Penalty injection covers HTTP 429/503/5xx and gRPC RESOURCE_EXHAUSTED/UNAVAILABLE trailers-only responses (not streaming gRPC failures since we can only access headers here). For responses with backoff hints, Retry-After on HTTP 429/503 or grpc-retry-pushback-ms on gRPC trailers-only errors, the penalty is amplified so that the EWMA value remains meaningful through the server-requested backoff window. The amplification is clamped to prevent infinity from permanently disabling the endpoint. The load metric is computed as `max(rtt * (pending + 1), penalty)`, where `rtt` is the peak-EWMA latency, and `pending` is the number of in-flight requests. This is returned via tower::load::Load for direct P2C integration. The load biaser is disabled by default, preserving RTT-only behavior (PeakEwma equivalent), unless explicitly activated. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

These cover the complete load biasing lifecycle, including penalty injection, hint parsing, cancellation safety via PinnedDrop, and backwards-compatible behavior when disabled (ie. RTT-only behavior equivalent to PeakEwma). Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Co-authored-by: katelyn martin <kate@buoyant.io>

…_rate_limit_hint The _max parameter was accepted for API symmetry with rate_limit_hint(max) but intentionally unused: the method always caches the uncapped raw value so each consumer can apply its own cap via rate_limit_hint(max). Removing the parameter for now since we probably won't need it in the future, and if so we can always put it back in place. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

…or and accessor Make the inner Duration field private and provide CachedRateLimitHint::new() for construction and duration_capped(max) for reads. This prevents consumers from bypassing the per-caller cap that rate_limit_hint(max) enforces, since the cached value is intentionally uncapped. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Explain why a standalone EWMA crate exists instead of using Tower's RttEstimate: it is private, mutates on read, and cannot support the penalty dimension that failure-aware load balancing requires. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

The crate only uses tokio::time, so disable the default feature set to avoid pulling unnecessary features into the dependency declaration. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

The cancellation test uses tokio::sync::oneshot which requires the sync feature. This compiled only because workspace feature unification pulled it in from other crates. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Replace raw string literals with the module-level constant for consistency with how HTTP tests use http::header::RETRY_AFTER. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Consistent with Ewma::new which already has this attribute. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Inspect the grpc-status header only on HTTP 200 responses whose content-type starts with application/grpc. Without this a non-gRPC upstream that happens to include a grpc-status header would be considered a gRPC failure and penalized by the load biaser. The same check is applied to the gRPC retry-pushback-ms parsing in the ReponseFailureHint trait implementation. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Up until now we mapped every non-zero gRPC status code to FailureHint::InternalError, penalizing client errors like CANCELLED, INVALID_ARGUMENT, NOT_FOUND, etc. These don't indicate server health issues and should not steer traffic away from the endpoint. Restrict penalty injection to server-side error codes that indicate endpoint problems: UNKNOWN (2), DEADLINE_EXCEEDED (4), INTERNAL (13), and DATA_LOSS (15), alongside the existing RESOURCE_EXHAUSTED (8) and UNAVAILABLE (14) statuses. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Ensure only those gRPC status codes indicating server-side errors inject penalties. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Verify that consecutive 429 responses at 1s intervals keep the penalty at the configured level, confirming the EWMA peak resets the decayed value rather than accumulating. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Add a `last_update()` getter that returns the timestamp of the most recent EWMA update. Callers that need to detect staleness (ie. idle periods where the EWMA has decayed to the point that a single sample dominates) can compare this against the current time to detect this exact circumstance (and, for example, require more samples before taking decisions). Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Co-authored-by: katelyn martin <git@katelyn.world>

- Drop unused add_rate, last_update - Correct MIN_DECAY enforcement comment - Note on ignoring negative do-not-retry pushbacks Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

…A RTT We now now keep a single RTT EWMA and a load of `rtt * (pending + 1)`, exactly like Tower's PeakEwma. A success records its measured RTT, while a failure now records a computed effective RTT through the same peak-EWMA logic, using the Retry-After or grpc-retry-pushback hint when present, or otherwise penalizing the RTT with a base value. In-flight requests are now counted the way Tower's PeakEwma counts them, using Arc's strong count and measuring on cancellation. Finally an explicit completion tracker can use `PendingUntilFirstData` for measurement to more closely match previous behavior. `linkerd-ewma` is still a separate crate because we feed it a penalty value rather than a measured RTT, and since Tower's `RttEstimate` is private (at the moment) and advances its decay clock on read, it can't accept an injected observation nor be read under a shared lock. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

…fault Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

The gRPC A6 spec defines grpc-retry-pushback-ms as an i32. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Conveys meaning without coupling type nor constant. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Add a test exercising the case of a sample at or below the still undecayed peak. It should not replace the peak, but compute the value blending in the new sample. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Ensure that add() discards a sample whose timestamp is at or before the stored one. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

A hint below the base penalty such as 0 records a low effective RTT and can make a failing endpoint look healthier than it should. Ensure a failure's recorded measurement is at least the base penalty, so that retry hints take effect only when they exceed that penalty. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

test_rtt_tracked_after_request resolved instantly under paused time and only checked that the RTT moved minimally. Drive a request that takes a measurable delay and assert the recorded RTT reflects it. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Existing tests raised the pending count with disabled handles or a single request. Try now with two concurrent requests and assert the strong count reports two pending, then assert the count falls when they resolve. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Version 0.20.0 adds the proto surface the load-balancing and circuit breaker work builds on: the penalty peak-EWMA load variant and the unified failure-accrual kind. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

The `ExponentialBackoff` strategy keeps its minimum and maximum durations in private fields, so callers that hold a backoff have no way to read the window it covers. Add public accessors that return those two durations. They let a caller clamp an externally supplied delay to the configured backoff window. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

The broadcast classification channel reported every dropped send the same way, which conflated two unrelated conditions. When no breaker consumes classifications, the receiver is dropped and the channel closes for good. On the default path that is the steady state and not a fault, so logging it only adds noise. A full channel differs. A consumer exists yet does not drain fast enough, and that backpressure is worth surfacing. Route every send site through a single helper that inspects the try-send error. A closed channel now emits a quiet trace line noting there is no consumer, while a full channel emits a debug line flagging the backpressure. Telling the two apart keeps the common no-consumer case silent without hiding the one operators care about. Replace the derived Debug on the channel state with a hand-written impl so the formatting no longer demands that the class type itself be Debug, which it need not be at every use site. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Introduce a SuccessRateWindow primitive the unified circuit breaker will consult to tell whether the recent fraction of healthy responses has fallen below a threshold. It tracks the ratio with a ring of ten fixed-duration buckets that span a configurable decay. Recording a response advances the ring to now, zeroes any bucket aged out of the window, then tallies the sample into the live bucket. A check sums the live buckets and trips when the window holds a minimum count of requests and the ratio is below the threshold. An idle gap past the window clears every bucket so a quiet endpoint starts cold. The breaker needs a rate-independent measure here. A decaying moving average weights each sample by how long since the previous one, so a burst of failures in a row gives each sample almost no weight and can hide a complete outage, while the same failures spread out trip sooner. Exact counts over a fixed window have no such blind spot. A given fraction of failures reaches the same decision regardless of arrival rate, the property an operator expects from a success-rate threshold. The module takes plain parameters rather than a configuration type, so it stays independent of how the breaker is configured. It is not yet wired into the breaker. A later change connects it. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

A server under load can tell a client how long to wait before trying again, through an HTTP Retry-After header or a gRPC grpc-retry-pushback-ms trailer on a RESOURCE_EXHAUSTED response. The circuit breaker should honor that signal rather than relying only on its own escalating backoff, so a backend that asks for a longer pause gets one and a backend that recovers fast is not punished beyond what it requested. This adds the per-endpoint plumbing that captures those hints. A duration hint store keeps the latest hint, last value wins, stamped with the instant it was recorded so an old value can be detected and dropped rather than replayed into a later backoff cycle. Keeping the freshest hint lets a recovering server that lowers its pushback be honored at the lower value. Taking a hint subtracts the time already spent waiting, so the breaker sees only the remaining delay, and an exhausted or overshot hint clears the slot. HTTP and gRPC hints live in separate stores kept strictly by source, and gRPC is parsed only on a genuine gRPC response, gated on a 200 OK so a 429 with a spurious grpc-status cannot leak across. A free function drains both stores and clamps each hint into the probe backoff's minimum and maximum before taking the larger, so one response can neither shorten the base backoff nor push past the ceiling the breaker escalates toward. The classifiers wrap an inner one and record hints as a side effect, leaving every decision unchanged. Nothing consumes the stores yet, and the core gRPC status accessor becomes public for that next step. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Introduce a circuit breaker that watches two failure signals at once. A consecutive-failure count reacts the instant an endpoint hard-fails but stays silent under a partial outage, where requests fail steadily yet never string together a long enough run to trip. A windowed success ratio catches that partial degradation, though it lags on a total outage since it must first gather its minimum sample. Running both in one breaker pairs the fast reaction of the count with the partial coverage of the ratio, so either kind of failure opens the circuit. The ratio comes from a time-windowed counter rather than a time-decayed average, so the trip decision tracks the failure fraction and not the request rate. A tight burst and the same burst spread out reach the same verdict. The ratio dimension is protected at first start and cannot trip until enough samples sit in its window, and a threshold of zero disables it. The engine takes only primitive parameters and the threshold is a plain fraction the caller supplies, so it never sees how the policy is represented at the configuration layer. The state machine has three states. Open accepts traffic while tracking both signals. Shut rejects traffic and runs the backoff, re-reading the combined Retry-After and gRPC hint each iteration so a fresh, longer hint from a later probe raises the floor for the waits that follow. Probation admits one probe, bounded by the backoff ceiling rather than the window just waited. The probe is strict. An HTTP probe must be non-5xx and non-429, and a gRPC probe passes for any class other than RESOURCE_EXHAUSTED, since a 429 there means the endpoint is still rate limiting. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Add the per-endpoint gating layer that connects each balancer endpoint's response classification stream to its own breaker. The gate set reads the target's failure accrual policy and, for each endpoint, builds a gate whose readiness a breaker task controls and a classifier that broadcasts each classification over that breaker's channel. Isolation is the point. The stores that hold Retry-After and gRPC pushback hints, and the breaker task that reads them, are built per endpoint, so a hint observed on one endpoint extends only that endpoint's backoff and a run of failures on one never gates another. When the policy respects server hints, the classifier is wrapped so the inner broadcaster records those hints into the endpoint's stores. A policy that leaves the hint flag off never reads the stores and skips that wrapper. A policy that can never trip costs the same as none at all, so a missing or inert policy resolves to a no-op path: no breaker task and no hint stores get allocated, and the gate simply never shuts. Hints are clamped to the chosen policy's maximum backoff, the same ceiling the breaker applies, so an oversized header is never held in a store until the breaker would discard it. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Failure accrual was a closed enum of None or ConsecutiveFailures that every HTTP and gRPC protocol config held by value. That left no room for the new success-rate breaker and forced the disabled state to be its own variant. Model it as an opt-in oneof instead. Each protocol config now holds an Option, where None means accrual is off, so the disabled case lives at the config level. The enum becomes Consecutive or Unified. The consecutive policy counts failures in a row, while the unified policy adds a windowed success-rate threshold on top of a consecutive-failure ceiling. Both variants also hold a Retry-After preference. Store the success-rate threshold as basis points in a small newtype rather than a float. A float is neither Eq nor Hash, which a backend's cache identity requires, while an integer in the range zero to ten thousand is both and stays finer than any meaningful success-rate target. This keeps the protocol configs on their plain derived equality and hashing. The proto conversion validates the incoming fraction for range and NaN before quantizing. The proto conversion dispatches on the oneof kind, reads the Retry-After preference off the backoff message, and enforces ceilings on the cold-start request floor and the success-rate window. That window floor binds only under a nonzero threshold. A zero threshold disables the success-rate dimension, so any decay is accepted and the consecutive-failure ceiling stands alone. The outbound breaker wiring then maps an absent policy to a gate that never closes, a consecutive policy to the consecutive breaker, and a unified policy to the success-rate engine. The per-endpoint gate set threads each endpoint's Retry-After stores to its own breaker, so a hint on one endpoint never extends another's. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Move the backoff-wait loop, which optionally floors the wait on a server Retry-After or gRPC pushback hint, out of the unified breaker and into the retry-after module beside the hint stores. The unified breaker calls the shared helper; the consecutive-failures breaker keeps its own hint-free exponential backoff, so only the unified policy ever floors a wait on a server hint. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

The balancer load model now describes a second estimator. Alongside plain peak-EWMA, a backend may select a variant that penalizes endpoints that signal rate limiting, biasing P2C toward healthier peers. Extend the client-policy Load enum with a PenaltyPeakEwma variant and hold the strategy's fields verbatim from the control-plane policy: the RTT seed and decay window, the penalty magnitude and its own decay, and the cap on how far a Retry-After hint may extend that penalty. The policy layer only records them as part of the backend's identity. The balancer maps them onto its estimator later. Every field is a duration, so the new struct and the widened enum keep deriving Eq and Hash and remain part of a backend's cache key, matching the existing peak-EWMA strategy. The proto conversion gains the matching branch. Since each field is optional on the wire, an absent value takes a documented default while a value that is present but invalid still surfaces an error rather than being silently discarded. A small helper expresses that decoding next to the existing required-field decoder. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Pin the failure-accrual proto conversion across both branches of the discovery oneof and the rules that guard the success-rate path. On the consecutive branch the failure ceiling comes through, and the Retry-After preference is read off the backoff message both when the hint is requested and when it is left unset. A branch with no backoff is treated as missing, and an accrual with no kind at all is rejected too. On the unified branch the wire fraction lands in the basis-points threshold rather than a raw float, so the threshold can stay part of a backend's cache identity. The measurement window defaults to ten seconds when absent. A zero threshold disables the success-rate dimension, so a sub-floor window is accepted there and the consecutive-failure ceiling stands on its own. Thresholds outside the unit interval, NaN among them, are rejected at the boundary before any rounding, and a populated ejection field is ignored since the conversion reads only the discovery kind. This boundary is the single place where untrusted control-plane numbers become typed policy. A regression here would either crash a backend or admit a breaker configuration that can never trip. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Give the P2C balancer a second load estimator alongside its Tower peak-EWMA wrapper. The new one tracks load with the response-aware biaser, so an endpoint returning rate-limit signals is de-prioritized for a penalty window. NewPenaltyPeakEwmaBalance builds each endpoint through NewLoadBiaser and serves only backends whose policy opts into penalties. A shared helper gives both paths the same queue, metrics, and endpoint setup, so only the per-endpoint load tracker and pool differ. Nothing selects the penalty path yet, so the change is additive. A backend's PenaltyPeakEwma policy maps onto the biaser configuration field for field, with one exception. The policy has a separate penalty_decay, yet the biaser records a penalized response as a raised effective RTT that fades through the same RTT EWMA window. No second decay remains to drive, so penalty_decay folds into rtt_decay and drops from the mapping. Both paths also floor the seed RTT at MIN_DEFAULT_RTT, one millisecond, since the estimate scales with the seed and a near-zero seed would let P2C over-select a fresh endpoint. RTT is sampled at the first response data frame. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

The HTTP balance layer now chooses its load estimator from the backend's policy. NewBalance reads the load policy from the target and dispatches to svc::Either: the PeakEwma branch builds the existing peak-EWMA balancer, and the PenaltyPeakEwma branch builds the response-aware penalty estimator that de-prioritizes endpoints returning rate-limit signals. The default stays peak-EWMA, so a backend that does not opt in behaves exactly as before, and RTT is still sampled at the first response data frame on either path. Since the two pool service types differ, each branch boxes its response body so the two unify to a single response type. Boxing does not change when first-data load completion fires, as the underlying handle still observes the boxed body when polled and the balancer's response body is boxed downstream regardless. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Hold the backend's load policy on the balance target so the HTTP balancer can pick its endpoint-load estimator per backend. The concrete balance dispatch now holds the client-policy load oneof in place of a bare EWMA configuration, and the target gains what the selector reads: the load itself, an EWMA configuration derived from the named estimator, and a penalty configuration that defaults to penalty-free for peak-EWMA so the parameter stays total. A Load::peak_ewma_rtt() helper reads the decay and seed RTT out of either estimator for the EWMA call sites. The logical, profile, and policy routers pick the load policy that flows into the dispatch. Profiles have no penalty configuration, so that path uses a peak-EWMA default whose RTT settings match the prior balancer, and a backend opting into nothing behaves as before. Opaque and TLS routing lack HTTP response classification, so they take only the RTT settings, and when an operator did set a real penalty the router warns it ignores that estimator so the drop stays visible. The control-plane balancer is likewise RTT-only and uses peak-EWMA directly. The per-backend selector boxes each branch's response body, relaxing its bounds so the penalty branch may use a distinct, independently boxable body. That branch samples penalty load at the first response data frame and wraps the endpoint body in the biaser's completion type, which differs from the peak-EWMA wrapper. Both bodies box to one response type regardless. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Add integration tests that drive the unified circuit breaker through the dispatch the outbound stack uses to spawn a policy per endpoint, rather than poking the engine internals directly. The unit tests already pin the state machine, so these pin the wiring that turns a failure-accrual policy into a running breaker. Both trip conditions are covered: a run of consecutive failures opens the circuit at once with no cold-start guard, and a low windowed success rate opens it when no such run forms. Recovery goes through bounded probation. A clean probe reopens the circuit, while a failed or silent probe re-shuts it and advances the backoff so a still-broken endpoint stays ejected. The two probe verdicts are covered on both breaker branches, since the consecutive branch keeps judging a probe by the default classifier while the unified branch treats a rate-limit signal during probation as a failure. Server pushback is covered as an opt-in backoff floor clamped to the ceiling, and isolated per endpoint so one peer's hint cannot stretch another endpoint's recovery. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Signed-off-by: Alex Leong <alex@buoyant.io>

cratelyn · 2026-06-12T19:52:23Z

my understanding is that this can be closed, because we opted to take #4565. is that correct @unleashed?

unleashed and others added 30 commits May 21, 2026 20:25

Update linkerd/http/classify/Cargo.toml

342f7d9

Co-authored-by: katelyn martin <kate@buoyant.io>

Update linkerd/load-biaser/Cargo.toml

11ee334

Co-authored-by: katelyn martin <kate@buoyant.io>

build(ewma): disable default tokio features

6ed442b

The crate only uses tokio::time, so disable the default feature set to avoid pulling unnecessary features into the dependency declaration. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(classify): use GRPC_RETRY_PUSHBACK_MS constant in tests

b74b52a

Replace raw string literals with the module-level constant for consistency with how HTTP tests use http::header::RETRY_AFTER. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

fix(ewma): fix typo in test comment

ae6de47

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): add #[must_use] to LoadBiaser::new

3023422

Consistent with Ewma::new which already has this attribute. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

test(load-biaser): add tests for extended gRPC status classification

12adaca

Ensure only those gRPC status codes indicating server-side errors inject penalties. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

test(ewma): verify last_update across construction, changes, and reads

8fe67af

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Apply suggestions from code review

98aafb8

Co-authored-by: katelyn martin <git@katelyn.world>

fix(ewma,http-classify): address feedback

d7c03b5

- Drop unused add_rate, last_update - Correct MIN_DECAY enforcement comment - Note on ignoring negative do-not-retry pushbacks Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): measure RTT to first response data frame by de…

2589a46

…fault Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): store the failure penalty as integer milliseconds

ee9e722

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): name the default RTT and decay durations

48e1cc3

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

docs(load-biaser): name the gRPC status codes in the failure classifier

163df42

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

docs(ewma): reword stale crate comment

761a019

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

fix(http-classify): parse grpc pushback as i32

73ce450

The gRPC A6 spec defines grpc-retry-pushback-ms as an i32. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

unleashed and others added 24 commits June 8, 2026 15:24

refactor(ewma): use is_infinite instead of f64::INFINITY

841abfa

Conveys meaning without coupling type nor constant. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

test(ewma): test add_peak when providing a lower measurement

cfad958

Add a test exercising the case of a sample at or below the still undecayed peak. It should not replace the peak, but compute the value blending in the new sample. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

test(ewma): add() ignores old timestamp measurements

1d592a9

Ensure that add() discards a sample whose timestamp is at or before the stored one. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

build(Cargo): update linkerd2-proxy-api to 0.20.0

08ebd95

Version 0.20.0 adds the proto surface the load-balancing and circuit breaker work builds on: the penalty peak-EWMA load variant and the unified failure-accrual kind. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

WIP

97a7dea

Signed-off-by: Alex Leong <alex@buoyant.io>

updated

ed1fea2

Signed-off-by: Alex Leong <alex@buoyant.io>

use latest retry-after hint

35c64d7

Signed-off-by: Alex Leong <alex@buoyant.io>

adleong requested a review from a team as a code owner June 9, 2026 22:33

fmt

540b000

Signed-off-by: Alex Leong <alex@buoyant.io>

unleashed force-pushed the amr/refactored-unified-breaker branch 2 times, most recently from 8544696 to 713e33d Compare June 11, 2026 18:06

Base automatically changed from amr/refactored-unified-breaker to main June 11, 2026 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactored unified breaker with retry after#4566

refactored unified breaker with retry after#4566
adleong wants to merge 61 commits into
mainfrom
alex/refactored-unified-breaker-with-retry-after

adleong commented Jun 9, 2026

Uh oh!

cratelyn commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adleong commented Jun 9, 2026

Uh oh!

cratelyn commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants