Resolved -
Overview
As part of a configuration change required by our bank enrichment data partner, there was a defect that resulted in a failing bank enrichment check. API response included reason code `BVERR` indicating error with bank verification.
While Sardine tested this configuration change and performed staged rollout with monitoring, our testing and monitoring were insufficient and missed certain configuration combinations which resulted in this incident.
Timeline
- 2025-02-20: code defect was introduced, while problematic code path was not executed
- 2025-12-11 14:13:40: rollout of feature to first group of client started
- 2025-12-11 17:02:40: rollout of feature to second group of client started
- 2025-12-11 20:14:17: rollout of feature to third group of client started
- 2025-12-15 18:01:40:: rollout of feature to fourth group of client started
- 2025-12-16 21:09:04: Bug identified and feature flag disabled to mitigate impact
What went wrong
- Insufficient test coverage: e2e tests didn't cover the specific configuration combination
- Insufficient monitoring and rollout process: while we performed staged rollout and had standard monitors around HTTP response codes, we didn’t notice the issue since our data provider responded with 200 (OK) status code
- Logging incident: coinciding with this rollout, we had separate incident where we were dropping some part of application logs. During the rollout we confirmed the absence of error logs, but that was due to the separate incident.
Action items
- Enhance automated monitors to include more granular error reason codes
- Update the internal runbook for more robust monitoring process
Dec 17, 16:03 UTC