Tag stats for origin backend storage downloads by namespace#563
Conversation
There was a problem hiding this comment.
Pull request overview
Adds namespace tagging to remote-backend blob download metrics so origin storage usage/latency can be analyzed per namespace.
Changes:
- Tag
download_remote_blobtimer metric withnamespace. - Tag
downloadscounter metric withnamespace.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| r.stats.Tagged(map[string]string{"namespace": namespace}). | ||
| Counter("downloads").Inc(1) |
There was a problem hiding this comment.
Tagging these metrics with the raw namespace value can create very high-cardinality time series (e.g., default configs include a catch-all backend rule namespace: .*, and origin endpoints accept namespace from request params). This can significantly increase metrics load/cost and may destabilize the metrics backend; consider tagging by a bounded dimension instead (such as matched backend rule / backend type, or a normalized namespace prefix) or explicitly documenting/enforcing an allowlist of namespaces to tag.
| r.stats.Tagged(map[string]string{"namespace": namespace}). | ||
| Timer("download_remote_blob").Record(t) |
There was a problem hiding this comment.
r.stats.Tagged(map[string]string{"namespace": namespace}) is called repeatedly (also again for the downloads counter below), which creates a new scope and allocates a new map each time. Consider creating a single namespaceStats := r.stats.Tagged(...) once in the closure and reusing it for both the timer and counter to reduce allocations and keep tagging consistent in one place.
One of the most useful metrics to track our usage patterns is the
namespace- we even talk to different storage backends based on it (GCS, Terrablob, UPT, etc.). Therefore, I'm tagging the latency and number of downloads from remote storage backends by namespace, so we can get more insight into how Kraken is used.