Skip to content

metric for difference between endpoints that has consisted for t seconds #37

@ties

Description

@ties

As a user, I want a metric for the difference between various sources that has consisted for t time so that I can monitor that my various sources (rtr, json, ...) converge.

Situation

  • Two different RPs
  • Refresh at different times

Because the publication of VRPs is continuous, the RPs will have a slightly different view of what VRPs exist. If you want to monitor that they converge, you could alert on: "the difference is continuously non-zero for 30 minutes" (and assume it drops to zero at some point in time).

In practice, this causes false positives if updates are frequent enough. Another way to go is to check what objects in A are not in B, and were seen in A at least visibility_seconds ago. That way you can have

This is similar to what I added to rtrmon, where there is a vrp_diff for objects that were seen in the source for the first time visibility_seconds ago.

Maybe a real set of metrics is clearer:

# HELP rpki_vrps Total number of VRPS/amount of differents.
# TYPE rpki_vrps gauge
rpki_vrps{server="primary",type="diff",url="http://routinator-1:9556/json"} 1110
rpki_vrps{server="primary",type="total",url="http://routinator-1:9556/json"} 143981
rpki_vrps{server="secondary",type="diff",url="https://ca-software/api/monitoring/roa-prefixes"} 1
rpki_vrps{server="secondary",type="total",url="https://ca-software/api/monitoring/roa-prefixes"} 142872
# HELP rtr_serial Serial of the RTR session.
# TYPE rtr_serial gauge
rtr_serial{server="primary",url="http://routinator-1:9556/json"} 0
rtr_serial{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 0
# HELP rtr_session ID of the RTR session.
# TYPE rtr_session gauge
rtr_session{server="primary",url="http://routinator-1:9556/json"} 0
rtr_session{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 0
# HELP update Timestamp of last update.
# TYPE update gauge
update{server="primary",url="http://routinator-1:9556/json"} 1.637752522e+09
update{server="secondary",url="https://ca-software/api/monitoring/roa-prefixes"} 1.63775261e+09
# HELP vrp_diff Number of VRPS in [lhs_url] that are not in [rhs_url] that were first seen [visibility_seconds] ago in lhs.
# TYPE vrp_diff gauge
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="0"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="1024"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="1706"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="256"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="3411"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="56"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="596"} 1110
vrp_diff{lhs_url="http://routinator-1:9556/json",rhs_url="https://ca-software/api/monitoring/roa-prefixes",visibility_seconds="851"} 1110
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="0"} 1
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="1024"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="1706"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="256"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="3411"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="56"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="596"} 0
vrp_diff{lhs_url="https://ca-software/api/monitoring/roa-prefixes",rhs_url="http://routinator-1:9556/json",visibility_seconds="851"} 0

In the diagram you see that the instantaneous difference grows, but the long-term difference never grows:
Screenshot 2021-11-24 at 12 23 15

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions