feat: Add tokio-metrics runtime metrics integration#256
Conversation
jlizen
left a comment
There was a problem hiding this comment.
Right now nothing prevents calling subscribe_tokio_runtime_metrics twice, which would just spawn two reporter tasks appending duplicate metrics. It won't break anything, but it's almost certainly users won't intend that to happen.
Logging a warning seems fine to me for now
| /// Configuration for Tokio runtime metrics bridge subscriptions. | ||
| #[derive(Debug, Clone, Copy)] | ||
| #[non_exhaustive] | ||
| pub struct TokioRuntimeMetricsConfig { |
There was a problem hiding this comment.
But, would it be worth having a different option, to fold the current runtime metrics into a given metrics struct? Some sort of #[metrics(with_tokio_metrics)] macro support?
Definitely not something to impl in this PR, but wondering if it merits a backlog (any opinion @rcoh ?)
There was a problem hiding this comment.
I think the cleanest thing would be to have EmbeddedTokioMetrics that had a ::default() impl so you could embed it as a field (rather than needing to drop into having macro support)
c70838c to
ef8db10
Compare
|
Following up on tokio-rs/tokio-metrics#121 (comment) With tokio-metrics using
Generic method // Default (Identity — snake_case):
ServiceMetrics::subscribe_tokio_runtime_metrics(config);
// PascalCase:
ServiceMetrics::subscribe_tokio_runtime_metrics::<PascalCase>(config);Generic config let config = TokioRuntimeMetricsConfig::default()
.with_name_style::<PascalCase>();Config becomes Now If we introduce a new public enum we could also do: Runtime enum on config let config = TokioRuntimeMetricsConfig::default()
.with_name_style(MetricNameStyle::PascalCase);(dispatches via If we support folding via #[metrics] with automatic appending Here @jlizen mentioned some sort of macro support for folding RuntimeMetrics into another metrics struct (I'm assuming that would automatically pull in the RuntimeMetrics data). If we go that route then we could solve this in multiple ways at the struct level rather than subscription config level, one example: What already works as this stands now: since RuntimeMetrics is a subfield, users can flatten it into their own Do we have any preferences here, or other approaches I'm missing? cc @rcoh |
|
I think I have a slight preference for the enum. No issues with mirroring it. We could eventually have it set on the sink too for runtime-cased metrics. Overall approach seems good to me |
d227e56 to
dab8555
Compare
| /// Configuration for Tokio runtime metrics bridge subscriptions. | ||
| #[derive(Debug, Clone, Copy)] | ||
| #[non_exhaustive] | ||
| pub struct TokioRuntimeMetricsConfig { |
There was a problem hiding this comment.
I think the cleanest thing would be to have EmbeddedTokioMetrics that had a ::default() impl so you could embed it as a field (rather than needing to drop into having macro support)
| ) -> (tokio::task::AbortHandle, JoinHandle<()>) { | ||
| let interval = config.interval; | ||
| let name_style = config.name_style; | ||
| let worker = tokio::spawn(async move { |
There was a problem hiding this comment.
we will eventually (but can probably punt) need to provide an overridable executor
There was a problem hiding this comment.
Good idea, would you be ok with doing this as a follow up? I can follow up with some work that does it for metrique in general, we do some hardcoded spawning in metrics-rs integration and in some of the internals as well.
I'm also curious as to how to catch this on CI, tests or even dial9 warnings, I have some ideas I'd like to try out later
| #[doc(hidden)] | ||
| pub type ShutdownFn = Box<dyn FnOnce() + Send>; | ||
|
|
||
| #[doc(hidden)] | ||
| pub type ShutdownRegistry = Mutex<Vec<ShutdownFn>>; |
There was a problem hiding this comment.
what are we attempting to achieve by making these doc hidden? register_shutdown_fn is not doc hidden an in practice, we would't be able to change these types. Probably worth just making newtypes for them for backwards compat and not trying to hide them in the API.
| tracing::debug!("tokio runtime metrics reporter stopped"); | ||
| }); | ||
| let worker_abort = worker.abort_handle(); | ||
| let monitor = tokio::spawn(async move { |
There was a problem hiding this comment.
do we actually need two separate tasks here? (Maybe?) Is this a general pattern that all bridge-reporters are going to need?
There was a problem hiding this comment.
I added this earlier when Jess asked about guarding against task panics and having an error. Is not truly needed contract-wise, I now simplified to use a single abort handle so we only pass one in the abort closure. Other bridges could opt not to do this, but perhaps it's preferred that we catch these and use tracing to log errors rather than having a single task panic and go into stderr?
| toml = "0.9" | ||
| walkdir = "2" | ||
|
|
||
| [patch.crates-io] |
There was a problem hiding this comment.
I'm looking into this, because without it I get compilation errors:, note the there are multiple different versions of crate metrique_core in the dependency graph:
error[E0599]: no method named `close` found for struct `tokio_metrics::RuntimeMetrics` in the current scope
--> metrique-util/src/tokio_metrics_reporter.rs:126:33
|
126 | entry: snapshot.close(),
| ^^^^^
|
note: there are multiple different versions of crate `metrique_core` in the dependency graph
--> /Users/joulei/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/metrique-core-0.1.18/src/lib.rs:131:1
|
131 | pub trait CloseValue {
| ^^^^^^^^^^^^^^^^^^^^ this is the expected trait
|
::: metrique-core/src/lib.rs:131:1
|
131 | pub trait CloseValue {
| -------------------- this is the trait that was imported
= help: you can use `cargo tree` to explore your dependency tree
help: some of the expressions' fields have a method of the same namecargo is pulling metrique-core from crates.io for tokio-metrics, and from path for metrique.
So at least we'll need it for local development until release. But I'm not sure yet if once this is released in crates.io, cargo will know how to unify the crates.io metrique deps from tokio-metrics, with the path deps that metrique has between crates. I'm trying to find docs on this or what others libs are doing (both serde and tokio have similar patches)
There was a problem hiding this comment.
Got it ok makes sense. Just add a comment
Adds a tokio-metrics-bridge feature to metrique-util that periodically appends tokio RuntimeMetrics snapshots to a global entry sink. This is powered by a new shutdown registry on AttachHandle. Background tasks register cleanup functions via register_shutdown_fn, which run in LIFO order before the sink detaches. Under tokio_unstable, poll_time_histogram bucket counts are paired with their ranges from the runtime handle and emitted as a separate distribution metric entry.
3573c8d to
22c41be
Compare
Closes #248
Depends on
tokio-rs/tokio-metrics#121 - once merged and released, will update
metrique-util/Cargo.tomlfrom git dep to crates.io version. The[patch.crates-io]section in the workspaceCargo.tomlshould needs to be removed.Summary
Adds a new feature
tokio-metrics-bridgetometrique-utilwhich adds a bridge for tokio-metrics.Main Changes
joininAttachHandleinto a shutdown registry. Background tasks register cleanup functions viaregister_shutdown_fnthat run (in LIFO order) before the sink detaches on drop.tokio-metrics-bridgefeature withAttachGlobalEntrySinkTokioMetricsExt::subscribe_tokio_runtime_metrics. Spawns a reporter task that appendsRuntimeMetricssnapshots at a configurable interval, automatically aborted on handle drop.Usage
See
metrique/examples/tokio-metrics.rsfor a better example.How it works
The
RuntimeMetricsstruct in tokio-metrics derivesmetrique_writer::Entry(behind themetrique-integrationfeature flag), so each snapshot is appended directly to the sink as a structured metric record with all runtime fields (worker utilization, park counts, queue depths, steal counts, poll durations, etc.).Open question: duplicate subscriptions
Right now nothing prevents calling
subscribe_tokio_runtime_metricstwice, which would just spawn two reporter tasks appending duplicate metrics. It won't break anything, but it's almost certainly users won't intend that to happen.I left it unguarded for now since the options each have tradeoffs and I'd rather get some input:
Happy to add a guard if there is a preferred approach, otherwise I'm leaning towards having at least a warning.
Update: this now warns of duplicate subscriptions
Testing
🔏 By submitting this pull request