Skip to content

feat: Add tokio-metrics runtime metrics integration#256

Merged
rcoh merged 33 commits into
awslabs:mainfrom
yulnr:tokio-metrics-integration
Apr 13, 2026
Merged

feat: Add tokio-metrics runtime metrics integration#256
rcoh merged 33 commits into
awslabs:mainfrom
yulnr:tokio-metrics-integration

Conversation

@yulnr
Copy link
Copy Markdown
Collaborator

@yulnr yulnr commented Mar 24, 2026

Closes #248

Depends on

tokio-rs/tokio-metrics#121 - once merged and released, will update metrique-util/Cargo.toml from git dep to crates.io version. The [patch.crates-io] section in the workspace Cargo.toml should needs to be removed.

Summary

Adds a new feature tokio-metrics-bridge to metrique-util which adds a bridge for tokio-metrics.

Main Changes

  • metrique-writer-core: Refactors join in AttachHandle into a shutdown registry. Background tasks register cleanup functions via register_shutdown_fn that run (in LIFO order) before the sink detaches on drop.
  • metrique-util: New tokio-metrics-bridge feature with AttachGlobalEntrySinkTokioMetricsExt::subscribe_tokio_runtime_metrics. Spawns a reporter task that appends RuntimeMetrics snapshots at a configurable interval, automatically aborted on handle drop.

Usage

use metrique_util::{AttachGlobalEntrySinkTokioMetricsExt, TokioRuntimeMetricsConfig};

// After attaching a sink:
let _handle = ServiceMetrics::attach_to_stream(emf.output_to(std::io::stderr()));

// Subscribe: reporter runs in the background, aborted when _handle drops.
let config = TokioRuntimeMetricsConfig::default().with_interval(Duration::from_secs(30));
ServiceMetrics::subscribe_tokio_runtime_metrics(config);

See metrique/examples/tokio-metrics.rs for a better example.

How it works

The RuntimeMetrics struct in tokio-metrics derives metrique_writer::Entry (behind the metrique-integration feature flag), so each snapshot is appended directly to the sink as a structured metric record with all runtime fields (worker utilization, park counts, queue depths, steal counts, poll durations, etc.).

Open question: duplicate subscriptions

Right now nothing prevents calling subscribe_tokio_runtime_metrics twice, which would just spawn two reporter tasks appending duplicate metrics. It won't break anything, but it's almost certainly users won't intend that to happen.

I left it unguarded for now since the options each have tradeoffs and I'd rather get some input:

  • Panic on duplicate: track whether a sink type has already subscribed and panic on the second call. Needs a per-type flag.
  • Just log a warning
  • Do nothing

Happy to add a guard if there is a preferred approach, otherwise I'm leaning towards having at least a warning.

Update: this now warns of duplicate subscriptions

Testing

  • Adds some shutdown registry tests to metrique-writer-core
  • Adds tests to metrique-util for the tokio-metrics-feature

🔏 By submitting this pull request

  • I confirm that I've made a best effort attempt to update all relevant documentation.
  • I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Copy Markdown
Contributor

@jlizen jlizen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now nothing prevents calling subscribe_tokio_runtime_metrics twice, which would just spawn two reporter tasks appending duplicate metrics. It won't break anything, but it's almost certainly users won't intend that to happen.

Logging a warning seems fine to me for now

Comment thread metrique-util/src/lib.rs Outdated
Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
/// Configuration for Tokio runtime metrics bridge subscriptions.
#[derive(Debug, Clone, Copy)]
#[non_exhaustive]
pub struct TokioRuntimeMetricsConfig {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API to emit on an interval makes sense to me.

But, would it be worth having a different option, to fold the current runtime metrics into a given metrics struct? Some sort of #[metrics(with_tokio_metrics)] macro support?

Definitely not something to impl in this PR, but wondering if it merits a backlog (any opinion @rcoh ?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cleanest thing would be to have EmbeddedTokioMetrics that had a ::default() impl so you could embed it as a field (rather than needing to drop into having macro support)

Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
Comment thread metrique-writer-core/src/global.rs Outdated
Comment thread metrique-util/README.md
Comment thread metrique-util/src/tokio_metrics_reporter.rs
Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
@yulnr
Copy link
Copy Markdown
Collaborator Author

yulnr commented Mar 31, 2026

Following up on tokio-rs/tokio-metrics#121 (comment)

With tokio-metrics using #[metrics(subfield)] without rename_all, I'm looking into how to let callers pick a name style.

InflectableEntry is generic over NS: NameStyle, so to expose this to callers we have some options:

Generic method

// Default (Identity — snake_case):
ServiceMetrics::subscribe_tokio_runtime_metrics(config);

// PascalCase:
ServiceMetrics::subscribe_tokio_runtime_metrics::<PascalCase>(config);

Generic config

let config = TokioRuntimeMetricsConfig::default()
    .with_name_style::<PascalCase>();

Config becomes TokioRuntimeMetricsConfig<PascalCase>

Now If we introduce a new public enum we could also do:

Runtime enum on config

let config = TokioRuntimeMetricsConfig::default()
    .with_name_style(MetricNameStyle::PascalCase);

(dispatches via match internally)
I think much better for users, only downside is we'd have a mirror of NameStyle enum from metrique-macro, but that should be easy to keep in sync.

If we support folding via #[metrics] with automatic appending

Here @jlizen mentioned some sort of macro support for folding RuntimeMetrics into another metrics struct (I'm assuming that would automatically pull in the RuntimeMetrics data). If we go that route then we could solve this in multiple ways at the struct level rather than subscription config level, one example: #[metrics(rename_all = "PascalCase")]

What already works as this stands now: since RuntimeMetrics is a subfield, users can flatten it into their own #[metrics(rename_all = "PascalCase", with_tokio_metrics)] struct and the naming is inherited. Though this is less likely to be used that way because users would need to write the monitor loop and append to sink themselves.

Do we have any preferences here, or other approaches I'm missing?

cc @rcoh

@rcoh
Copy link
Copy Markdown
Contributor

rcoh commented Mar 31, 2026

I think I have a slight preference for the enum. No issues with mirroring it. We could eventually have it set on the sink too for runtime-cased metrics.

Overall approach seems good to me

@yulnr yulnr force-pushed the tokio-metrics-integration branch 2 times, most recently from d227e56 to dab8555 Compare April 6, 2026 11:12
@yulnr yulnr marked this pull request as ready for review April 10, 2026 14:51
Copy link
Copy Markdown
Contributor

@rcoh rcoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment thread metrique-util/src/dynamic_inflection.rs
/// Configuration for Tokio runtime metrics bridge subscriptions.
#[derive(Debug, Clone, Copy)]
#[non_exhaustive]
pub struct TokioRuntimeMetricsConfig {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cleanest thing would be to have EmbeddedTokioMetrics that had a ::default() impl so you could embed it as a field (rather than needing to drop into having macro support)

Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
) -> (tokio::task::AbortHandle, JoinHandle<()>) {
let interval = config.interval;
let name_style = config.name_style;
let worker = tokio::spawn(async move {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will eventually (but can probably punt) need to provide an overridable executor

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, would you be ok with doing this as a follow up? I can follow up with some work that does it for metrique in general, we do some hardcoded spawning in metrics-rs integration and in some of the internals as well.
I'm also curious as to how to catch this on CI, tests or even dial9 warnings, I have some ideas I'd like to try out later

Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
Comment thread metrique-util/src/tokio_metrics_reporter.rs Outdated
Comment thread metrique-writer-core/src/global.rs Outdated
Comment on lines +255 to +259
#[doc(hidden)]
pub type ShutdownFn = Box<dyn FnOnce() + Send>;

#[doc(hidden)]
pub type ShutdownRegistry = Mutex<Vec<ShutdownFn>>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are we attempting to achieve by making these doc hidden? register_shutdown_fn is not doc hidden an in practice, we would't be able to change these types. Probably worth just making newtypes for them for backwards compat and not trying to hide them in the API.

tracing::debug!("tokio runtime metrics reporter stopped");
});
let worker_abort = worker.abort_handle();
let monitor = tokio::spawn(async move {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually need two separate tasks here? (Maybe?) Is this a general pattern that all bridge-reporters are going to need?

Copy link
Copy Markdown
Collaborator Author

@yulnr yulnr Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this earlier when Jess asked about guarding against task panics and having an error. Is not truly needed contract-wise, I now simplified to use a single abort handle so we only pass one in the abort closure. Other bridges could opt not to do this, but perhaps it's preferred that we catch these and use tracing to log errors rather than having a single task panic and go into stderr?

Comment thread Cargo.toml
toml = "0.9"
walkdir = "2"

[patch.crates-io]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needed?

Copy link
Copy Markdown
Collaborator Author

@yulnr yulnr Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into this, because without it I get compilation errors:, note the there are multiple different versions of crate metrique_core in the dependency graph:

error[E0599]: no method named `close` found for struct `tokio_metrics::RuntimeMetrics` in the current scope
   --> metrique-util/src/tokio_metrics_reporter.rs:126:33
    |
126 |                 entry: snapshot.close(),
    |                                 ^^^^^
    |
note: there are multiple different versions of crate `metrique_core` in the dependency graph
   --> /Users/joulei/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/metrique-core-0.1.18/src/lib.rs:131:1
    |
131 | pub trait CloseValue {
    | ^^^^^^^^^^^^^^^^^^^^ this is the expected trait
    |
   ::: metrique-core/src/lib.rs:131:1
    |
131 | pub trait CloseValue {
    | -------------------- this is the trait that was imported
    = help: you can use `cargo tree` to explore your dependency tree
help: some of the expressions' fields have a method of the same name

cargo is pulling metrique-core from crates.io for tokio-metrics, and from path for metrique.

So at least we'll need it for local development until release. But I'm not sure yet if once this is released in crates.io, cargo will know how to unify the crates.io metrique deps from tokio-metrics, with the path deps that metrique has between crates. I'm trying to find docs on this or what others libs are doing (both serde and tokio have similar patches)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it ok makes sense. Just add a comment

@yulnr yulnr force-pushed the tokio-metrics-integration branch from 3573c8d to 22c41be Compare April 13, 2026 11:17
Comment thread metrique-writer-core/src/global.rs Outdated
Comment thread metrique-writer-core/src/global.rs Outdated
@rcoh rcoh merged commit 5110172 into awslabs:main Apr 13, 2026
10 checks passed
@fmzbl fmzbl mentioned this pull request May 4, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

metrique <-> tokio-metrics bridge

3 participants