-
Notifications
You must be signed in to change notification settings - Fork 81
Add samply-markers crate #710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
What a gift! Thank you! (But I haven't looked beyond the commit message.) |
Please give it a try. I'd appreciate your feedback. It's slightly different than what you implemented, and I've removed the dependency on But I'm hoping the documentation generated by |
1a6295a to
2ca0131
Compare
1ba0c8d to
3337731
Compare
This commit adds a new workspace for the `samply-markers` crate starting at version `0.0.0`. Since this is my first PR to this project, introducing a new component, I will bump the version to 0.1.0 once it is fully reviewed, accepted, and ready.
Source: feldera/feldera@1b448b7 Original author: Ben Pfaff <blp@feldera.com> File renamed from crates/adapters/src/samply.rs to ./samply-markers/src/unix.rs
This commit runs cargo-fmt on the ported commit from feldera/feldera@1b448b7 to ensure that the code style is up to date with samply's repository.
This commit adds all of the required dependencies in order to get the ported code from feldera/feldera@1b448b7 to build. Some of these dependencies may be removed as I update the API to my vision for the project.
This commit removes all of the code from unix.rs that is not specific to actually writing to the `mmap` files on unix systems. This preserves the original author's attributions.
This commit adds the `SamplyTimestamp` implementation with a default provider that always returns zero.
This commit adds the `SamplyMarker` implementation with a default provider that intentionally does not write the markers to a file.
This commit updates the `unix.rs` implementation to be compatible with the uniform internal provider APIs that markers require in order to function.
This commit adds unimplemented stubs for a Windows provider that would conform to the uniform internal provider APIs.
This commit adds a `SamplyTimer` abstraction which is a scoped RAII object that records the start time it was created and emits an interval marker when it goes out of scope.
This commit adds a declarative macro wrapper around the `SamplyMarker` implementation with the intent to be the primary way of emitting markers.
This commit adds a declarative macro wrapper around the `SamplyTimer` implementation with the intent to be the primary way of constructing timer objects to emit intervals.
This commit adds a declarative macro wrapper that will automatically measure the runtime of a provided sync or async code block by emitting an intterval marker when the code block has finished running.
This commit adds a `prelude` module to the `samply-markers` crate so that users have an easy way to glob import everything that they need to instrument their code with markers.
This commit adds a top-level doc comment to the `samply-markers` crate.
This commit updates the way Samply adds markers to use instant marker timing, instead of a span marker timing, if the start time and end time of the span are exactly the same. Admittedly, I don't like this change, and I would like to revert it once a different marker file format is potentially supported. I think that a zero-length interval should be possible if that is actually what was measured. But as of right now, I have no way to emit an actual instant marker given the current file-format restrictions.
This commit adds integreation tests. The tests for Unix systems are fully fleshed out, ensuring that markers are correctly written to the expected files. The for Windows systems currently expect to panic.
This commit adds a project-level README file to the samply-markers crate.
This commit ensures that .truncate(true) is set when creating marker files to clarify that existing files should be overwritten. This is unlikely to happen with the temp directory file format containing the PID and TID, but if it were to happen, it's better to be safe than sorry. This also fixes the clippy::suspicious_open_options warning.
This commit ensures that the `unix.rs` file correctly handles emitting markers on macOS. I hadn't realized in previous commits that the functionality was only working on Linux. It should work on both systems now.
This commit adds a safety comment to every unsafe block in the `unix.rs` file, explaining why I blieve the call is justified.
This commit adds an examples directory with instructions on how to build, run, and profile the examples shown in the main README file.
This commit improves the performance of samply-markers
when enabled, based on benchmarks of the Fibonacci example
increased to n = 30 and measured with hyperfine.
The Fibonacci example at n = 30 makes 2,692,536 recursive
calls, emitting the same number of markers. This provides
a great stress test for marker emission overhead.
The configurations are as follows:
- disabled: samply-markers disabled
- enabled-baseline: samply-markers enabled before any performance improvements
- buffered-writes: adds the buffered writes improvement
- itoa-buffers: adds the itoa improvement on top of buffered writes
Linux results:
Benchmark 1: ./fib30-disabled
Time (mean ± σ): 128.2 ms ± 0.6 ms [User: 127.2 ms, System: 0.9 ms]
Range (min … max): 127.1 ms … 129.6 ms 23 runs
Benchmark 2: ./fib30-enabled-baseline
Time (mean ± σ): 3.144 s ± 0.009 s [User: 0.924 s, System: 2.220 s]
Range (min … max): 3.134 s … 3.161 s 10 runs
Benchmark 3: ./fib30-enabled-buffered-writes
Time (mean ± σ): 522.9 ms ± 1.1 ms [User: 478.9 ms, System: 43.7 ms]
Range (min … max): 520.2 ms … 524.1 ms 10 runs
Benchmark 4: ./fib30-enabled-itoa-buffers
Time (mean ± σ): 423.1 ms ± 1.8 ms [User: 377.8 ms, System: 45.1 ms]
Range (min … max): 419.6 ms … 426.3 ms 10 runs
Summary
./fib30-disabled ran
3.30 ± 0.02 times faster than ./fib30-enabled-itoa-buffers
4.08 ± 0.02 times faster than ./fib30-enabled-buffered-writes
24.53 ± 0.14 times faster than ./fib30-enabled-baseline
macOS results:
Benchmark 1: ./fib30-disabled
Time (mean ± σ): 90.7 ms ± 0.4 ms [User: 89.4 ms, System: 0.9 ms]
Range (min … max): 90.0 ms … 91.6 ms 31 runs
Benchmark 2: ./fib30-enabled-baseline
Time (mean ± σ): 4.384 s ± 0.053 s [User: 0.578 s, System: 3.783 s]
Range (min … max): 4.295 s … 4.495 s 10 runs
Benchmark 3: ./fib30-enabled-buffered-writes
Time (mean ± σ): 257.4 ms ± 1.8 ms [User: 237.6 ms, System: 16.9 ms]
Range (min … max): 255.1 ms … 261.8 ms 11 runs
Benchmark 4: ./fib30-enabled-itoa-buffers
Time (mean ± σ): 187.6 ms ± 1.1 ms [User: 169.9 ms, System: 16.1 ms]
Range (min … max): 186.3 ms … 190.1 ms 15 runs
Summary
./fib30-disabled ran
2.07 ± 0.01 times faster than ./fib30-enabled-itoa-buffers
2.84 ± 0.02 times faster than ./fib30-enabled-buffered-writes
48.32 ± 0.62 times faster than ./fib30-enabled-baseline
Profiles:
Ubuntu:
* fib30-disabled: https://share.firefox.dev/48id64U
* fib30-enabled-baseline: https://share.firefox.dev/47K8Taf
* fib30-enabled-buffered-writes: https://share.firefox.dev/4oK5Uog
* fib30-enabled-itoa-buffers: https://share.firefox.dev/47ZtGVQ
macOS:
* fib30-disabled: https://share.firefox.dev/3LMmaX2
* fib30-enabled-baseline: https://share.firefox.dev/49TPODF
* fib30-enabled-buffered-writes: https://share.firefox.dev/4oFo1LP
* fib30-enabled-itoa-buffers: https://share.firefox.dev/3K6rLqG
Overall:
- Using buffered writes dramatically reduces overhead.
- Using itoa provides an additional ~19% improvement on Linux and ~27% on macOS.
3337731 to
8a5e032
Compare
Overview
This is a minimal initial implementation of #349 based off of @blp's commit feldera/feldera@1b448b7.
I've pulled the commit directly into my pull request, so as to preserve the original attributions, and built the crate around the Unix mmap capabilities. Windows is stubbed out, but not yet implemented.
Examples
The README and generated rust docs are full of examples to look at.
Crate Name Reservation
I've also reserved the crate name in good faith that I'd like to release this eventually.
Though, if for some reason this falls through, I'd be happy to relinquish ownership of the crate. I'm not a fan of name squatting without substance or genuine intent.
Current Status
That being said, I'm not ready to release the crate as
0.1.0yet. This is just the first pull request.This implementation does the bare minimum that is (mostly) compatible with the current
samplyimplementation (I did have to make one small change that I would prefer to revert in time).What's implemented:
Future Work
The following are all considerations that I would like to handle before a
0.1.0crate release, but aren't necessarily critical functionality.Windows Support
Windows is currently stubbed out with
unimplemented!(). The implementation will need platform-specific equivalents for the Unix mmap-based approach.Note
Of all the topics, I'm most okay with releasing
0.1.0without this.I don't have a Windows dev machine to implement or test this on.
Runtime Detection
I would like to have runtime detection for emitting markers if possible. Perhaps samply itself could set
SAMPLY_MARKERS=1for the process to run, or if the user already hasSAMPLY_MARKERS=0in the current session, then Samply could preserve that.That would be a lot more ergonomic to use, since markers would either emit if samply is recording, or not if samply is not, and likely not cost much performance compared to the currently implemented compile-time disablement.
Extended Marker Data
I would like to allow
samply-markersto output more optional data that can take advantage of runtime marker schemas, but this will require more changes to Samply itself.Ideally, I'd like to redo the format and parsing of the marker files (which would be a breaking change).
I could do this in a backward-compatible, or perhaps I could make my own format that outputs to
samply-marker-{PID}-{TID}.txt.I'm not sure how important it is that the current format remains intact.
Known Issues
The following are all considerations that will need to be handled before a
0.1.0crate release. But I don't think they should block merging this PR, creating these as sub-issues, and getting this crate hooked up to CI for testing.Buffer Flushing
Currently, the marker-file buffers use thread-local storage (TLS) and ensure flushing on drop. This approach doesn't work for long-lived threads, such as those in Rayon's global thread pool. As a result, buffered marker data may never be written out for long-lived threads.
I really don't want to have to mutex-guard my writes to the buffers, but at this point, I can't think of a safe solution that isn't a centralized flush mechanism on the main thread.
Buffer Size
The current implementation has a 64 KiB buffer per thread to write markers, because it performed well in the single not-very-principled hyperfine benchmark that I ran. But 64 KiB feels like a lot. I'll definitely need to do more research and benchmarking into what is an ideal buffer size to have for each thread, both in terms of size and speed.
Ctrl+C Handling
This issue is related to buffer flushing. Whatever solution is developed for the flushing problem will likely be used here as well, but explicit handling of
Ctrl+Cwill still be needed to ensure that buffers are properly flushed before program termination.Attaching to a Process with Markers
Markers are emitted from the process that is being recorded, and this can make things look akward in the profiler UI when samply is attached to an already-running process. It means that the marker start time for some intervals could be earlier than the profiling start time.
I think the cleanest solution here would be to filter markers as they are added to the JSON, to ensure that any interval markers' start times are equal to or after the profiling start time.
This would require Samply to know the start time of the PID it just attached to. There may be system calls to find this information (I haven't looked too deeply yet), or if we revise the marker file format, we could ensure that every marker file includes a global first time-since-epoch-style timestamp that is guaranteed to be earlier than the first marker. This would allow Samply to reason about the marker timestamp data more easily.
@mstange, perhaps we could collaborate on this in a new issue once this stack is reviewed and (hopefully) landed.