Benchmark parameters I by arximboldi · Pull Request #56 · libnonius/nonius

arximboldi · 2016-08-03T09:45:12Z

Hi!

I managed to get started with the feature. There is work to be done, but I open the pull request in case you want to do an early review so I can change my approach if needed. Note that this branch depends on the changes in #54

Here's the progress:

Benchmark writers can declare one or more parameters of any streamable type.
Benchmark parameters can be set to a value from the command line.
The benchmark runner can be told from the command line try a whole range (linear or exponential) of range values for one (and only one) parameter.
Improve the handling of parameter types.
Testing (I know, one should do that first, but the areas of the code I needed to touch did not have any tests yet so I just hacked my way through.)
Update the HTML reporter to show a graph with the different means and variances for the test run with different parameters.

Nice to have that I might not do:

Update documentation mentioning this feature.
Do something special for other reporters when trying a parameter range.
Improve parameter parsing on the command line so one can escape the separator : in string parameters.
Add runs through ranges of values over multiple parameters, using support for 3D and/or contour plots from Plotly.
Explicitly defining benchmark suites (capturing parameters at suite level for brevity)

RFC @rmartinho

rmartinho · 2016-08-03T09:45:19Z

The build server cannot build this PR because the author is not whitelisted. A meatbag will have to look at it.

This is an automated message.

rmartinho · 2016-08-04T16:56:37Z

Thanks for putting this up so I can review it. I hope I can get some time on sunday. If you want you can rebase on devel now that I merged the other PR.

The user may declare parameters by doing, at the top level: ```cpp NONIUS_PARAM(name, <default_value>); ``` This declares a parameter. Parameters can be then passed to the `main` program by passing arguments of the form: ``` -p <name>:<value> ``` The parameters have to be accessed from the chronometer. Since parameters can have any type and are string-addressed, they can only be finally parsed or unboxed and looked up in the benchmark. To stop users from mistakenly adding that cost to their tests, they are forced to extract the parameters before, from the chronometer. Of course they can cheat by capturing the chronometer, but the documentation should guide users in the right direction. A good example: ``` NONIUS_BENCHMARK("push_back vector", [](nonius::chronometer meter) { auto n = meter.param("size"); auto storage = std::vector<std::vector<char>>(meter.runs()); meter.measure([&](int i) { auto& x = storage[i]; for (auto j = 0; j < n; ++j) x.push_back(static_cast<char>(j)); }); }) ```

The parameter name is now required to be a string already. There is no fundamental reason why parameter names should be limited to identifier names rules. The fact that `NONIUS_DETAIL_UNIQUE_NAME` already exists is convenient here. This changes syntax for the user from: ```cpp NONIUS_PARAM(name, 42); ``` To: ```cpp NONIUS_PARAM("name", 42) ``` Note the semicolon is not needed anymore either.

The syntax is: ``` -p <name>:<oper>:<init>:<delta>:<steps> ``` This tells that the benchmark should be run with `<steps>` different values for parameter `<name>`. Each value is computed as: ``` next = last <oper> deta ``` Where `<oper>` may be `+` or `*` as for testing with linear or exponential increments (the later is useful when checking logarithmic algorithms or subtle non-linearities in the "constant factors" of an implementation).

The old code was convoluted, with too many assertions and implicit assumptions. Instead of generating all the samples that we want to try in the argument parsing, we delay this logic to a more appropriate place later, without loss of information. The only drawback of the new configuration representation is that it may be less flexible, as the previous one allowed to easily add passing explicitly all the various parmeters to try. We can, though, easily add this later by adding more options to the configuration.

The functions `go_benchmark` and `go_param` where extracted, since the function had grown too big to my taste after the parameter running logic had been introduced. Also, the main loop has been simplified to avoid the need to `continue`, which is more structured IMHO.

We always store parameters generically as `std::string`. The user implicitly suggests a concrete value type with the type of the default value of the parameter. We use that type when we need to do arithmetic on the parameter and also when checking that it parses correctly when passed from the command line. We do this by casting the strings back to the original `T` using the `param_spec` interface. This might seem overkill. Another approach could be to type erase the concrete values once intially parsed from the command line, using a `boost::any`-like type erasure class. However that approach is less flexible, because it requires the user to provide the specific concrete type when calling `chronometer::param<T>()` and just a mistake in signedness or whatever could be fatal. Just parsing everything into and back from strings is more forgiving on numeric types, which is the most common case. This commit adds a new `example7` that where using ranged parameter would not work correctly previously.

In the last commit, we broke parameters of types that do not support `+` and `*` operators. Some of these parameters might be useful, for example `std::string`. Now, parameters may or may not support these operations. When they do not support, the benchmarks will fail when we try to run the parameter through a range with those operations. Funny, we can do: ``` ./bin/examples/example6 -p string:+:hola-mundo:-yeah:10 ``` But we can't: ``` ./bin/examples/example6 -p string:*:hola-mundo:-oh,no!:10 ```

This commit refactors a lot of the code added in this branch. The original goal was to reverse how `go()` runs through various parameter values. Before it did like: for each benchmark for each parameter run-benchmark() Now it is more like for each parameter for each benchmark run-benchmark() This new structure should make writing reporters easier, since it better fits how the data could be presented to the user in meaningful ways -- we'll see more of this in later commits. A couple of more refactorings has sneaked in this commit: - `param_map` and `param_registry` etc. are not in `detail::` anymore. This is to better fit other parts of the library (e.g. benchmark, reporter) that are not in `detail` and arguably at the same level of abstraction from the user API POV. - A couple of functions now have a more "data driven" approach, trying to reuse previously computed data more often, as to improve the overall performance.

The old API for benchmarks that have fixtures is like this: ```cpp NONIUS_BENCHMARK([] (nonius::chronometer meter){ // setup code m.measure([&] { // benchmark code }) }); ``` That API is nice, but since it has to assume that the benchmark code can always mutate the setup, the setup can not be reused across samples. This is ok in general, but for benchmarks that have expensive setups relative to their measured time, time might be wasted uneedesly. Our alternative syntax is like: ```cpp NONIUS_BENCHMARK([] (nonius::parameters meter){ // immutable setup code return [=] { // benchmark code } }); ``` Or even further: ```cpp NONIUS_BENCHMARK([] (nonius::parameters params){ // immutable setup code return [=] (nonius::chronometer meter) { // mutable setup code meter.measure([&] { // benchmark code }) } }); ``` The idea is that when the benchmark function can take a `parameters` argument, it is assumed to be a factory that then returns the actual benchmark function. This factory function is called *only once* for a given set of parameters (in `prepare(...)`) and the resulting benchmark function is kept in the `execution_plan` and reused for collecting all the samples. Furthermore, this syntax may be more intuitive for people that need parameters, but do not need a chronometer.

When trying to figure out how long to run a test to get a meaningful value, we may fall into an infinite loop when the whole benchmark has been optimized away. We now try to detect this case by hard-limiting the number of iterations we do in the loop. This partly solves libnonius#52. A complete solution involves properly deoptimizing the return value of the benchmark function.

The exception message would get mixed together with other lines of messages creating confusion.

Now all the logic of deciding which parameters to execute is contained in `generate_params`. We removed the "feature" that `parameters` looks up default values. This way the code we remove dependencies on global state, and make the feature more testable. `generate_params` now merges the default values with the fixed values with the values generated for a parameter run.

The command line parser would only look at the first instance of an argument. This would prevent us from passing multiple parameters to the benchmark runner. Examples that did not work but now do work: ``` # Would only set string=foo, discarding size example6 -p string:foo -p size:42 ``` ``` # Would only run size:42,52,..., discarding string example6 -p size:+:42:10:10 -p string:foo ```

arximboldi · 2016-08-08T14:41:20Z

Thanks! No problem with the timing, take your time. In the meantime I will be using the branch in the project that motivated the change, which might help polishing it.

rmartinho · 2016-08-09T09:05:10Z

+NONIUS_PARAM("x", 1.0)
+
+template <typename Fn>
+struct volatilize_fn {


I suppose this is here because of #52, right? If it works, I'd say it should be part of what nonius does when running the code and not in the examples. Basically, I think that any such technique that can be applied generically should be built-in.

I have been researching a bit on optimizer control, and I put up some code for that (#57). If you could have a look at it, that'd be great.

rmartinho · 2016-08-09T11:46:04Z

It might be important for nonius to know which benchmarks parameters are used by each benchmark. As is, all benchmarks are repeated for all parameter values, even if they don't depend on them. This is fine if you group sets of benchmarks each in its own runner but not so nice if you put different ones together. I'm not sure how important such a use case would be, but I see having to split benchmarks across multiple runners as an inconvenience given that the runners have filtering capabilities.

I'm not sure how to have that work, but maybe some kind of benchmark grouping is needed. Just bringing it up for discussion.

arximboldi · 2016-08-09T12:21:50Z

@rmartinho with regard to grouping. I also thought of that. The best solution I thought of so far is to add a notion of "suite". Parameters can then be scoped to a suite. Furthermore, this syntax might make it easier to use parameter objects as proxy to query the parameters without runtime cost nor global state. Here is an example of how benchmarks could look like:

NONIUS_SUITE("push_back", [] {
    NONIUS_PARAM(N, std::size_t{10});
    NONIUS_BENCHMARK("vector", [=] {
        for (auto i = 0u; i < N; ++i) 
              ...
    })
    NONIUS_BENCHMARK("string", [=] {
        for (auto i = 0u; i < N; ++i) 
             ...
    })
})

I was hesitant to implement that in this branch because it requires even deeper changes to the runner and registration system, possibly breaking compatibility. If you like it I can try implementing it to see how it feels...

ThePhD · 2016-08-09T13:07:22Z

If I could throw my two cents in here...

Y E S .

Grouping is a BIG thing for me. I was going to implement it myself when I had time, because I was curious to see how well many of the library I benched performed when you scaled up a certain parameter, and when you compared it across libraries. Replacing the x-axis value of "sample #" for the default graph with instead a kind of "grouping A", "grouping B", and (for the HTML graph bits) doing a scatter-bar there in to represent all the samples of that grouping would be immensely helpful.

(I ended up using matplotlib with python to create a fancier graph to do what I want: this is what it ended up looking like: http://sol2.readthedocs.io/en/latest/benchmarks.html)

rmartinho · 2016-08-09T13:57:03Z

Let's try that on a separate branch (though you can branch off of this one if you want; this will definitely be merged). See issue #59.

arximboldi · 2016-08-09T14:44:51Z

Cool. I just pushed a little fix on this branch for an old bug. I will move further experimenting with syntax to a new branch.

rmartinho · 2016-08-12T10:20:26Z

@@ -0,0 +1,176 @@
+// Nonius - C++ benchmarking tool
+//
+// Written in 2014 by Martinho Fernandes <martinho.fernandes@gmail.com>


Ha, I see you copied these from some existing stale ones. Since this is PD this notice is somewhat irrelevant, but if you want to claim credit for this file, you're welcome to do so. Otherwise I'll probably do a sweeping change in the repo to something in the spirit of "by the nonius contributors nonius@rmf.io"

Hehe, yup!

I think long term it's easiest to put something like "by the nonius contributors nonius@rmf.io, see AUTHORS file." Then the AUTHORS file may list maintainers, contributors, etc.

arximboldi force-pushed the parameters branch from 91490c2 to 0729a7f Compare August 3, 2016 18:30

arximboldi force-pushed the parameters branch from 0e218a5 to b88d81e Compare August 4, 2016 17:57

arximboldi added 8 commits August 5, 2016 11:39

Define basic API for declaring and using parameters

376c5a2

Run through a parameter range when the option is enabled

77a4a26

arximboldi force-pushed the parameters branch 5 times, most recently from 222ae88 to 6ce1fa3 Compare August 6, 2016 12:07

arximboldi added 11 commits August 6, 2016 14:08

Rename param_map to parameters

ccaaaba

Improve output of standard_reporter on error

cb0f2a4

The exception message would get mixed together with other lines of messages creating confusion.

Test changes in detail::benchmark_function

be91a3e

Test new parameters API

fa1f10f

Test parameter run generation

cacc15b

arximboldi force-pushed the parameters branch from 6ce1fa3 to a3cb512 Compare August 6, 2016 12:11

arximboldi changed the title ~~[WIP] Benchmark parameters~~ Benchmark parameters Aug 6, 2016

arximboldi added 2 commits August 8, 2016 18:50

Fix missing override warning

c5b48e6

Fix large numbers would be truncated incorrectly in HTML reporter

ebb807a

rmartinho reviewed Aug 9, 2016
View reviewed changes

rmartinho added this to the Parameterized benchmarking milestone Aug 9, 2016

rmartinho self-assigned this Aug 9, 2016

rmartinho modified the milestones: Boostless, Parameterized benchmarking, RELEASE: 1.2 Aug 9, 2016

arximboldi mentioned this pull request Aug 10, 2016

Benchmark parameters II (API improvements) #63

Merged

arximboldi changed the title ~~Benchmark parameters~~ Benchmark parameters I Aug 10, 2016

arximboldi force-pushed the parameters branch from 4023d3e to ebb807a Compare August 11, 2016 08:21

rmartinho mentioned this pull request Aug 11, 2016

Document parameters #64

Open

rmartinho reviewed Aug 12, 2016
View reviewed changes

rmartinho merged commit ebb807a into libnonius:devel Sep 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark parameters I#56

Benchmark parameters I#56
rmartinho merged 22 commits into
libnonius:develfrom
arximboldi:parameters

arximboldi commented Aug 3, 2016 •

edited

Loading

Uh oh!

rmartinho commented Aug 3, 2016

Uh oh!

rmartinho commented Aug 4, 2016

Uh oh!

arximboldi commented Aug 8, 2016

Uh oh!

rmartinho Aug 9, 2016 •

edited

Loading

Uh oh!

rmartinho commented Aug 9, 2016 •

edited

Loading

Uh oh!

arximboldi commented Aug 9, 2016 •

edited

Loading

Uh oh!

ThePhD commented Aug 9, 2016

Uh oh!

rmartinho commented Aug 9, 2016 •

edited

Loading

Uh oh!

arximboldi commented Aug 9, 2016

Uh oh!

rmartinho Aug 12, 2016 •

edited

Loading

Uh oh!

arximboldi Aug 14, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

arximboldi commented Aug 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmartinho commented Aug 3, 2016

Uh oh!

rmartinho commented Aug 4, 2016

Uh oh!

arximboldi commented Aug 8, 2016

Uh oh!

rmartinho Aug 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmartinho commented Aug 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arximboldi commented Aug 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThePhD commented Aug 9, 2016

Uh oh!

rmartinho commented Aug 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arximboldi commented Aug 9, 2016

Uh oh!

rmartinho Aug 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arximboldi Aug 14, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arximboldi commented Aug 3, 2016 •

edited

Loading

rmartinho Aug 9, 2016 •

edited

Loading

rmartinho commented Aug 9, 2016 •

edited

Loading

arximboldi commented Aug 9, 2016 •

edited

Loading

rmartinho commented Aug 9, 2016 •

edited

Loading

rmartinho Aug 12, 2016 •

edited

Loading