Benchmark parameters I#56
Conversation
|
The build server cannot build this PR because the author is not whitelisted. A meatbag will have to look at it. This is an automated message. |
|
Thanks for putting this up so I can review it. I hope I can get some time on sunday. If you want you can rebase on devel now that I merged the other PR. |
The user may declare parameters by doing, at the top level:
```cpp
NONIUS_PARAM(name, <default_value>);
```
This declares a parameter. Parameters can be then passed to the `main`
program by passing arguments of the form:
```
-p <name>:<value>
```
The parameters have to be accessed from the chronometer. Since
parameters can have any type and are string-addressed, they can only be
finally parsed or unboxed and looked up in the benchmark. To stop users
from mistakenly adding that cost to their tests, they are forced to
extract the parameters before, from the chronometer. Of course they can
cheat by capturing the chronometer, but the documentation should guide
users in the right direction. A good example:
```
NONIUS_BENCHMARK("push_back vector", [](nonius::chronometer meter)
{
auto n = meter.param("size");
auto storage = std::vector<std::vector<char>>(meter.runs());
meter.measure([&](int i) {
auto& x = storage[i];
for (auto j = 0; j < n; ++j)
x.push_back(static_cast<char>(j));
});
})
```
The parameter name is now required to be a string already. There is no
fundamental reason why parameter names should be limited to identifier
names rules. The fact that `NONIUS_DETAIL_UNIQUE_NAME` already exists
is convenient here.
This changes syntax for the user from:
```cpp
NONIUS_PARAM(name, 42);
```
To:
```cpp
NONIUS_PARAM("name", 42)
```
Note the semicolon is not needed anymore either.
The syntax is:
```
-p <name>:<oper>:<init>:<delta>:<steps>
```
This tells that the benchmark should be run with `<steps>` different
values for parameter `<name>`. Each value is computed as:
```
next = last <oper> deta
```
Where `<oper>` may be `+` or `*` as for testing with linear or
exponential increments (the later is useful when checking logarithmic
algorithms or subtle non-linearities in the "constant factors" of an
implementation).
The old code was convoluted, with too many assertions and implicit assumptions. Instead of generating all the samples that we want to try in the argument parsing, we delay this logic to a more appropriate place later, without loss of information. The only drawback of the new configuration representation is that it may be less flexible, as the previous one allowed to easily add passing explicitly all the various parmeters to try. We can, though, easily add this later by adding more options to the configuration.
The functions `go_benchmark` and `go_param` where extracted, since the function had grown too big to my taste after the parameter running logic had been introduced. Also, the main loop has been simplified to avoid the need to `continue`, which is more structured IMHO.
We always store parameters generically as `std::string`. The user implicitly suggests a concrete value type with the type of the default value of the parameter. We use that type when we need to do arithmetic on the parameter and also when checking that it parses correctly when passed from the command line. We do this by casting the strings back to the original `T` using the `param_spec` interface. This might seem overkill. Another approach could be to type erase the concrete values once intially parsed from the command line, using a `boost::any`-like type erasure class. However that approach is less flexible, because it requires the user to provide the specific concrete type when calling `chronometer::param<T>()` and just a mistake in signedness or whatever could be fatal. Just parsing everything into and back from strings is more forgiving on numeric types, which is the most common case. This commit adds a new `example7` that where using ranged parameter would not work correctly previously.
222ae88 to
6ce1fa3
Compare
In the last commit, we broke parameters of types that do not support `+` and `*` operators. Some of these parameters might be useful, for example `std::string`. Now, parameters may or may not support these operations. When they do not support, the benchmarks will fail when we try to run the parameter through a range with those operations. Funny, we can do: ``` ./bin/examples/example6 -p string:+:hola-mundo:-yeah:10 ``` But we can't: ``` ./bin/examples/example6 -p string:*:hola-mundo:-oh,no!:10 ```
This commit refactors a lot of the code added in this branch. The
original goal was to reverse how `go()` runs through various
parameter values. Before it did like:
for each benchmark
for each parameter
run-benchmark()
Now it is more like
for each parameter
for each benchmark
run-benchmark()
This new structure should make writing reporters easier, since it better
fits how the data could be presented to the user in meaningful ways --
we'll see more of this in later commits.
A couple of more refactorings has sneaked in this commit:
- `param_map` and `param_registry` etc. are not in `detail::` anymore.
This is to better fit other parts of the library (e.g. benchmark,
reporter) that are not in `detail` and arguably at the same level of
abstraction from the user API POV.
- A couple of functions now have a more "data driven" approach, trying
to reuse previously computed data more often, as to improve the
overall performance.
The old API for benchmarks that have fixtures is like this:
```cpp
NONIUS_BENCHMARK([] (nonius::chronometer meter){
// setup code
m.measure([&] {
// benchmark code
})
});
```
That API is nice, but since it has to assume that the benchmark code can
always mutate the setup, the setup can not be reused across samples.
This is ok in general, but for benchmarks that have expensive setups
relative to their measured time, time might be wasted uneedesly.
Our alternative syntax is like:
```cpp
NONIUS_BENCHMARK([] (nonius::parameters meter){
// immutable setup code
return [=] {
// benchmark code
}
});
```
Or even further:
```cpp
NONIUS_BENCHMARK([] (nonius::parameters params){
// immutable setup code
return [=] (nonius::chronometer meter) {
// mutable setup code
meter.measure([&] {
// benchmark code
})
}
});
```
The idea is that when the benchmark function can take a `parameters`
argument, it is assumed to be a factory that then returns the actual
benchmark function. This factory function is called *only once* for a
given set of parameters (in `prepare(...)`) and the resulting benchmark
function is kept in the `execution_plan` and reused for collecting all
the samples.
Furthermore, this syntax may be more intuitive for people that need
parameters, but do not need a chronometer.
When trying to figure out how long to run a test to get a meaningful value, we may fall into an infinite loop when the whole benchmark has been optimized away. We now try to detect this case by hard-limiting the number of iterations we do in the loop. This partly solves libnonius#52. A complete solution involves properly deoptimizing the return value of the benchmark function.
The exception message would get mixed together with other lines of messages creating confusion.
Now all the logic of deciding which parameters to execute is contained in `generate_params`. We removed the "feature" that `parameters` looks up default values. This way the code we remove dependencies on global state, and make the feature more testable. `generate_params` now merges the default values with the fixed values with the values generated for a parameter run.
The command line parser would only look at the first instance of an
argument. This would prevent us from passing multiple parameters to the
benchmark runner. Examples that did not work but now do work:
```
# Would only set string=foo, discarding size
example6 -p string:foo -p size:42
```
```
# Would only run size:42,52,..., discarding string
example6 -p size:+:42:10:10 -p string:foo
```
|
Thanks! No problem with the timing, take your time. In the meantime I will be using the branch in the project that motivated the change, which might help polishing it. |
| NONIUS_PARAM("x", 1.0) | ||
|
|
||
| template <typename Fn> | ||
| struct volatilize_fn { |
There was a problem hiding this comment.
I suppose this is here because of #52, right? If it works, I'd say it should be part of what nonius does when running the code and not in the examples. Basically, I think that any such technique that can be applied generically should be built-in.
I have been researching a bit on optimizer control, and I put up some code for that (#57). If you could have a look at it, that'd be great.
|
It might be important for nonius to know which benchmarks parameters are used by each benchmark. As is, all benchmarks are repeated for all parameter values, even if they don't depend on them. This is fine if you group sets of benchmarks each in its own runner but not so nice if you put different ones together. I'm not sure how important such a use case would be, but I see having to split benchmarks across multiple runners as an inconvenience given that the runners have filtering capabilities. I'm not sure how to have that work, but maybe some kind of benchmark grouping is needed. Just bringing it up for discussion. |
|
@rmartinho with regard to grouping. I also thought of that. The best solution I thought of so far is to add a notion of "suite". Parameters can then be scoped to a suite. Furthermore, this syntax might make it easier to use parameter objects as proxy to query the parameters without runtime cost nor global state. Here is an example of how benchmarks could look like: NONIUS_SUITE("push_back", [] {
NONIUS_PARAM(N, std::size_t{10});
NONIUS_BENCHMARK("vector", [=] {
for (auto i = 0u; i < N; ++i)
...
})
NONIUS_BENCHMARK("string", [=] {
for (auto i = 0u; i < N; ++i)
...
})
})I was hesitant to implement that in this branch because it requires even deeper changes to the runner and registration system, possibly breaking compatibility. If you like it I can try implementing it to see how it feels... |
|
If I could throw my two cents in here... Y E S . Grouping is a BIG thing for me. I was going to implement it myself when I had time, because I was curious to see how well many of the library I benched performed when you scaled up a certain parameter, and when you compared it across libraries. Replacing the x-axis value of "sample #" for the default graph with instead a kind of "grouping A", "grouping B", and (for the HTML graph bits) doing a scatter-bar there in to represent all the samples of that grouping would be immensely helpful. (I ended up using matplotlib with python to create a fancier graph to do what I want: this is what it ended up looking like: http://sol2.readthedocs.io/en/latest/benchmarks.html) |
|
Let's try that on a separate branch (though you can branch off of this one if you want; this will definitely be merged). See issue #59. |
|
Cool. I just pushed a little fix on this branch for an old bug. I will move further experimenting with syntax to a new branch. |
| @@ -0,0 +1,176 @@ | |||
| // Nonius - C++ benchmarking tool | |||
| // | |||
| // Written in 2014 by Martinho Fernandes <martinho.fernandes@gmail.com> | |||
There was a problem hiding this comment.
Ha, I see you copied these from some existing stale ones. Since this is PD this notice is somewhat irrelevant, but if you want to claim credit for this file, you're welcome to do so. Otherwise I'll probably do a sweeping change in the repo to something in the spirit of "by the nonius contributors nonius@rmf.io"
There was a problem hiding this comment.
Hehe, yup!
I think long term it's easiest to put something like "by the nonius contributors nonius@rmf.io, see AUTHORS file." Then the AUTHORS file may list maintainers, contributors, etc.
Hi!
I managed to get started with the feature. There is work to be done, but I open the pull request in case you want to do an early review so I can change my approach if needed. Note that this branch depends on the changes in #54
Here's the progress:
Nice to have that I might not do:
:in string parameters.RFC @rmartinho