Skip to content

aluhrs13/tachometer

 
 

Repository files navigation

tachometer Build Status NPM  package

tachometer is a tool for running benchmarks in web browsers. It uses repeated sampling and statistics to reliably identify even tiny differences in runtime.

Install

npm i tachometer

Usage

npx tachometer bench1.html [bench2.html ...]

Why?

Even if you run the same JavaScript, on the same browser, on the same machine, on the same day, you'll still get a different result every time. But if you take enough repeated samples and apply the right statistics, you can reliably identify even tiny differences in runtime.

Example

Let's test two approaches for adding elements to a page. First create two HTML files:

inner.html

<script type="module">
  import * as bench from '/bench.js';
  bench.start();
  for (let i = 0; i < 100; i++) {
    document.body.innerHTML += '<button></button>';
  }
  bench.stop();
</script>

append.html

<script type="module">
  import * as bench from '/bench.js';
  bench.start();
  for (let i = 0; i < 100; i++) {
    document.body.append(document.createElement('button'));
  }
  bench.stop();
</script>

Now run tachometer:

npx tachometer append.html inner.html

Tachometer opens Chrome and loads each HTML file, measuring the time between bench.start() and bench.stop(). It round-robins between the two files, running each at least 50 times.

[==============================================------------] 79/100 chrome append.html

After a few seconds, the results are ready:

┌─────────────┬─────────────────┬─────────────────┬─────────────────┐
│ Benchmark   │        Avg time │   vs inner.html │  vs append.html │
├─────────────┼─────────────────┼─────────────────┼─────────────────┤
│ inner.html  │ 7.23ms - 8.54ms │                 │          slower │
│             │                 │        -        │    851% - 1091% │
│             │                 │                 │ 6.49ms - 7.80ms │
├─────────────┼─────────────────┼─────────────────┼─────────────────┤
│ append.html │ 0.68ms - 0.79ms │          faster │                 │
│             │                 │       90% - 92% │        -        │
│             │                 │ 6.49ms - 7.80ms │                 │
└─────────────┴─────────────────┴─────────────────┴─────────────────┘

This tells us that using the document.body.append approach instead of the innerHTML approach would be between 90% and 92% faster on average. The ranges tachometer reports are 95% confidence intervals for the percent change from one benchmark to another. See Interpreting results for more information.

Features

  • Measure your own specific timings with the /bench.js module, by setting the window.tachometerResult global (or by polling an arbitrary JS expression), or measure First Contentful Paint on any local or remote URL.

  • Compare benchmarks by round-robin between two or more files, URLs, URL query string parameters, or browsers, to measure which is faster or slower, and by how much, with statistical significance.

  • Swap dependency versions of any NPM package you depend on, to compare published versions, remote GitHub branches, or local git repos.

  • Automatically sample until we have enough precision to answer the question you are asking.

  • Remote control browsers running on different machines using remote WebDriver.

Sampling

Minimum sample size

By default, a minimum of 50 samples are taken from each benchmark. You can change the minimum sample size with the --sample-size flag or the sampleSize JSON config option.

Auto sample

After the initial 50 samples, tachometer will continue taking samples until there is a clear statistically significant difference between all benchmarks, for up to 3 minutes.

You can change this duration with the --timeout flag or the timeout JSON config option, measured in minutes. Set --timeout=0 to disable auto sampling entirely. Set --timeout=60 to sample for up to an hour.

Auto sample conditions

You can also configure which statistical conditions tachometer should check for when deciding when to stop auto sampling by configuring auto sample conditions.

To set auto sample conditions from the command-line, use the --auto-sample-conditions flag with a comma-delimited list:

--auto-sample-conditions=0%,10%

To set auto sample conditions from a JSON config file, use the autoSampleConditions property with an array of strings (including if there is only one condition):

{
  "autoSampleConditions": ["0%", "10%"]
}

An auto sample condition can be thought of as a point of interest on the number-line of either absolute milliseconds, or relative percent change. By setting a condition, you are asking tachometer to try to shrink the confidence interval until it is unambiguously placed on one side or the other of that condition.

Example condition Question
0% Is A faster or slower than B at all? (The default)
10% Is A faster or slower than B by at least 10%?
+10% Is A slower than B by at least 10%?
-10% Is A faster than B by at least 10%?
-10%, +10% (Same as 10%)
0%, 10%, 100% Is A at all, a little, or a lot slower or faster than B?
0.5ms Is A faster or slower than B by at least 0.5 milliseconds?

In the following example, we have set --auto-sample-conditions=10%, meaning we are interested in knowing whether A differs from B by at least 10% in either direction. The sample size automatically increases until the confidence interval is narrow enough to place the estimated difference squarely on one side or the other of both conditions.

      <------------------------------->     n=50  X -10% X +10%
                <------------------>        n=100 ✔️ -10% X +10%
                    <----->                 n=200 ✔️ -10% ✔️ +10%

  |---------|---------|---------|---------| difference in runtime
-20%      -10%        0       +10%      +20%

n    = sample size
<--> = confidence interval for percent difference of mean runtimes
✔️    = resolved condition
X    = unresolved condition

In this example, by n=50 we are not sure whether A is faster or slower than B by more than 10%. By n=100 we have ruled out that B is faster than A by more than 10%, but we're still not sure if it's slower by more than 10%. By n=200 we have also ruled out that B is slower than A by more than 10%, so we stop sampling. Note that we still don't know which is absolutely faster, we just know that whatever the difference is, it is neither faster nor slower than 10% (and if we did want to know, we could add 0 to our conditions).

Note that, if the actual difference is very close to a condition, then it is likely that the condition will never be met, and the timeout will expire.

Measurement modes

Tachometer supports six modes of measurement (four for time, one for memory, one for CPU), controlled with the measurement config file property, or the --measure flag.

If measurement is an array, then all of the given measurements will be retrieved from each page load. Each measurement from a page is treated as its own benchmark.

A measurement can specify a name property that will be used to display its results.

Performance API

Retrieve a measure, mark, or paint timing from the performance.getEntriesByName API. Note this mode can only be used with a config file.

For example, in your benchmark:

performance.mark('foo-start');
// Do some work ...
performance.mark('foo-stop');
performance.measure('foo', 'foo-start', 'foo-stop');

And in your config file:

"benchmarks": [
  {
    "measurement": {
      "mode": "performance",
      "entryName": "foo"
    }
  }
]

The following performance entry types are supported:

  • measure: Retrieve the duration of a user-defined interval between two marks. Use for measuring the timing of a specific chunk of your code.
  • mark: Retrieve the startTime of a user-defined instant. Use for measuring the time between initial page navigation and a specific point in your code.
  • paint: Retrieve the startTime of a built-in paint measurement (e.g. first-contentful-paint).

Callback

By default with local (non-URL) benchmarks, or when the --measure flag is set to callback, your page is responsible for calling the start() and stop() functions from the /bench.js module. This mode is appropriate for micro benchmarks, or any other kind of situation where you want full control over the beginning and end times.

Global result

When the --measure flag is set to global, then you can assign an arbitrary millisecond result to the window.tachometerResult global. In this mode, tachometer will poll until it finds a result assigned here.

const start = performance.now();
for (const i = 0; i < 1000; i++) {}
window.tachometerResult = performance.now() - start;

This mode is appropriate when you need full control of the measured time, or when you can't use callback mode because you are not using tachometer's built-in server.

Alternatively, to poll an arbitrary JS expression in global measurement mode (rather than window.tachometerResult), set --measurement-expression to the JS expression to poll. This option is useful for scenarios where you cannot easily modify the code under test to assign to window.tachometerResult but are otherwise able to extract a measurement from the page using JavaScript.

First Contentful Paint (FCP)

When the --measure flag is set to fcp, or when the benchmark is an external URL, then the First Contentful Paint (FCP) time will be automatically extracted from your page using the Performance Timeline API. This interval begins at initial navigation, and ends when the browser first renders any DOM content. Currently, only Chrome supports the first-contentful-paint performance timeline entry. In this mode, calling the start() and stop() functions is not required, and has no effect.

Memory (Chromium memory-infra)

When the --measure flag is set to memory, or when a config-file measurement object has "mode": "memory", tachometer captures a memory dump from Chromium's memory-infra tracing subsystem at the end of each sample and auto-discovers every (processRole, allocator, attribute) tuple present in the dump. Each tuple becomes its own measurement and is statistically compared between variants the same way timing results are. There are no per-metric or per-process knobs - tachometer just reports on every category Chromium emits.

Memory measurement is Chromium-only (chrome, edge); using it with Firefox/Safari/IE will produce a clear error at startup.

Configuration:

"benchmarks": [
  {
    "measurement": {
      "mode": "memory",
      "dumpLevel": "detailed",
      "gcBefore": true
    }
  }
]
  • dumpLevellight or detailed (default).
  • gcBefore — when true (default), force a garbage collection via the DevTools HeapProfiler.collectGarbage command before capturing the dump. This significantly reduces noise on V8 heap measurements.
  • maxAllocatorDepth — when set, drops allocators whose /-separated path depth exceeds the given value. malloc has depth 1, malloc/partitions has depth 2, malloc/partitions/allocator/buckets/bucket_0000016 has depth 5. Chromium memory-infra reports parent allocators as the sum of their children's roll-up attributes (size, effective_size, allocated_size, …), so dropping deeper paths preserves the high-level picture while collapsing per-bucket and per-sub-arena noise. A typical detailed dump produces 20,000+ rows; setting maxAllocatorDepth: 2 typically cuts that to a few hundred while keeping every top-level category. Attributes that only exist at leaves (e.g. per-bucket fragmentation) are skipped entirely when this is set. Default: unlimited.

Equivalent CLI flags: --measure=memory, --memory-dump-level=<light|detailed>, --memory-gc-before-dump=<true|false>, and --memory-max-allocator-depth=<N>.

Attribute filtering. Memory-infra emits dozens of attributes per allocator: bytes (size, effective_size, allocated_size, …), counters (object_count, alloc_count, …), pool bookkeeping (regular_pool_usage, brp_pool_largest_reservation, …), boolean flags (is_peak_rss_resettable, is_prepaint, …), and derived rates (syscalls_per_minute, brp_quarantined_bytes_per_minute, …). Tachometer tracks only the bytes-focused subset by default:

  • size — the primary "bytes allocated for this dump" metric every named allocator reports.
  • effective_size — bytes attributed after sharing is split across owners (the "fair share" view of memory).
  • process_totals.peak_resident_set_size, process_totals.private_footprint_bytes — process-level RSS-style attributes (in bytes) on the platforms that report them. In current Chromium these are the only process_totals attributes exposed via the Tracing.requestMemoryDump trace path. resident_set_bytes is also tracked for forward/cross-browser compatibility, but Chromium does not currently emit it under process_totals in the trace (the non-peak resident-set value is only available through a separate memory-instrumentation API that tachometer does not use), so it normally does not appear in results.

Everything else is dropped during enumeration. This is hard-coded because the dropped attributes don't answer a memory-cost question; they only add noise to the result table.

How auto-discovery works. Before any recorded sample is taken, tachometer opens each benchmark page once, captures a probe dump, and enumerates every (processRole, allocator, attribute) tuple in it. The union of tuples across every variant becomes the result set used for the whole run, so two variants of the same benchmark always produce the same row set and can be compared directly. Once probing is done, tachometer takes its regular samples; each sample fires a single Tracing.requestMemoryDump, and every discovered tuple is read out of that one dump.

Result rows. Each row is labelled <bench> [memory:<processRole>:<allocator>.<attribute>]. Process roles come straight from Chromium's process_name metadata, lowercased (e.g. renderer, browser, gpu process, service: network.mojom.networkservice). When multiple processes share the same role (e.g. several renderers), tachometer sums their values for that tuple. A category that is present in one variant but missing in another is recorded as 0 for the missing variant, with one caveat: when the baseline value for a comparison is exactly 0, the relative-percent column is rendered as n/a (division by zero is undefined); the absolute byte delta is still shown.

Comparison grouping. Each auto-discovered category only compares against the same category across variants. There is no renderer:malloc.size vs browser:v8/main/heap.size cross-pair noise in the table.

Memory results are rendered with units of B/KiB/MiB/GiB (depending on magnitude) instead of ms. The auto-sample condition syntax accepts byte suffixes too — e.g. --auto-sample-conditions=0KiB,+10KiB,-1% will stop sampling once the absolute byte difference is well-resolved at the 10 KiB boundary or once the relative difference is resolved at 1 %. Supported byte suffixes are B, KiB, MiB, and GiB. Byte and millisecond absolute conditions are partitioned by unit: a +1KiB condition only applies to memory results and a +0.1ms condition only applies to timing results, so mixed benchmark suites can share a single condition list safely.

A single benchmark can combine timing and memory measurements by passing an array to measurement — both are collected from the same page load, and each is statistically compared independently. Only one mode: "memory" entry is allowed per benchmark (auto-discovery already covers every category, so duplicates would be redundant; tachometer fails at startup with a clear error if more than one is declared).

Result rows are filtered by a curated rule list

Memory-infra emits hundreds of (processRole, allocator, attribute) tuples per dump. The raw list is noisy and shifts between Chromium versions, which makes runs hard to compare. Tachometer ships with a hard-coded curated rule list that filters and aggregates the discovered tuples into a focused, stable set of result rows:

  • Renderer bytes-level totals for each top-level subsystem: renderer:blink_gc.size/.effective_size, renderer:malloc.size/.effective_size, renderer:v8.size/.effective_size, renderer:partition_alloc.size/.effective_size.
  • Renderer OS-level RSS when the platform reports it: renderer:process_totals.peak_resident_set_size, renderer:process_totals.private_footprint_bytes.
  • Service processes (NetworkService, StorageService, TracingService, …) collapsed into one summed row each: memory:sum:all-services-malloc-size and memory:sum:all-services-malloc-effective-size.
  • Browser process malloc totals as a coarse indicator.

This is intentionally not user-configurable. Every benchmark in every repo gets the same rows, so results are directly comparable without per-config bikeshedding. The full list is the memoryDefaultCategories constant in src/defaults.ts — if a category you care about is missing for a real benchmark, propose a change there.

Iterating on the rule list is supported via the diagnostic JSON output: pass --memory-categories-file=<path>.json to write a report describing what the run actually did. The file has:

  • config — the focused-default filters that ran before the rule list (maxAllocatorDepth, tracked attributes).
  • discovered.tuples — every (processRole, allocator, attribute) tuple memory-infra emitted (after the focused-default filters). Plus a perSpec count so you can see when one variant discovers more than another.
  • output.tupleRows and output.aggregateRows — the rows that made it into the result table. Each aggregate lists its sources (the exact tuples that contributed to the sum).
  • excludedByRule — per-exclude pattern, the tuples it removed.
  • droppedNoMatch — tuples that were discovered but matched no include rule. Skim this list to find rows worth proposing for the baked-in memoryDefaultCategories.
  • optionalRulesWithNoMatchesinclude patterns flagged optional: true that didn't match anything this run (some rules in the baked list are conditional on platform / Chromium version).

The file is tiny (kilobytes) and written every time the flag is set; you can check one in next to a benchmark config and diff across runs to spot Chromium emission changes.

CPU time (Chromium)

When the --measure flag is set to cpu, or when a config-file measurement object has "mode": "cpu", tachometer measures main-thread renderer CPU time using the Chrome DevTools Protocol Performance.getMetrics API (enabled in threadTicks time domain). Like memory, this is Chromium-only (chrome / edge).

{
  "benchmarks": [
    {
      "url": "my-benchmark.html",
      "measurement": ["callback", "cpu"]
    }
  ]
}

CPU is a companion metric: it has no completion signal of its own, so it snapshots its value whenever a paired timing measurement (callback, fcp, or a global/expression poll) finishes. A cpu measurement therefore requires at least one timing measurement in the same benchmark:

  • --measure=cpu on the CLI automatically expands to [<the url's default timing measurement>, cpu].
  • A bare "measurement": "cpu" (or ["cpu"]) in a config file has the url-appropriate default timing measurement injected ahead of it.
  • Otherwise list them explicitly, e.g. "measurement": ["callback", "cpu"].

The reported value for each sample is the delta between two cumulative counter snapshots: a baseline taken on the fresh about:blank tab before navigation, and an end snapshot taken when the timing companion completes. Capturing the baseline before navigation makes the measurement load-inclusive — synchronous page-load work (HTML parsing, synchronous <script>s, and the initial style/layout) is counted, not just post-load activity. (Enabling the counters after load, as earlier versions did, zeroed them once load had finished and made sync/load-bound benchmarks read ~0.) CPU is most meaningful with a callback companion, which fires when your benchmark signals it is done — fcp fires before the page finishes loading, leaving a near-empty measurement window.

A single cpu measurement auto-expands into one result row per discovered sub-metric, each statistically compared independently:

  • cpu:mainThread:TaskDuration — total main-thread task time.
  • cpu:mainThread:ScriptDuration — time running JavaScript.
  • cpu:mainThread:RecalcStyleDuration — style recalculation.
  • cpu:mainThread:LayoutDuration — layout.
  • cpu:mainThread:V8CompileDuration — V8 compilation.

(The exact set is the intersection of the curated cpuDefaultMetrics list in src/defaults.ts with what the running Chromium build emits.)

Caveats:

  • Main-thread renderer only. Web/Service Workers, the compositor and raster threads, GPU process, network service, and browser process are all excluded.
  • Sub-metrics overlap and are NOT additive. ScriptDuration, LayoutDuration, etc. are slices that can nest inside TaskDuration; don't sum them.
  • Unit is ms. CPU rows share the ms unit (and +1ms-style auto-sample conditions) with wall-clock results, but their cpu:mainThread: compare key keeps them from being compared against wall-clock timing rows.
  • Cannot be combined with memory in the same benchmark: the memory dump and its forced garbage collection consume main-thread CPU and would contaminate the measurement. Measure them in separate runs.
  • CPU and timing windows differ. Because the CPU baseline is taken before navigation (load-inclusive) while a page's own timing expression typically starts at some in-page t0, the CPU window is usually wider than the timing window. A CPU sub-metric can therefore read slightly higher than a page-reported duration — they cover different intervals, so don't expect them to match exactly.

Equivalent CLI flag: --measure=cpu.

Interpreting results

Average runtime

The first column of output is the average runtime of the benchmark. This is a 95% confidence interval for the number of milliseconds that elapsed during the benchmark. When you run only one benchmark, this is the only output.

Difference table

When you run multiple benchmarks together, you'll get an NxN table summarizing all of the differences in runtimes, both in absolute and relative terms (percent-change).

In this example screenshot we're comparing for loops, each running with a different number of iterations (1, 1000, 1001, and 3000):

This table tells us:

  • 1 iteration was between 65% and 73% faster than 1000 iterations.

  • 1000 iterations was between 179% and 263% slower than 1 iteration. Note that the difference between 1-vs-1000 and 1000-vs-1 is the choice of which runtime is used as the reference in the percent-change calculation, where the reference runtime comes from the column labeled "vs X".

  • The difference between 1000 and 1001 iterations was ambiguous. We can't tell which is faster, because the difference was too small. 1000 iterations could be as much as 13% faster, or as much as 21% slower, than 1001 iterations.

Confidence intervals

Loosely speaking, a confidence interval is a range of plausible values for a parameter like runtime, and the confidence level (which tachometer always fixes to 95%) corresponds to the degree of confidence we have that the interval contains the true value of that parameter. See Wikipedia for more information about confidence intervals.

    <------------->   Wider confidence interval
                      High variance and/or low sample size

         <--->   Narrower confidence interval
                 Low variance and/or high sample size

 |---------|---------|---------|---------|
-1%      -0.5%       0%      +0.5%      +1%

The way tachometer shrinks confidence intervals is by increasing the sample size. The central limit theorem means that, even when we have high variance data, and even when that data is not normally distributed, as we take more and more samples, we'll be able to calculate a more and more precise estimate of the true mean of the data.

Swap NPM dependencies

Tachometer has specialized support for swapping in custom versions of any NPM dependency in your package.json. This can be used to compare the same benchmark against one or more versions of a library it depends on.

Use the benchmarks.packageVersions JSON config property to specify the version to swap in, like this:

{
  "benchmarks": [
    {
      "name": "my-benchmark",
      "url": "my-benchmark.html",
      "packageVersions": {
        "label": "my-label",
        "dependencies": {
          "my-package": "github:MyOrg/my-repo#my-branch"
        }
      }
    }
  ]
}

The version for a dependency can be any of the following:

  • Any version range supported by NPM, including semver ranges, git repos, and local paths. See the NPM documentation for more details.

  • For monorepos, or other git repos where the package.json is not located at the root of the repository (which is required for NPM's git install function), you can use an advanced git configuration object (schema) in place of the NPM version string, e.g.:

    {
      "benchmarks": [
        {
          "name": "my-benchmark",
          "url": "my-benchmark.html",
          "packageVersions": {
            "label": "my-label",
            "dependencies": {
              "my-package": {
                "kind": "git",
                "repo": "git@github.com:MyOrg/my-repo.git",
                "ref": "my-branch",
                "subdir": "packages/my-package",
                "setupCommands": ["npm install", "npm run build"]
              }
            }
          }
        }
      ]
    }

You can also use the --package-version flag to specify a version to swap in from the command-line, with format [label=]package@version. Note that the advanced git install configuration is not supported from the command line:

tach mybench.html \
  --package-version=my-package@1.0.0 \
  --package-version=my-label=my-package@github:MyOrg/my-repo#my-branch

When you specify a dependency to swap, the following happens:

  1. The package.json file closest to your benchmark HTML file is found.

  2. A copy of this package.json, with the new dependency version swapped in, is written to the system's temp directory (use --npm-install-dir to change this location), and npm install is run in that directory.

  3. A separate server is started for each custom NPM installation, where any request for the benchmark's node_modules/ directory is served from that location.

NOTE: Tachometer will re-use NPM install directories as long as the dependencies you specified haven't changed, and the version of tachometer used to install it is the same. To always do a fresh npm install, set the --force-clean-npm-install flag.

JavaScript module imports

JavaScript module imports with bare module specifiers (e.g. import {foo} from 'mylib';) will be automatically transformed to browser-compatible path imports using Node-style module resolution (e.g.import {foo} from './node_modules/mylib/index.js';).

This feature can be disabled with the --resolve-bare-modules=false flag, or the resolveBareModules: false JSON config file property.

Browsers

Browser Headless FCP
chrome yes yes
firefox yes no
safari no no
edge no no
ie no no

Webdriver Plugins

Tachometer comes with WebDriver plugins for Chrome, Safari, Firefox, and Internet Explorer.

For Edge, follow the Microsoft WebDriver installation documentation.

If you encounter errors while driving IE, see the Required Configuration section of the WebDriver IE plugin documentation. In particular, setting "Enable Protected Mode" so that it is consistently either enabled or disabled across all security zones appears to resolve NoSuchSessionError errors.

On-demand dependencies

Tachometer will install WebDriver plugins for Chrome, Firefox and IE on-demand. The first time that Tachometer runs a benchmark in any of these browsers, it will install the appropriate plug-in via NPM or Yarn if it is not already installed.

If you wish to avoid on-demand installations like this, you can install the related packages (chromedriver, geckodriver and iedriver, respectively) ahead of time with npm install, for example:

npm install tachometer chromedriver

In the example above, Tachometer will detect the manually installed chromedriver package and will skip any attempt to install it on-demand later.

Headless

If supported by the browser, you can launch in headless mode by adding "headless": true to the browser JSON config, or by appending -headless to the browser name when using the CLI flag (e.g. --browser=chrome-headless).

Binary path and arguments

WebDriver automatically finds the location of the browser binary, and launches it with a default set of arguments.

To customize the binary path (Chrome and Firefox only), use the binary property in the browser JSON config. For example, to launch Chrome Canary from its standard location on macOS:

{
  "name": "chrome",
  "binary": "/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary"
}

To pass additional arguments to the binary (Chrome and Firefox only), use the addArguments property in the browser JSON config. To remove one of the arguments that WebDriver sets by default (Chrome only), use removeArguments (see example in next section).

To configure Firefox preferences that are usually set from the about:config page, use the preferences property in the browser JSON config.

Profiles

It is normally recommended to use the default behavior whereby a new, empty browser profile is created when the browser is launched, so that state from your personal profile (cookies, extensions, cache etc.) do not influence benchmark results.

However, in some cases it may be useful to use an existing browser profile, for example if the webpage you are benchmarking requires being signed into an account.

In Chrome and Firefox, use the profile JSON config option to specify an existing profile to use. Other browsers do not yet support this option.

Chrome

To find your current profile location in Chrome, visit chrome://version and look for "Profile Path".

If there is an existing Chrome process using this profile, you must first terminate it. You also need to close all open tabs, or disable the "Continue where you left off" startup setting, because tachometer does not expect to find any existing tabs.

You may also need to remove the use-mock-keychain default argument if you encounter authentication problems.

For example, using the standard location of the default user profile on macOS:

{
  "benchmarks": [
    {
      "url": "mybench.html",
      "browser": {
        "name": "chrome",
        "profile": "/Users/<username>/Library/Application Support/Google/Chrome",
        "removeArguments": ["use-mock-keychain"]
      }
    }
  ]
}

Firefox

To find your current profile location in Firefox, visit about:support and look for "Profile Folder" or "Profile Directory".

Note when using the profile option in Firefox, the profile directory is copied to a temporary location.

You may encounter a no such file or directory, stat '.../lock' error, due to a bug in selenium-webdriver. Deleting this lock file should resolve the error.

For example, using the standard location of user profiles on macOS:

{
  "benchmarks": [
    {
      "url": "mybench.html",
      "browser": {
        "name": "firefox",
        "profile": "/Users/<username>/Library/Application Support/Firefox/Profiles/<profile-name>"
      }
    }
  ]
}

Performance traces

Once you determine that something is slower or faster in comparison to something else, investigating why is the natural next step. To assist in determining why, consider collecting performance traces. These traces can be used to determine what the browser is doing differently between two versions of code.

When the trace option is turned on in Chromium-based browsers, each tachometer sample will produce a JSON file that can be viewed in Chromium's about:tracing tool. Enter about:tracing in the URL bar of Chromium, click load, and select the json file you want to view. Check out the about:tracing doc page to learn more about using the trace event profiling tool.

To turn on tracing with the default configuration, add trace: true to a Chromium browser's config object. This config turns on tracing with some default categories enabled and puts the JSON files into a directory called logs in your current working directory.

For example:

{
  "benchmarks": [
    {
      "name": "my-benchmark",
      "url": "my-benchmark.html",
      "browser": {
        "name": "chrome",
        "trace": true
      }
    }
  ]
}

To customize where the logs files are placed or what categories of events are traced, pass an object to the trace config as demonstrated below. The categories property is a list of trace categories to collect. The logDir is the directory to store the log files to. If it is relative, it is resolved relative to the current working directory.

{
  "benchmarks": [
    {
      "name": "my-benchmark",
      "url": "my-benchmark.html",
      "browser": {
        "name": "chrome",
        "trace": {
          "categories": ["blink", "cc", "netlog", "toplevel", "v8"],
          "logDir": "results/trace-logs"
        }
      }
    }
  ]
}

Available trace categories can be found by going to about:tracing in a Chromium browser by entering about:tracing in the URL bar. Press "Record" in the top right (1), then expand the "Edit categories" section (2). There, all the categories available for tracing are listed. Note, for the "Disabled by Default Categories", preface the name with the string disabled-by-default- when adding it to your tachometer config. For example, to enable the disabled by default audio category shown below (3), specify disabled-by-default-audio in your browser.trace.categories tachometer config.

about:tracing app demonstrating the steps above

Tracing can also be enabled via command line flags. See the table at the end of the file for details.

Remote control

Tachometer can control and benchmark browsers running on remote machines by using the Standalone Selenium Server, which supports macOS, Windows, and Linux.

This may be useful if you want to develop on one platform but benchmark on another, or if you want to use a dedicated benchmarking computer for better performance isolation.

Note you will need to know the IP address of both your local and remote machine for the setup steps below. You can typically use ipconfig on Windows, ifconfig on macOS, and ip on Linux to find these addresses. You'll need to be able to initiate connections between these machines in both directions, so if you encounter problems, it's possible that there is a firewall or NAT preventing the connection.

On the remote machine:

  1. Install a Java Development Kit (JDK) if you don't already have one.

  2. Download the latest Standalone Selenium Server .jar file from seleniumhq.org.

  3. Download the driver plugins for the browsers you intend to remote control from seleniumhq.org. Note that if you download a plugin archive file, the archive contents must be extracted and placed either in the current working directory for the next command, or in a directory that is included in your $PATH environment variable.

  4. Launch the Standalone Selenium Server.

    java -jar selenium-server-standalone-<version>.jar

On the local machine:

  1. Use the --browser flag or the browser config file property with syntax <browser>@<remote-url> to tell tachometer the IP address or hostname of the remote Standalone Selenium Server to launch the browser from. Note that 4444 is the default port, and the /wd/hub URL suffix is required.

    --browser=chrome@http://my-remote-machine:4444/wd/hub
  2. Use the --host flag to configure the network interface address that tachometer's built-in static server will listen on (unless you are only benchmarking external URLs that do not require the static server). By default, for security, tachometer listens on 127.0.0.1 and will not be accessible from the remote machine unless you change this to an IP address or hostname that will be accessible from the remote machine.

  3. If needed, use the --remote-accessible-host flag to configure the URL that the remote browser will use when making requests to your local tachometer static server. By default this will match --host, but in some network configurations it may need to be different (e.g. if the machines are separated by a NAT).

Config file

Use the --config flag to control tachometer with a JSON configuration file. Defaults are the same as the corresponding command-line flags.

All paths in a config file are relative to the path of the config file itself.

You will typically want to set root to the directory that contains your package's node_modules/ folder, so that the web server will be able to resolve bare-module imports.

For example, a file called benchmarks/foo/tachometer.json might look like this:

{
  "root": "../..",
  "sampleSize": 50,
  "timeout": 3,
  "autoSampleConditions": ["0%", "1%"],
  "benchmarks": [
    {
      "name": "foo",
      "url": "foo/bar.html?baz=123",
      "browser": {
        "name": "chrome",
        "headless": true,
        "windowSize": {
          "width": 800,
          "height": 600
        }
      },
      "measure": "fcp",
      "packageVersions": {
        "label": "my-branch",
        "dependencies": {
          "mylib": "github:Polymer/mylib#my-branch"
        }
      }
    }
  ]
}

Use the expand property in a benchmark object to recursively generate multiple variations of the same benchmark configuration. For example, to test the same benchmark file with two different browsers, you can use expand instead of duplicating the entire benchmark configuration:

{
  "benchmarks": [
    {
      "url": "foo/bar.html",
      "expand": [
        {
          "browser": "chrome"
        },
        {
          "browser": "firefox"
        }
      ]
    }
  ]
}

Which is equivalent to:

{
  "benchmarks": [
    {
      "url": "foo/bar.html",
      "browser": "chrome"
    },
    {
      "url": "foo/bar.html",
      "browser": "firefox"
    }
  ]
}

Pinned metrics

A single benchmark can produce many metrics (for example fcp, a named callback measurement, or a memory aggregate). Use the top-level pinnedMetrics property to declare which metric names matter most for a config, in priority order:

{
  "pinnedMetrics": ["build-time", "total-time"],
  "benchmarks": [
    {
      "url": "foo/bar.html",
      "measurement": [
        {"name": "build-time", "mode": "performance", "entryName": "build"},
        {"name": "total-time", "mode": "performance", "entryName": "total"}
      ]
    }
  ]
}

Each entry must match a measurement's resolved metric name — either an explicit measurement.name, or the label tachometer derives when one isn't given (such as fcp, callback, the global expression, or memory:sum:<name> for a named memory aggregate). The list is declared once per config and applies across every variant, since metric names are shared across variants.

For memory metrics, prefer pinning a named aggregate (memory:sum:<name>, declared via a measurement's categories/sumAs) rather than a raw memory:tuple:... label, whose process-role and allocator segments can vary across browser versions and platforms.

pinnedMetrics is purely declarative metadata: it never changes what is measured or how statistics are computed. When you write results with --json-file, tachometer copies the list verbatim into a top-level pinnedMetrics field (omitted when empty) so downstream consumers — such as report generators — can highlight or prioritise those metrics. If a pinned name matches no produced metric, tachometer prints a non-fatal warning so you can catch typos.

CLI usage

Run a benchmark from a local file:

tach foo.html

Compare a benchmark with different URL parameters:

tach foo.html?i=1 foo.html?i=2

Benchmark index.html in a directory:

tach foo/bar

Benchmark First Contentful Paint time of a remote URL:

tach http://example.com
Flag - Default Description
--help false Show documentation
--root ./ Root directory to search for benchmarks
--host 127.0.0.1 Which host to run on
--port 8080, 8081, ..., 0 Which port to run on (comma-delimited preference list, 0 for random)
--config (none) Path to JSON config file (details)
--package-version / -p (none) Specify an NPM package version to swap in (details)
--browser / -b chrome Which browsers to launch in automatic mode, comma-delimited (chrome, firefox, safari, edge, ie) (details)
--window-size 1024,768 "width,height" in pixels of the browser windows that will be created
--sample-size / -n 50 Minimum number of times to run each benchmark (details)
--auto-sample-conditions 0% The degrees of difference to try and resolve when auto-sampling ("N%" or "Nms", comma-delimited) (details)
--timeout 3 The maximum number of minutes to spend auto-sampling (details)
--measure callback Which measurement to take (callback, global, fcp, memory, cpu) (details)
--measurement-expression window.tachometerResult JS expression to poll for on page to retrieve measurement result when measure setting is set to global
--memory-dump-level detailed When --measure=memory, dump level of detail (light or detailed).
--memory-gc-before-dump true When --measure=memory, whether to force a garbage collection before the dump.
--memory-max-allocator-depth (unlimited) When --measure=memory, drop allocators whose path depth exceeds this value (e.g. 2 collapses per-bucket sub-allocators into their parents).
--remote-accessible-host matches --host When using a browser over a remote WebDriver connection, the URL that those browsers should use to access the local tachometer server (details)
--npm-install-dir system temp dir Where to install custom package versions. (details)
--force-clean-npm-install false Always do a from-scratch NPM install when using custom package versions. (details)
--csv-file none Save statistical summary to this CSV file.
--csv-file-raw none Save raw sample measurements to this CSV file.
--memory-categories-file none When --measure=memory, save a diagnostic JSON describing what the probe discovered, what each categories rule did, and what was dropped.
--json-file none Save results to this JSON file.
--manual false Don't run automatically, just show URLs and collect results
--trace false Enable performance tracing (details)
--trace-log-dir ${cwd}/logs The directory to put tracing log files. Defaults to ${cwd}/logs.
--trace-cat default categories The tracing categories to record. Should be a string of comma-separated category names

About

Statistically rigorous benchmark runner for the web

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 98.9%
  • Other 1.1%