Skip to content

iboB/par

Repository files navigation

par

Standard License

Simple task parallelism for C++20.

Alternative to the most commonly used subset of OpenMP.

Example

Also see complete working examples in the repo.

Trivial for loop

par

const auto data = make_some_data();
par::pfor({/*default run opts*/}, 0, data.size(), [&](int i) {
    compute_something(data[i]);
});

OpenMP

const auto data = make_some_data();
#pragma omp parallel for // implied default options
for (int i = 0; i < data.size(); ++i) {
    compute_something(data[i]);
}

Job-specific data

par

struct job_data {
    job_data(const par::job_info& info)
        : rng(info.job_index)
        , dist(0, 1);
    {}
    std::mt19937 rng;
    std::uniform_real_distribution<float> dist;
};

const auto data = make_some_data();
par::pfor<job_data>({.max_par = 5}, 0, data.size(), [&](int i, job_data& jd) {
    float r = jd.dist(jd.rng);
    monte_carlo_sample(data[i], r);
});

OpenMP

const auto data = make_some_data();
#pragma omp parallel num_threads(5)
{
    std::mt19937 rng(omp_get_thread_num());
    std::uniform_real_distribution<float> dist(0, 1);
    #pragma omp for
    for (int i = 0; i < data.size(); ++i) {
        float r = dist(rng);
        monte_carlo_sample(data[i], r);
    }
}

Why not just use OpenMP?

The initial motivation of this library was the use of a thread sanitizer. Using OpenMP with thread sanitization leads to a slew of false positive errors. To fix this, one needs to either turn the thread sanitizer off for potentially big chunks of their code, or build OpenMP itself with thread sanitization enabled. Incorporating any of these (especially custom OpenMP) into a build system is very unpleasant.

Thread sanitizers are good. The vast majority of software only needs a small subset of OpenMP: parallel for loops and basic scheduling. Par aims to provide that subset without the build-system baggage and without any sacrifice in performance.

As a side effect, par allows finer-grain control over the thread pool. You can instantiate multiple thread pools with different sizes. Thus you can integrate par into larger systems more easily as you can control the number of threads allotted to different subsystems.

Why not use std::execution::par?

I don't like std::execution::par. It's acceptable for small oneshot tasks that you run once and forget, but it's terrible for integrating into larger systems with more moving parts. It's a black box. You can't control how many threads it uses, you can't control how tasks are scheduled. Additionally it only works on iterators which makes certain algorithms harder to express.

Features

Par has about the same overhead as OpenMP. See more in the performance document.

  • par::thread_pool: The thread pool. Multiple thread pools can be instantiated. A global one is used by default by runners. The global thread pool is lazily initialized on first use and lives until process termination.
  • Runners:
    • par::prun: run a generic task in parallel. The provided function receives a job index.
    • par::pchunk: run a task in parallel over chunks of work. The provided function receives the chunk range.
    • par::pfor: run a for loop in parallel. The provided function receives the current index.
      • allows specifying job-specific data
      • allows specifying chunks of iterations to be processed by each job
  • Runner options par::run_opts. See run_opts.hpp for details.
    • .max_par: maximum parallelism (number of concurrent jobs). Defaults to the number of thread pool threads.
    • .sched: scheduling strategy
      • schedule_dynamic (default): jobs are assigned dynamically to threads as they finish previous jobs. Suitable for unbalanced workloads.
      • schedule_static: each thread is assigned a fixed set of jobs at the start. Suitable for balanced workloads.

Notable unsupported OpenMP features

  • No guided scheduling.
  • No barriers or other synchronization primitives.
  • Limited nested parallelism support: only dynamically scheduled tasks can be nested.
  • No thread ids. Instead job_index is used, but with dynamic scheduling multiple job indices may end up being executed by the same thread. Use std::this_thread::get_id() if you need the actual thread id.
  • No extended features like atomic, SIMD, reductions, etc.

Usage

Note that par, and code using it, require at least C++20. Any C++20 capable compiler should work.

The easiest, and currently only supported, way to add par to a project is as a CPM.cmake package. If you are using this package manager (and you should be), you only need to add this line to your CMakeLists.txt: CPMAddPackage(gh:iboB/par@0.1.0). Update the version "0.1.0" to the one you want.

In your CMake code link with par::par. The relevant CMake configurations are:

  • BUILD_SHARED_LIBS - respected and controls whether par is built as a shared or static library.
  • par_STATIC - if set to ON, par is always built as a static library, regardless of BUILD_SHARED_LIBS

The build is very straightforward and other ways of integration, though not explicitly supported, should work:

  • As a submodule/subrepo: par bundles CPM, so this should just work as well, as long as there are no dependency clashes.
  • Copy the code/ subdirectory into your project and make sure the dependencies are available.
  • Whatever your heart desires

Dependencies

The tests, examples, and configuration depend on various packages, but par's code itself only depends on:

These libraries are header-only and have no dependencies of their own. If you copy code or create an alternative build process, things should be relatively easy to set up.

License

This software is distributed under the MIT Software License.

See accompanying file LICENSE or copy here.

Copyright © 2025-2026 Borislav Stanimirov

About

Simple task parallelism for C++20. Alternative to the most used subset of OpenMP

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Generated from iboB/cpp-lib-template