There are currently multiple uses of OpenMP locking mechanisms in the Cuda / HIP / SYCL code resulting in #ifdef OpenMP code. Can this be replaced with C++ std library mechanisms to avoid preprocessor conditional code logic. What are the performance impacts of such a change?