biski64: Fast and Robust Pseudo-Random Number Generator

This repository contains biski64, an extremely fast pseudo-random number generator (PRNG) with a guaranteed minimum period of 2^64. It is designed for non-cryptographic applications where speed and statistical quality are important.

The library is available on crates.io and the documentation can be found on docs.rs.

Features

High Performance: - Significantly faster than standard library generators and modern high-speed PRNGs like xoroshiro128++ and xoshiro256++.
Exceptional Statistical Quality: - Easily passes BigCrush and terabytes of PractRand. Scaled down versions show even better mixing efficiency than well respected PRNGs like JSF.
Guaranteed Minimum Period: - Incorporates a 64-bit Weyl sequence to ensure a minimum period of 2^64.
Proven Injectivity: - Invertible algorithm with proven injectivity via Z3 Prover.
Rust Ecosystem Integration: - The library is no_std compatible and implements the standard RngCore and SeedableRng traits from rand_core for easy use.
C Integration: - Only requires stdint.h.

Rust Installation

Add biski64 and rand to your Cargo.toml dependencies:

[dependencies]
biski64 = "0.3.2"
rand = "0.9"

Basic Usage

use rand::{RngCore, SeedableRng};
use biski64::Biski64Rng;

let mut rng = Biski64Rng::seed_from_u64(12345);
let num = rng.next_u64();

Performance

Rust:

  biski64            0.401 ns/call
  xoshiro256++       0.610 ns/call
  xoroshiro128++     0.883 ns/call

C:

  biski64            0.368 ns/call
  wyrand             0.368 ns/call
  sfc64              0.368 ns/call
  xoshiro256++       0.552 ns/call
  xoroshiro128++     0.732 ns/call
  PCG128_XSL_RR_64   1.197 ns/call

Java:

  biski64            0.491 ns/call
  xoshiro256++       0.739 ns/call
  xoroshiro128++     0.790 ns/call
  ThreadLocalRandom  0.846 ns/call
  Java.util.Random   5.315 ns/call

Tested on an AMD Ryzen 9 7950X3D

Rust Algorithm

// In the actual implementation, these are fields of the Biski64Rng struct.
let (mut fast_loop, mut mix, mut loop_mix) = (0u64, 0u64, 0u64);

#[inline(always)]
pub fn next_u64(&mut self) -> u64 {
    // The output is calculated from the current state.
    let output = self.mix.wrapping_add(self.loop_mix);

    (self.fast_loop, self.mix, self.loop_mix) = (
        self.fast_loop.wrapping_add(0x9999999999999999),
        self.mix.rotate_left(16).wrapping_add(self.loop_mix.rotate_left(40)),
        self.fast_loop ^ self.mix,
    );

    output
}

C Algorithm

// Helper for rotation
static inline uint64_t rotateLeft(const uint64_t x, int k) {
    return (x << k) | (x >> (64 - k));
}

uint64_t biski64() {
  uint64_t output = mix + loopMix;
  uint64_t oldLoopMix = loopMix;

  loopMix = fast_loop ^ mix;
  mix = rotateLeft(mix, 16) + rotateLeft(oldLoopMix, 40);

  fast_loop += 0x9999999999999999;

  return output;
  }

Java Algorithm

public long nextLong() {
  final long output = mix + loopMix;
  final long oldLoopMix = loopMix;

  loopMix = fastLoop ^ mix;
  mix = Long.rotateLeft( mix, 16 ) + Long.rotateLeft( oldLoopMix, 40 );
  fastLoop += 0x9999999999999999L;

  return output;
  }

(Note: See test files for full seeding and usage examples.)

Parallel Streams

biski64 is well-suited for parallel applications, and parallel streams can be implemented as follows:

Randomly seed mix, and loop_mix for each stream as normal.
To ensure maximal separation between sequences, space starting values for each streams' fast_loop:

uint64_t cycles_per_stream = (2^64 - 1 ) / num_streams;
fast_loop = i * cycles_per_stream * 0x9999999999999999ULL;

where i is the stream index (0, 1, 2, ...) as demonstrated in the C, Rust and Java example code

Scaled Down Testing

A key test for any random number generator is to see how it performs when its internal state is drastically reduced. This allows for practical testing of the core mixing algorithm. biski64 performs exceptionally well in this regard.

State Variable Size	Total State	Practrand Failure
8-bit	24 bits	2^22 bytes (4 MB)
16-bit	48 bits	2^40 bytes (1 TB)
32-bit	96 bits	did not fail, tested to 2^47 bytes (128 TB)
64-bit	192 bits	did not fail, tested to 2^47 bytes (128 TB)

The results for the 8-bit and 16-bit scaled down versions show that biski64 exceeds the mixing efficiency (in terms of PractRand bytes passed per total state size) of even the well respected and tested JSF PRNG.

For comparison:

PRNG	Total State	Practrand Failure	Ratio (failed exponent/state bits)
biski(8-bit)	24 bits	2^22 bytes	91.7%
jsf8	32 bits	2^28 bytes	87.5%

PRNG	Total State	Practrand Failure	Ratio (failed exponent/state bits)
biski(16-bit)	48 bits	2^40 bytes	83.3%
jsf16	64 bits	2^47 bytes	73.4%

The scaled down test results for biski64 indicate strong performance of its core mixing algorithm (suggesting that the strong performance at full 64-bit size is not a result simply of a large state, but of its robust mixing performance).

BigCrush Comparison

BigCrush was run 100 times on biski64 (and multiple reference PRNGs).

Assuming test failure with a p-value below 0.001 or above 0.999 implies a 0.2% probability of a single test yielding a "false positive" purely by chance with an ideal generator. Running BigCrush 100 times (for a total of 25400 sub-tests), around 50.8 such chance "errors" would be anticipated.

biski64, 51 failed subtests (out of 25400 total)
  5 subtests failed twice

wyrand, 55 failed subtests (out of 25400 total)
  1 subtest failed THREE times
  5 subtests failed twice

sfc64, 70 failed subtests (out of 25400 total)
  1 subtest failed THREE times
  12 subtests failed twice

xoroshiro128++, 54 failed subtests (out of 25400 total)
  1 subtest failed FOUR times
  4 subtests failed twice

xoroshiro256++, 60 failed subtests (out of 25400 total)
  1 subtest failed THREE times
  5 subtests failed twice

pcg128_xsl_rr_64, 47 failed subtests (out of 25400 total)
  1 subtest failed FIVE times
  4 subtests failed twice

(Note: For an ideal random generator, seeing three or more failures for a specific subtest would not be expected.)

Design

Motivated by M.E. O'Neill's post, Does It Beat the Minimal Standard - the initial design for biski64 used a scaled down version with 8-bit state variables. This allowed for fast iteration using PractRand.

Once the algorithm was settled on, parameters were selected as follows:

Additive Constant - Multiple additive constants were tested using a 16-bit state variable version. For each constant all possible rotation constants r1 and r2 (r1 < r2) were iterated over and PractRand was run to determine how far each (additiveConstant, r1, r2) variant could get. The highest performing constant was 0x9999 - passing the most rotational pairs at the highest level. This constant was then scaled to 0x9999999999999999 in the full 64-bit version.
Rotation Constants - It was observed that bitsPerStateVar / 4 showed up frequently at the highest level for r1, as well as integers near bitsPerStateVar * (ɸ - 1) for r2. This was then used as a heuristic for rotation constants in the 32-bit and 64-bit versions.
BigCrush Confirmation - BigCrush was run 100 times on five selected full size 64-bit variants (using the above heuristics). The best observed r2 values were 39 and 40 - supporting the theory that r1=bits/4 and r2=bits * (ɸ - 1) were solid heuristics. r2=40 was then deemed the best (as it exhibited no subtests failing three times) - and then used for the full 64-bit version.

(r1=16, r2=37) 64 subtests failed (out of 25400 total)
   2 subtests failed THREE times
   7 subtests failed twice

(r1=16, r2=39) 46 subtests failed (out of 25400 total)
   1 subtest failed THREE times
   4 subtests failed twice

(r1=16, r2=40) 51 subtests failed (out of 25400 total)
   5 subtests failed twice

(r1=16, r2=41) 60 subtests failed (out of 25400 total)
   1 subtest failed THREE times
   6 subtests failed twice

(r1=16, r2=43) 46 subtests failed (out of 25400 total)
   1 subtest failed FOUR times
   1 subtest failed THREE times
   1 subtest failed twice

Notes

Created by Daniel Cota and named after his cat Biscuit - a small and fast Egyptian Mau.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
benches		benches
c		c
java		java
src		src
tests		tests
z3		z3
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

biski64: Fast and Robust Pseudo-Random Number Generator

Features

Rust Installation

Basic Usage

Performance

Rust Algorithm

C Algorithm

Java Algorithm

Parallel Streams

Scaled Down Testing

BigCrush Comparison

Design

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

danielcota/biski64

Folders and files

Latest commit

History

Repository files navigation

biski64: Fast and Robust Pseudo-Random Number Generator

Features

Rust Installation

Basic Usage

Performance

Rust Algorithm

C Algorithm

Java Algorithm

Parallel Streams

Scaled Down Testing

BigCrush Comparison

Design

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages