This repository contains biski64
, an extremely fast pseudo-random number generator (PRNG) with a guaranteed minimum period of 2^64. It is designed for non-cryptographic applications where speed and statistical quality are important.
The library is available on crates.io and the documentation can be found on docs.rs.
- High Performance: - Significantly faster than standard library generators and modern high-speed PRNGs like
xoroshiro128++
andxoshiro256++
. - Exceptional Statistical Quality: - Easily passes BigCrush and terabytes of PractRand. Scaled down versions show even better mixing efficiency than well respected PRNGs like JSF.
- Guaranteed Minimum Period: - Incorporates a 64-bit Weyl sequence to ensure a minimum period of 2^64.
- Proven Injectivity: - Invertible algorithm with proven injectivity via Z3 Prover.
- Rust Ecosystem Integration: - The library is
no_std
compatible and implements the standardRngCore
andSeedableRng
traits fromrand_core
for easy use. - C Integration: - Only requires stdint.h.
Add biski64
and rand
to your Cargo.toml
dependencies:
[dependencies]
biski64 = "0.3.2"
rand = "0.9"
use rand::{RngCore, SeedableRng};
use biski64::Biski64Rng;
let mut rng = Biski64Rng::seed_from_u64(12345);
let num = rng.next_u64();
- Rust:
biski64 0.401 ns/call
xoshiro256++ 0.610 ns/call
xoroshiro128++ 0.883 ns/call
- C:
biski64 0.368 ns/call
wyrand 0.368 ns/call
sfc64 0.368 ns/call
xoshiro256++ 0.552 ns/call
xoroshiro128++ 0.732 ns/call
PCG128_XSL_RR_64 1.197 ns/call
- Java:
biski64 0.491 ns/call
xoshiro256++ 0.739 ns/call
xoroshiro128++ 0.790 ns/call
ThreadLocalRandom 0.846 ns/call
Java.util.Random 5.315 ns/call
Tested on an AMD Ryzen 9 7950X3D
// In the actual implementation, these are fields of the Biski64Rng struct.
let (mut fast_loop, mut mix, mut loop_mix) = (0u64, 0u64, 0u64);
#[inline(always)]
pub fn next_u64(&mut self) -> u64 {
// The output is calculated from the current state.
let output = self.mix.wrapping_add(self.loop_mix);
(self.fast_loop, self.mix, self.loop_mix) = (
self.fast_loop.wrapping_add(0x9999999999999999),
self.mix.rotate_left(16).wrapping_add(self.loop_mix.rotate_left(40)),
self.fast_loop ^ self.mix,
);
output
}
// Helper for rotation
static inline uint64_t rotateLeft(const uint64_t x, int k) {
return (x << k) | (x >> (64 - k));
}
uint64_t biski64() {
uint64_t output = mix + loopMix;
uint64_t oldLoopMix = loopMix;
loopMix = fast_loop ^ mix;
mix = rotateLeft(mix, 16) + rotateLeft(oldLoopMix, 40);
fast_loop += 0x9999999999999999;
return output;
}
public long nextLong() {
final long output = mix + loopMix;
final long oldLoopMix = loopMix;
loopMix = fastLoop ^ mix;
mix = Long.rotateLeft( mix, 16 ) + Long.rotateLeft( oldLoopMix, 40 );
fastLoop += 0x9999999999999999L;
return output;
}
(Note: See test files for full seeding and usage examples.)
biski64
is well-suited for parallel applications, and parallel streams can be implemented as follows:
- Randomly seed
mix
, andloop_mix
for each stream as normal. - To ensure maximal separation between sequences, space starting values for each streams'
fast_loop
:
uint64_t cycles_per_stream = (2^64 - 1 ) / num_streams;
fast_loop = i * cycles_per_stream * 0x9999999999999999ULL;
where i is the stream index (0, 1, 2, ...) as demonstrated in the C, Rust and Java example code
A key test for any random number generator is to see how it performs when its internal state is drastically reduced. This allows for practical testing of the core mixing algorithm. biski64
performs exceptionally well in this regard.
State Variable Size | Total State | Practrand Failure |
---|---|---|
8-bit | 24 bits | 2^22 bytes (4 MB) |
16-bit | 48 bits | 2^40 bytes (1 TB) |
32-bit | 96 bits | did not fail, tested to 2^47 bytes (128 TB) |
64-bit | 192 bits | did not fail, tested to 2^47 bytes (128 TB) |
The results for the 8-bit and 16-bit scaled down versions show that biski64
exceeds the mixing efficiency (in terms of PractRand bytes passed per total state size) of even the well respected and tested JSF PRNG.
For comparison:
PRNG | Total State | Practrand Failure | Ratio (failed exponent/state bits) |
---|---|---|---|
biski(8-bit) | 24 bits | 2^22 bytes | 91.7% |
jsf8 | 32 bits | 2^28 bytes | 87.5% |
PRNG | Total State | Practrand Failure | Ratio (failed exponent/state bits) |
---|---|---|---|
biski(16-bit) | 48 bits | 2^40 bytes | 83.3% |
jsf16 | 64 bits | 2^47 bytes | 73.4% |
The scaled down test results for biski64
indicate strong performance of its core mixing algorithm (suggesting that the strong performance at full 64-bit size is not a result simply of a large state, but of its robust mixing performance).
BigCrush was run 100 times on biski64
(and multiple reference PRNGs).
Assuming test failure with a p-value below 0.001 or above 0.999 implies a 0.2% probability of a single test yielding a "false positive" purely by chance with an ideal generator. Running BigCrush 100 times (for a total of 25400 sub-tests), around 50.8 such chance "errors" would be anticipated.
biski64, 51 failed subtests (out of 25400 total)
5 subtests failed twice
wyrand, 55 failed subtests (out of 25400 total)
1 subtest failed THREE times
5 subtests failed twice
sfc64, 70 failed subtests (out of 25400 total)
1 subtest failed THREE times
12 subtests failed twice
xoroshiro128++, 54 failed subtests (out of 25400 total)
1 subtest failed FOUR times
4 subtests failed twice
xoroshiro256++, 60 failed subtests (out of 25400 total)
1 subtest failed THREE times
5 subtests failed twice
pcg128_xsl_rr_64, 47 failed subtests (out of 25400 total)
1 subtest failed FIVE times
4 subtests failed twice
(Note: For an ideal random generator, seeing three or more failures for a specific subtest would not be expected.)
Motivated by M.E. O'Neill's post, Does It Beat the Minimal Standard - the initial design for biski64
used a scaled down version with 8-bit state variables. This allowed for fast iteration using PractRand.
Once the algorithm was settled on, parameters were selected as follows:
- Additive Constant - Multiple additive constants were tested using a 16-bit state variable version. For each constant all possible rotation constants r1 and r2 (r1 < r2) were iterated over and PractRand was run to determine how far each (additiveConstant, r1, r2) variant could get. The highest performing constant was 0x9999 - passing the most rotational pairs at the highest level. This constant was then scaled to 0x9999999999999999 in the full 64-bit version.
- Rotation Constants - It was observed that bitsPerStateVar / 4 showed up frequently at the highest level for r1, as well as integers near bitsPerStateVar * (ɸ - 1) for r2. This was then used as a heuristic for rotation constants in the 32-bit and 64-bit versions.
- BigCrush Confirmation - BigCrush was run 100 times on five selected full size 64-bit variants (using the above heuristics). The best observed r2 values were 39 and 40 - supporting the theory that r1=bits/4 and r2=bits * (ɸ - 1) were solid heuristics. r2=40 was then deemed the best (as it exhibited no subtests failing three times) - and then used for the full 64-bit version.
(r1=16, r2=37) 64 subtests failed (out of 25400 total)
2 subtests failed THREE times
7 subtests failed twice
(r1=16, r2=39) 46 subtests failed (out of 25400 total)
1 subtest failed THREE times
4 subtests failed twice
(r1=16, r2=40) 51 subtests failed (out of 25400 total)
5 subtests failed twice
(r1=16, r2=41) 60 subtests failed (out of 25400 total)
1 subtest failed THREE times
6 subtests failed twice
(r1=16, r2=43) 46 subtests failed (out of 25400 total)
1 subtest failed FOUR times
1 subtest failed THREE times
1 subtest failed twice
Created by Daniel Cota and named after his cat Biscuit - a small and fast Egyptian Mau.