Standard ML library for faster parsing of Reals (floats/doubles), heavily
inspired by fast_float.
Features a zero-allocation fast path that is as much as 8x faster than the
Basis implementations of Real.scan and Real.fromString. The slow
path currently falls back on Real.scan.
FastReal.from_string is meant to be a drop-in replacement for
Real.fromString. Note that there is a disagreement
between SML implementations on certain input strings, e.g.,
see issue #3.
Compatible with the smlpkg
package manager.
Here are timings with MaPLe on my
Macbook Air (2022, M2 chip). The input set is generated by
test/RealStringGen with approximately 95% of inputs hitting the fast path of
FastReal.
MaPLe v0.5.3 (8 threads), `Real.scan`, 1 million input strings (approx 10MB):
avg 0.1208s
min 0.1148s
max 0.1259s
average throughput: 87.8 MB/s
MaPLe v0.5.3 (8 threads), `FastReal.from_chars`, 1 million input strings (approx 10MB):
avg 0.0163s
min 0.0152s
max 0.0195s
average throughput: 650.8 MB/s <----- ~7.5x improvement over Real.scan
The next major TODO would be to use SIMD/vectorization for even more speedup.
See issue #2.
With SIMD I wouldn't be surprised if we could get 2-4x additional
throughput, perhaps more. I wonder if eventually we could compete in raw
performance with fast_float.
There is one source file, tested with both MLton and MaPLe.
lib/github.com/shwestrick/sml-fast-real/sources.mlb
The library defines a functor, FastReal,
which takes an implementation of Reals as input.
The input structure needs to also
define a function fromLargeWord: LargeWord.word -> real which rounds the
input value (interpreted as an unsigned integer) to the nearest representable
floating point value.
This function is used on the fast path; ideally, it
should have very low overhead and zero allocation.
In MLton (and MaPLe), suitable functions are MLton.Real32.fromLargeWord and
MLton.Real64.fromLargeWord. See below for example usage.
functor FastReal
(R:
sig
include REAL
val fromLargeWord: LargeWord.word -> real
end):
sig
(* implicitly defines a sequence of characters
* [ get(start), get(start+1), ..., get(stop-1) ]
*)
type chars = {start: int, stop: int, get: int -> char}
type result_with_info = {result: R.real, num_chomped: int, fast_path: bool}
val from_chars: chars -> R.real option
val from_chars_with_info: chars -> result_with_info option
val from_string: string -> R.real option
val from_string_with_info: string -> result_with_info option
end =Example .mlb file:
$(SML_LIB)/basis/basis.mlb
$(SML_LIB)/basis/mlton.mlb (* for MLton.Real64 *)
lib/github.com/shwestrick/sml-fast-real/sources.mlb
main.sml
Example main.sml:
structure R64 =
struct
open MLton.Real64 (* need this for fromLargeWord *)
open Real64
end
structure FR = FastReal(R64)
val r = valOf (FR.from_string "123.456E-1")