Simple and fast implementation of the Levenshtein distance algorithm.
And nothing more.
It utilizes Span<T> over stackallock to avoid allocations and improve performance.
And ArrayPool<T> in case of exceed MaxStackAllocSize static property value.
It has a lot of optimizations and tricks to improve performance:
- Do not use
unsafecode. - Reducing array indexings.
- Allocating only array for current line calculation and reusing it for next line calculation.
- Zero index is not part of the line and stored in a separate variable.
- etc.
And in result it extremely fast on Windows and MacOS. It even faster than Fastenshtein library (a little bit 😏).
| Method | Mean | Error | StdDev | Ratio | Gen0 | Allocated |
|---|---|---|---|---|---|---|
| Fastenshtein | 7.613 us | 0.0295 us | 0.0276 us | 1.00 | 0.9308 | 5896 B |
| FastenshteinZeroAlloc | 6.845 us | 0.0136 us | 0.0120 us | 0.90 | - | - |
| Zeroshtein | 6.958 us | 0.0093 us | 0.0083 us | 0.91 | - | - |
| ZeroshteinInMemory | 12.122 us | 0.0263 us | 0.0246 us | 1.59 | - | - |
| StringSimilarity | 17.948 us | 0.0691 us | 0.0647 us | 2.36 | 1.7395 | 11064 B |
| NinjaNye | 19.700 us | 0.0586 us | 0.0490 us | 2.59 | 7.1411 | 44880 B |
| FuzzyStrings | 77.905 us | 0.2432 us | 0.2031 us | 10.23 | 26.8555 | 169040 B |
| Quickenshtein | 23.650 us | 0.0324 us | 0.0271 us | 3.11 | - | - |
Benchmark was performed in next environment:
BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22000.1574/21H2/SunValley)
Intel Core i7-9700K CPU 3.60GHz (Coffee Lake), 1 CPU, 8 logical and 8 physical cores
.NET SDK=7.0.102
[Host] : .NET 7.0.2 (7.0.222.60605), X64 RyuJIT AVX2
DefaultJob : .NET 7.0.2 (7.0.222.60605), X64 RyuJIT AVX2
| Method | Mean | Error | StdDev | Ratio | Gen0 | Allocated |
|---|---|---|---|---|---|---|
| Fastenshtein | 5.690 us | 0.0240 us | 0.0225 us | 1.00 | 0.9384 | 5896 B |
| FastenshteinZeroAlloc | 5.237 us | 0.0142 us | 0.0132 us | 0.92 | - | - |
| Zeroshtein | 5.469 us | 0.0105 us | 0.0093 us | 0.96 | - | - |
| ZeroshteinInMemory | 6.906 us | 0.0188 us | 0.0176 us | 1.21 | - | - |
| StringSimilarity | 8.884 us | 0.0173 us | 0.0162 us | 1.56 | 1.7548 | 11064 B |
| NinjaNye | 11.147 us | 0.0263 us | 0.0219 us | 1.96 | 7.1411 | 44880 B |
| FuzzyStrings | 46.450 us | 0.1166 us | 0.0974 us | 8.17 | 26.9165 | 169040 B |
| Quickenshtein | 5.692 us | 0.0115 us | 0.0108 us | 1.00 | - | - |
Benchmark was performed in next environment:
BenchmarkDotNet=v0.13.5, OS=macOS Ventura 13.2.1 (22D68) [Darwin 22.3.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK=7.0.101
[Host] : .NET 7.0.1 (7.0.122.56804), Arm64 RyuJIT AdvSIMD
DefaultJob : .NET 7.0.1 (7.0.122.56804), Arm64 RyuJIT AdvSIMD
var distance = Zeroshtein.Levenshtein.Distance("horce", "rose");By default Zeroshtein uses stackallock to allocate calculation array in stack only.
And if it exceeds MaxStackAllocSize static property value it uses ArrayPool<T> to allocate array in memory.
Default value of MaxStackAllocSize is 1024 bytes.
But can be changed to any value, it is a static property.
Zeroshtein.Levenshtein.MaxStackAllocSize = 4096;It should be set with caution.
Big value can cause StackOverflowException in case of long strings.
This exception can't be caught and handled.
It will cause application crash.