Skip to content

Add arm64 NEON path (~1.7x)#27

Merged
zeebo merged 2 commits into
zeebo:masterfrom
klauspost:add-arm64-neon
Jan 23, 2026
Merged

Add arm64 NEON path (~1.7x)#27
zeebo merged 2 commits into
zeebo:masterfrom
klauspost:add-arm64-neon

Conversation

@klauspost
Copy link
Copy Markdown
Contributor

@klauspost klauspost commented Jan 19, 2026

Tested with QEMU. My test script:

echo Building ARM64 test binary...
set GOOS=linux
set GOARCH=arm64
go test -c -o xxh3_arm64.test

if %ERRORLEVEL% neq 0 (
    echo Build failed!
    SET GOOS=windows
    SET GOARCH=amd64
    exit /b %ERRORLEVEL%
)

SET GOOS=windows
SET GOARCH=amd64

echo Ensuring QEMU is active...
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
echo Running tests in ARM64 container...
docker run --rm --platform linux/arm64 -e GOGC=20 -e GOMEMLIMIT=2GiB -v "%cd%:/work" -w /work arm64v8/alpine ./xxh3_arm64.test %*

QEMU is dirt slow with NEON it seems.

Adding CI for ubuntu-24.04-arm which should test on real HW. Updates various CI aspects as well.

Went for NEON since SVE supposedly gives little advantage and NEON is available across the platform.
Didn't go for the scalar+NEON hybrid approach of xxhash.h, since I'd rather not make it too complex.

Has not been benchmarked on real hardware. Maybe someone with a Mac can give it a spin?

AI assisted code, fwiw.

Tested with QEMU. My test script:

```
echo Building ARM64 test binary...
set GOOS=linux
set GOARCH=arm64
go test -c -o xxh3_arm64.test

if %ERRORLEVEL% neq 0 (
    echo Build failed!
    SET GOOS=windows
    SET GOARCH=amd64
    exit /b %ERRORLEVEL%
)

SET GOOS=windows
SET GOARCH=amd64

echo Running tests in ARM64 container...
docker run --rm --platform linux/arm64 -e GOGC=20 -e GOMEMLIMIT=2GiB -v "%cd%:/work" -w /work arm64v8/alpine ./xxh3_arm64.test %*
```

Adding CI for `ubuntu-24.04-arm` which should test on real HW. Updates various CI aspects as well.

AI assisted code, fwiw.
@klauspost
Copy link
Copy Markdown
Contributor Author

Thanks to @leitzler for benchmarking on "Apple M1 Ultra":

benchmark                                    old ns/op     new ns/op     delta
BenchmarkFixed128/0/default-20               2.89          2.89          -0.07%
BenchmarkFixed128/0/seed-20                  3.85          3.54          -8.02%
BenchmarkFixed128/1/default-20               3.35          3.37          +0.60%
BenchmarkFixed128/1/seed-20                  3.60          3.64          +1.08%
BenchmarkFixed128/2/default-20               3.12          3.21          +2.85%
BenchmarkFixed128/2/seed-20                  3.54          3.76          +6.36%
BenchmarkFixed128/3/default-20               3.01          3.02          +0.40%
BenchmarkFixed128/3/seed-20                  3.50          3.51          +0.06%
BenchmarkFixed128/4/default-20               3.31          3.31          +0.12%
BenchmarkFixed128/4/seed-20                  3.56          3.56          +0.08%
BenchmarkFixed128/8/default-20               3.31          3.30          -0.42%
BenchmarkFixed128/8/seed-20                  3.57          3.59          +0.50%
BenchmarkFixed128/9/default-20               3.81          3.81          -0.05%
BenchmarkFixed128/9/seed-20                  3.95          3.92          -0.58%
BenchmarkFixed128/16/default-20              3.80          3.81          +0.40%
BenchmarkFixed128/16/seed-20                 3.95          3.91          -1.01%
BenchmarkFixed128/17/default-20              4.79          4.82          +0.73%
BenchmarkFixed128/17/seed-20                 5.10          5.09          -0.14%
BenchmarkFixed128/32/default-20              4.79          4.79          -0.17%
BenchmarkFixed128/32/seed-20                 5.07          5.11          +0.73%
BenchmarkFixed128/33/default-20              6.67          6.65          -0.31%
BenchmarkFixed128/33/seed-20                 7.13          7.18          +0.59%
BenchmarkFixed128/64/default-20              6.65          6.68          +0.45%
BenchmarkFixed128/64/seed-20                 7.13          7.18          +0.71%
BenchmarkFixed128/65/default-20              8.69          8.54          -1.75%
BenchmarkFixed128/65/seed-20                 9.26          9.27          +0.11%
BenchmarkFixed128/96/default-20              8.55          8.57          +0.29%
BenchmarkFixed128/96/seed-20                 9.24          9.27          +0.24%
BenchmarkFixed128/97/default-20              10.3          10.4          +0.39%
BenchmarkFixed128/97/seed-20                 11.2          11.2          +0.09%
BenchmarkFixed128/128/default-20             10.4          10.4          +0.00%
BenchmarkFixed128/128/seed-20                11.2          11.2          -0.62%
BenchmarkFixed128/129/default-20             12.3          12.3          -0.08%
BenchmarkFixed128/129/seed-20                13.2          13.3          +0.15%
BenchmarkFixed128/240/default-20             18.0          18.0          +0.06%
BenchmarkFixed128/240/seed-20                19.5          19.5          +0.00%
BenchmarkFixed128/241/default-20             26.7          18.3          -31.58%
BenchmarkFixed128/241/seed-20                36.3          28.3          -22.00%
BenchmarkFixed128/512/default-20             43.1          27.4          -36.43%
BenchmarkFixed128/512/seed-20                52.8          37.7          -28.59%
BenchmarkFixed128/1024/default-20            75.8          45.4          -40.20%
BenchmarkFixed128/1024/seed-20               85.9          56.3          -34.44%
BenchmarkFixed128/8192/default-20            573           317           -44.70%
BenchmarkFixed128/8192/seed-20               571           327           -42.72%
BenchmarkFixed128/102400/default-20          6891          3893          -43.51%
BenchmarkFixed128/102400/seed-20             6916          3917          -43.36%
BenchmarkFixed128/1024000/default-20         69178         38905         -43.76%
BenchmarkFixed128/1024000/seed-20            68461         38912         -43.16%
BenchmarkFixed128/10240000/default-20        716488        413344        -42.31%
BenchmarkFixed128/10240000/seed-20           717202        414561        -42.20%
BenchmarkFixed128/102400000/default-20       6983667       4023573       -42.39%
BenchmarkFixed128/102400000/seed-20          6962106       4029868       -42.12%
BenchmarkFixed64/0/default-20                2.94          2.90          -1.56%
BenchmarkFixed64/0/seed-20                   3.21          3.21          -0.06%
BenchmarkFixed64/1/default-20                2.89          2.89          +0.00%
BenchmarkFixed64/1/seed-20                   2.98          2.98          +0.07%
BenchmarkFixed64/2/default-20                3.21          2.89          -9.87%
BenchmarkFixed64/2/seed-20                   2.89          2.89          -0.03%
BenchmarkFixed64/3/default-20                2.66          2.65          -0.34%
BenchmarkFixed64/3/seed-20                   2.74          2.75          +0.29%
BenchmarkFixed64/4/default-20                2.64          2.64          +0.08%
BenchmarkFixed64/4/seed-20                   2.80          2.81          +0.07%
BenchmarkFixed64/8/default-20                2.68          2.67          -0.34%
BenchmarkFixed64/8/seed-20                   2.81          2.81          +0.07%
BenchmarkFixed64/9/default-20                2.69          2.70          +0.11%
BenchmarkFixed64/9/seed-20                   2.89          2.91          +0.66%
BenchmarkFixed64/16/default-20               2.69          2.69          +0.11%
BenchmarkFixed64/16/seed-20                  2.91          2.90          -0.41%
BenchmarkFixed64/17/default-20               3.58          3.57          -0.28%
BenchmarkFixed64/17/seed-20                  3.88          3.86          -0.54%
BenchmarkFixed64/32/default-20               3.58          3.57          -0.28%
BenchmarkFixed64/32/seed-20                  3.87          3.86          -0.18%
BenchmarkFixed64/33/default-20               5.18          5.21          +0.62%
BenchmarkFixed64/33/seed-20                  5.62          5.61          -0.25%
BenchmarkFixed64/64/default-20               5.18          5.20          +0.23%
BenchmarkFixed64/64/seed-20                  5.64          5.61          -0.48%
BenchmarkFixed64/65/default-20               6.79          6.80          +0.25%
BenchmarkFixed64/65/seed-20                  7.41          7.40          -0.12%
BenchmarkFixed64/96/default-20               6.84          6.78          -0.91%
BenchmarkFixed64/96/seed-20                  7.41          7.40          -0.18%
BenchmarkFixed64/97/default-20               8.32          8.33          +0.12%
BenchmarkFixed64/97/seed-20                  9.24          9.22          -0.31%
BenchmarkFixed64/128/default-20              8.34          8.32          -0.13%
BenchmarkFixed64/128/seed-20                 9.25          9.21          -0.40%
BenchmarkFixed64/129/default-20              9.02          9.02          -0.02%
BenchmarkFixed64/129/seed-20                 10.0          10.2          +1.90%
BenchmarkFixed64/240/default-20              15.2          15.1          -0.46%
BenchmarkFixed64/240/seed-20                 16.6          16.6          -0.24%
BenchmarkFixed64/241/default-20              23.8          15.1          -36.49%
BenchmarkFixed64/241/seed-20                 33.3          25.8          -22.66%
BenchmarkFixed64/512/default-20              40.6          24.4          -39.93%
BenchmarkFixed64/512/seed-20                 49.6          35.0          -29.39%
BenchmarkFixed64/1024/default-20             72.9          42.3          -41.92%
BenchmarkFixed64/1024/seed-20                82.4          53.1          -35.61%
BenchmarkFixed64/8192/default-20             562           314           -44.15%
BenchmarkFixed64/8192/seed-20                568           326           -42.70%
BenchmarkFixed64/102400/default-20           6909          3906          -43.47%
BenchmarkFixed64/102400/seed-20              6951          3919          -43.62%
BenchmarkFixed64/1024000/default-20          68946         38963         -43.49%
BenchmarkFixed64/1024000/seed-20             68682         39200         -42.93%
BenchmarkFixed64/10240000/default-20         721518        414624        -42.53%
BenchmarkFixed64/10240000/seed-20            712357        413686        -41.93%
BenchmarkFixed64/102400000/default-20        6996173       4087764       -41.57%
BenchmarkFixed64/102400000/seed-20           6961685       4109113       -40.98%
BenchmarkHasher64/16/go/plain-20             7.07          7.09          +0.24%
BenchmarkHasher64/16/go/seed-20              7.70          7.75          +0.60%
BenchmarkHasher64/64/go/plain-20             10.0          10.2          +1.50%
BenchmarkHasher64/64/go/seed-20              10.6          10.6          +0.19%
BenchmarkHasher64/256/go/plain-20            31.0          22.5          -27.35%
BenchmarkHasher64/256/go/seed-20             40.6          33.8          -16.80%
BenchmarkHasher64/1024/go/plain-20           89.1          59.5          -33.18%
BenchmarkHasher64/1024/go/seed-20            99.4          70.5          -29.04%
BenchmarkHasher64/4096/go/plain-20           294           180           -38.52%
BenchmarkHasher64/4096/go/seed-20            296           180           -39.24%
BenchmarkHasher64/16384/go/plain-20          1141          665           -41.71%
BenchmarkHasher64/16384/go/seed-20           1124          670           -40.43%
BenchmarkHasher64/65536/go/plain-20          4406          2618          -40.58%
BenchmarkHasher64/65536/go/seed-20           4413          2634          -40.31%
BenchmarkHasher64/262144/go/plain-20         17515         10391         -40.67%
BenchmarkHasher64/262144/go/seed-20          17606         10445         -40.67%
BenchmarkHasher64/1048576/go/plain-20        69923         41668         -40.41%
BenchmarkHasher64/1048576/go/seed-20         70326         41717         -40.68%
BenchmarkHasher64/4194304/go/plain-20        280944        168411        -40.06%
BenchmarkHasher64/4194304/go/seed-20         281212        168019        -40.25%
BenchmarkHasher64/16777216/go/plain-20       1142426       693005        -39.34%
BenchmarkHasher64/16777216/go/seed-20        1146059       702704        -38.69%
BenchmarkHasher64/67108864/go/plain-20       4564818       2736551       -40.05%
BenchmarkHasher64/67108864/go/seed-20        4550971       2756270       -39.44%
BenchmarkHasher64/268435456/go/plain-20      18057221      10967945      -39.26%
BenchmarkHasher64/268435456/go/seed-20       18106736      11039005      -39.03%
BenchmarkHasher128/16/go/plain-20            8.00          8.08          +0.96%
BenchmarkHasher128/16/go/seed-20             8.50          8.38          -1.42%
BenchmarkHasher128/64/go/plain-20            11.5          11.4          -0.44%
BenchmarkHasher128/64/go/seed-20             12.0          12.0          +0.08%
BenchmarkHasher128/256/go/plain-20           33.9          26.1          -22.85%
BenchmarkHasher128/256/go/seed-20            43.5          36.0          -17.33%
BenchmarkHasher128/1024/go/plain-20          92.5          62.6          -32.37%
BenchmarkHasher128/1024/go/seed-20           102           72.7          -28.69%
BenchmarkHasher128/4096/go/plain-20          300           183           -38.98%
BenchmarkHasher128/4096/go/seed-20           297           182           -38.54%
BenchmarkHasher128/16384/go/plain-20         1122          672           -40.13%
BenchmarkHasher128/16384/go/seed-20          1127          669           -40.60%
BenchmarkHasher128/65536/go/plain-20         4396          2615          -40.51%
BenchmarkHasher128/65536/go/seed-20          4532          2630          -41.97%
BenchmarkHasher128/262144/go/plain-20        17555         10394         -40.79%
BenchmarkHasher128/262144/go/seed-20         17584         10442         -40.62%
BenchmarkHasher128/1048576/go/plain-20       70007         41570         -40.62%
BenchmarkHasher128/1048576/go/seed-20        70545         41778         -40.78%
BenchmarkHasher128/4194304/go/plain-20       281493        168289        -40.22%
BenchmarkHasher128/4194304/go/seed-20        282385        168582        -40.30%
BenchmarkHasher128/16777216/go/plain-20      1145056       691152        -39.64%
BenchmarkHasher128/16777216/go/seed-20       1148202       691947        -39.74%
BenchmarkHasher128/67108864/go/plain-20      4544374       2745257       -39.59%
BenchmarkHasher128/67108864/go/seed-20       4546367       2749636       -39.52%
BenchmarkHasher128/268435456/go/plain-20     18088278      10871372      -39.90%
BenchmarkHasher128/268435456/go/seed-20      18171299      10950518      -39.74%

benchmark                                    old MB/s     new MB/s     speedup
BenchmarkFixed128/1/default-20               298.58       296.84       0.99x
BenchmarkFixed128/1/seed-20                  277.55       274.55       0.99x
BenchmarkFixed128/2/default-20               640.87       623.07       0.97x
BenchmarkFixed128/2/seed-20                  565.06       531.31       0.94x
BenchmarkFixed128/3/default-20               996.73       992.63       1.00x
BenchmarkFixed128/3/seed-20                  855.84       855.44       1.00x
BenchmarkFixed128/4/default-20               1209.84      1208.44      1.00x
BenchmarkFixed128/4/seed-20                  1123.45      1122.65      1.00x
BenchmarkFixed128/8/default-20               2413.15      2423.86      1.00x
BenchmarkFixed128/8/seed-20                  2240.67      2229.51      1.00x
BenchmarkFixed128/9/default-20               2362.87      2364.33      1.00x
BenchmarkFixed128/9/seed-20                  2280.97      2293.92      1.01x
BenchmarkFixed128/16/default-20              4213.98      4196.86      1.00x
BenchmarkFixed128/16/seed-20                 4047.95      4088.56      1.01x
BenchmarkFixed128/17/default-20              3551.43      3525.31      0.99x
BenchmarkFixed128/17/seed-20                 3335.08      3340.06      1.00x
BenchmarkFixed128/32/default-20              6673.66      6684.09      1.00x
BenchmarkFixed128/32/seed-20                 6309.57      6263.63      0.99x
BenchmarkFixed128/33/default-20              4947.59      4962.86      1.00x
BenchmarkFixed128/33/seed-20                 4625.19      4598.31      0.99x
BenchmarkFixed128/64/default-20              9623.88      9580.85      1.00x
BenchmarkFixed128/64/seed-20                 8972.46      8908.30      0.99x
BenchmarkFixed128/65/default-20              7477.52      7610.55      1.02x
BenchmarkFixed128/65/seed-20                 7019.00      7011.05      1.00x
BenchmarkFixed128/96/default-20              11233.47     11200.03     1.00x
BenchmarkFixed128/96/seed-20                 10386.14     10361.95     1.00x
BenchmarkFixed128/97/default-20              9379.85      9345.34      1.00x
BenchmarkFixed128/97/seed-20                 8661.01      8653.76      1.00x
BenchmarkFixed128/128/default-20             12361.12     12353.71     1.00x
BenchmarkFixed128/128/seed-20                11372.79     11445.66     1.01x
BenchmarkFixed128/129/default-20             10507.78     10513.61     1.00x
BenchmarkFixed128/129/seed-20                9737.36      9718.02      1.00x
BenchmarkFixed128/240/default-20             13354.29     13348.07     1.00x
BenchmarkFixed128/240/seed-20                12318.04     12317.65     1.00x
BenchmarkFixed128/241/default-20             9031.09      13201.21     1.46x
BenchmarkFixed128/241/seed-20                6644.46      8517.47      1.28x
BenchmarkFixed128/512/default-20             11880.23     18686.93     1.57x
BenchmarkFixed128/512/seed-20                9693.55      13573.45     1.40x
BenchmarkFixed128/1024/default-20            13503.90     22580.26     1.67x
BenchmarkFixed128/1024/seed-20               11916.32     18176.34     1.53x
BenchmarkFixed128/8192/default-20            14292.96     25850.60     1.81x
BenchmarkFixed128/8192/seed-20               14337.19     25030.67     1.75x
BenchmarkFixed128/102400/default-20          14860.05     26302.05     1.77x
BenchmarkFixed128/102400/seed-20             14805.77     26141.23     1.77x
BenchmarkFixed128/1024000/default-20         14802.50     26320.61     1.78x
BenchmarkFixed128/1024000/seed-20            14957.44     26315.73     1.76x
BenchmarkFixed128/10240000/default-20        14291.94     24773.52     1.73x
BenchmarkFixed128/10240000/seed-20           14277.70     24700.82     1.73x
BenchmarkFixed128/102400000/default-20       14662.78     25450.02     1.74x
BenchmarkFixed128/102400000/seed-20          14708.19     25410.26     1.73x
BenchmarkFixed64/1/default-20                345.97       346.07       1.00x
BenchmarkFixed64/1/seed-20                   335.45       335.26       1.00x
BenchmarkFixed64/2/default-20                622.85       691.15       1.11x
BenchmarkFixed64/2/seed-20                   692.13       692.21       1.00x
BenchmarkFixed64/3/default-20                1127.70      1131.53      1.00x
BenchmarkFixed64/3/seed-20                   1095.42      1092.21      1.00x
BenchmarkFixed64/4/default-20                1515.73      1514.43      1.00x
BenchmarkFixed64/4/seed-20                   1426.55      1425.47      1.00x
BenchmarkFixed64/8/default-20                2982.89      2993.04      1.00x
BenchmarkFixed64/8/seed-20                   2844.84      2842.68      1.00x
BenchmarkFixed64/9/default-20                3341.43      3337.72      1.00x
BenchmarkFixed64/9/seed-20                   3111.86      3091.91      0.99x
BenchmarkFixed64/16/default-20               5953.01      5946.38      1.00x
BenchmarkFixed64/16/seed-20                  5503.25      5526.10      1.00x
BenchmarkFixed64/17/default-20               4751.51      4764.65      1.00x
BenchmarkFixed64/17/seed-20                  4384.80      4409.27      1.01x
BenchmarkFixed64/32/default-20               8931.82      8955.68      1.00x
BenchmarkFixed64/32/seed-20                  8280.49      8294.34      1.00x
BenchmarkFixed64/33/default-20               6368.06      6328.62      0.99x
BenchmarkFixed64/33/seed-20                  5870.57      5885.21      1.00x
BenchmarkFixed64/64/default-20               12342.39     12315.96     1.00x
BenchmarkFixed64/64/seed-20                  11351.27     11405.58     1.00x
BenchmarkFixed64/65/default-20               9575.03      9552.02      1.00x
BenchmarkFixed64/65/seed-20                  8773.03      8783.21      1.00x
BenchmarkFixed64/96/default-20               14040.96     14170.15     1.01x
BenchmarkFixed64/96/seed-20                  12953.51     12976.55     1.00x
BenchmarkFixed64/97/default-20               11658.90     11645.09     1.00x
BenchmarkFixed64/97/seed-20                  10492.44     10525.38     1.00x
BenchmarkFixed64/128/default-20              15354.97     15374.80     1.00x
BenchmarkFixed64/128/seed-20                 13840.17     13896.39     1.00x
BenchmarkFixed64/129/default-20              14306.60     14309.61     1.00x
BenchmarkFixed64/129/seed-20                 12892.36     12645.66     0.98x
BenchmarkFixed64/240/default-20              15812.90     15884.70     1.00x
BenchmarkFixed64/240/seed-20                 14415.46     14451.81     1.00x
BenchmarkFixed64/241/default-20              10128.93     15947.47     1.57x
BenchmarkFixed64/241/seed-20                 7233.29      9350.91      1.29x
BenchmarkFixed64/512/default-20              12594.20     20964.74     1.66x
BenchmarkFixed64/512/seed-20                 10313.69     14607.16     1.42x
BenchmarkFixed64/1024/default-20             14050.11     24193.25     1.72x
BenchmarkFixed64/1024/seed-20                12423.37     19291.25     1.55x
BenchmarkFixed64/8192/default-20             14565.90     26080.34     1.79x
BenchmarkFixed64/8192/seed-20                14419.50     25168.20     1.75x
BenchmarkFixed64/102400/default-20           14821.27     26215.85     1.77x
BenchmarkFixed64/102400/seed-20              14732.19     26126.41     1.77x
BenchmarkFixed64/1024000/default-20          14852.26     26281.56     1.77x
BenchmarkFixed64/1024000/seed-20             14909.38     26122.58     1.75x
BenchmarkFixed64/10240000/default-20         14192.30     24697.06     1.74x
BenchmarkFixed64/10240000/seed-20            14374.82     24753.04     1.72x
BenchmarkFixed64/102400000/default-20        14636.57     25050.37     1.71x
BenchmarkFixed64/102400000/seed-20           14709.08     24920.22     1.69x
BenchmarkHasher64/16/go/plain-20             2262.56      2257.08      1.00x
BenchmarkHasher64/16/go/seed-20              2077.89      2065.61      0.99x
BenchmarkHasher64/64/go/plain-20             6392.42      6301.90      0.99x
BenchmarkHasher64/64/go/seed-20              6061.74      6049.19      1.00x
BenchmarkHasher64/256/go/plain-20            8258.78      11366.53     1.38x
BenchmarkHasher64/256/go/seed-20             6307.13      7580.54      1.20x
BenchmarkHasher64/1024/go/plain-20           11490.26     17195.49     1.50x
BenchmarkHasher64/1024/go/seed-20            10306.03     14522.16     1.41x
BenchmarkHasher64/4096/go/plain-20           13951.61     22686.49     1.63x
BenchmarkHasher64/4096/go/seed-20            13844.12     22777.54     1.65x
BenchmarkHasher64/16384/go/plain-20          14362.61     24634.82     1.72x
BenchmarkHasher64/16384/go/seed-20           14574.80     24469.78     1.68x
BenchmarkHasher64/65536/go/plain-20          14874.83     25030.95     1.68x
BenchmarkHasher64/65536/go/seed-20           14850.77     24883.10     1.68x
BenchmarkHasher64/262144/go/plain-20         14966.95     25228.33     1.69x
BenchmarkHasher64/262144/go/seed-20          14889.15     25098.02     1.69x
BenchmarkHasher64/1048576/go/plain-20        14996.12     25165.05     1.68x
BenchmarkHasher64/1048576/go/seed-20         14910.25     25135.26     1.69x
BenchmarkHasher64/4194304/go/plain-20        14929.32     24905.18     1.67x
BenchmarkHasher64/4194304/go/seed-20         14915.08     24963.34     1.67x
BenchmarkHasher64/16777216/go/plain-20       14685.61     24209.38     1.65x
BenchmarkHasher64/16777216/go/seed-20        14639.05     23875.22     1.63x
BenchmarkHasher64/67108864/go/plain-20       14701.32     24523.16     1.67x
BenchmarkHasher64/67108864/go/seed-20        14746.05     24347.71     1.65x
BenchmarkHasher64/268435456/go/plain-20      14865.82     24474.54     1.65x
BenchmarkHasher64/268435456/go/seed-20       14825.17     24317.00     1.64x
BenchmarkHasher128/16/go/plain-20            1999.79      1980.62      0.99x
BenchmarkHasher128/16/go/seed-20             1883.21      1910.42      1.01x
BenchmarkHasher128/64/go/plain-20            5571.61      5593.48      1.00x
BenchmarkHasher128/64/go/seed-20             5342.39      5338.02      1.00x
BenchmarkHasher128/256/go/plain-20           7559.03      9798.43      1.30x
BenchmarkHasher128/256/go/seed-20            5882.76      7114.99      1.21x
BenchmarkHasher128/1024/go/plain-20          11066.39     16361.89     1.48x
BenchmarkHasher128/1024/go/seed-20           10050.57     14093.39     1.40x
BenchmarkHasher128/4096/go/plain-20          13636.35     22345.03     1.64x
BenchmarkHasher128/4096/go/seed-20           13810.70     22465.71     1.63x
BenchmarkHasher128/16384/go/plain-20         14605.75     24390.76     1.67x
BenchmarkHasher128/16384/go/seed-20          14533.13     24475.91     1.68x
BenchmarkHasher128/65536/go/plain-20         14909.67     25059.87     1.68x
BenchmarkHasher128/65536/go/seed-20          14460.69     24917.72     1.72x
BenchmarkHasher128/262144/go/plain-20        14932.91     25221.48     1.69x
BenchmarkHasher128/262144/go/seed-20         14908.35     25105.73     1.68x
BenchmarkHasher128/1048576/go/plain-20       14978.20     25224.45     1.68x
BenchmarkHasher128/1048576/go/seed-20        14863.98     25098.72     1.69x
BenchmarkHasher128/4194304/go/plain-20       14900.19     24923.15     1.67x
BenchmarkHasher128/4194304/go/seed-20        14853.15     24879.94     1.68x
BenchmarkHasher128/16777216/go/plain-20      14651.88     24274.28     1.66x
BenchmarkHasher128/16777216/go/seed-20       14611.73     24246.39     1.66x
BenchmarkHasher128/67108864/go/plain-20      14767.46     24445.39     1.66x
BenchmarkHasher128/67108864/go/seed-20       14760.99     24406.46     1.65x
BenchmarkHasher128/268435456/go/plain-20     14840.30     24691.96     1.66x
BenchmarkHasher128/268435456/go/seed-20      14772.50     24513.50     1.66x

@klauspost klauspost changed the title Add arm64 NEON path Add arm64 NEON path (~1.7x) Jan 20, 2026
@egonelbre
Copy link
Copy Markdown

Results from M4

goos: darwin
goarch: arm64
pkg: github.com/zeebo/xxh3
cpu: Apple M4 Max
                                │   v0.log~    │              v1.log~               │
                                │    sec/op    │   sec/op     vs base               │
Fixed128/0/default-16              1.996n ± 3%   2.156n ± 6%   +8.01% (p=0.002 n=6)
Fixed128/0/seed-16                 2.602n ± 3%   2.488n ± 2%   -4.36% (p=0.002 n=6)
Fixed128/1/default-16              2.284n ± 2%   2.270n ± 4%        ~ (p=0.818 n=6)
Fixed128/1/seed-16                 2.347n ± 4%   2.308n ± 2%        ~ (p=0.093 n=6)
Fixed128/2/default-16              2.258n ± 2%   2.306n ± 4%        ~ (p=0.509 n=6)
Fixed128/2/seed-16                 2.298n ± 4%   2.337n ± 2%        ~ (p=0.180 n=6)
Fixed128/3/default-16              1.987n ± 2%   1.979n ± 2%   -0.43% (p=0.030 n=6)
Fixed128/3/seed-16                 2.235n ± 1%   2.228n ± 1%   -0.34% (p=0.039 n=6)
Fixed128/4/default-16              2.240n ± 2%   2.225n ± 0%   -0.65% (p=0.006 n=6)
Fixed128/4/seed-16                 2.269n ± 1%   2.471n ± 0%   +8.86% (p=0.002 n=6)
Fixed128/8/default-16              2.251n ± 1%   2.229n ± 0%   -0.98% (p=0.006 n=6)
Fixed128/8/seed-16                 2.237n ± 1%   2.473n ± 0%  +10.53% (p=0.002 n=6)
Fixed128/9/default-16              2.483n ± 1%   2.475n ± 0%   -0.32% (p=0.006 n=6)
Fixed128/9/seed-16                 2.541n ± 1%   2.496n ± 0%   -1.79% (p=0.002 n=6)
Fixed128/16/default-16             2.490n ± 2%   2.478n ± 0%   -0.48% (p=0.002 n=6)
Fixed128/16/seed-16                2.552n ± 1%   2.495n ± 0%   -2.27% (p=0.002 n=6)
Fixed128/17/default-16             2.999n ± 1%   2.986n ± 0%   -0.45% (p=0.002 n=6)
Fixed128/17/seed-16                3.307n ± 2%   3.157n ± 0%   -4.54% (p=0.002 n=6)
Fixed128/32/default-16             2.987n ± 4%   2.981n ± 0%   -0.20% (p=0.011 n=6)
Fixed128/32/seed-16                3.305n ± 1%   3.158n ± 0%   -4.45% (p=0.002 n=6)
Fixed128/33/default-16             4.139n ± 1%   4.123n ± 1%   -0.41% (p=0.041 n=6)
Fixed128/33/seed-16                4.602n ± 1%   4.619n ± 0%   +0.36% (p=0.048 n=6)
Fixed128/64/default-16             4.147n ± 0%   4.106n ± 0%   -1.00% (p=0.002 n=6)
Fixed128/64/seed-16                4.620n ± 1%   4.617n ± 1%        ~ (p=0.500 n=6)
Fixed128/65/default-16             5.319n ± 2%   5.333n ± 0%        ~ (p=0.372 n=6)
Fixed128/65/seed-16                5.770n ± 1%   5.716n ± 0%   -0.94% (p=0.002 n=6)
Fixed128/96/default-16             5.319n ± 2%   5.327n ± 0%        ~ (p=1.000 n=6)
Fixed128/96/seed-16                5.784n ± 2%   5.713n ± 0%   -1.23% (p=0.002 n=6)
Fixed128/97/default-16             6.469n ± 0%   6.434n ± 0%   -0.53% (p=0.002 n=6)
Fixed128/97/seed-16                6.991n ± 0%   6.939n ± 0%   -0.75% (p=0.002 n=6)
Fixed128/128/default-16            6.462n ± 0%   6.435n ± 0%   -0.41% (p=0.002 n=6)
Fixed128/128/seed-16               6.968n ± 0%   6.944n ± 0%   -0.34% (p=0.002 n=6)
Fixed128/129/default-16            7.784n ± 0%   7.759n ± 0%   -0.31% (p=0.002 n=6)
Fixed128/129/seed-16               8.532n ± 0%   8.470n ± 0%   -0.72% (p=0.002 n=6)
Fixed128/240/default-16            11.22n ± 0%   11.17n ± 0%   -0.40% (p=0.013 n=6)
Fixed128/240/seed-16               12.20n ± 0%   12.20n ± 0%        ~ (p=0.758 n=6)
Fixed128/241/default-16            17.27n ± 0%   11.23n ± 0%  -34.97% (p=0.002 n=6)
Fixed128/241/seed-16               24.23n ± 1%   17.65n ± 0%  -27.16% (p=0.002 n=6)
Fixed128/512/default-16            28.23n ± 1%   15.92n ± 0%  -43.58% (p=0.002 n=6)
Fixed128/512/seed-16               35.17n ± 1%   22.56n ± 0%  -35.85% (p=0.002 n=6)
Fixed128/1024/default-16           49.99n ± 0%   28.13n ± 0%  -43.73% (p=0.002 n=6)
Fixed128/1024/seed-16              57.05n ± 0%   34.02n ± 0%  -40.36% (p=0.002 n=6)
Fixed128/8192/default-16           369.7n ± 0%   235.9n ± 0%  -36.17% (p=0.002 n=6)
Fixed128/8192/seed-16              374.9n ± 0%   239.8n ± 0%  -36.05% (p=0.002 n=6)
Fixed128/102400/default-16         4.575µ ± 0%   2.976µ ± 0%  -34.94% (p=0.002 n=6)
Fixed128/102400/seed-16            4.562µ ± 0%   2.979µ ± 0%  -34.71% (p=0.002 n=6)
Fixed128/1024000/default-16        45.79µ ± 0%   29.87µ ± 0%  -34.77% (p=0.002 n=6)
Fixed128/1024000/seed-16           45.50µ ± 0%   29.87µ ± 0%  -34.35% (p=0.002 n=6)
Fixed128/10240000/default-16       459.9µ ± 0%   301.2µ ± 0%  -34.51% (p=0.002 n=6)
Fixed128/10240000/seed-16          458.1µ ± 1%   301.1µ ± 0%  -34.26% (p=0.002 n=6)
Fixed128/102400000/default-16      4.594m ± 1%   2.999m ± 0%  -34.72% (p=0.002 n=6)
Fixed128/102400000/seed-16         4.568m ± 0%   3.005m ± 1%  -34.21% (p=0.002 n=6)
Fixed64/0/default-16               2.014n ± 6%   1.985n ± 1%        ~ (p=0.132 n=6)
Fixed64/0/seed-16                  2.252n ± 1%   2.237n ± 3%        ~ (p=0.288 n=6)
Fixed64/1/default-16               2.074n ± 4%   1.991n ± 6%        ~ (p=0.089 n=6)
Fixed64/1/seed-16                  2.045n ± 3%   2.307n ± 2%  +12.84% (p=0.002 n=6)
Fixed64/2/default-16               2.233n ± 1%   2.224n ± 0%   -0.38% (p=0.002 n=6)
Fixed64/2/seed-16                  2.057n ± 6%   2.228n ± 1%   +8.26% (p=0.002 n=6)
Fixed64/3/default-16               1.985n ± 0%   1.978n ± 0%   -0.35% (p=0.013 n=6)
Fixed64/3/seed-16                  1.981n ± 0%   1.980n ± 0%        ~ (p=0.080 n=6)
Fixed64/4/default-16               1.992n ± 1%   1.980n ± 0%   -0.58% (p=0.004 n=6)
Fixed64/4/seed-16                  1.982n ± 0%   1.981n ± 0%        ~ (p=0.145 n=6)
Fixed64/8/default-16               1.982n ± 0%   1.980n ± 0%   -0.15% (p=0.009 n=6)
Fixed64/8/seed-16                  1.983n ± 0%   1.983n ± 0%        ~ (p=0.835 n=6)
Fixed64/9/default-16               1.983n ± 0%   1.981n ± 0%        ~ (p=0.262 n=6)
Fixed64/9/seed-16                  2.229n ± 0%   2.225n ± 0%        ~ (p=0.173 n=6)
Fixed64/16/default-16              1.982n ± 0%   1.983n ± 0%        ~ (p=0.141 n=6)
Fixed64/16/seed-16                 2.228n ± 0%   2.224n ± 0%        ~ (p=0.058 n=6)
Fixed64/17/default-16              2.243n ± 0%   2.237n ± 0%   -0.25% (p=0.006 n=6)
Fixed64/17/seed-16                 2.482n ± 0%   2.493n ± 2%   +0.40% (p=0.030 n=6)
Fixed64/32/default-16              2.243n ± 0%   2.247n ± 0%        ~ (p=0.264 n=6)
Fixed64/32/seed-16                 2.481n ± 0%   2.478n ± 0%        ~ (p=0.054 n=6)
Fixed64/33/default-16              3.231n ± 2%   3.225n ± 0%        ~ (p=0.913 n=6)
Fixed64/33/seed-16                 3.472n ± 0%   3.471n ± 0%   -0.04% (p=0.048 n=6)
Fixed64/64/default-16              3.224n ± 0%   3.221n ± 0%        ~ (p=0.091 n=6)
Fixed64/64/seed-16                 3.471n ± 0%   3.471n ± 1%        ~ (p=0.394 n=6)
Fixed64/65/default-16              4.218n ± 0%   4.226n ± 0%   +0.18% (p=0.043 n=6)
Fixed64/65/seed-16                 4.638n ± 0%   4.740n ± 0%   +2.21% (p=0.002 n=6)
Fixed64/96/default-16              4.211n ± 0%   4.213n ± 0%        ~ (p=0.327 n=6)
Fixed64/96/seed-16                 4.608n ± 0%   4.949n ± 0%   +7.39% (p=0.002 n=6)
Fixed64/97/default-16              5.200n ± 0%   5.198n ± 0%        ~ (p=0.130 n=6)
Fixed64/97/seed-16                 5.649n ± 0%   5.651n ± 0%        ~ (p=0.797 n=6)
Fixed64/128/default-16             5.205n ± 0%   5.199n ± 0%        ~ (p=0.223 n=6)
Fixed64/128/seed-16                5.659n ± 0%   5.649n ± 0%        ~ (p=0.132 n=6)
Fixed64/129/default-16             5.612n ± 0%   5.616n ± 0%        ~ (p=0.130 n=6)
Fixed64/129/seed-16                6.306n ± 0%   6.340n ± 1%   +0.56% (p=0.002 n=6)
Fixed64/240/default-16             9.182n ± 0%   9.183n ± 0%        ~ (p=0.777 n=6)
Fixed64/240/seed-16                10.28n ± 0%   10.50n ± 0%   +2.19% (p=0.002 n=6)
Fixed64/241/default-16            15.560n ± 1%   9.209n ± 0%  -40.82% (p=0.002 n=6)
Fixed64/241/seed-16                22.44n ± 0%   16.09n ± 1%  -28.28% (p=0.002 n=6)
Fixed64/512/default-16             26.45n ± 0%   14.26n ± 0%  -46.08% (p=0.002 n=6)
Fixed64/512/seed-16                33.30n ± 0%   20.88n ± 1%  -37.31% (p=0.002 n=6)
Fixed64/1024/default-16            48.25n ± 0%   27.75n ± 0%  -42.50% (p=0.002 n=6)
Fixed64/1024/seed-16               55.07n ± 0%   32.04n ± 0%  -41.82% (p=0.002 n=6)
Fixed64/8192/default-16            367.4n ± 0%   236.6n ± 0%  -35.60% (p=0.002 n=6)
Fixed64/8192/seed-16               373.2n ± 0%   238.0n ± 0%  -36.24% (p=0.002 n=6)
Fixed64/102400/default-16          4.570µ ± 0%   2.975µ ± 1%  -34.91% (p=0.002 n=6)
Fixed64/102400/seed-16             4.558µ ± 0%   2.980µ ± 0%  -34.62% (p=0.002 n=6)
Fixed64/1024000/default-16         45.76µ ± 0%   29.92µ ± 0%  -34.61% (p=0.002 n=6)
Fixed64/1024000/seed-16            45.46µ ± 0%   29.91µ ± 0%  -34.21% (p=0.002 n=6)
Fixed64/10240000/default-16        459.7µ ± 0%   301.0µ ± 0%  -34.52% (p=0.002 n=6)
Fixed64/10240000/seed-16           458.0µ ± 0%   301.2µ ± 0%  -34.24% (p=0.002 n=6)
Fixed64/102400000/default-16       4.590m ± 1%   3.001m ± 0%  -34.62% (p=0.002 n=6)
Fixed64/102400000/seed-16          4.567m ± 0%   3.005m ± 0%  -34.20% (p=0.002 n=6)
Hasher64/16/go/plain-16            4.970n ± 1%   4.961n ± 1%        ~ (p=0.288 n=6)
Hasher64/16/go/seed-16             5.530n ± 3%   5.454n ± 0%   -1.39% (p=0.045 n=6)
Hasher64/64/go/plain-16            6.224n ± 1%   6.236n ± 0%        ~ (p=0.167 n=6)
Hasher64/64/go/seed-16             6.768n ± 0%   6.734n ± 0%   -0.51% (p=0.004 n=6)
Hasher64/256/go/plain-16           19.99n ± 0%   14.33n ± 0%  -28.31% (p=0.002 n=6)
Hasher64/256/go/seed-16            26.89n ± 1%   21.57n ± 0%  -19.80% (p=0.002 n=6)
Hasher64/1024/go/plain-16          59.97n ± 0%   43.26n ± 0%  -27.86% (p=0.002 n=6)
Hasher64/1024/go/seed-16           67.09n ± 0%   48.90n ± 0%  -27.11% (p=0.002 n=6)
Hasher64/4096/go/plain-16          195.4n ± 0%   126.3n ± 0%  -35.36% (p=0.002 n=6)
Hasher64/4096/go/seed-16           196.7n ± 0%   126.4n ± 0%  -35.74% (p=0.002 n=6)
Hasher64/16384/go/plain-16         738.9n ± 0%   486.0n ± 0%  -34.23% (p=0.002 n=6)
Hasher64/16384/go/seed-16          747.9n ± 0%   484.9n ± 0%  -35.17% (p=0.002 n=6)
Hasher64/65536/go/plain-16         2.904µ ± 0%   1.927µ ± 0%  -33.64% (p=0.002 n=6)
Hasher64/65536/go/seed-16          2.949µ ± 1%   1.927µ ± 0%  -34.63% (p=0.002 n=6)
Hasher64/262144/go/plain-16       11.562µ ± 0%   7.697µ ± 0%  -33.43% (p=0.002 n=6)
Hasher64/262144/go/seed-16        11.696µ ± 0%   7.698µ ± 0%  -34.18% (p=0.002 n=6)
Hasher64/1048576/go/plain-16       46.03µ ± 0%   30.72µ ± 0%  -33.26% (p=0.002 n=6)
Hasher64/1048576/go/seed-16        46.75µ ± 1%   30.72µ ± 0%  -34.30% (p=0.002 n=6)
Hasher64/4194304/go/plain-16       185.2µ ± 1%   122.9µ ± 0%  -33.63% (p=0.002 n=6)
Hasher64/4194304/go/seed-16        187.7µ ± 0%   123.0µ ± 0%  -34.49% (p=0.002 n=6)
Hasher64/16777216/go/plain-16      744.7µ ± 1%   496.3µ ± 1%  -33.35% (p=0.002 n=6)
Hasher64/16777216/go/seed-16       754.4µ ± 1%   496.9µ ± 2%  -34.13% (p=0.002 n=6)
Hasher64/67108864/go/plain-16      2.965m ± 0%   1.981m ± 1%  -33.20% (p=0.002 n=6)
Hasher64/67108864/go/seed-16       3.007m ± 0%   1.976m ± 0%  -34.28% (p=0.002 n=6)
Hasher64/268435456/go/plain-16    11.866m ± 0%   7.900m ± 0%  -33.42% (p=0.002 n=6)
Hasher64/268435456/go/seed-16     12.035m ± 2%   7.904m ± 0%  -34.32% (p=0.002 n=6)
Hasher128/16/go/plain-16           5.484n ± 1%   5.481n ± 1%        ~ (p=0.394 n=6)
Hasher128/16/go/seed-16            5.963n ± 0%   5.952n ± 1%        ~ (p=0.504 n=6)
Hasher128/64/go/plain-16           7.224n ± 1%   7.226n ± 0%        ~ (p=0.699 n=6)
Hasher128/64/go/seed-16            7.719n ± 1%   7.800n ± 1%   +1.05% (p=0.002 n=6)
Hasher128/256/go/plain-16          21.84n ± 0%   16.30n ± 0%  -25.41% (p=0.002 n=6)
Hasher128/256/go/seed-16           29.39n ± 1%   23.21n ± 0%  -21.02% (p=0.002 n=6)
Hasher128/1024/go/plain-16         61.87n ± 1%   43.61n ± 0%  -29.52% (p=0.002 n=6)
Hasher128/1024/go/seed-16          68.51n ± 7%   50.41n ± 1%  -26.43% (p=0.002 n=6)
Hasher128/4096/go/plain-16         197.1n ± 0%   126.2n ± 0%  -35.97% (p=0.002 n=6)
Hasher128/4096/go/seed-16          198.2n ± 0%   126.3n ± 0%  -36.26% (p=0.002 n=6)
Hasher128/16384/go/plain-16        740.7n ± 0%   485.2n ± 0%  -34.49% (p=0.002 n=6)
Hasher128/16384/go/seed-16         750.3n ± 0%   485.8n ± 1%  -35.25% (p=0.002 n=6)
Hasher128/65536/go/plain-16        2.894µ ± 0%   1.926µ ± 0%  -33.45% (p=0.002 n=6)
Hasher128/65536/go/seed-16         2.937µ ± 0%   1.926µ ± 0%  -34.43% (p=0.002 n=6)
Hasher128/262144/go/plain-16      11.544µ ± 0%   7.699µ ± 0%  -33.31% (p=0.002 n=6)
Hasher128/262144/go/seed-16       11.737µ ± 0%   7.694µ ± 0%  -34.45% (p=0.002 n=6)
Hasher128/1048576/go/plain-16      46.02µ ± 0%   30.71µ ± 0%  -33.26% (p=0.002 n=6)
Hasher128/1048576/go/seed-16       46.73µ ± 0%   30.72µ ± 0%  -34.26% (p=0.002 n=6)
Hasher128/4194304/go/plain-16      184.7µ ± 0%   122.9µ ± 0%  -33.46% (p=0.002 n=6)
Hasher128/4194304/go/seed-16       187.3µ ± 0%   122.9µ ± 0%  -34.39% (p=0.002 n=6)
Hasher128/16777216/go/plain-16     745.5µ ± 0%   505.1µ ± 2%  -32.25% (p=0.002 n=6)
Hasher128/16777216/go/seed-16      757.5µ ± 1%   502.0µ ± 2%  -33.72% (p=0.002 n=6)
Hasher128/67108864/go/plain-16     2.964m ± 1%   1.976m ± 1%  -33.35% (p=0.002 n=6)
Hasher128/67108864/go/seed-16      3.012m ± 0%   1.974m ± 0%  -34.45% (p=0.002 n=6)
Hasher128/268435456/go/plain-16   11.837m ± 0%   7.904m ± 0%  -33.23% (p=0.002 n=6)
Hasher128/268435456/go/seed-16    12.005m ± 0%   7.897m ± 0%  -34.21% (p=0.002 n=6)
geomean                            143.4n        117.1n       -18.36%

@zeebo
Copy link
Copy Markdown
Owner

zeebo commented Jan 21, 2026

this seems awesome as usual. i just got back from a vacation so i'll take a look sometime soon.

@zeebo zeebo merged commit 84cc04f into zeebo:master Jan 23, 2026
13 checks passed
@zeebo
Copy link
Copy Markdown
Owner

zeebo commented Jan 23, 2026

i realized the last tag was almost 4 years ago before the 128 bit hasher and a bunch of improvements, so i arbitrarily tagged v1.1.0 and pushed it 😄 thanks!

@klauspost klauspost deleted the add-arm64-neon branch January 23, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants