Skip to content

RISC-V integration and JIT compiler#3725

Merged
xmrig merged 4 commits into
xmrig:devfrom
SChernykh:dev
Oct 23, 2025
Merged

RISC-V integration and JIT compiler#3725
xmrig merged 4 commits into
xmrig:devfrom
SChernykh:dev

Conversation

@SChernykh

@SChernykh SChernykh commented Oct 22, 2025

Copy link
Copy Markdown
Contributor

Closes #1924

@SChernykh

Copy link
Copy Markdown
Contributor Author

@Slayingripper please test on your board (with and without -DARCH=native for cmake)

@SChernykh

Copy link
Copy Markdown
Contributor Author

Some tests in qemu:
image
image

@Slayingripper

Copy link
Copy Markdown
Contributor

Cmake without -DARCH=native . It seems like the JIT compiler port MADE A HUGE difference, it's now up to 100H/s, that's an almost 3x increase!
image

Cmake with -DARCH=native

image

@SChernykh

Copy link
Copy Markdown
Contributor Author

Can you also build https://github.com/tevador/RandomX/ and run the benchmark there?

./randomx-benchmark --mine --jit --largePages --softAes --threads 8 --init 8

I need this to check if I didn't break anything while porting the code.

@Slayingripper

Copy link
Copy Markdown
Contributor

All good

-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting default build type: Release
-- Performing Test _march=rv64gc_cxx
-- Performing Test _march=rv64gc_cxx - Success
-- Setting CXX flag -march=rv64gc
-- Performing Test _march=rv64gc_c
-- Performing Test _march=rv64gc_c - Success
-- Setting C flag -march=rv64gc
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Performing Test HAVE_CXX_ATOMICS
-- Performing Test HAVE_CXX_ATOMICS - Success
-- Configuring done (5.5s)
-- Generating done (0.1s)
-- Build files have been written to: /root/RandomX/build
root@orangepirv2:~/RandomX/build# make -j 8
[  2%] Building CXX object CMakeFiles/randomx.dir/src/aes_hash.cpp.o
[ 11%] Building CXX object CMakeFiles/randomx.dir/src/cpu.cpp.o
[ 11%] Building C object CMakeFiles/randomx.dir/src/argon2_ref.c.o
[ 14%] Building CXX object CMakeFiles/randomx.dir/src/bytecode_machine.cpp.o
[ 14%] Building C object CMakeFiles/randomx.dir/src/argon2_avx2.c.o
[ 17%] Building C object CMakeFiles/randomx.dir/src/argon2_ssse3.c.o
[ 20%] Building CXX object CMakeFiles/randomx.dir/src/dataset.cpp.o
[ 23%] Building CXX object CMakeFiles/randomx.dir/src/soft_aes.cpp.o
[ 26%] Building C object CMakeFiles/randomx.dir/src/virtual_memory.c.o
[ 29%] Building CXX object CMakeFiles/randomx.dir/src/vm_interpreted.cpp.o
[ 32%] Building CXX object CMakeFiles/randomx.dir/src/allocator.cpp.o
[ 35%] Building CXX object CMakeFiles/randomx.dir/src/assembly_generator_x86.cpp.o
[ 38%] Building CXX object CMakeFiles/randomx.dir/src/instruction.cpp.o
[ 41%] Building CXX object CMakeFiles/randomx.dir/src/randomx.cpp.o
[ 44%] Building CXX object CMakeFiles/randomx.dir/src/superscalar.cpp.o
[ 47%] Building CXX object CMakeFiles/randomx.dir/src/vm_compiled.cpp.o
[ 50%] Building CXX object CMakeFiles/randomx.dir/src/vm_interpreted_light.cpp.o
[ 52%] Building C object CMakeFiles/randomx.dir/src/argon2_core.c.o
[ 55%] Building CXX object CMakeFiles/randomx.dir/src/blake2_generator.cpp.o
[ 58%] Building CXX object CMakeFiles/randomx.dir/src/instructions_portable.cpp.o
[ 61%] Building C object CMakeFiles/randomx.dir/src/reciprocal.c.o
[ 64%] Building CXX object CMakeFiles/randomx.dir/src/virtual_machine.cpp.o
[ 67%] Building CXX object CMakeFiles/randomx.dir/src/vm_compiled_light.cpp.o
[ 70%] Building C object CMakeFiles/randomx.dir/src/blake2/blake2b.c.o
[ 73%] Building C object CMakeFiles/randomx.dir/src/jit_compiler_rv64_static.S.o
[ 76%] Building CXX object CMakeFiles/randomx.dir/src/jit_compiler_rv64.cpp.o
[ 79%] Linking CXX static library librandomx.a
[ 79%] Built target randomx
[ 85%] Building CXX object CMakeFiles/randomx-tests.dir/src/tests/tests.cpp.o
[ 85%] Building CXX object CMakeFiles/randomx-codegen.dir/src/tests/code-generator.cpp.o
[ 91%] Building CXX object CMakeFiles/randomx-benchmark.dir/src/tests/affinity.cpp.o
[ 91%] Building CXX object CMakeFiles/randomx-benchmark.dir/src/tests/benchmark.cpp.o
[ 94%] Linking CXX executable randomx-codegen
[ 94%] Built target randomx-codegen
[ 97%] Linking CXX executable randomx-benchmark
[ 97%] Built target randomx-benchmark
[100%] Linking CXX executable randomx-tests
[100%] Built target randomx-tests
root@orangepirv2:~/RandomX/build# ./randomx-benchmark --mine --jit --largePages --softAes --threads 8 --init 8
RandomX benchmark v1.2.1
 - Argon2 implementation: reference
 - full memory mode (2080 MiB)
 - JIT compiled mode 
 - software AES mode
 - large pages mode
 - batch mode
Initializing (8 threads) ...
Memory initialized in 29.2877 s
Initializing 8 virtual machine(s) ...
Running benchmark (1000 nonces) ...
Calculated result: 10b649a3f15c7c7f88277812f2e74b337a0f20ce909af09199cccb960771cfa1
Reference result:  10b649a3f15c7c7f88277812f2e74b337a0f20ce909af09199cccb960771cfa1

@SChernykh

Copy link
Copy Markdown
Contributor Author

And the last line (after the reference result) - performance (hashes per second)?

@Slayingripper

Copy link
Copy Markdown
Contributor

And the last line (after the reference result) - performance (hashes per second)?

Performance: 86.8955 hashes per second

@SChernykh

Copy link
Copy Markdown
Contributor Author

All good then. XMRig has better soft AES implementation, so it should be a bit faster even with the same JIT compiler.

@SChernykh

Copy link
Copy Markdown
Contributor Author

One more question: RxDataset_riscv.h and the whole src/crypto/riscv folder don't seem to be used anywhere. What is the purpose of these files?

@Slayingripper

Copy link
Copy Markdown
Contributor

One more question: RxDataset_riscv.h and the whole src/crypto/riscv folder don't seem to be used anywhere. What is the purpose of these files?

I had forgotten about this, thinking that RISC-V had some crypto extension, but until this happens, this won't get utilised. I was hoping to figure out a way to implement this, but maybe it's too early for RISC-V. You could remove them

@Slayingripper

Copy link
Copy Markdown
Contributor

One more question: RxDataset_riscv.h and the whole src/crypto/riscv folder don't seem to be used anywhere. What is the purpose of these files?

I had forgotten about this, thinking that RISC-V had some crypto extension, but until this happens, this won't get utilised. I was hoping to figure out a way to implement this, but maybe it's too early for RISC-V. You could remove them

I was digging into the Orange Pi's documentation in the hopes I could figure out a way to utilise the "AI" features. But I guess it was just marketing hype.

@xmrig xmrig added this to the v6 milestone Oct 23, 2025
@xmrig xmrig merged commit 3ecacf0 into xmrig:dev Oct 23, 2025
@SChernykh

Copy link
Copy Markdown
Contributor Author

@Slayingripper can you also run ./xmrig --export-topology and attach the generated xml file here? Maybe I will be able to better tune the auto-config for this CPU.

@Slayingripper

Copy link
Copy Markdown
Contributor

topology.xml

@SChernykh

Copy link
Copy Markdown
Contributor Author

Unfortunately (or fortunately) nothing unusual there, so auto-config should already create 8 threads for it.

@Slayingripper

Copy link
Copy Markdown
Contributor

Got this running on a Mango PI with the D1 processor.

So I guess I can confirm that it works on two different boards.

image

@KiritakeKumi

Copy link
Copy Markdown

After seeing your work merged, I conducted tests on a device with an SG2042 chip (64 cores, 128GB RAM). Here's a summary of my benchmark results.

PS: However, this device doesn't have RVV1.0 vector acceleration.

--bench=1M --algo=rx/0 -t 20
image

@SChernykh

Copy link
Copy Markdown
Contributor Author

@KiritakeKumi That's too low hashrate for SG2042. Did you test the latest dev branch? Also, why not 32 threads?

@Slayingripper

Copy link
Copy Markdown
Contributor

SG2042

This is very suspicious since the same chip is used in the XMR MINER X5 if I'm not mistaken. The low hashrate is quite interesting

@SChernykh

Copy link
Copy Markdown
Contributor Author

X5 uses SG2042R - a custom version of SG2042. I suspect it has hardware AES + vector instructions + some extra instructions specifically for common RandomX code sequences. But the regular SG2042 shouldn't be more than 10x slower anyway.

@Slayingripper

Slayingripper commented Dec 16, 2025

Copy link
Copy Markdown
Contributor

X5 uses SG2042R - a custom version of SG2042. I suspect it has hardware AES + vector instructions + some extra instructions specifically for common RandomX code sequences. But the regular SG2042 shouldn't be more than 10x slower anyway.

Yes, I agree, but at least it's working. Thinking about this for a second, although this is not a 1:1 comparison. Since the RV2 gets around 100H/s at 8 cores , then (100*64)/8 , does come out to 800 H/s, so the benchmark on the surface does make sense.

I will also add that both of these chips have relatively low L1 Cache, which, if I'm not mistaken, also impacts performance.

@SChernykh

Copy link
Copy Markdown
Contributor Author

RV2 does 37 h/s on a single core though, even with only 512 KB cache. SG2042 has 64 MB cache, so it can run 32 threads at full speed - it should be much faster per thread.

@KiritakeKumi

Copy link
Copy Markdown

@KiritakeKumi That's too low hashrate for SG2042. Did you test the latest dev branch? Also, why not 32 threads?

Yes, I'm using the latest dev branch.

Regarding the core count, I found that 20 cores is optimal; higher core counts might be due to NUMA optimization issues?

The SG2042R, compared to the SG2042, should have added a RandomX hardware accelerator.

@SChernykh

Copy link
Copy Markdown
Contributor Author

Regarding the core count, I found that 20 cores is optimal; higher core counts might be due to NUMA optimization issues?

Thread affinity can also be important. When you set thread count manually, threads are not fixed to specific cores. Can you run ./xmrig --export-topology and attach the generated topology.xml here? Then I will be able to tell which thread configuration is optimal.

@KiritakeKumi

Copy link
Copy Markdown

Regarding the core count, I found that 20 cores is optimal; higher core counts might be due to NUMA optimization issues?

Thread affinity can also be important. When you set thread count manually, threads are not fixed to specific cores. Can you run ./xmrig --export-topology and attach the generated topology.xml here? Then I will be able to tell which thread configuration is optimal.

topology.xml
thanks

@SChernykh

Copy link
Copy Markdown
Contributor Author

Weird, it doesn't show cache size per core. But it does show that this CPU is split into 4 NUMA nodes, so running XMRig in auto mode is important. Try to use --cpu-max-threads-hint=31 or --cpu-max-threads-hint=32 instead of setting the thread count explicitly.

@KiritakeKumi

Copy link
Copy Markdown

After setting it up, I can now achieve a speed of around 2100 H/s.

Without RVV acceleration, I think this speed is quite reasonable now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants