Skip to content

animica: add SHA3-256 CUDA kernel + animicaHash plugin export#218

Open
animicaorg wants to merge 1 commit into
xmrig:masterfrom
animicaorg:animica-mining
Open

animica: add SHA3-256 CUDA kernel + animicaHash plugin export#218
animicaorg wants to merge 1 commit into
xmrig:masterfrom
animicaorg:animica-mining

Conversation

@animicaorg

Copy link
Copy Markdown

Adds the CUDA SHA3-256 kernel for Animica (Layer-1 hashshare PoW) — companion to xmrig/xmrig#3817.

With this plugin loaded, an xmrig built with -DWITH_ANIMICA=ON resolves the animicaHash symbol via dlsym, and xmrig --algo animica --url pool.animica.org:3333 mines on the GPU via CUDA. Without it (e.g. stock upstream xmrig-cuda), xmrig's CudaLib::hasAnimicaSupport() probe returns false, the runner emits one yellow warning, and mining falls back cleanly to OpenCL/CPU — no segfault, no broken state.

What's in this PR

Kernel (src/Animica/Animica.cu)

  • One CUDA thread per nonce. Builds the 136-byte rate block (32-byte prefix + 8-byte LE nonce + 0x06 ... 0x80 SHA3 padding), runs Keccak-f[1600] for 24 rounds inlined in registers, squeezes the first 32 bytes, big-endian-compares against the 256-bit target.
  • Atomic-reserves a result slot when the digest beats the target. Per-share record is just the 4-byte nonce — xmrig's submit path re-derives the digest host-side via the matching AnimicaHash.cpp::sha3_256 (which produces bit-identical output by construction; same round constants, same 0x06/0x80 padding).
  • One device-side state struct per device_id (capped at 16 GPUs/host).

Plugin export (src/xmrig-cuda.{h,cpp})

  • XMRIG_EXPORT bool animicaHash(nvid_ctx*, uint8_t*, uint64_t, uint32_t, uint32_t*, uint32_t*, uint32_t*) matches the signature xmrig's CudaLib uv_dlsym's.
  • setJob() for the ANIMICA family is a deliberate no-op — the kernel reads job_blob directly per call, no device-side scratch carried between turns (in contrast to KawPow's DAG or RandomX's dataset).

Algorithm registration (src/crypto/common/Algorithm.{h,cpp})

  • Algorithm::ANIMICA_SHA3 = 0x41010000 + Family::ANIMICA = 0x41000000, mirroring the encoding in the companion xmrig PR.
  • Added to the parse table under XMRIG_ALGO_ANIMICA.

Build glue (CMakeLists.txt, cmake/CUDA.cmake)

  • option(WITH_ANIMICA "..." ON). XMRIG_ALGO_ANIMICA define plumbed alongside XMRIG_ALGO_KAWPOW.
  • CUDA_ANIMICA_SOURCES list with the .cu + .h.

Footprint

When WITH_ANIMICA=OFF the binary is byte-identical to upstream. Animica adds ~300 lines including the kernel; no new dependencies (uses only the CUDA Runtime API and the existing xmrig-cuda nvid_ctx).

Companion PR

CPU + OpenCL paths, algorithm enum, Stratum + AICF dispatch in xmrig: xmrig/xmrig#3817.

Build verification

The kernel hasn't been benchmarked on hardware yet — the dev box has no CUDA toolkit installed. Compilation against nvcc and a real-GPU run against pool.animica.org:3333 is the next-step exercise, after which the xmrig + xmrig-cuda + Animica pool end-to-end is the same one the CPU and OpenCL paths already pass.

This is the xmrig-cuda counterpart to the Animica integration in
ercmine/xmrig@animica-mining. With this plugin loaded, an xmrig built
with -DWITH_ANIMICA=ON resolves the `animicaHash` symbol via dlsym,
and `--algo animica` mines on the GPU via CUDA.

What's in this commit:
- src/Animica/Animica.cu — the kernel. One CUDA thread per nonce:
  builds the single 136-byte rate block (32-byte prefix + 8-byte LE
  nonce + 0x06 ... 0x80 SHA3 padding), runs Keccak-f[1600] for 24
  rounds inlined in registers, squeezes the first 32 bytes, and
  big-endian-compares against the 256-bit target. Atomic-reserves a
  slot in a small results buffer when the digest is ≤ target.
- src/Animica/Animica.h — single-function dispatcher signature
  matching the CudaLib::animicaHash() symbol xmrig dlsym's.
- src/xmrig-cuda.h / src/xmrig-cuda.cpp — `XMRIG_EXPORT bool
  animicaHash(...)` exposed at the plugin boundary, gated by
  XMRIG_ALGO_ANIMICA. setJob() for the ANIMICA family is a deliberate
  no-op — the kernel reads job_blob directly per call, no device-side
  scratch carried between turns.
- src/cryptonight.h — forward declaration of Animica::hash() so the
  rest of the plugin compiles without including the .cu header.
- src/crypto/common/Algorithm.h / .cpp — adds ANIMICA_SHA3
  (0x41010000) to the algorithm enum and the parse table, mirroring
  the encoding chosen in ercmine/xmrig.
- CMakeLists.txt + cmake/CUDA.cmake — `WITH_ANIMICA=ON` build flag
  (defaults on), CUDA_ANIMICA_SOURCES list, and the
  XMRIG_ALGO_ANIMICA preprocessor define plumbed alongside KAWPOW /
  RANDOMX.

What the host xmrig sees:
- On plugin dlopen, the optional `uv_dlsym(animicaHash)` in
  CudaLib succeeds (was a no-op against stock xmrig-cuda).
- CudaLib::hasAnimicaSupport() returns true.
- CudaAnimicaRunner::run() dispatches into this kernel and reports
  any winning nonce back to the worker submit path.

Not in this commit (intentional):
- No GPU build verified on this host — the dev box has no CUDA
  toolkit installed. The .cu compiles against nvcc on a CUDA-equipped
  builder is the next-step manual test, after which the dlopen +
  Animica end-to-end against pool.animica.org:3333 is the same
  exercise the CPU/OpenCL paths already pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant