Skip to content

Speed up (sv2 native) CPU miner#1901

Merged
plebhash merged 12 commits into
stratum-mining:mainfrom
Sjors:2025/09/mine-faster
Sep 23, 2025
Merged

Speed up (sv2 native) CPU miner#1901
plebhash merged 12 commits into
stratum-mining:mainfrom
Sjors:2025/09/mine-faster

Conversation

@Sjors

@Sjors Sjors commented Sep 23, 2025

Copy link
Copy Markdown
Collaborator

On my M4 MacBook Pro this results in a 10x speedup.

cargo bench --bench hasher_bench -- --quiet
...
mining_device_hasher/baseline_block_hash/full
                        time:   [365.94 ns 366.43 ns 366.92 ns]
mining_device_hasher/fast_midstate/compress256
                        time:   [44.996 ns 45.057 ns 45.119 ns]

This PR is entirely vibe coded and I'm not much of a Rust expert. Maybe if the code isn't usable it could serve as inspiration. Happy to clean things up too.

It's mainly achieved by:

  • using midstate
  • using native SHA256 instructions (provided by the sha2 crate)

On my machine, when mining at difficulty 1 it's the difference between annoyingly slow and finding a block in about 10 seconds.

Performance is comparable to https://github.com/pooler/cpuminer despite the latter not using native SHA instructions (it took some inspiration by reading that code, but didn't copy it).

I also vibe coded a GPU miner in Metal, which adds another 3x speed increase, but with a lot of extra complexity. Here's the branch: https://github.com/Sjors/stratum/tree/2025/09/gpu-miner

@Sjors Sjors force-pushed the 2025/09/mine-faster branch from 078bf19 to b833f48 Compare September 23, 2025 10:27
@Sjors

Sjors commented Sep 23, 2025

Copy link
Copy Markdown
Collaborator Author

I added 2da36b2 to first do a cargo +1.75.0 update. This reduces the churn in cargo lock files in later commits, and probably also makes rebases easier.

@Sjors Sjors force-pushed the 2025/09/mine-faster branch from b833f48 to de59c0d Compare September 23, 2025 11:04
@Sjors

Sjors commented Sep 23, 2025

Copy link
Copy Markdown
Collaborator Author

Tightened the bench dependencies a bit in 05016a9 so it should work with Rust 1.75 (avoiding rayon and half).

@Sjors Sjors force-pushed the 2025/09/mine-faster branch from 97ded11 to 9791c10 Compare September 23, 2025 11:25
@plebhash

Copy link
Copy Markdown
Member

whenever I think of CPU miners, I don't really put performance at the top of the priority list... overall, it's not going to be profitable anyways, and I just need something that will have some hashrate while complying to the Sv1 or Sv2 protocol

whether its 1 or 100 kH/s, I don't think it makes much of a difference, as long as difficulty is set accordingly

and when it comes to the Sv1 CPU miner, tbh these days I lean way more towards just using https://github.com/pooler/cpuminer rather than trying to maintain/improve SRI mining-device crate

minerd is much more battle proven in regards to Sv1 protocol compliance and overall robustness

for Integration Tests, where it's useful to have Rust APIs that control a Sv1 CPU miner, I have a branch that replaces mining-device with minerd: https://github.com/plebhash/stratum/tree/2025-08-30-minerd-itf

@Sjors

Sjors commented Sep 23, 2025

Copy link
Copy Markdown
Collaborator Author

For the purpose of giving sv2 demos I would prefer to use the sv2 native miner, so I don't have to bother people (or myself) with the translator role. And having it a bit faster makes that easier. Especially because difficulty 1 is the minimum on custom signets.

@Shourya742

Copy link
Copy Markdown
Member

For the purpose of giving sv2 demos I would prefer to use the sv2 native miner, so I don't have to bother people (or myself) with the translator role. And having it a bit faster makes that easier. Especially because difficulty 1 is the minimum on custom signets.

You can try this: https://github.com/plebhash/sv2-cpu-miner, for sv2 native miner.

@Sjors

Sjors commented Sep 23, 2025

Copy link
Copy Markdown
Collaborator Author

TIL... @plebhash is that based on this project or completely independent?

@plebhash

Copy link
Copy Markdown
Member

When I wrote my comment above I had Sv1 CPU miners in mind. For some reason my brain was parsing this PR as a modification to mining-device-sv1, not mining-device.

As a result I'm introducing unnecessary noise into this PR. I apologize.

Yeah, minerd is not a substitute for mining-device, since it doesn't do Sv2.

TIL... @plebhash is that based on this project or completely independent?

https://github.com/plebhash/sv2-cpu-miner is built on top of https://github.com/plebhash/sv2-services (fka tower-stratum), which also builds on top of SRI low level crates but aims to be a unified framework for building Sv2 apps

but it's only experimental and since it was kinda rejected by SRI community, it's not official and I'm not sure how much time/effort I'll continue allocating to it.

but also this alternative CPU miner is designed such that performance is deliberately de-prioritized as a trade-off with other design decisions, so it would not solve your problems anyways.


if this was trying to optimize mining-device-sv1, I guess I'd be more opinionated that the engineering effort was not really worth it.

no hard opinions against trying to speed up mining-device.

@Sjors Sjors changed the title Speed up CPU miner Speed up (sv2 native) CPU miner Sep 23, 2025
@plebhash

Copy link
Copy Markdown
Member

tACK

@Sjors can you rebase against main?

Prevent churn in later commits.
Also adds a benchmark.

- Use sha2 with compress+asm and add a midstate-based double-SHA256 hasher (FastSha256d)
- Integrate optimized hasher into Miner::next_share() to speed up the nonce loop

Assisted-by: GitHub Copilot
Assisted-by: OpenAI GPT-5
- FastSha256d: mutate the first-pass second 64-byte block in place
  - Update only time and nonce (block1[4..8], block1[12..16]) before compress256
  - Reduces per-nonce memory churn in the hashing hot path
- Replace Target conversions with word-wise little-endian u32 comparison
  - Avoids allocations/conversions and allows early-out on first inequality

Assisted-by: GitHub Copilot
Assisted-by: OpenAI GPT-5
Assisted-by: GitHub Copilot
Assisted-by: OpenAI GPT-5
- Print a concise MH/s estimate per batch size in microbatch_bench for familiar throughput units.
- Reduce verbosity:
  -Default batch sizes: 1, 8, 32, 128 (override via MINING_DEVICE_BATCH_SIZES).
  - Criterion timings: sample_size=10, warm_up=100ms, measurement=1s.
- README: note MH/s output, document env override and concise defaults.

Assisted-by: GitHub Copilot
Assisted-by: OpenAI GPT-5
Assisted-by: GitHub Copilot
Assisted-by: OpenAI GPT-5
Assisted-by: GitHub Copilot
Assisted-by: OpenAI GPT-5
Assisted-by: GitHub Copilot
Assisted-by: OpenAI GPT-5
Previously it would measure one core and extrapolate, but not all cores are equal.

Assisted-by: GitHub Copilot
Assisted-by: OpenAI GPT-5
@Sjors Sjors force-pushed the 2025/09/mine-faster branch from 9791c10 to 3c21721 Compare September 23, 2025 16:53
@Sjors

Sjors commented Sep 23, 2025

Copy link
Copy Markdown
Collaborator Author

Rebased

@plebhash plebhash merged commit 9affb14 into stratum-mining:main Sep 23, 2025
11 checks passed
@Sjors Sjors deleted the 2025/09/mine-faster branch September 23, 2025 17:32
@Sjors

Sjors commented Sep 23, 2025

Copy link
Copy Markdown
Collaborator Author

Feeling emboldened :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants