Speed up (sv2 native) CPU miner#1901
Conversation
078bf19 to
b833f48
Compare
|
I added 2da36b2 to first do a |
b833f48 to
de59c0d
Compare
|
Tightened the bench dependencies a bit in 05016a9 so it should work with Rust 1.75 (avoiding |
97ded11 to
9791c10
Compare
|
whenever I think of CPU miners, I don't really put performance at the top of the priority list... overall, it's not going to be profitable anyways, and I just need something that will have some hashrate while complying to the Sv1 or Sv2 protocol whether its 1 or 100 kH/s, I don't think it makes much of a difference, as long as difficulty is set accordingly and when it comes to the Sv1 CPU miner, tbh these days I lean way more towards just using https://github.com/pooler/cpuminer rather than trying to maintain/improve SRI
for Integration Tests, where it's useful to have Rust APIs that control a Sv1 CPU miner, I have a branch that replaces |
|
For the purpose of giving sv2 demos I would prefer to use the sv2 native miner, so I don't have to bother people (or myself) with the translator role. And having it a bit faster makes that easier. Especially because difficulty 1 is the minimum on custom signets. |
You can try this: https://github.com/plebhash/sv2-cpu-miner, for sv2 native miner. |
|
TIL... @plebhash is that based on this project or completely independent? |
|
When I wrote my comment above I had Sv1 CPU miners in mind. For some reason my brain was parsing this PR as a modification to As a result I'm introducing unnecessary noise into this PR. I apologize. Yeah,
https://github.com/plebhash/sv2-cpu-miner is built on top of https://github.com/plebhash/sv2-services (fka but it's only experimental and since it was kinda rejected by SRI community, it's not official and I'm not sure how much time/effort I'll continue allocating to it. but also this alternative CPU miner is designed such that performance is deliberately de-prioritized as a trade-off with other design decisions, so it would not solve your problems anyways. if this was trying to optimize no hard opinions against trying to speed up |
|
tACK @Sjors can you rebase against |
Prevent churn in later commits.
Also adds a benchmark. - Use sha2 with compress+asm and add a midstate-based double-SHA256 hasher (FastSha256d) - Integrate optimized hasher into Miner::next_share() to speed up the nonce loop Assisted-by: GitHub Copilot Assisted-by: OpenAI GPT-5
- FastSha256d: mutate the first-pass second 64-byte block in place - Update only time and nonce (block1[4..8], block1[12..16]) before compress256 - Reduces per-nonce memory churn in the hashing hot path - Replace Target conversions with word-wise little-endian u32 comparison - Avoids allocations/conversions and allows early-out on first inequality Assisted-by: GitHub Copilot Assisted-by: OpenAI GPT-5
Assisted-by: GitHub Copilot Assisted-by: OpenAI GPT-5
- Print a concise MH/s estimate per batch size in microbatch_bench for familiar throughput units. - Reduce verbosity: -Default batch sizes: 1, 8, 32, 128 (override via MINING_DEVICE_BATCH_SIZES). - Criterion timings: sample_size=10, warm_up=100ms, measurement=1s. - README: note MH/s output, document env override and concise defaults. Assisted-by: GitHub Copilot Assisted-by: OpenAI GPT-5
Assisted-by: GitHub Copilot Assisted-by: OpenAI GPT-5
Assisted-by: GitHub Copilot Assisted-by: OpenAI GPT-5
Assisted-by: GitHub Copilot Assisted-by: OpenAI GPT-5
Previously it would measure one core and extrapolate, but not all cores are equal. Assisted-by: GitHub Copilot Assisted-by: OpenAI GPT-5
… pest_generator, pest_meta)
9791c10 to
3c21721
Compare
|
Rebased |
|
Feeling emboldened :-) |
On my M4 MacBook Pro this results in a 10x speedup.
cargo bench --bench hasher_bench -- --quiet ... mining_device_hasher/baseline_block_hash/full time: [365.94 ns 366.43 ns 366.92 ns] mining_device_hasher/fast_midstate/compress256 time: [44.996 ns 45.057 ns 45.119 ns]This PR is entirely vibe coded and I'm not much of a Rust expert. Maybe if the code isn't usable it could serve as inspiration. Happy to clean things up too.
It's mainly achieved by:
On my machine, when mining at difficulty 1 it's the difference between annoyingly slow and finding a block in about 10 seconds.
Performance is comparable to https://github.com/pooler/cpuminer despite the latter not using native SHA instructions (it took some inspiration by reading that code, but didn't copy it).
I also vibe coded a GPU miner in Metal, which adds another 3x speed increase, but with a lot of extra complexity. Here's the branch: https://github.com/Sjors/stratum/tree/2025/09/gpu-miner