Skip to content

feat: cuda refinement#876

Merged
JesseTheRobot merged 10 commits into
masterfrom
feat/cuda-refinement
Nov 24, 2025
Merged

feat: cuda refinement#876
JesseTheRobot merged 10 commits into
masterfrom
feat/cuda-refinement

Conversation

@JesseTheRobot

@JesseTheRobot JesseTheRobot commented Oct 16, 2025

Copy link
Copy Markdown
Member

Describe the changes
This PR:

  • Fixes a flaw in the CUDA C++ that meant batches of more than 8192 chunks would segfault
  • Fixes another flaw in the CUDA C++ that would cause corrupt data if you overprovisioned chunks (requested more chunks to be packed than you have blocks * threads per block)
  • Adds logic to control the blocks and threads per block of the packing kernel
  • Adds logic to automatically best-guess the blocks and thread per blocks
    NB: testing has shown that we get the best performance by trying to match the underlying hardware - this means blocks = SM count, and threads per block = physical CUDA code count per SM. There is likely room for improvement here with more indepth testing (using nsight etc).

Checklist

  • [] Tests have been added/updated for the changes.
  • Documentation has been updated for the changes (if applicable).
  • The code follows Rust's style guidelines.

@JesseTheRobot JesseTheRobot marked this pull request as ready for review November 23, 2025 17:30
@JesseTheRobot JesseTheRobot changed the title wip: cuda refinement feat: cuda refinement Nov 23, 2025
@JesseTheRobot JesseTheRobot merged commit 51ce8b6 into master Nov 24, 2025
17 checks passed
@JesseTheRobot JesseTheRobot deleted the feat/cuda-refinement branch November 24, 2025 13:02
JesseTheRobot added a commit that referenced this pull request Nov 27, 2025
* wip

* wip

* fix: use stride-loop

* feat: handle GPU errors

* feat: improve kernel to allow for >8192 chunks

* chore: add packing benchmark

* feat: add code for basic automatic block and blocks per thread

* chore: disable spellchecking

* chore: fmt

* chore: actually fix the spellchecker
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant