A GPU-accelerated molecular dynamics engine in Rust + CUDA, designed for bit-wise reproducibility: identical inputs produce byte-identical trajectory and log files across runs on the same GPU.
- Lennard-Jones pair forces (O(N²) kernel) with the minimum-image convention for periodic boundary conditions.
- Velocity Verlet integration in either an ordinary
f32mode (lossy) or a compensated(f32, f64)mode (lossless) that supports bit-exact time reversal. - Single-stream CUDA execution and a deterministic segmented reduction so that floating-point sums are performed in the same order on every run.
- Extended-XYZ trajectory output, CSV diagnostic log (step, time, KE, T), and a per-stage performance summary measured with CUDA events plus host wall-clocks.
See docs/architecture.md for the data flow,
reproducibility strategy, and per-kernel design. Every behaviour the engine
ships with is canonically described under rqm/; the source tree
references those entities by stable IDs (rq-XXXXXXXX).
- An NVIDIA GPU with a recent driver.
- CUDA Toolkit 11.8 or newer on
PATHsonvcccan compile the device kernels at build time. - Rust (the project uses Cargo edition 2024; install via rustup).
cargo build --release
The build script invokes nvcc for each .cu file under kernels/,
embeds the resulting PTX, and produces the heddlemd binary at
target/release/heddlemd.
A complete 10,000-atom Lennard-Jones argon example lives at
examples/lj-10000-argon/. It runs 100
integration timesteps in roughly a second on a recent NVIDIA GPU.
From the project root:
./target/release/heddlemd run examples/lj-10000-argon/argon.in.toml
(Or cargo run --release -- run examples/lj-10000-argon/argon.in.toml.)
A run produces three files alongside the config:
argon.out.xyz— 11 trajectory frames (steps 0, 10, …, 100) in extended-XYZ format. Each frame is self-describing (lattice vectors, column layout, simulation time). The trajectory frames can be re-loaded as an init file.argon.out.log— CSV withstep,time,kinetic_energy,temperature; one header line plus 21 data rows.argon.out.timings— a fixed-width text table with one row per instrumented stage: per-kernel timings (CUDA events) and host stages (config_load,init_load,gpu_init,host_to_device_upload,device_to_host_download,trajectory_write,log_write,velocity_generation,total_runtime). Columns:count,total_ms,mean_us,min_us,max_us.
By convention, config filenames end in .in.toml and the loader
derives the default output paths from the filename root and each
phase's name (argon.in.toml with phase name = "run" →
argon.out.run.{xyz,log,timings}). The runner rejects a config path
that does not match the suffix. The example's
README.md describes the lattice
layout and how to regenerate argon.in.xyz.
A simulation is fully specified by two files:
- A TOML config that pins everything affecting the trajectory:
RNG seed, target temperature, particle-type masses, per-pair
Lennard-Jones coefficients, and one or more
[[phase]]blocks — each carrying its ownn_steps,dt, integrator mode, and optional thermostat/barostat/output. SI units throughout (metres, kilograms, seconds, joules, kelvin). Per-phase output paths and cadences live in the optional[phase.output]sub-table; seerqm/io/config-schema.mdfor the full field reference. - An extended-XYZ init file carrying the particle count, simulation
box (orthorhombic
Lattice="lx 0 0 0 ly 0 0 0 lz"), per-particle type names, positions, and optionally velocities. Positions must lie inside the primary cell[-L/2, L/2)per axis. Velocities are optional; absent velocities are sampled from a Maxwell-Boltzmann distribution at the configured temperature using a deterministic ChaCha8 RNG seeded by the config seed, with the centre-of-mass drift removed. Seerqm/io/init-state-file.md.
The runner currently accepts one particle type per simulation; the schema is forward-compatible with multi-type runs once the kernel supports them.
heddlemd lint <config> runs every input-validation check the runner
would perform — TOML parse, init-file load, topology load,
output-path collisions, box-vs-cutoff geometry — without touching the
GPU or writing any files. Designed for HPC contexts where a long
queue makes ad-hoc trial-and-error iteration expensive: lint on a
login node and fix the report up front. Add --with-gpu to extend
the lint through init_device, slot construction, and force-field
allocation when a GPU is available. See the
CLI Reference chapter for the full
specification.
The <root>.out.xyz trajectory and the <root>.out.log log are
byte-identical across two runs of the same config on the same GPU.
The <root>.out.timings file is intentionally not reproducible:
wall-clock measurements vary run-to-run and would corrupt the
comparison if mixed with the deterministic outputs.
Cross-hardware reproducibility is not a goal; CUDA permits FMA
contraction differences between GPUs.
src/ Rust host code: I/O, runner, GPU buffer wrappers
kernels/ CUDA C source for the device kernels (compiled to PTX)
docs/architecture.md System design and data flow
rqm/ Canonical requirements, by feature
examples/ Ready-to-run input bundles
tests/ Integration tests (one per requirements file)
This repository follows a requirements-driven workflow: every
feature has a canonical description under rqm/ with Gherkin scenarios,
and every type, function, and test in src/ and tests/ carries the
stable rq-XXXXXXXX ID of the requirement it implements. The traceability
registry at rqm/registry.json is rebuilt by
./.claude/skills/plan-feature/rqm.sh index.
Two skills assist this loop:
/plan-featuredrafts or extends a requirements file, asks clarifying questions, and stamps stable IDs on every heading, API item, and scenario./implementwrites the code and tests for an existing requirements file. One test per Gherkin scenario, annotated with the scenario'srq-ID.
When iterating on a feature, edit the requirements file first, then ask
the assistant to update the implementation. This keeps rqm/ as the
source of truth: if src/ were deleted, the requirements files would
be enough to reproduce the engine.
LLMs are susceptible to prompt injection and data poisoning. When using this repo with an agentic assistant:
- Run the assistant inside a sandboxed container (the included Podman setup blocks the assistant from running outside one).
- Never expose private SSH keys, credentials, or write access to remote repositories.
- Review every generated change before pushing.