Tags · NVIDIA/warp

v1.14.0

v1.14.0

Highlights:
- Extend CPU APIC graph capture serialization to replay backward
  launches, tiled kernels, richer launch arguments, and structs or
  indexed arrays carrying Warp array buffers
- Add multi-environment warp.fem geometries with environment-aware
  lookup and environment-first partitions for batched solves
- Add reusable and batched warp.optim.linear solvers with preallocated
  solver state and batch_offsets support
- Add pluggable Python logging through wp.set_logger(),
  wp.ScopedLogger, and wp.config.log_level
- Relax CPU/GPU array launch validation for HMM and ATS systems with
  wp.can_access() and LaunchArrayAccessMode controls
- Promote JAX integration to stable top-level APIs and deprecate
  warp.jax_experimental
- Add portable tile FFT and solver fallbacks for CPU and libmathdx-free
  GPU builds, plus wp.tile_empty()
- Fix math and autodiff correctness for NaN min/max/clamp/atomics,
  composite-component writes, curlnoise gradients, and large tile
  offsets

See the full changelog for more details:
https://github.com/NVIDIA/warp/releases/tag/v1.14.0

May 31, 2026
b943176
zip
tar.gz
Notes
Downloads

v1.13.0

v1.13.0

Highlights:
- Add experimental graph capture serialization (wp.capture_save/wp.capture_load) with portable .wrp format and standalone C++ replay on both GPU and CPU
- Add wp.bfloat16 scalar type with array allocation, kernel execution, autodiff, DLPack, PyTorch, and JAX interop
- Add pluggable CUDA allocator interface (wp.set_cuda_allocator) with built-in RAPIDS Memory Manager (RMM) integration
- Add scoped memory tracking with C++-layer call-site attribution via wp.ScopedMemoryTracker
- Add experimental cuBQL BVH backend for wp.Mesh ray queries on dense meshes
- Add new tile primitives: wp.tile_dot, wp.tile_axpy, wp.tile_stack family, wp.tile_scatter_add/masked, wp.tile_query_valid
- Add double-precision (wp.float64) support to warp.fem
- Remove Python 3.9 support (Python 3.10 is now the minimum)

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.13.0

May 4, 2026
130a55e
zip
tar.gz
Notes
Downloads

v1.12.1

v1.12.1

Highlights:

- Fix kernel dispatch using incorrect block_dim across devices, causing
  crashes or memory corruption in tile kernels
- Fix silent precision loss in compile-time constants passed to 64-bit
  scalar constructors (wp.float64(), wp.int64(), wp.uint64())
- Fix wp.HashGrid neighbor queries missing results for negative coordinates
- Fix augmented assignments with subscript/attribute targets double-evaluating
  the target expression (e.g., s.field += expr, arr[i] *= expr)
- Fix wp.tile_matmul() and wp.tile_fft() ignoring module-level enable_backward
- Fix @wp.func with tile parameters failing to compile with shared-memory tiles
- Fix struct field assignments converting Warp scalar types to plain Python types

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.12.1

Apr 6, 2026
fbceb2a
zip
tar.gz
Notes
Downloads

v1.12.0

v1.12.0

Highlights:
- Add experimental hardware-accelerated texture sampling on CUDA GPUs with wp.Texture1D/2D/3D and wp.texture_sample()
- Add subscript-style type hints (e.g., wp.array[float]) for better Pyright/Pylance compatibility
- Add tile arithmetic operators (*, /) with broadcast, differentiable FFT, and wp.tile_from_thread()
- Add jax.vmap() support for Warp kernels and callables via jax_kernel() and jax_callable()
- Add quaternion/spatial helpers, approximate math intrinsics, and wp.print_diagnostics()
- Add B-spline shape functions to warp.fem
- Allow NVRTC compilation without a CUDA driver for ahead-of-time compilation in Docker builds

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.12.0

Mar 6, 2026
e6c3ba2
zip
tar.gz
Notes
Downloads

v1.11.1

v1.11.1

Highlights:

- Fix wp.tile_matmul() sometimes producing NaN results when using the
  `c = wp.tile_matmul(a, b)` form due to reading uninitialized output memory
- Fix wp.static() incorrectly resolving loop variables to same-named global
  Python variables when used for static loop unrolling in kernels
- Fix segfault in conditional expressions (ternary if/else) when one branch
  accesses an array element and the other branch is taken
- Fix CUDA graphs with multiple temporary allocations using more memory than
  necessary due to improper sequencing of memory free operations
- Fix @wp.func decorated functions showing generic types in Pyright/Pylance
  instead of their actual signatures on Python 3.10+

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.11.1

Feb 1, 2026
173e179
zip
tar.gz
Notes
Downloads

v1.11.0

v1.11.0

Highlights:
- Add group-aware construction and queries for wp.Bvh and wp.Mesh to support multi-environment workloads
- Add wp.grad() to evaluate function gradients inline during the forward pass
- Add options to reduce JIT compilation time with precompiled headers, optimization level control, and parallel module compilation
- Extend wp.tile_map() to support n-ary operations (up to 8 arguments) and add wp.tile_randf()/wp.tile_randi() for random tile generation
- Add unpack operator (*) support in kernels for vectors, matrices, quaternions, and array slices

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.11.0

Jan 2, 2026
8a3c350
zip
tar.gz
Notes
Downloads

v1.10.1

v1.10.1

Highlights:

- Fix module="unique" kernels to properly reuse existing module objects,
  avoiding unnecessary overhead (especially noticeable on macOS)
- Fix kernel-local arrays (wp.zeros() in kernels): .ptr access, indexing,
  and shape parameter handling
- Fix code generation ordering for custom gradient functions (@wp.func_grad)
  when used with nested function calls
- Fix loops containing wp.static() expressions to unroll correctly regardless
  of max_unroll settings
- Fix reference cycles in wp.fem.Temporary and wp.fem.ShapeBasisSpace

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.10.1

Dec 1, 2025
7e719ed
zip
tar.gz
Notes
Downloads

v1.10.0

v1.10.0

Highlights:
- Add experimental JAX automatic differentiation support with jax_kernel(enable_backward=True)
- Add in-place wp.Bvh.rebuild() with CUDA graph support for allocation-free BVH updates
- Improve built-in function call performance from Python by up to 70× through caching
- Add tile programming enhancements: axis-specific reductions, component indexing, wp.tile_full()
- Remove warp.sim module (superseded by Newton library)

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.10.0

Nov 2, 2025
c19d0de
zip
tar.gz
Notes
Downloads

v1.9.1

v1.9.1

Highlights:

- Fix crash when using radix sort on multiple streams
- Fix memory management issues with shared tiles (double frees, leaks)
- Restore support for older GPU architectures (Maxwell, Pascal, Volta)
  when building with CUDA 12
- Fix TypeError with tuple type hints on Python 3.9/3.10
- Fix empty slice operations arr[i:i] that caused indexing errors

See the full changelog for more details:
https://github.com/NVIDIA/warp/releases/tag/v1.9.1

Oct 1, 2025
c60ce15
zip
tar.gz
Notes
Downloads

v1.9.0

v1.9.0

Highlights:
- wp.MarchingCubes rewrite in pure Warp, supporting CPU and GPU devices and differentiability
- wp.compile_aot_module() and wp.load_aot_module() to support basic ahead-of-time workflows
- More flexible indexing support for wp.matrix()/wp.vector()/wp.quaternion() types
- Support for IntEnum and IntFlag inside Warp kernels
- Add indexed tile operations: wp.tile_index_load(), wp.tile_index_store(), and wp.tile_index_atomic_add()

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.9.0

Sep 5, 2025
d4440b4
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.14.0

v1.13.0

v1.12.1

v1.12.0

v1.11.1

v1.11.0

v1.10.1

v1.10.0

v1.9.1

v1.9.0

Tags: NVIDIA/warp