Skip to content

Tags: NVIDIA/warp

Tags

v1.14.0

Toggle v1.14.0's commit message

Verified

This tag was signed with the committer’s verified signature.
shi-eric Eric Shi
v1.14.0

Highlights:
- Extend CPU APIC graph capture serialization to replay backward
  launches, tiled kernels, richer launch arguments, and structs or
  indexed arrays carrying Warp array buffers
- Add multi-environment warp.fem geometries with environment-aware
  lookup and environment-first partitions for batched solves
- Add reusable and batched warp.optim.linear solvers with preallocated
  solver state and batch_offsets support
- Add pluggable Python logging through wp.set_logger(),
  wp.ScopedLogger, and wp.config.log_level
- Relax CPU/GPU array launch validation for HMM and ATS systems with
  wp.can_access() and LaunchArrayAccessMode controls
- Promote JAX integration to stable top-level APIs and deprecate
  warp.jax_experimental
- Add portable tile FFT and solver fallbacks for CPU and libmathdx-free
  GPU builds, plus wp.tile_empty()
- Fix math and autodiff correctness for NaN min/max/clamp/atomics,
  composite-component writes, curlnoise gradients, and large tile
  offsets

See the full changelog for more details:
https://github.com/NVIDIA/warp/releases/tag/v1.14.0

v1.13.0

Toggle v1.13.0's commit message

Verified

This tag was signed with the committer’s verified signature.
shi-eric Eric Shi
v1.13.0

Highlights:
- Add experimental graph capture serialization (wp.capture_save/wp.capture_load) with portable .wrp format and standalone C++ replay on both GPU and CPU
- Add wp.bfloat16 scalar type with array allocation, kernel execution, autodiff, DLPack, PyTorch, and JAX interop
- Add pluggable CUDA allocator interface (wp.set_cuda_allocator) with built-in RAPIDS Memory Manager (RMM) integration
- Add scoped memory tracking with C++-layer call-site attribution via wp.ScopedMemoryTracker
- Add experimental cuBQL BVH backend for wp.Mesh ray queries on dense meshes
- Add new tile primitives: wp.tile_dot, wp.tile_axpy, wp.tile_stack family, wp.tile_scatter_add/masked, wp.tile_query_valid
- Add double-precision (wp.float64) support to warp.fem
- Remove Python 3.9 support (Python 3.10 is now the minimum)

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.13.0

v1.12.1

Toggle v1.12.1's commit message

Verified

This tag was signed with the committer’s verified signature.
shi-eric Eric Shi
v1.12.1

Highlights:

- Fix kernel dispatch using incorrect block_dim across devices, causing
  crashes or memory corruption in tile kernels
- Fix silent precision loss in compile-time constants passed to 64-bit
  scalar constructors (wp.float64(), wp.int64(), wp.uint64())
- Fix wp.HashGrid neighbor queries missing results for negative coordinates
- Fix augmented assignments with subscript/attribute targets double-evaluating
  the target expression (e.g., s.field += expr, arr[i] *= expr)
- Fix wp.tile_matmul() and wp.tile_fft() ignoring module-level enable_backward
- Fix @wp.func with tile parameters failing to compile with shared-memory tiles
- Fix struct field assignments converting Warp scalar types to plain Python types

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.12.1

v1.12.0

Toggle v1.12.0's commit message

Verified

This tag was signed with the committer’s verified signature.
shi-eric Eric Shi
v1.12.0

Highlights:
- Add experimental hardware-accelerated texture sampling on CUDA GPUs with wp.Texture1D/2D/3D and wp.texture_sample()
- Add subscript-style type hints (e.g., wp.array[float]) for better Pyright/Pylance compatibility
- Add tile arithmetic operators (*, /) with broadcast, differentiable FFT, and wp.tile_from_thread()
- Add jax.vmap() support for Warp kernels and callables via jax_kernel() and jax_callable()
- Add quaternion/spatial helpers, approximate math intrinsics, and wp.print_diagnostics()
- Add B-spline shape functions to warp.fem
- Allow NVRTC compilation without a CUDA driver for ahead-of-time compilation in Docker builds

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.12.0

v1.11.1

Toggle v1.11.1's commit message

Verified

This tag was signed with the committer’s verified signature.
shi-eric Eric Shi
v1.11.1

Highlights:

- Fix wp.tile_matmul() sometimes producing NaN results when using the
  `c = wp.tile_matmul(a, b)` form due to reading uninitialized output memory
- Fix wp.static() incorrectly resolving loop variables to same-named global
  Python variables when used for static loop unrolling in kernels
- Fix segfault in conditional expressions (ternary if/else) when one branch
  accesses an array element and the other branch is taken
- Fix CUDA graphs with multiple temporary allocations using more memory than
  necessary due to improper sequencing of memory free operations
- Fix @wp.func decorated functions showing generic types in Pyright/Pylance
  instead of their actual signatures on Python 3.10+

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.11.1

v1.11.0

Toggle v1.11.0's commit message

Verified

This tag was signed with the committer’s verified signature.
shi-eric Eric Shi
v1.11.0

Highlights:
- Add group-aware construction and queries for wp.Bvh and wp.Mesh to support multi-environment workloads
- Add wp.grad() to evaluate function gradients inline during the forward pass
- Add options to reduce JIT compilation time with precompiled headers, optimization level control, and parallel module compilation
- Extend wp.tile_map() to support n-ary operations (up to 8 arguments) and add wp.tile_randf()/wp.tile_randi() for random tile generation
- Add unpack operator (*) support in kernels for vectors, matrices, quaternions, and array slices

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.11.0

v1.10.1

Toggle v1.10.1's commit message

Verified

This tag was signed with the committer’s verified signature.
shi-eric Eric Shi
v1.10.1

Highlights:

- Fix module="unique" kernels to properly reuse existing module objects,
  avoiding unnecessary overhead (especially noticeable on macOS)
- Fix kernel-local arrays (wp.zeros() in kernels): .ptr access, indexing,
  and shape parameter handling
- Fix code generation ordering for custom gradient functions (@wp.func_grad)
  when used with nested function calls
- Fix loops containing wp.static() expressions to unroll correctly regardless
  of max_unroll settings
- Fix reference cycles in wp.fem.Temporary and wp.fem.ShapeBasisSpace

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.10.1

v1.10.0

Toggle v1.10.0's commit message

Verified

This tag was signed with the committer’s verified signature.
shi-eric Eric Shi
v1.10.0

Highlights:
- Add experimental JAX automatic differentiation support with jax_kernel(enable_backward=True)
- Add in-place wp.Bvh.rebuild() with CUDA graph support for allocation-free BVH updates
- Improve built-in function call performance from Python by up to 70× through caching
- Add tile programming enhancements: axis-specific reductions, component indexing, wp.tile_full()
- Remove warp.sim module (superseded by Newton library)

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.10.0

v1.9.1

Toggle v1.9.1's commit message

Verified

This tag was signed with the committer’s verified signature.
shi-eric Eric Shi
v1.9.1

Highlights:

- Fix crash when using radix sort on multiple streams
- Fix memory management issues with shared tiles (double frees, leaks)
- Restore support for older GPU architectures (Maxwell, Pascal, Volta)
  when building with CUDA 12
- Fix TypeError with tuple type hints on Python 3.9/3.10
- Fix empty slice operations arr[i:i] that caused indexing errors

See the full changelog for more details:
https://github.com/NVIDIA/warp/releases/tag/v1.9.1

v1.9.0

Toggle v1.9.0's commit message

Verified

This tag was signed with the committer’s verified signature.
shi-eric Eric Shi
v1.9.0

Highlights:
- wp.MarchingCubes rewrite in pure Warp, supporting CPU and GPU devices and differentiability
- wp.compile_aot_module() and wp.load_aot_module() to support basic ahead-of-time workflows
- More flexible indexing support for wp.matrix()/wp.vector()/wp.quaternion() types
- Support for IntEnum and IntFlag inside Warp kernels
- Add indexed tile operations: wp.tile_index_load(), wp.tile_index_store(), and wp.tile_index_atomic_add()

See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.9.0