Tags: NVIDIA/warp
Tags
v1.14.0 Highlights: - Extend CPU APIC graph capture serialization to replay backward launches, tiled kernels, richer launch arguments, and structs or indexed arrays carrying Warp array buffers - Add multi-environment warp.fem geometries with environment-aware lookup and environment-first partitions for batched solves - Add reusable and batched warp.optim.linear solvers with preallocated solver state and batch_offsets support - Add pluggable Python logging through wp.set_logger(), wp.ScopedLogger, and wp.config.log_level - Relax CPU/GPU array launch validation for HMM and ATS systems with wp.can_access() and LaunchArrayAccessMode controls - Promote JAX integration to stable top-level APIs and deprecate warp.jax_experimental - Add portable tile FFT and solver fallbacks for CPU and libmathdx-free GPU builds, plus wp.tile_empty() - Fix math and autodiff correctness for NaN min/max/clamp/atomics, composite-component writes, curlnoise gradients, and large tile offsets See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.14.0
v1.13.0 Highlights: - Add experimental graph capture serialization (wp.capture_save/wp.capture_load) with portable .wrp format and standalone C++ replay on both GPU and CPU - Add wp.bfloat16 scalar type with array allocation, kernel execution, autodiff, DLPack, PyTorch, and JAX interop - Add pluggable CUDA allocator interface (wp.set_cuda_allocator) with built-in RAPIDS Memory Manager (RMM) integration - Add scoped memory tracking with C++-layer call-site attribution via wp.ScopedMemoryTracker - Add experimental cuBQL BVH backend for wp.Mesh ray queries on dense meshes - Add new tile primitives: wp.tile_dot, wp.tile_axpy, wp.tile_stack family, wp.tile_scatter_add/masked, wp.tile_query_valid - Add double-precision (wp.float64) support to warp.fem - Remove Python 3.9 support (Python 3.10 is now the minimum) See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.13.0
v1.12.1 Highlights: - Fix kernel dispatch using incorrect block_dim across devices, causing crashes or memory corruption in tile kernels - Fix silent precision loss in compile-time constants passed to 64-bit scalar constructors (wp.float64(), wp.int64(), wp.uint64()) - Fix wp.HashGrid neighbor queries missing results for negative coordinates - Fix augmented assignments with subscript/attribute targets double-evaluating the target expression (e.g., s.field += expr, arr[i] *= expr) - Fix wp.tile_matmul() and wp.tile_fft() ignoring module-level enable_backward - Fix @wp.func with tile parameters failing to compile with shared-memory tiles - Fix struct field assignments converting Warp scalar types to plain Python types See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.12.1
v1.12.0 Highlights: - Add experimental hardware-accelerated texture sampling on CUDA GPUs with wp.Texture1D/2D/3D and wp.texture_sample() - Add subscript-style type hints (e.g., wp.array[float]) for better Pyright/Pylance compatibility - Add tile arithmetic operators (*, /) with broadcast, differentiable FFT, and wp.tile_from_thread() - Add jax.vmap() support for Warp kernels and callables via jax_kernel() and jax_callable() - Add quaternion/spatial helpers, approximate math intrinsics, and wp.print_diagnostics() - Add B-spline shape functions to warp.fem - Allow NVRTC compilation without a CUDA driver for ahead-of-time compilation in Docker builds See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.12.0
v1.11.1 Highlights: - Fix wp.tile_matmul() sometimes producing NaN results when using the `c = wp.tile_matmul(a, b)` form due to reading uninitialized output memory - Fix wp.static() incorrectly resolving loop variables to same-named global Python variables when used for static loop unrolling in kernels - Fix segfault in conditional expressions (ternary if/else) when one branch accesses an array element and the other branch is taken - Fix CUDA graphs with multiple temporary allocations using more memory than necessary due to improper sequencing of memory free operations - Fix @wp.func decorated functions showing generic types in Pyright/Pylance instead of their actual signatures on Python 3.10+ See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.11.1
v1.11.0 Highlights: - Add group-aware construction and queries for wp.Bvh and wp.Mesh to support multi-environment workloads - Add wp.grad() to evaluate function gradients inline during the forward pass - Add options to reduce JIT compilation time with precompiled headers, optimization level control, and parallel module compilation - Extend wp.tile_map() to support n-ary operations (up to 8 arguments) and add wp.tile_randf()/wp.tile_randi() for random tile generation - Add unpack operator (*) support in kernels for vectors, matrices, quaternions, and array slices See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.11.0
v1.10.1 Highlights: - Fix module="unique" kernels to properly reuse existing module objects, avoiding unnecessary overhead (especially noticeable on macOS) - Fix kernel-local arrays (wp.zeros() in kernels): .ptr access, indexing, and shape parameter handling - Fix code generation ordering for custom gradient functions (@wp.func_grad) when used with nested function calls - Fix loops containing wp.static() expressions to unroll correctly regardless of max_unroll settings - Fix reference cycles in wp.fem.Temporary and wp.fem.ShapeBasisSpace See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.10.1
v1.10.0 Highlights: - Add experimental JAX automatic differentiation support with jax_kernel(enable_backward=True) - Add in-place wp.Bvh.rebuild() with CUDA graph support for allocation-free BVH updates - Improve built-in function call performance from Python by up to 70× through caching - Add tile programming enhancements: axis-specific reductions, component indexing, wp.tile_full() - Remove warp.sim module (superseded by Newton library) See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.10.0
v1.9.1 Highlights: - Fix crash when using radix sort on multiple streams - Fix memory management issues with shared tiles (double frees, leaks) - Restore support for older GPU architectures (Maxwell, Pascal, Volta) when building with CUDA 12 - Fix TypeError with tuple type hints on Python 3.9/3.10 - Fix empty slice operations arr[i:i] that caused indexing errors See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.9.1
v1.9.0 Highlights: - wp.MarchingCubes rewrite in pure Warp, supporting CPU and GPU devices and differentiability - wp.compile_aot_module() and wp.load_aot_module() to support basic ahead-of-time workflows - More flexible indexing support for wp.matrix()/wp.vector()/wp.quaternion() types - Support for IntEnum and IntFlag inside Warp kernels - Add indexed tile operations: wp.tile_index_load(), wp.tile_index_store(), and wp.tile_index_atomic_add() See the full changelog for more details: https://github.com/NVIDIA/warp/releases/tag/v1.9.0
PreviousNext