Skip to content

PyTorch/XLA 2.8 release

Latest
Compare
Choose a tag to compare
@bhavya01 bhavya01 released this 13 Aug 22:39
· 167 commits to master since this release

Highlights

  • Broader Platform Support: Build support has been added for Python 3.12 and 3.13.

  • New Quantization Features: Introduced support for weight-8-bit, activation-8-bit (w8a8) quantized matrix multiplication, including both the kernel and the Torch/XLA wrapper.

  • Torchax Enhancements: The torchax library has been significantly improved with new features like torchax.amp.autocast, support for bicubic and bilinear resampling, and better interoperability with Flax.

  • Distributed Computing: Added support for the torch.distributed.scatter collective operation and fixed logic for all_gather_into_tensor.

Bug Fixes

  • Fixed an issue in EmbeddingDenseBackward by removing an unnecessary cast of padding_idx to double.
  • Corrected all_gather_into_tensor logic.
  • Resolved an issue where Allgather would incorrectly check tuple shapes.
  • Fixed unspecified behavior in custom calls.
  • Addressed a bug in the ragged paged attention kernel by applying a KV mask to filter out NaNs.
  • Corrected the reloading of sharded optimizer parameter groups and master weights.
  • Fixed an issue where NoneRemover was not returning the modified list/tuple.

Deprecations

  • A warning will now be surfaced upon initialization if the deprecated XLA:CUDA device is used. Nightly CUDA builds have been removed.
  • The devkind field from xla_model.xla_device is deprecated.
  • ShapeOfXlaOp is deprecated in favor of GetShape.

What's Changed

New Contributors

Full Changelog: v2.7.0...v2.8.0