Skip to content

BWD nn test with dynamic input without sigmoid results in a new error #4374

@vanbasten23

Description

@vanbasten23

🐛 Bug

BWD nn test with dynamic input without sigmoid results in a new error.
A similar model, BWD nn test with dynamic input with sigmoid, results in a error in autograd: #4322. So I replaced the sigmoid with relu and the new model failed with a new error:

Traceback (most recent call last):
  File "pytorch/xla/test/test_dynamic_shape_backward_models.py", line 82, in <module>
    train(model, loss_fn=criterion, optimizer=optimizer)
  File "pytorch/xla/test/test_dynamic_shape_backward_models.py", line 69, in train
    loss.backward()
  File "/home/ptxla/.local/lib/python3.8/site-packages/torch/_tensor.py", line 484, in backward
    torch.autograd.backward(
  File "/home/ptxla/.local/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: torch_xla/csrc/helpers.cpp:273 : Check failed: out_size <= size_at_dyndim / input_shape.dimensions( input_dynamic_dimension) (10 vs. 1)
*** Begin stack trace ***
        tsl::CurrentStackTrace[abi:cxx11]()
        torch_xla::XlaHelpers::GetDynamicReshapeInfo(xla::Shape const&, absl::lts_20220623::Span<long const>)
        torch_xla::XlaHelpers::GetDynamicReshape(xla::Shape const&, absl::lts_20220623::Span<long const>)
        torch_xla::Permute::MakePermuteShape(xla::Shape const&, absl::lts_20220623::Span<long const>)
        torch_xla::ViewInfo::ViewInfo(torch_xla::ViewInfo::Type, xla::Shape, std::vector<long, std::allocator<long> >)
        torch_xla::tensor_methods::transpose(c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> > const&, long, long)
        torch_xla::XLANativeFunctions::t(at::Tensor const&)


        at::_ops::t::redispatch(c10::DispatchKeySet, at::Tensor const&)

        at::_ops::t::redispatch(c10::DispatchKeySet, at::Tensor const&)

        at::_ops::t::call(at::Tensor const&)

        torch::autograd::generated::AddmmBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)

        torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&)
        torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&)
        torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool)
        torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool)


        clone
*** End stack trace ***
Unable to map dynamic dimension of shape f32[<=80,10]{1,0} to output sizes (10, 80)

full error with print statement.

To Reproduce

Run the script from pr on TPU VM:

export XRT_TPU_CONFIG="localservice;0;localhost:51011"
export XLA_EXPERIMENTAL="nonzero:masked_select"
python3 pytorch/xla/test/test_dynamic_shape_backward_models.py

Expected behavior

It shouldn't crash.

Environment

  • Reproducible on XLA backend [CPU/TPU]: TPU
  • torch_xla version: HEAD

Additional context

Metadata

Metadata

Assignees

Labels

dynamismDynamic Shape Features

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions