Skip to content

Conversation

vanbasten23
Copy link
Collaborator

No description provided.

@vanbasten23
Copy link
Collaborator Author

I have a question about my test test_forward_pass_nn_model_compile_once in this pr: in the test I run the forward pass twice and expect we only compile it once. I assume met.metric_data("CompileTime")[0] indicates the number of compilation. But the actual value now is 3, instead of 1. I wonder if you have any pointers to debug such issue for me to begin with. @JackCaoG @miladm

xm.mark_step()
# TODO: figure out if met.metric_data("CompileTime") indicates
# the number of compilations. Also figure out why the counter now is 3 instead of the expected 1.
np.testing.assert_equal(met.metric_data('CompileTime')[0], 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dump the IR graphs then you will know what got executed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the IR dump, there are 30 ## BEGIN_GRAPH for the 10 iteration. I think it means for each iteration, the IR graph gets compiled into HLO for 3 times, right?

Also, does the metric "CompileTime" refer to compiling from IR to HLO or compiling from HLO to LLO?

Plus, with the actual met.metric_data('CompileTime')[0] being 3, I think the dynamic behavior is what we expected right? It's because the compilation time doesn't grow as the iteration. Is my understanding correct?

@vanbasten23 vanbasten23 marked this pull request as ready for review December 3, 2022 00:55
@vanbasten23
Copy link
Collaborator Author

Right now the newly added tests succeed on TPU but fails on CPU with error:

ERROR: test_forward_pass_dynamic_input_correctness (__main__.TestDynamicShapeModels)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/pytorch/xla/test/test_dynamic_shape_models.py", line 51, in test_forward_pass_dynamic_input_correctness
    xm.mark_step()
  File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.14-py3.7-linux-x86_64.egg/torch_xla/core/xla_model.py", line 953, in mark_step
    wait=xu.getenv_as('XLA_SYNC_WAIT', bool, False))
RuntimeError: INVALID_ARGUMENT: From /job:localservice/replica:0/task:0:
2 root error(s) found.
  (0) INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13
	 [[{{node XRTCompile}}]]
	 [[XRTCompile_G3]]
  (1) INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13
	 [[{{node XRTCompile}}]]
0 successful operations.
0 derived errors ignored.
Recent warning and error logs:
  0 successful operations.
  0 derived errors ignored.
  Recent warning and error logs:
    OP_REQUIRES failed at xrt_compile_ops.cc:221 : INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13
  OP_REQUIRES failed at xrt_compile_ops.cc:221 : INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13

I found a similar issue but it doesn't explain why it fails. I wonder if you have encountered this issue before @miladm @JackCaoG .

@JackCaoG
Copy link
Collaborator

JackCaoG commented Dec 6, 2022

maybe check where the error is from(somewhere in xla) and see why it failed?

y_pred = model(x_test)
before_train = criterion(y_pred.squeeze(), y_test)
xm.mark_step()
np.testing.assert_equal(met.metric_data('CompileTime')[0], 3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it 3 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the "CompileTime" here refer to compiling IR graph to HLO graph, or compiling HLO to LLO/executable?

The IR dump shows 3 graphs: 2 for before_train = criterion(y_pred.squeeze(), y_test) and 1 for xm.mark_step(). The "CompileTime" doesn't grow linearly with the number of iteration. Does 3 match your expectation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so the second graph

[ScheduleSyncTensorsGraph]
TensorsGraphInfo:
  __bool__ (/home/ptxla/.local/lib/python3.8/site-packages/torch/__init__.py:212)
  binary_cross_entropy (/home/ptxla/.local/lib/python3.8/site-packages/torch/nn/functional.py:3087)
  forward (/home/ptxla/.local/lib/python3.8/site-packages/torch/nn/modules/loss.py:619)
  _call_impl (/home/ptxla/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1480)
  test_forward_pass_dynamic_input_compile_once (pytorch/xla/test/test_dynamic_shape_models.py:71)
  _callTestMethod (/usr/local/lib/python3.8/unittest/case.py:633)
  run (/usr/local/lib/python3.8/unittest/case.py:676)
  __call__ (/usr/local/lib/python3.8/unittest/case.py:736)
  run (/usr/local/lib/python3.8/unittest/suite.py:122)
  __call__ (/usr/local/lib/python3.8/unittest/suite.py:84)
  run (/usr/local/lib/python3.8/unittest/suite.py:122)
  __call__ (/usr/local/lib/python3.8/unittest/suite.py:84)
  run (/usr/local/lib/python3.8/unittest/runner.py:176)
  runTests (/usr/local/lib/python3.8/unittest/main.py:271)
  __init__ (/usr/local/lib/python3.8/unittest/main.py:101)
  <module> (pytorch/xla/test/test_dynamic_shape_models.py:93)
 
Hashes: (5c2a92a233f40275064b7ca64d2c16ba)
 
## BEGIN_GRAPH
IR {
  %0 = f32[1]{0} xla::device_data(), location=convert@module.py:1128, device=TPU:0
  %1 = f32[1,10]{1,0} xla::device_data(), location=convert@module.py:1128, device=TPU:0
  %2 = f32[10,1]{0,1} aten::permute(%1), location=forward@linear.py:114, dims=(1, 0)
  %3 = f32[10]{0} xla::device_data(), location=convert@module.py:1128, device=TPU:0
  %4 = f32[10,2]{0,1} xla::device_data(), location=convert@module.py:1128, device=TPU:0
  %5 = f32[2,10]{1,0} aten::permute(%4), location=forward@linear.py:114, dims=(1, 0)
  %6 = f32[5,2]{0,1} xla::device_data(), location=create_dynamic_test_data@test_dynamic_shape_models.py:85, device=TPU:0
  %7 = s32[5,2]{0,1} xla::cast(%6), location=create_dynamic_test_data@test_dynamic_shape_models.py:86, type=s32, dtype=Int, stype=Float
  %8 = (s32[<=10,2]{1,0}, s32[]) aten::nonzero(%7), num_outputs=2, location=create_dynamic_test_data@test_dynamic_shape_models.py:86
  %9 = f32[<=10,2]{1,0} xla::cast(%8.0), location=create_dynamic_test_data@test_dynamic_shape_models.py:86, type=f32, dtype=Float, stype=Int
  %10 = f32[<=10,10]{1,0} aten::addmm(%9, %5, %3), location=forward@linear.py:114
  %11 = f32[<=10,10]{1,0} aten::relu(%10), location=relu@functional.py:1457
  %12 = f32[<=10,1]{1,0} aten::addmm(%11, %2, %0), location=forward@linear.py:114
  %13 = f32[<=10,1]{1,0} aten::sigmoid(%12), location=forward@activation.py:294
  %14 = f32[<=10]{0} aten::view(%13), location=binary_cross_entropy@functional.py:3087, output_size=(10)
  %15 = s32[] aten::size(%14), ROOT=0
}

is a bit concerning, it seems like we materalize the size by a bool operator somewhere. I would like to understand where does that happen in a follow up pr.

@JackCaoG
Copy link
Collaborator

JackCaoG commented Dec 7, 2022

Hmm I think I know what's going on, in cpu compiler(tensorflow/compiler/xla/service/cpu/cpu_compiler.cc)

dynamic_padder_options.shape_check_mode = DynamicDimensionInference::ShapeCheckMode::kCompileTime;

which will fail if at compileTime it can not verify if two shapes are equivalent, this will pretty much blocks ds work. In GPU it is being set to kRuntime which only check shape eqality at run time. I think we can follow up on why CPU does not support this check, but run this test only on GPU and TPU for now.

@vanbasten23
Copy link
Collaborator Author

vanbasten23 commented Dec 7, 2022

In GPU it is being set to kRuntime

How do you know "In GPU it is being set to kRuntime"?

Also, how do we usually follow up on why CPU doesn't support this check? Do we just ask Blake or open an github issue for tensorflow?

@JackCaoG
Copy link
Collaborator

JackCaoG commented Dec 7, 2022

In GPU it is being set to kRuntime

How do you know "In GPU it is being set to kRuntime"?

Also, how do we usually follow up on why CPU doesn't support this check? Do we just ask Blake or open an github issue for tensorflow?

I just search the error message in xla code base and then find the kCompileTime and kRunTime. Then search kRunTime and found gpu is using it.


def test_forward_pass_dynamic_input_correctness(self):
losses = []
for dev in [torch.device('gpu'), xla_dev]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miladm @JackCaoG is it possible to get 2 devices on one machine?

This tests is designed to test the model generates the same model on different devices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need this test, we expect the HLO generation part is mostly device independent.

@vanbasten23
Copy link
Collaborator Author

I think we can follow up on why CPU does not support this check, but run this test only on GPU and TPU for now.

This is done now. Also created an issue to track #4298. Can you take another look at the PR?

@vanbasten23 vanbasten23 requested a review from JackCaoG December 8, 2022 00:25


@unittest.skipIf(
xm.get_xla_supported_devices("CPU"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check if this test will get run on GPU? Check the gpu test log. What could happen is that GPU CI also can get the CPU device. It is better if you specifilly check if you can get gpu and tpu device.

Copy link
Collaborator

@JackCaoG JackCaoG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge it after you verify test actually get run in GPU. I also think we don't need to compare gpu and cpu HLO in this test.

@vanbasten23
Copy link
Collaborator Author

vanbasten23 commented Dec 9, 2022

Feel free to merge it after you verify test actually get run in GPU.

You are right. With # @unittest.skipIf(xm.get_xla_supported_devices("CPU"), my test didn't run on GPU:

OK
+ run_dynamic python3 /tmp/pytorch/xla/test/test_dynamic_shape_models.py --verbosity=2
+ [[ '' == \1 ]]
+ echo 'Running in DynamicShape mode: python3' /tmp/pytorch/xla/test/test_dynamic_shape_models.py --verbosity=2
Running in DynamicShape mode: python3 /tmp/pytorch/xla/test/test_dynamic_shape_models.py --verbosity=2
+ XLA_EXPERIMENTAL=nonzero:masked_select:masked_scatter
+ run_test python3 /tmp/pytorch/xla/test/test_dynamic_shape_models.py --verbosity=2
+ python3 /tmp/pytorch/xla/test/test_dynamic_shape_models.py --verbosity=2
test_forward_pass_dynamic_input_compile_once (__main__.TestDynamicShapeModels) ... skipped 'The tests fail on CPU. See https://github.com/pytorch/xla/issues/4298 for more detail.'
test_forward_pass_dynamic_input_correctness (__main__.TestDynamicShapeModels) ... skipped 'The tests fail on CPU. See https://github.com/pytorch/xla/issues/4298 for more detail.'

----------------------------------------------------------------------
Ran 2 tests in 0.003s

I've pushed another commit to fix that.

I also think we don't need to compare gpu and cpu HLO in this test.

I modified the test_forward_pass_dynamic_input_correctness test such that I run the same test twice on the same device and make sure 2 losses are the same.

Edit: I verified that the test runs on GPU.

@vanbasten23 vanbasten23 merged commit 9894c96 into master Dec 9, 2022
@JackCaoG
Copy link
Collaborator

JackCaoG commented Dec 9, 2022

@cicirori @ymwangg Feel free to give this a try, we are still working on enabling the backward in #4289

@miladm miladm added dynamism Dynamic Shape Features testing Testing and coverage related issues. labels Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dynamism Dynamic Shape Features testing Testing and coverage related issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants