-
Notifications
You must be signed in to change notification settings - Fork 566
Add a forward pass nn model with dynamism test. #4256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I have a question about my test |
test/test_dynamic_shape_models.py
Outdated
xm.mark_step() | ||
# TODO: figure out if met.metric_data("CompileTime") indicates | ||
# the number of compilations. Also figure out why the counter now is 3 instead of the expected 1. | ||
np.testing.assert_equal(met.metric_data('CompileTime')[0], 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dump the IR graphs then you will know what got executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the IR dump, there are 30 ## BEGIN_GRAPH
for the 10 iteration. I think it means for each iteration, the IR graph gets compiled into HLO for 3 times, right?
Also, does the metric "CompileTime" refer to compiling from IR to HLO or compiling from HLO to LLO?
Plus, with the actual met.metric_data('CompileTime')[0]
being 3, I think the dynamic behavior is what we expected right? It's because the compilation time doesn't grow as the iteration. Is my understanding correct?
Right now the newly added tests succeed on TPU but fails on CPU with error:
I found a similar issue but it doesn't explain why it fails. I wonder if you have encountered this issue before @miladm @JackCaoG . |
maybe check where the error is from(somewhere in xla) and see why it failed? |
y_pred = model(x_test) | ||
before_train = criterion(y_pred.squeeze(), y_test) | ||
xm.mark_step() | ||
np.testing.assert_equal(met.metric_data('CompileTime')[0], 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it 3 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the "CompileTime" here refer to compiling IR graph to HLO graph, or compiling HLO to LLO/executable?
The IR dump shows 3 graphs: 2 for before_train = criterion(y_pred.squeeze(), y_test)
and 1 for xm.mark_step()
. The "CompileTime" doesn't grow linearly with the number of iteration. Does 3
match your expectation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, so the second graph
[ScheduleSyncTensorsGraph]
TensorsGraphInfo:
__bool__ (/home/ptxla/.local/lib/python3.8/site-packages/torch/__init__.py:212)
binary_cross_entropy (/home/ptxla/.local/lib/python3.8/site-packages/torch/nn/functional.py:3087)
forward (/home/ptxla/.local/lib/python3.8/site-packages/torch/nn/modules/loss.py:619)
_call_impl (/home/ptxla/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1480)
test_forward_pass_dynamic_input_compile_once (pytorch/xla/test/test_dynamic_shape_models.py:71)
_callTestMethod (/usr/local/lib/python3.8/unittest/case.py:633)
run (/usr/local/lib/python3.8/unittest/case.py:676)
__call__ (/usr/local/lib/python3.8/unittest/case.py:736)
run (/usr/local/lib/python3.8/unittest/suite.py:122)
__call__ (/usr/local/lib/python3.8/unittest/suite.py:84)
run (/usr/local/lib/python3.8/unittest/suite.py:122)
__call__ (/usr/local/lib/python3.8/unittest/suite.py:84)
run (/usr/local/lib/python3.8/unittest/runner.py:176)
runTests (/usr/local/lib/python3.8/unittest/main.py:271)
__init__ (/usr/local/lib/python3.8/unittest/main.py:101)
<module> (pytorch/xla/test/test_dynamic_shape_models.py:93)
Hashes: (5c2a92a233f40275064b7ca64d2c16ba)
## BEGIN_GRAPH
IR {
%0 = f32[1]{0} xla::device_data(), location=convert@module.py:1128, device=TPU:0
%1 = f32[1,10]{1,0} xla::device_data(), location=convert@module.py:1128, device=TPU:0
%2 = f32[10,1]{0,1} aten::permute(%1), location=forward@linear.py:114, dims=(1, 0)
%3 = f32[10]{0} xla::device_data(), location=convert@module.py:1128, device=TPU:0
%4 = f32[10,2]{0,1} xla::device_data(), location=convert@module.py:1128, device=TPU:0
%5 = f32[2,10]{1,0} aten::permute(%4), location=forward@linear.py:114, dims=(1, 0)
%6 = f32[5,2]{0,1} xla::device_data(), location=create_dynamic_test_data@test_dynamic_shape_models.py:85, device=TPU:0
%7 = s32[5,2]{0,1} xla::cast(%6), location=create_dynamic_test_data@test_dynamic_shape_models.py:86, type=s32, dtype=Int, stype=Float
%8 = (s32[<=10,2]{1,0}, s32[]) aten::nonzero(%7), num_outputs=2, location=create_dynamic_test_data@test_dynamic_shape_models.py:86
%9 = f32[<=10,2]{1,0} xla::cast(%8.0), location=create_dynamic_test_data@test_dynamic_shape_models.py:86, type=f32, dtype=Float, stype=Int
%10 = f32[<=10,10]{1,0} aten::addmm(%9, %5, %3), location=forward@linear.py:114
%11 = f32[<=10,10]{1,0} aten::relu(%10), location=relu@functional.py:1457
%12 = f32[<=10,1]{1,0} aten::addmm(%11, %2, %0), location=forward@linear.py:114
%13 = f32[<=10,1]{1,0} aten::sigmoid(%12), location=forward@activation.py:294
%14 = f32[<=10]{0} aten::view(%13), location=binary_cross_entropy@functional.py:3087, output_size=(10)
%15 = s32[] aten::size(%14), ROOT=0
}
is a bit concerning, it seems like we materalize the size
by a bool
operator somewhere. I would like to understand where does that happen in a follow up pr.
Hmm I think I know what's going on, in cpu compiler(tensorflow/compiler/xla/service/cpu/cpu_compiler.cc)
which will fail if at compileTime it can not verify if two shapes are equivalent, this will pretty much blocks ds work. In GPU it is being set to |
How do you know "In GPU it is being set to kRuntime"? Also, how do we usually follow up on why CPU doesn't support this check? Do we just ask Blake or open an github issue for tensorflow? |
I just search the error message in xla code base and then find the |
test/test_dynamic_shape_models.py
Outdated
|
||
def test_forward_pass_dynamic_input_correctness(self): | ||
losses = [] | ||
for dev in [torch.device('gpu'), xla_dev]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't need this test, we expect the HLO generation part is mostly device independent.
This is done now. Also created an issue to track #4298. Can you take another look at the PR? |
test/test_dynamic_shape_models.py
Outdated
|
||
|
||
@unittest.skipIf( | ||
xm.get_xla_supported_devices("CPU"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check if this test will get run on GPU? Check the gpu test log. What could happen is that GPU CI also can get the CPU device. It is better if you specifilly check if you can get gpu and tpu device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to merge it after you verify test actually get run in GPU. I also think we don't need to compare gpu and cpu HLO in this test.
You are right. With
I've pushed another commit to fix that.
I modified the Edit: I verified that the test runs on GPU. |
No description provided.