-
Notifications
You must be signed in to change notification settings - Fork 2.1k
FIX Multi GPU tests: explicit device map #2484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Multi GPU tests: explicit device map #2484
Conversation
Some multi GPU tests had device_map="auto" but some recent changes in accelerate resulted in parameters being moved to a single device. Now set the device map explicitly to avoid that. Add a more rigorous check to ensure that the parameters are really on multiple devices.
|
@SunMarc Could you please check if this change makes sense and also check if you have an idea about this new error. It feels like moving the device of the layer should not require creating a completely new one. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just a nit
It feels like moving the device of the layer should not require creating a completely new one.
The tensor are recreated and this is not done in place, hence this is why we did that. As for why are calling param_cls, again, i'm not so sure. This is potentially related to bnb. I can dig it up with needed
tests/test_gpu_examples.py
Outdated
| device_map = { | ||
| "": 0, | ||
| "model.decoder.layers.11": 1, | ||
| "model.decoder.layers.11.activation_fn": 1, | ||
| "model.decoder.layers.11.fc1": 1, | ||
| "model.decoder.layers.11.fc2": 1, | ||
| "model.decoder.layers.11.final_layer_norm": 1, | ||
| "model.decoder.layers.11.self_attn": 1, | ||
| "model.decoder.layers.11.self_attn.k_proj": 1, | ||
| "model.decoder.layers.11.self_attn.out_proj": 1, | ||
| "model.decoder.layers.11.self_attn.q_proj": 1, | ||
| "model.decoder.layers.11.self_attn.v_proj": 1, | ||
| "model.decoder.layers.11.self_attn_layer_norm": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be better to have something like this instead:
device_map = {'model.decoder.embed_tokens': 0, 'lm_head': 0, 'model.decoder.embed_positions': 0, 'model.decoder.project_out': 0, 'model.decoder.project_in': 0, 'model.decoder.layers.0': 0, 'model.decoder.layers.1': 0, 'model.decoder.layers.2': 0, 'model.decoder.layers.3': 0, 'model.decoder.layers.4': 0, 'model.decoder.layers.5': 0, 'model.decoder.layers.6': 0, 'model.decoder.layers.7': 0, 'model.decoder.layers.8': 0, 'model.decoder.layers.9': 0, 'model.decoder.layers.10': 0, 'model.decoder.layers.11': 1, 'model.decoder.layers.12': 1, 'model.decoder.layers.13': 1, 'model.decoder.layers.14': 1, 'model.decoder.layers.15': 1, 'model.decoder.layers.16': 1, 'model.decoder.layers.17': 1, 'model.decoder.layers.18': 1, 'model.decoder.layers.19': 1, 'model.decoder.layers.20': 1, 'model.decoder.layers.21': 1, 'model.decoder.layers.22': 1, 'model.decoder.layers.23': 1}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change that, although there are only 12 layers here, so would drop everything past 11. Just for my understanding, why would this be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my bad, i looked at facebook/opt-350m model and it has 23 layers. but the actual quantized model is from facebook/opt-125m.
Let's use that then :
device_map = {'model.decoder.embed_tokens': 0, 'lm_head': 0, 'model.decoder.embed_positions': 0, 'model.decoder.project_out': 0, 'model.decoder.project_in': 0, 'model.decoder.layers.0': 0, 'model.decoder.layers.1': 0, 'model.decoder.layers.2': 0, 'model.decoder.layers.3': 0, 'model.decoder.layers.4': 0, 'model.decoder.layers.5': 0, 'model.decoder.layers.6': 1, 'model.decoder.layers.7': 1, 'model.decoder.layers.8': 1, 'model.decoder.layers.9': 1, 'model.decoder.layers.10': 1, 'model.decoder.layers.11': 1}
|
@SunMarc I took your suggested |
Wow, nice that it got fixed by magic xD |
Some multi GPU tests had device_map="auto" but some recent changes in accelerate resulted in parameters being moved to a single device. Now set the device map explicitly to avoid that. Add a more rigorous check to ensure that the parameters are really on multiple devices.
Some multi GPU tests had device_map="auto" but some recent changes in accelerate resulted in parameters being moved to a single device. Now set the device map explicitly to avoid that. Add a more rigorous check to ensure that the parameters are really on multiple devices.
* smol course links and badges * try without space * revert space
Some multi GPU tests had
device_map="auto"but some recent changes in accelerate resulted in parameters being moved to a single device. Now set the device map explicitly to avoid that. Add a more rigorous check to ensure that the parameters are really on multiple devices.Notes:
These tests require GPUs and are thus not part of the normal CI. Also, there is now another error with
pytest tests/test_gpu_examples.py::PeftTorchaoGPUTests::test_causal_lm_training_multi_gpu_torchao_1_int8_dynamic_activation_int8_weight: