-
Notifications
You must be signed in to change notification settings - Fork 2.1k
OFT: several improvements to make OFT faster and more memory efficient #2575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@BenjaminBossan Hi, can you please make a review? It would be great. Best, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this big update to OFT. I didn't have time for an in-depth review yet, but I did a high level review and have some comments, please check.
My biggest concern right now is that the PR is backwards incompatible for the following reasons:
- a new module,
OFTRotationModule, is being used - a new option,
use_cayley_neumannis added and enabled by default - some parameters are renamed
We should instead strife to make the change backwards compatible. This would mean:
- Not sure if possible, but initialize
OFTRotationModulein a way that the module works the same as previously. Remapstate_dictparameters if necessary. - Default to
use_cayley_neumann=False. - Don't rename the parameters or ensure that
state_dicts are mapped accordingly.
I think it's fine not to concern ourselves with forwards compatibility. That means it's not necessary for OFT checkpoints created after this PR is merged to be compatible with older PEFT versions.
As for the quantized layers, also thanks a lot for adding all of them. I haven't checked the details yet. Maybe it would be good to move those to a separate, self-contained PR, WDYT?
Apart from this, we'll also have to add some tests for the new parameters and quantized layers, but let's first deal with the points mentioned above.
PS: If the prospect of making these changes backwards compatible is too much, we can also consider creating a new PEFT method like "OFTv2" and leave the current OFT as is.
src/peft/tuners/lora/layer.py
Outdated
| lora_bias=lora_bias, | ||
| ) | ||
|
|
||
| breakpoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this.
src/peft/tuners/oft/aqlm.py
Outdated
| @@ -0,0 +1,101 @@ | |||
| # Copyright 2024-present the HuggingFace Inc. team. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Copyright 2024-present the HuggingFace Inc. team. | |
| # Copyright 2025-present the HuggingFace Inc. team. |
Here and below.
Dear Benjamin, Thank you so much for this quick reply and the suggestions. I am actually the author of both OFT and this newer version of OFT (the paper will be released on arxiv once the PR gets accepted). I totally understand the concerns with the backward compatibility, here are some explanations for the changes: I changed to add the module Personally I hope and think that it would be great if we could leave it as one OFTConfig (instead of adding another OFTConfigV2) to avoid confusions for the user, because essentially the algorithm does not change but is simply much faster and also supports the fine-tuning quantized layers. Is it possible to simply adding a check and detect if the checkpoint is an old one, the user should instead pip install an older version? Best, |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for offering the perspective, I understand your argument. After some further exploration of the code, it looks like there is no easy way to make the transition backwards compatible. I think the least we should do, however, is to check if a user tries to load an old OFT checkpoint and raise an error with a helpful error message. For this, we could for instance add some code after this line:
peft/src/peft/utils/save_and_load.py
Line 435 in a27406c
| peft_model_state_dict = {renamed_dora_weights(k): v for k, v in peft_model_state_dict.items()} |
The check could be something like this:
elif config.peft_type == PeftType.OFT:
# new OFT adapter is backwards incompatible, see #2575
if any(".oft_s." in key for key in peft_model_state_dict):
raise ValueError("Trying to load old OFT checkpoint, which is no longer supported. Please install PEFT <= v0.15.2 to load it or train a new OFT adapter.")Next week, I should hopefully find the opportunity to do some testing with the new OFT method to check if I can replicate the improvements that you mentioned.
Before we can merge, we also need to update some of the tests. For instance, this test no longer works with the new OFT parameters:
Lines 1882 to 1898 in a27406c
| def test_oft_add_new_adapter_does_not_change_device(self, mlp): | |
| # same as first test, but using OFT | |
| config = OFTConfig(target_modules=["lin0"]) | |
| model = get_peft_model(mlp, config) | |
| model = model.to(self.device) | |
| model.lin0.oft_r.cpu() | |
| # check that the adapter is indeed on CPU and the base model on GPU | |
| assert model.lin0.oft_r.default.device.type == "cpu" | |
| assert model.lin0.base_layer.weight.device.type == self.device | |
| model.add_adapter("other", config) | |
| # check that after adding a new adapter, the old adapter is still on CPU | |
| assert model.lin0.oft_r.default.device.type == "cpu" | |
| # the rest should be on GPU | |
| assert model.lin0.base_layer.weight.device.type == self.device | |
| assert model.lin0.oft_r.other.device.type == self.device |
Furthermore, since quantized layers are now supported, let's add GPU tests for them. As an example, here is a LoRA test for GPTQ. You can essentially copy this test and switch the method from LoRA to OFT.
src/peft/tuners/oft/layer.py
Outdated
| self.eps = {} | ||
| self.block_share = {} | ||
| # For Embedding layer | ||
| self.oft_embedding_R = nn.ParameterDict({}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is not used, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated
src/peft/tuners/oft/layer.py
Outdated
| elif isinstance(base_layer, nn.MultiheadAttention): | ||
| if not base_layer._qkv_same_embed_dim: | ||
| raise ValueError(f"Only same dim for query/key/value is supported as of now for {self.__class__}.") | ||
| in_features, out_features = base_layer.embed_dim, 3 * base_layer.embed_dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MHA is not supported, so this can be removed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated.
src/peft/tuners/oft/layer.py
Outdated
| warnings.warn("Unscaling operation for OFT not supported! Keeping scale to 1.") | ||
|
|
||
| def update_layer(self, adapter_name, r, oft_block_size, module_dropout, coft, eps, block_share, init_weights): | ||
| def _check_forward_args(self, x, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used in LoRA to check that we can perform mixed adapter predictions (i.e. in the same batch, having some samples with adapter X and other samples with adapter Y. See:
peft/src/peft/tuners/lora/layer.py
Line 537 in a27406c
| def _mixed_batch_forward( |
As this is not supported by OFT, we can remove this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, updated
src/peft/tuners/oft/layer.py
Outdated
| self._check_forward_args(x, *args, **kwargs) | ||
| adapter_names = kwargs.pop("adapter_names", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, since OFT does not support _mixed_batch_forward, these lines can be safely removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, updated.
src/peft/tuners/oft/model.py
Outdated
| new_module.requires_grad_(False) | ||
| self._replace_module(parent, target_name, new_module, target) | ||
| else: | ||
| breakpoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, updated
src/peft/tuners/oft/layer.py
Outdated
| if init_weights is False: | ||
| nn.init.normal_(self.oft_r[adapter_name], mean=0.0, std=0.1) | ||
| nn.init.normal_(self.oft_s[adapter_name], mean=1.0, std=0.1) | ||
| nn.init.normal_(self.oft_R[adapter_name], mean=0.0, std=0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails for me because self.oft_R[adapter_name] is a OFTRotationModule, did you mean:
| nn.init.normal_(self.oft_R[adapter_name], mean=0.0, std=0.1) | |
| nn.init.normal_(self.oft_R[adapter_name].weight, mean=0.0, std=0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
@BenjaminBossan Thank you so much for the detailed reply and the suggestions. I just pushed a version with the required changes and added tests. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, there isn't much missing for this PR.
Apart from the smaller comments I left, ideally we should also have one test per quantization method which was added in this PR. For this, please check test_gpu_examples.py, where you can see the test classes for each quantization method. Could you please add OFT examples?
I would also like to do some of my own testing, but this will have to wait until next week. I hope this is fine.
src/peft/tuners/oft/awq.py
Outdated
| @@ -0,0 +1,145 @@ | |||
| # Copyright 2024-present the HuggingFace Inc. team. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update all the years to 2025.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is now the option to use OFT with use_cayley_neumann=True and use_cayley_neumann=False. Let's ensure that we have at least one test case for each one of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenjaminBossan Where do you suggest to add this test? Best,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be sufficient to extend the existing OFT test cases here:
peft/tests/test_custom_models.py
Lines 300 to 322 in bd893a8
| ######## | |
| # OFT # | |
| ######## | |
| ("Vanilla MLP 1 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": "lin0"}), | |
| ("Vanilla MLP 2 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"]}), | |
| ("Vanilla MLP 5 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "modules_to_save": ["lin1"]}), | |
| ( | |
| "Vanilla MLP 6 OFT", | |
| "MLP", | |
| OFTConfig, | |
| { | |
| "r": 2, | |
| "target_modules": ["lin0"], | |
| "module_dropout": 0.1, | |
| }, | |
| ), | |
| ("Vanilla MLP 7 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "coft": True}), | |
| ("Vanilla MLP 8 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "block_share": True}), | |
| ("Vanilla MLP 9 OFT", "MLP", OFTConfig, {"r": 2, "target_modules": ["lin0"], "coft": True, "block_share": True}), | |
| ("Conv2d 1 OFT", "Conv2d", OFTConfig, {"r": 5, "target_modules": ["conv2d"]}), | |
| ("Conv2d 3 OFT", "Conv2d", OFTConfig, {"r": 5, "target_modules": ["conv2d"], "coft": True}), | |
| ("Conv2d 4 OFT", "Conv2d", OFTConfig, {"r": 5, "target_modules": ["conv2d"], "block_share": True}), | |
| ("Conv2d 5 OFT", "Conv2d", OFTConfig, {"r": 5, "target_modules": ["conv2d"], "coft": True, "block_share": True}), |
|
@BenjaminBossan Hi Benjamin, just updated the test, would be great if you could provide some feedback this week? We would really like to publish the paper to Arxiv :) |
@zqiu24 Okay, let's try our best to get this merged this week. I tried to run the new tests with:
but I get an error:
Could you please check if the tests pass for you and if they don't, fix the remaining issues? I'm also currently running some experiments and get an issue when the base model dtype != float32. I think I know the error, but I'll get back to you once I'm sure I pinned it down. PS: I think I found the cause, see my comment below. |
src/peft/tuners/oft/layer.py
Outdated
| x = self._cast_input_dtype(x, scaled_rotated_weight.dtype) | ||
| bias = self._cast_input_dtype(self.get_base_layer().bias, scaled_rotated_weight.dtype) | ||
| result = F.linear(input=x, weight=scaled_rotated_weight, bias=bias) | ||
| result = self.base_layer(x, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this line errors for me when the base layer is bfloat16. The reason is that x has been upcast to float32 due to x = self._cast_input_dtype(x, oft_R.weight.dtype). I could fix the problem by changing this line to:
| result = self.base_layer(x, *args, **kwargs) | |
| result = self.base_layer(x.to(previous_dtype), *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
|
@BenjaminBossan Thank you so much:) I will do my best to make the changes as fast as I get the review. Yes, the above issue is because we changed the default now. for some reasons I have the following error when I perform test locally, it breaks: INTERNALERROR> File "/home/wliu/anaconda3/envs/peft_pr/lib/python3.12/site-packages/coverage/sqlitedb.py", line 138, in _execute So I cannot see the error logs, |
|
MacOS errors are unrelated |
@BenjaminBossan So if I determined the best the of default hyperparameters, it is ready to be merged? Best, |
I guess we can merge as is, but normally we would not want to change the default parameters later. Thus, ideally, we have them set before merging. If you have enough time to test better defaults, please go ahead. |
|
Yes, thanks. I am just curious about the performance gap. I will test it locally and give me the go from my side. Best, |
|
@BenjaminBossan would you mind to test the current version again and see if the performance is better?
|
src/peft/tuners/oft/layer.py
Outdated
| self._active_adapter = adapter_name | ||
|
|
||
| self.update_layer(adapter_name, r, oft_block_size, module_dropout, coft, eps, block_share, init_weights) | ||
| breakpoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forget to delete this.
I ran it again, with the updated code and otherwise same settings. The final results are very similar (showing most relevant metrics): Before: "test accuracy": 0.4450341167551175,
"train loss": 0.6357718685865402,
"cuda_memory_reserved_avg": 11804291891,
"cuda_memory_max": 22189965312,
"cuda_memory_reserved_99th": 17720976343,
"train_time": 1967.2921348049422,After: "test accuracy": 0.4473085670962851,
"train loss": 0.6358395928144455,
"cuda_memory_reserved_avg": 11783439070,
"cuda_memory_max": 22198353920,
"cuda_memory_reserved_99th": 17720934400,
"train_time": 2134.552857758019,If you want, you should be quickly able to run the benchmark yourself. You need this file: {
"auto_mapping": null,
"base_model_name_or_path": null,
"bias": "none",
"block_share": false,
"coft": false,
"eps": 6e-05,
"exclude_modules": null,
"fan_in_fan_out": false,
"inference_mode": false,
"init_weights": true,
"layers_pattern": null,
"layers_to_transform": null,
"module_dropout": 0.0,
"modules_to_save": null,
"num_cayley_neumann_terms": 5,
"oft_block_size": 32,
"peft_type": "OFT",
"r": 0,
"revision": null,
"target_modules": null,
"task_type": null,
"use_cayley_neumann": false
}and then run (you might need some extra requirements: At the end of the day, we can still merge and figure out if there are better settings later. |
|
@BenjaminBossan Thank you for the help and running of OFT tests, I really appreciate it:) |
|
Okay, then let's deal with potentially better experimental settings later. LMK if the PR is ready for a final review. |
|
@BenjaminBossan Please have a final review. Best, |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this update. I found only a few minor issues, they should be quick to fix, otherwise this LGTM. I only skimmed the different quantization methods like AQLM, AWQ etc. as the code is mostly the same.
I think it would be good to add tests for each supported quantization method in test_gpu_examples.py (single GPU tests are enough) to ensure that they work and keep on working, but this can be added in a later PR.
| from trl import SFTTrainer | ||
| from peft import OFTConfig | ||
|
|
||
| if use_quantization: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This snippet uses a mix of 2 and 4 spaces for indentation. Let's use 4 spaces consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/peft/tuners/oft/bnb.py
Outdated
| @@ -0,0 +1,388 @@ | |||
| # Copyright 2023-present the HuggingFace Inc. team. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below, the date is sometimes wrong.
| # Copyright 2023-present the HuggingFace Inc. team. | |
| # Copyright 2025-present the HuggingFace Inc. team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
tests/test_gptqmodel.py
Outdated
| # assert loss is not None | ||
| assert trainer.state.log_history[-1]["train_loss"] is not None | ||
|
|
||
| @pytest.mark.single_gpu_tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @pytest.mark.single_gpu_tests | |
| @pytest.mark.multi_gpu_tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
@BenjaminBossan Hi, is there any other changes required from my side? Best, |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement to, and extension of, OFT. The changes LGTM.
Make OFT faster and more memory efficient. This new version of OFT is not backwards compatible with older checkpoints and vice versa. To load older checkpoints, downgrade PEFT to 0.15.2 or lower.
* fix token_level_kl * fix non_score_reward and rlhf_reward * add rloo test * update test * fix docs * fix doc
Hi there,
I want to make some changes to OFT. The major problem of OFT is it is relatively slow and not memory efficient for large models, the commit does not change the training logic but makes several improvements to make it faster.
Core Improvements:
Hopefully, you can let me add these improvements to the OFT training code.
The below is some test on when performing fine-tuning with Qwen2.5-7B.
Best,