Skip to content

Conversation

@younesbelkada
Copy link
Contributor

What does this PR do?

BLIP-2 is a multi-modal model capable of image-captioning task. It is widely used for natural image-captioning but fine-tuning such a model remains a challenge due to the model's size. The largest model being blip2-flan-t5-xxl (~24GB). Hence, we should leverage peft to offer users the possibility to fine-tune this model at low cost.

This PR adds BLIP2 support for peft. Added also an example script

cc @pacman100

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 4, 2023

The documentation is not available anymore as the PR was closed or merged.

@younesbelkada younesbelkada requested a review from pacman100 April 4, 2023 08:37
Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a cool example @younesbelkada. Thank you for adding it 🚀.

Left comments

Comment on lines 142 to 145

if peft_config.task_type == "VISION_2_SEQ" and not isinstance(peft_config, LoraConfig):
raise ValueError("Vision2Seq task type is only supported with LORA")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't required if the task type is left unspecified. For unspecified tasks, line 146-148 already use LoRA via PeftModel as task-specific sub-class isn't required for LoRA method.

Comment on lines 1037 to 1105


class PeftModelForVision2Seq(PeftModel):
"""
Peft model for vision to text models.
Args:
model ([`~transformers.PreTrainedModel`]): Base transformer model.
peft_config ([`PeftConfig`]): Peft config.
Example:
```py
>>> from transformers import AutoModelForVision2Seq
>>> from peft import PeftModelForVision2Seq, get_peft_config
>>> config = {
... "peft_type": "LORA",
... "task_type": "VISION_2_SEQ",
... "inference_mode": False,
... "r": 8,
... "target_modules": ["q", "v"],
... "lora_alpha": 32,
... "lora_dropout": 0.1,
... "merge_weights": False,
... "fan_in_fan_out": False,
... "enable_lora": None,
... "bias": "none",
... }
>>> peft_config = get_peft_config(config)
>>> model = AutoModelForVision2Seq.from_pretrained("Salesforce/blip2-flan-t5-xl")
>>> peft_model = PeftModelForVision2Seq(model, peft_config)
>>> peft_model.print_trainable_parameters()
trainable params: 1843200 || all params: 775873280 || trainable%: 0.23756456724479544
```
"""

def __init__(self, model, peft_config: PeftConfig):
super().__init__(model, peft_config)
self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_for_generation

def forward(
self,
pixel_values=None,
attention_mask=None,
decoder_input_ids=None,
decoder_attention_mask=None,
labels=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
**kwargs,
):
r"""
A simple wrapper around the base model's forward method.
"""
return self.base_model(
pixel_values=pixel_values,
attention_mask=attention_mask,
decoder_input_ids=decoder_input_ids,
decoder_attention_mask=decoder_attention_mask,
labels=labels,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
**kwargs,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following previous comment, this isn't required if we aren't supporting methods apart from LoRA

lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="VISION_2_SEQ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this unspecified will automatically use the LoRA model via PeftModel object as task-specific class isn't a requirement for LoRA

@younesbelkada younesbelkada changed the title Add BLIP2 Add BLIP2 Example Apr 4, 2023
Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @younesbelkada for iterating, LGTM! 🤗

@pacman100 pacman100 merged commit 382b178 into huggingface:main Apr 6, 2023
@younesbelkada younesbelkada deleted the add-pix2struct branch April 6, 2023 08:10
Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025
* fix ds issue

* more comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants