Add BLIP2 Example #260

younesbelkada · 2023-04-04T08:21:06Z

What does this PR do?

BLIP-2 is a multi-modal model capable of image-captioning task. It is widely used for natural image-captioning but fine-tuning such a model remains a challenge due to the model's size. The largest model being blip2-flan-t5-xxl (~24GB). Hence, we should leverage peft to offer users the possibility to fine-tune this model at low cost.

This PR adds BLIP2 support for peft. Added also an example script

cc @pacman100

src/peft/peft_model.py

HuggingFaceDocBuilderDev · 2023-04-04T08:28:13Z

The documentation is not available anymore as the PR was closed or merged.

pacman100

this is a cool example @younesbelkada. Thank you for adding it 🚀.

Left comments

pacman100 · 2023-04-04T10:41:02Z

src/peft/mapping.py

+
+    if peft_config.task_type == "VISION_2_SEQ" and not isinstance(peft_config, LoraConfig):
+        raise ValueError("Vision2Seq task type is only supported with LORA")
+


This isn't required if the task type is left unspecified. For unspecified tasks, line 146-148 already use LoRA via PeftModel as task-specific sub-class isn't required for LoRA method.

pacman100 · 2023-04-04T10:41:43Z

src/peft/peft_model.py

+
+
+class PeftModelForVision2Seq(PeftModel):
+    """
+    Peft model for vision to text models.
+
+    Args:
+        model ([`~transformers.PreTrainedModel`]): Base transformer model.
+        peft_config ([`PeftConfig`]): Peft config.
+
+
+    Example:
+
+        ```py
+        >>> from transformers import AutoModelForVision2Seq
+        >>> from peft import PeftModelForVision2Seq, get_peft_config
+
+        >>> config = {
+        ...     "peft_type": "LORA",
+        ...     "task_type": "VISION_2_SEQ",
+        ...     "inference_mode": False,
+        ...     "r": 8,
+        ...     "target_modules": ["q", "v"],
+        ...     "lora_alpha": 32,
+        ...     "lora_dropout": 0.1,
+        ...     "merge_weights": False,
+        ...     "fan_in_fan_out": False,
+        ...     "enable_lora": None,
+        ...     "bias": "none",
+        ... }
+
+        >>> peft_config = get_peft_config(config)
+        >>> model = AutoModelForVision2Seq.from_pretrained("Salesforce/blip2-flan-t5-xl")
+        >>> peft_model = PeftModelForVision2Seq(model, peft_config)
+        >>> peft_model.print_trainable_parameters()
+        trainable params: 1843200 || all params: 775873280 || trainable%: 0.23756456724479544
+        ```
+    """
+
+    def __init__(self, model, peft_config: PeftConfig):
+        super().__init__(model, peft_config)
+        self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_for_generation
+
+    def forward(
+        self,
+        pixel_values=None,
+        attention_mask=None,
+        decoder_input_ids=None,
+        decoder_attention_mask=None,
+        labels=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,
+        **kwargs,
+    ):
+        r"""
+        A simple wrapper around the base model's forward method.
+        """
+        return self.base_model(
+            pixel_values=pixel_values,
+            attention_mask=attention_mask,
+            decoder_input_ids=decoder_input_ids,
+            decoder_attention_mask=decoder_attention_mask,
+            labels=labels,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+            **kwargs,
+        )


Following previous comment, this isn't required if we aren't supporting methods apart from LoRA

pacman100 · 2023-04-04T10:42:15Z

examples/int8_training/fine_tune_blip2_int8.py

+    lora_alpha=32,
+    lora_dropout=0.05,
+    bias="none",
+    task_type="VISION_2_SEQ",


Keeping this unspecified will automatically use the LoRA model via PeftModel object as task-specific class isn't a requirement for LoRA

pacman100

Thank you @younesbelkada for iterating, LGTM! 🤗

Add BLIP2 Example

* fix ds issue * more comments

younesbelkada added 4 commits April 4, 2023 07:58

v1

c2ef46f

Merge remote-tracking branch 'upstream/main' into add-pix2struct

3d1e87c

v1

c7e22cc

add blip2

af6794e

younesbelkada commented Apr 4, 2023

View reviewed changes

src/peft/peft_model.py Outdated Show resolved Hide resolved

younesbelkada and others added 2 commits April 4, 2023 10:21

Update src/peft/peft_model.py

f569bc6

revert

46ab596

younesbelkada added 3 commits April 4, 2023 08:29

fix

96cd039

revert

4cbd6cf

few fixes

8c83386

younesbelkada requested a review from pacman100 April 4, 2023 08:37

pacman100 reviewed Apr 4, 2023

View reviewed changes

revert changes

7ed9ad0

younesbelkada changed the title ~~Add BLIP2~~ Add BLIP2 Example Apr 4, 2023

pacman100 approved these changes Apr 6, 2023

View reviewed changes

pacman100 merged commit 382b178 into huggingface:main Apr 6, 2023

younesbelkada deleted the add-pix2struct branch April 6, 2023 08:10

Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025

Merge pull request huggingface#260 from younesbelkada/add-pix2struct

92754e8

Add BLIP2 Example

cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025

[core] Fix ds issue (huggingface#260)

7346242

* fix ds issue * more comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add BLIP2 Example #260

Add BLIP2 Example #260

Uh oh!

younesbelkada commented Apr 4, 2023

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 4, 2023 •

edited

Loading

Uh oh!

pacman100 left a comment

Uh oh!

pacman100 Apr 4, 2023

Uh oh!

pacman100 Apr 4, 2023

Uh oh!

pacman100 Apr 4, 2023

Uh oh!

pacman100 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		if peft_config.task_type == "VISION_2_SEQ" and not isinstance(peft_config, LoraConfig):
		raise ValueError("Vision2Seq task type is only supported with LORA")

Add BLIP2 Example #260

Add BLIP2 Example #260

Uh oh!

Conversation

younesbelkada commented Apr 4, 2023

What does this PR do?

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pacman100 left a comment

Choose a reason for hiding this comment

Uh oh!

pacman100 Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

pacman100 Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

pacman100 Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

pacman100 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HuggingFaceDocBuilderDev commented Apr 4, 2023 •

edited

Loading