-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add BLIP2 Example #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BLIP2 Example #260
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
pacman100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a cool example @younesbelkada. Thank you for adding it 🚀.
Left comments
src/peft/mapping.py
Outdated
|
|
||
| if peft_config.task_type == "VISION_2_SEQ" and not isinstance(peft_config, LoraConfig): | ||
| raise ValueError("Vision2Seq task type is only supported with LORA") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't required if the task type is left unspecified. For unspecified tasks, line 146-148 already use LoRA via PeftModel as task-specific sub-class isn't required for LoRA method.
src/peft/peft_model.py
Outdated
|
|
||
|
|
||
| class PeftModelForVision2Seq(PeftModel): | ||
| """ | ||
| Peft model for vision to text models. | ||
| Args: | ||
| model ([`~transformers.PreTrainedModel`]): Base transformer model. | ||
| peft_config ([`PeftConfig`]): Peft config. | ||
| Example: | ||
| ```py | ||
| >>> from transformers import AutoModelForVision2Seq | ||
| >>> from peft import PeftModelForVision2Seq, get_peft_config | ||
| >>> config = { | ||
| ... "peft_type": "LORA", | ||
| ... "task_type": "VISION_2_SEQ", | ||
| ... "inference_mode": False, | ||
| ... "r": 8, | ||
| ... "target_modules": ["q", "v"], | ||
| ... "lora_alpha": 32, | ||
| ... "lora_dropout": 0.1, | ||
| ... "merge_weights": False, | ||
| ... "fan_in_fan_out": False, | ||
| ... "enable_lora": None, | ||
| ... "bias": "none", | ||
| ... } | ||
| >>> peft_config = get_peft_config(config) | ||
| >>> model = AutoModelForVision2Seq.from_pretrained("Salesforce/blip2-flan-t5-xl") | ||
| >>> peft_model = PeftModelForVision2Seq(model, peft_config) | ||
| >>> peft_model.print_trainable_parameters() | ||
| trainable params: 1843200 || all params: 775873280 || trainable%: 0.23756456724479544 | ||
| ``` | ||
| """ | ||
|
|
||
| def __init__(self, model, peft_config: PeftConfig): | ||
| super().__init__(model, peft_config) | ||
| self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_for_generation | ||
|
|
||
| def forward( | ||
| self, | ||
| pixel_values=None, | ||
| attention_mask=None, | ||
| decoder_input_ids=None, | ||
| decoder_attention_mask=None, | ||
| labels=None, | ||
| output_attentions=None, | ||
| output_hidden_states=None, | ||
| return_dict=None, | ||
| **kwargs, | ||
| ): | ||
| r""" | ||
| A simple wrapper around the base model's forward method. | ||
| """ | ||
| return self.base_model( | ||
| pixel_values=pixel_values, | ||
| attention_mask=attention_mask, | ||
| decoder_input_ids=decoder_input_ids, | ||
| decoder_attention_mask=decoder_attention_mask, | ||
| labels=labels, | ||
| output_attentions=output_attentions, | ||
| output_hidden_states=output_hidden_states, | ||
| return_dict=return_dict, | ||
| **kwargs, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following previous comment, this isn't required if we aren't supporting methods apart from LoRA
| lora_alpha=32, | ||
| lora_dropout=0.05, | ||
| bias="none", | ||
| task_type="VISION_2_SEQ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this unspecified will automatically use the LoRA model via PeftModel object as task-specific class isn't a requirement for LoRA
pacman100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @younesbelkada for iterating, LGTM! 🤗
Add BLIP2 Example
* fix ds issue * more comments
What does this PR do?
BLIP-2 is a multi-modal model capable of image-captioning task. It is widely used for natural image-captioning but fine-tuning such a model remains a challenge due to the model's size. The largest model being
blip2-flan-t5-xxl(~24GB). Hence, we should leveragepeftto offer users the possibility to fine-tune this model at low cost.This PR adds BLIP2 support for
peft. Added also an example scriptcc @pacman100