add support for Feature Extraction using PEFT #647

pacman100 · 2023-06-28T11:21:12Z

What does this PR do?

For many tasks, users leverage AutoModel to get embeddings followed by postprocessing using that. One of the most popular use case of it is Semantic Similarity via Bi-Encoder models.
This PR enables that
Fixes How to use PEFT to wrap an encoder? #348
Adds an example on how to use it
Also shows how to use PEFT with custom models on top of Transformers

ToDo:

Add tests
Add docs
Add a complete example

review-notebook-app · 2023-06-28T11:21:16Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

HuggingFaceDocBuilderDev · 2023-06-28T11:25:22Z

The documentation is not available anymore as the PR was closed or merged.

BenjaminBossan

I'm still new to a lot of this, so I can't give a full review.

Here are some observations:

I couldn't discern how the notebook peft_prompt_tuning_seq2seq_with_generate.ipynb relates to embeddings. Could this be elaborated? Also, it contains a big error message in the first cell.
The notebook peft_lora_embedding_semantic_similarity.ipynb also contains an error message at the end. If it takes too long to run, could the number of epochs or the dataset size be reduced? Or at least the KeyboardInterrupt could be caught gracefully to avoid showing the error.
In general, at least for me personally, it would be helpful if the notebooks could add a few sentences to explain what is going on and what the use case is.
There are no tests for the new class. Are we okay with that? Will it be added later?
Similarly, there is also no documentation for the new class (doc entry, docstring), even though this seems to be a big addition.

src/peft/peft_model.py

pacman100 · 2023-06-28T12:38:12Z

Hello @BenjaminBossan , the example is using PEFT Embedding Task to get embedding for queries and products and then uses them to get the cosine similarity between them to know whether the query and product are similar/relevant.

Also, the first cell didn't have any error, it is just using dataset from cache.

Yes, will be adding tests and docs.

Also working on a full example on this.

BenjaminBossan · 2023-06-28T12:53:19Z

the example is using PEFT Embedding Task to get embedding for queries and products and then uses them to get the cosine similarity between them to know whether the query and product are similar/relevant.

Also, the first cell didn't have any error, it is just using dataset from cache.

You mean in peft_lora_embedding_semantic_similarity.ipynb? Yes, but my comment referred to peft_prompt_tuning_seq2seq_with_generate.ipynb, which had a bunch of changes in this PR, and I was not sure how those relate to the PR.

Yes, will be adding tests and docs.

Also working on a full example on this.

Fantastic. I'll do another review once they are added.

pacman100 · 2023-06-28T12:55:16Z

but my comment referred to peft_prompt_tuning_seq2seq_with_generate.ipynb, which had a bunch of changes in this PR, and I was not sure how those relate to the PR.

Those were due to make style and make quality

BenjaminBossan · 2023-06-28T13:04:00Z

Those were due to make style and make quality

Ah I see, so just disregard my comments about that notebook.

stevhliu

Thanks for adding a guide for such a cool use-case, this is great! 👏

Remember to add the docs to the toctree to properly build it!

docs/source/package_reference/peft_model.mdx

docs/source/task_guides/semantic-similarity-lora.md

src/peft/peft_model.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

pacman100 · 2023-07-13T07:16:43Z

Remember to add the docs to the toctree to properly build it!

Done

younesbelkada

Very nice addition and learned a lot! I left 2 questions !

docs/source/task_guides/semantic-similarity-lora.md

src/peft/peft_model.py

BenjaminBossan

I focused on the documentation part for this review, will review the code once the tests pass.

Great docs overall, thanks for putting so much work into this. I have a few minor comments. Please take a look.

BenjaminBossan · 2023-07-13T08:26:27Z

docs/source/task_guides/semantic-similarity-lora.md

+
+## Setup
+
+Start by installing 🤗 PEFT from [source](https://moon-ci-docs.huggingface.co/docs/peft/pr_647/en/install#source), and then navigate to the directory containing the training scripts for fine-tuning DreamBooth with LoRA:


Should this be linking to moon-ci? Also, is it necessary to install from source? We plan to have the release very soon.

docs/source/task_guides/semantic-similarity-lora.md

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

younesbelkada

Still looking great, thanks !

BenjaminBossan

Overall, this looks good to me, nice work.

I have one concern left, however. According to the coverage report. the following lines are not covered by tests:

peft/src/peft/peft_model.py

Lines 1614 to 1644 in e2f7ff8

    
           batch_size = input_ids.shape[0] 
        
           if attention_mask is not None: 
        
               # concat prompt attention mask 
        
               prefix_attention_mask = torch.ones(batch_size, peft_config.num_virtual_tokens).to(attention_mask.device) 
        
               attention_mask = torch.cat((prefix_attention_mask, attention_mask), dim=1) 
        
           if kwargs.get("position_ids", None) is not None: 
        
               warnings.warn("Position ids are not supported for parameter efficient tuning. Ignoring position ids.") 
        
               kwargs["position_ids"] = None 
        
           if kwargs.get("token_type_ids", None) is not None: 
        
               warnings.warn("Token type ids are not supported for parameter efficient tuning. Ignoring token type ids") 
        
               kwargs["token_type_ids"] = None 
        
           kwargs.update( 
        
               { 
        
                   "attention_mask": attention_mask, 
        
                   "output_attentions": output_attentions, 
        
                   "output_hidden_states": output_hidden_states, 
        
                   "return_dict": return_dict, 
        
               } 
        
           ) 
        
           if peft_config.peft_type == PeftType.PREFIX_TUNING: 
        
               past_key_values = self.get_prompt(batch_size) 
        
               return self.base_model(input_ids=input_ids, past_key_values=past_key_values, **kwargs) 
        
           else: 
        
               if inputs_embeds is None: 
        
                   inputs_embeds = self.word_embeddings(input_ids) 
        
               prompts = self.get_prompt(batch_size=batch_size) 
        
               prompts = prompts.to(inputs_embeds.dtype) 
        
               inputs_embeds = torch.cat((prompts, inputs_embeds), dim=1) 
        
               return self.base_model(inputs_embeds=inputs_embeds, **kwargs)

I double-checked locally and indeed I never hit those lines. When inspecting, only LoRA and IA³ are being tested. So should the tests be extended to include prompt learning or what could be done here?

…e/peft into smangrul/add-emb-task

pacman100 · 2023-07-13T11:26:00Z

I have one concern left, however. According to the coverage report. the following lines are not covered by tests:

latest commit should fix this

BenjaminBossan

Thanks for the fixes, this LGTM now.

AngledLuffa · 2023-07-13T14:57:49Z

This is great! Thanks for the support and for knocking off that old issue of mine.

* add support for embedding with peft * add example and resolve code quality issues * update notebook example post fixing the loss * adding full example with inference notebook * quality ✨ * add tests, docs, guide and rename task_type to be inline with Hub * fixes * fixes * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update peft_model.py * fixes * final fixes * Update _toctree.yml * fixes and make style and make quality * deberta exception with checkpointing * Update docs/source/task_guides/semantic-similarity-lora.md Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/task_guides/semantic-similarity-lora.md Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * resolve comments * testing prompt learning methods * Update testing_common.py * fix the tests --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

pacman100 added 2 commits June 28, 2023 16:07

add support for embedding with peft

97c1b65

add example and resolve code quality issues

403f1c0

pacman100 requested review from BenjaminBossan and younesbelkada June 28, 2023 11:21

update notebook example post fixing the loss

0121d23

BenjaminBossan reviewed Jun 28, 2023

View reviewed changes

src/peft/peft_model.py Show resolved Hide resolved

src/peft/peft_model.py Show resolved Hide resolved

pacman100 added 3 commits June 29, 2023 13:17

adding full example with inference notebook

bd106b6

quality ✨

63836df

add tests, docs, guide and rename task_type to be inline with Hub

4fafeff

pacman100 changed the title ~~add support for getting embeddings using PEFT~~ add support for Feature Extraction using PEFT Jul 1, 2023

stevhliu reviewed Jul 5, 2023

View reviewed changes

pacman100 and others added 6 commits July 13, 2023 11:24

fixes

0242e1c

fixes

8264331

Apply suggestions from code review

1c60c2b

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update peft_model.py

b38cd30

fixes

504064c

final fixes

5e71dd1

Update _toctree.yml

01d7169

pacman100 requested review from BenjaminBossan and stevhliu July 13, 2023 07:26

Merge branch 'main' into smangrul/add-emb-task

4f7c4b1

younesbelkada approved these changes Jul 13, 2023

View reviewed changes

docs/source/task_guides/semantic-similarity-lora.md Outdated Show resolved Hide resolved

src/peft/peft_model.py Show resolved Hide resolved

BenjaminBossan reviewed Jul 13, 2023

View reviewed changes

pacman100 and others added 5 commits July 13, 2023 15:08

fixes and make style and make quality

d5c37fc

deberta exception with checkpointing

d284f2f

Update docs/source/task_guides/semantic-similarity-lora.md

ba7350c

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

Update docs/source/task_guides/semantic-similarity-lora.md

e2f7ff8

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

resolve comments

89f0623

pacman100 requested review from BenjaminBossan and younesbelkada July 13, 2023 10:28

younesbelkada approved these changes Jul 13, 2023

View reviewed changes

BenjaminBossan reviewed Jul 13, 2023

View reviewed changes

pacman100 added 2 commits July 13, 2023 16:32

Merge branch 'smangrul/add-emb-task' of https://github.com/huggingfac…

dc073ea

…e/peft into smangrul/add-emb-task

testing prompt learning methods

438c023

pacman100 added 2 commits July 13, 2023 16:58

Update testing_common.py

20cd424

fix the tests

d296da4

pacman100 requested a review from BenjaminBossan July 13, 2023 12:29

BenjaminBossan approved these changes Jul 13, 2023

View reviewed changes

pacman100 merged commit 92d38b5 into main Jul 13, 2023

pacman100 deleted the smangrul/add-emb-task branch July 13, 2023 12:43


		## Setup

		Start by installing 🤗 PEFT from [source](https://moon-ci-docs.huggingface.co/docs/peft/pr_647/en/install#source), and then navigate to the directory containing the training scripts for fine-tuning DreamBooth with LoRA:

	batch_size = input_ids.shape[0]
	if attention_mask is not None:
	# concat prompt attention mask
	prefix_attention_mask = torch.ones(batch_size, peft_config.num_virtual_tokens).to(attention_mask.device)
	attention_mask = torch.cat((prefix_attention_mask, attention_mask), dim=1)

	if kwargs.get("position_ids", None) is not None:
	warnings.warn("Position ids are not supported for parameter efficient tuning. Ignoring position ids.")
	kwargs["position_ids"] = None
	if kwargs.get("token_type_ids", None) is not None:
	warnings.warn("Token type ids are not supported for parameter efficient tuning. Ignoring token type ids")
	kwargs["token_type_ids"] = None
	kwargs.update(
	{
	"attention_mask": attention_mask,
	"output_attentions": output_attentions,
	"output_hidden_states": output_hidden_states,
	"return_dict": return_dict,
	}
	)

	if peft_config.peft_type == PeftType.PREFIX_TUNING:
	past_key_values = self.get_prompt(batch_size)
	return self.base_model(input_ids=input_ids, past_key_values=past_key_values, **kwargs)
	else:
	if inputs_embeds is None:
	inputs_embeds = self.word_embeddings(input_ids)
	prompts = self.get_prompt(batch_size=batch_size)
	prompts = prompts.to(inputs_embeds.dtype)
	inputs_embeds = torch.cat((prompts, inputs_embeds), dim=1)
	return self.base_model(inputs_embeds=inputs_embeds, **kwargs)

add support for Feature Extraction using PEFT #647

add support for Feature Extraction using PEFT #647

Uh oh!

Conversation

pacman100 commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

review-notebook-app bot commented Jun 28, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pacman100 commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan commented Jun 28, 2023

Uh oh!

pacman100 commented Jun 28, 2023

Uh oh!

BenjaminBossan commented Jun 28, 2023

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pacman100 commented Jul 13, 2023

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Jul 13, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

pacman100 commented Jul 13, 2023

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

AngledLuffa commented Jul 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pacman100 commented Jun 28, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 28, 2023 •

edited

Loading

BenjaminBossan left a comment •

edited

Loading

pacman100 commented Jun 28, 2023 •

edited

Loading