LARF: Layer-Aware Representation Filtering

Overview

This method identifies safety-sensitive layers within the LLM and leverages data bi-representation to detect safety-degrading data samples in the fine-tuning dataset.

Installation

Clone the repository:

git clone https://github.com/LLLeoLi/LARF.git
cd LARF

Install dependencies:

conda create --name LARF python=3.10
conda activate LARF
pip install -r requirements.txt

Usage

1. Identify the Safety-Sensitive Layers

bash scripts/scaling_llama.sh

2. Filter Your Dataset with LARF

python get_bi_rep_llama.py --layer_num_start 10 --layer_num_end 11

3. Fine-tuning with LoRA

Train the model with custom datasets:

bash scripts/train_multipule_llama.sh

4. Safety Evaluation

Evaluate model outputs for safety:

bash scripts/llama_guard.sh
python llama_guard.py

5. Analysis

Open the Jupyter notebook for analysis:

jupyter notebook analysis.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
safe_test		safe_test
scripts		scripts
README.md		README.md
analysis.ipynb		analysis.ipynb
get_bi_rep_llama.py		get_bi_rep_llama.py
llama_guard.py		llama_guard.py
requirements.txt		requirements.txt
scaling_llama.py		scaling_llama.py
train_llama.py		train_llama.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LARF: Layer-Aware Representation Filtering

Overview

Installation

Usage

1. Identify the Safety-Sensitive Layers

2. Filter Your Dataset with LARF

3. Fine-tuning with LoRA

4. Safety Evaluation

5. Analysis

About

Uh oh!

Releases

Packages

Languages

LLLeoLi/LARF

Folders and files

Latest commit

History

Repository files navigation

LARF: Layer-Aware Representation Filtering

Overview

Installation

Usage

1. Identify the Safety-Sensitive Layers

2. Filter Your Dataset with LARF

3. Fine-tuning with LoRA

4. Safety Evaluation

5. Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages