The repository contains early release of the code and deployment instructions for the paper LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization.
💡News | 🫧Method | 🛠️Usage | 📚Citation
- September 2025: The first pypi package released. Integrate LoSiA into your code with a single-line function call!
- August 2025: LoSiA is accepted to EMNLP 2025 Main Conference!
- June 2025: The first release of open source code
LoSiA (Low-Resources Subnet Integration Adaptation) is a novel Parameter-Efficient Fine-Tuning (PEFT) framework that dynamically identifies and optimizes critical sub-networks within LLMs, enabling parameter-efficient full-rank fine-tuning with low latency.
The identification stage of core sub-networks is mainly composed by :
- Calculation of parameter importance scores via sensitivity-based metrics
- Usage of greedy algorithms to select optimal input/output neuron subsets
We design a novel mechanism to organize the optimizations of multiple layers. Specifically, LoSiA:
- Fine-tunes only the identified core subnets instead of full layers
- Implements asynchronous periodic re-localization to adapt to dynamic training patterns
- Applies learning rate rewarming during subnet updates for more stable training
LoSiA-Pro is a equivalent but refined implementation for LoSiA, boosting training efficiency and lowering GPU memory consumptions. It includes:
- Reduction of activation storage by only saving subnet activations
- Replacing full gradient computation with low rank matrix multiplication in back-propagation
Here is the pseudo code of LoSiA:
To develop within a local directory, Setting the training environment directly by following commands:
conda create -n losia python=3.8
conda activate losia
cd LoSiA
pip install -r requirements.txt
pip install flash_attn
We have also released a PyPI package that can be installed via the following command:
pip install losia
If the g++ toolchain is absent from the environment (usually reported by ninja):
conda install -c conda-forge gxx_linux-64
conda install -c conda-forge libxcrypt
The experiment scripts are tested on Python 3.8 with PyTorch 2.4.1+cu121 and the CUDA version is 12.4.
To adopt LoSiA as the optimizer, use attach_losia in optmizer.py (losia.optimizer.attach_losia for pypi package). This function will create layer-wise parameter groups and LoSiA optimizers, and register post hook for per-layer weight updates. An example of the usage is as below:
import os
import torch
from transformers import AutoConfig, AutoModelForCausalLM
from losia.optimizer import attach_losia
# from optimizer import attach_losia
model_path = "meta-llama/Llama-2-7b-hf"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2").cuda()
model_config = AutoConfig.from_pretrained(model_path)
model.train()
for n,p in model.named_parameters():
p.requires_grad = True
attach_losia(
model, model_config,
num_training_steps = 10000,
lr = 3e-5,
rank_factor = 1.0/8.0,
period = 100
)A concise implementation of a model-training pipeline using LoSiA is provided in example.py.
Note that function attach_losia automatically registers a backward hook; consequently, the weight update and learning-rate step are performed internally within loss.backward(), so neither an explicit scheduler step nor an optimizer step is required in the training loop.
After attaching LoSiA to the backbone model, learning rate scheduler and parameter gradients will be automatically managed by LoSiA optimizers in post backward hooks. Calling backward function for every iteration is enough for training:
for epoch in range(args.epochs):
for batch_idx, batch in enumerate(dataloader):
loss = model(**batch, labels=labels).loss
loss.backward()For more details, please refer to torchrun_main.py and optimizer.py.
In this repository we take LLaMA-2 7B fine-tuning as an example. You can download the backbone model by url meta-llama/Llama-2-7b-hf. Training scripts lies under /scripts folder. Run by following commands (take common-sense task training as example):
cd scripts
bash datasets_download.sh
bash run_losia.shThis script will run training on eight common-sense reasoning tasks. For example, train social_i_qa with the following command:
# This takes about 19GB of GPU memory
torchrun --standalone --nproc_per_node 1 $(dirname "$0")/../torchrun_main.py \
--model_path meta-llama/Llama-2-7b-hf \
--dataset_name siqa \
--dataset_path allenai/social_i_qa \
--save_dir LLaMA-2-7B-SIQA \
--lr 5e-5 \
--batch_size 16 \
--rank_factor 0.125 \ # rank factor p, controling the scale of core subnet
--period 50 \ # time slot T for reselection
--max_length 256 \
--epochs 3 \
--pad_to_max_len \
--warmup_steps_ratio 0.1 \
--grad_clipping 1.0 \
--dtype bfloat16 \
--single_gpu \
--scheduler cosine_restarts \
--optimizer losia_adamw_per_layer \
# --activation_checkpointing \ # enable gradient checkpointing
# --use_pro # train with losia-proLoSiA is developed based on the training framework of GaLore. Furthermore, We use EleutherAI/lm-evaluation-harness for evaluation, please follow instructions in the link for deployment.
If this work is found to be helpful to you, we would greatly appreciate your citation.
@misc{wang2025losiaefficienthighrankfinetuning,
title={LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization},
author={Xujia Wang and Yunjia Qi and Bin Xu},
year={2025},
eprint={2507.04487},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2507.04487},
}