Skip to content

4张24GB显存的4090lora微调GLM-4V-9B时,还是爆显存 #785

@programmingsky

Description

@programmingsky

System Info / 系統信息

py3.12
cuda12.4

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

[rank3]: OutOfMemoryError: CUDA out of memory. Tried to allocate 986.00 MiB. GPU 3 has a total capacity of 23.64 GiB of which 499.69 MiB is free. Including non-PyTorch memory, this process has 23.15 GiB
[rank3]: memory in use. Of the allocated memory 18.71 GiB is allocated by PyTorch, and 3.83 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting
[rank3]: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
我的ymal文件如下:
data_config:
train_file: train.jsonl
val_file: dev.jsonl
test_file: test.jsonl
num_proc: 1

combine: True
freezeV: True
max_input_length: 256
max_output_length: 64

swanlab: "local" # set to local if don`t use cloud

training_args:

see transformers.Seq2SeqTrainingArguments

output_dir: ./output
max_steps: 3000

needed to be fit for the dataset

learning_rate: 5e-4

settings for data loading

per_device_train_batch_size: 1
gradient_accumulation_steps: 16
dataloader_num_workers: 1
remove_unused_columns: false

settings for saving checkpoints

save_strategy: steps
save_steps: 500

settings for logging

log_level: info
logging_strategy: steps
logging_steps: 10
run_name: "glm4-lora-finetune"

settings for evaluation

per_device_eval_batch_size: 4

eval_strategy: steps

eval_steps: 500

settings for optimizer

adam_epsilon: 1e-6

uncomment the following line to detect nan or inf values

debug: underflow_overflow

predict_with_generate: true

see transformers.GenerationConfig

generation_config:
max_new_tokens: 64

set your absolute deepspeed path here

deepspeed: configs/ds_zero_3.json
bf16: true
#deepspeed: configs/ds_zero_2.json

peft_config:
peft_type: LORA
task_type: CAUSAL_LM
r: 8
lora_alpha: 32
lora_dropout: 0.1
target_modules: ["query_key_value"]

Expected behavior / 期待表现

我希望能够成功lora

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions