-
Notifications
You must be signed in to change notification settings - Fork 594
Description
System Info / 系統信息
py3.12
cuda12.4
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- The official example scripts / 官方的示例脚本
- My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
[rank3]: OutOfMemoryError: CUDA out of memory. Tried to allocate 986.00 MiB. GPU 3 has a total capacity of 23.64 GiB of which 499.69 MiB is free. Including non-PyTorch memory, this process has 23.15 GiB
[rank3]: memory in use. Of the allocated memory 18.71 GiB is allocated by PyTorch, and 3.83 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting
[rank3]: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
我的ymal文件如下:
data_config:
train_file: train.jsonl
val_file: dev.jsonl
test_file: test.jsonl
num_proc: 1
combine: True
freezeV: True
max_input_length: 256
max_output_length: 64
swanlab: "local" # set to local if don`t use cloud
training_args:
see transformers.Seq2SeqTrainingArguments
output_dir: ./output
max_steps: 3000
needed to be fit for the dataset
learning_rate: 5e-4
settings for data loading
per_device_train_batch_size: 1
gradient_accumulation_steps: 16
dataloader_num_workers: 1
remove_unused_columns: false
settings for saving checkpoints
save_strategy: steps
save_steps: 500
settings for logging
log_level: info
logging_strategy: steps
logging_steps: 10
run_name: "glm4-lora-finetune"
settings for evaluation
per_device_eval_batch_size: 4
eval_strategy: steps
eval_steps: 500
settings for optimizer
adam_epsilon: 1e-6
uncomment the following line to detect nan or inf values
debug: underflow_overflow
predict_with_generate: true
see transformers.GenerationConfig
generation_config:
max_new_tokens: 64
set your absolute deepspeed path here
deepspeed: configs/ds_zero_3.json
bf16: true
#deepspeed: configs/ds_zero_2.json
peft_config:
peft_type: LORA
task_type: CAUSAL_LM
r: 8
lora_alpha: 32
lora_dropout: 0.1
target_modules: ["query_key_value"]
Expected behavior / 期待表现
我希望能够成功lora