Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/Instruction/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -673,6 +673,8 @@ App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)
## 特定模型参数
特定模型参数可以通过`--model_kwargs`或者环境变量进行设置,例如: `--model_kwargs '{"fps_max_frames": 12}'`或者`FPS_MAX_FRAMES=12`。

以下参数的含义可以在对应模型官方repo或者其推理代码中找到相应含义。

### qwen2_vl, qvq, qwen2_5_vl, mimo_vl, keye_vl
参数含义同`qwen_vl_utils`或者`qwen_omni_utils`库,可以查看[这里](https://github.com/QwenLM/Qwen2.5-VL/blob/main/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L24)。

Expand Down
2 changes: 2 additions & 0 deletions docs/source_en/Instruction/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -692,6 +692,8 @@ Export Arguments include the [basic arguments](#base-arguments) and [merge argum

Specific model arguments can be set using `--model_kwargs` or environment variables, for example: `--model_kwargs '{"fps_max_frames": 12}'` or `FPS_MAX_FRAMES=12`.

The definitions of the parameters listed below can be found in each model’s official repository or in its inference code.

### qwen2_vl, qvq, qwen2_5_vl, mimo_vl, keye_vl
The parameter meanings are the same as in the `qwen_vl_utils` or `qwen_omni_utils` library. You can refer to [here](https://github.com/QwenLM/Qwen2.5-VL/blob/main/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L24)

Expand Down
28 changes: 28 additions & 0 deletions examples/models/deepseek_vl2/train.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# 9GiB
pip install "transformers==4.41.*" "peft==0.11.*"
# pip uninstall autoawq

CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model deepseek-ai/deepseek-vl2-tiny \
--dataset 'AI-ModelScope/LaTeX_OCR:human_handwrite#20000' \
--split_dataset_ratio 0.01 \
--train_type lora \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--freeze_vit true \
--gradient_accumulation_steps 16 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 2 \
--logging_steps 5 \
--max_length 4096 \
--output_dir output \
--warmup_ratio 0.05 \
--dataloader_num_workers 4
27 changes: 27 additions & 0 deletions examples/models/internvl3/train.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# 24GiB
pip install "transformers==4.51.*"

CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model OpenGVLab/InternVL3-8B \
--dataset 'AI-ModelScope/LaTeX_OCR:human_handwrite#20000' \
--split_dataset_ratio 0.01 \
--train_type lora \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--freeze_vit true \
--gradient_accumulation_steps 16 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 2 \
--logging_steps 5 \
--max_length 4096 \
--output_dir output \
--warmup_ratio 0.05 \
--dataloader_num_workers 4
1 change: 0 additions & 1 deletion examples/models/minicpmv/train.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# 10.5GiB
CUDA_VISIBLE_DEVICES=0 \
MAX_PIXELS=1003520 \
swift sft \
--model OpenBMB/MiniCPM-V-4 \
--dataset 'AI-ModelScope/LaTeX_OCR:human_handwrite#20000' \
Expand Down
28 changes: 28 additions & 0 deletions examples/models/ovis2/train.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# 28GiB

pip install "transformers==4.51.*"

CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model AIDC-AI/Ovis2-8B \
--dataset 'modelscope/coco_2014_caption:validation#20000' \
--split_dataset_ratio 0.01 \
--train_type lora \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--freeze_vit true \
--gradient_accumulation_steps 16 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 2 \
--logging_steps 5 \
--max_length 4096 \
--output_dir output \
--warmup_ratio 0.05 \
--dataloader_num_workers 4
2 changes: 1 addition & 1 deletion examples/train/multimodal/ocr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ swift sft \
--save_steps 50 \
--save_total_limit 2 \
--logging_steps 5 \
--max_length 2048 \
--max_length 4096 \
--output_dir output \
--warmup_ratio 0.05 \
--dataloader_num_workers 4
7 changes: 4 additions & 3 deletions swift/trainers/mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@
from transformers.integrations import is_deepspeed_zero3_enabled
from transformers.modeling_utils import unwrap_model
from transformers.trainer import (OPTIMIZER_NAME, PREFIX_CHECKPOINT_DIR, SCHEDULER_NAME, TRAINER_STATE_NAME,
DeepSpeedSchedulerWrapper, ParallelMode, TrainerCallback, reissue_pt_warnings)
from transformers.trainer_utils import EvalPrediction, IntervalStrategy, SaveStrategy
ParallelMode, TrainerCallback, reissue_pt_warnings)
from transformers.trainer_utils import EvalPrediction, IntervalStrategy
from transformers.utils import is_torch_npu_available

from swift.hub import get_hub
Expand Down Expand Up @@ -459,7 +459,8 @@ def _save_checkpoint(self, *args, **kwargs):
return result

def _save_flash_checkpoint(self, model, trial, metrics=None):

from transformers.trainer import DeepSpeedSchedulerWrapper
from transformers.trainer_utils import SaveStrategy
Comment on lines +462 to +463
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The local imports for DeepSpeedSchedulerWrapper and SaveStrategy will fail with newer versions of transformers (e.g., 4.51.* as used in the new shell scripts), as they have been removed. This will cause a runtime ImportError when use_flash_ckpt is enabled.

To ensure compatibility across transformers versions, these imports should be handled with try...except blocks. Additionally, the usage of DeepSpeedSchedulerWrapper needs to be guarded.

Here is the suggested change for the imports:

And at line 498, the check should be updated to:

is_deepspeed_custom_scheduler = (
    self.is_deepspeed_enabled
    and DeepSpeedSchedulerWrapper is not None
    and not isinstance(self.lr_scheduler, DeepSpeedSchedulerWrapper)
)
Suggested change
from transformers.trainer import DeepSpeedSchedulerWrapper
from transformers.trainer_utils import SaveStrategy
try:
from transformers.trainer import DeepSpeedSchedulerWrapper
except ImportError:
DeepSpeedSchedulerWrapper = None # For transformers>=4.43
try:
from transformers.trainer_utils import SaveStrategy
except ImportError:
from transformers.trainer_utils import IntervalStrategy as SaveStrategy # For transformers>=4.42

from dlrover.trainer.torch.flash_checkpoint.hf_trainer import HfDdpCheckpointer, HfDeepSpeedCheckpointer
run_dir = self._get_output_dir(trial=trial)

Expand Down
Loading