Skip to content

RoPE scaling configuration not applied when using mcore_adapter for training #9589

@sijyy

Description

@sijyy

Reminder

  • I have read the above rules and searched the existing issues.

System Info

When training a model with the MCore Adapter, the RoPE scaling configuration does not take effect correctly. This affects both the rope_scaling settings in the model's config.json and those provided via ModelArguments.

As a result, the RoPE scaling parameters are not properly passed to Megatron-LM, which can lead to incorrect attention behavior.

Reproduction

from mcore_adapter.models import AutoModel
model = AutoModel.from_pretrained(model_args.model_name_or_path, training_args)
export DISTRIBUTED_ARGS="
    --nproc_per_node 8 \
    --nnodes 4 \
    --node_rank $node_rank \
    --master_addr $master_addr \
    --master_port $master_port
"

USE_MCA=1 torchrun $DISTRIBUTED_ARGS src/train.py \
    --model_name_or_path Qwen3-32B \
    --do_train \
    --stage sft \
    --finetuning_type full \
    --dataset <any long context dataset> \
    --preprocessing_num_workers 8 \
    --cutoff_len 65536 \
    --rope_scaling linear \
    --template qwen3 \
    --output_dir saves/mca/qwen3_32b_65536_scaling \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --num_train_epochs 2 \
    --learning_rate 3e-6 \
    --logging_steps 1 \
    --max_steps 3 \
    --save_steps 100 \
    --lr_scheduler_type cosine \
    --bf16 \
    --tensor_model_parallel_size 8 \
    --sequence_parallel true \
    --pipeline_model_parallel_size 4 \
    --bias_activation_fusion true \
    --apply_rope_fusion true \
    --overlap_grad_reduce true \
    --use_distributed_optimizer true \
    --overlap_param_gather true \
    --recompute_granularity full

Others

Related Link: alibaba/ROLL#287

Metadata

Metadata

Labels

bugSomething isn't workingpendingThis problem is yet to be addressed

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions