fix: provide explicit scheduler params for LinearWarmupCosineAnnealingLR in train.py#16
Conversation
The scheduler config only specified {"type": "LinearWarmupCosineAnnealingLR"}
without warmup_steps or max_steps. The auto-inference in
stable_pretraining relies on trainer.estimated_stepping_batches, which
is not yet available when configure_optimizers runs, causing a TypeError.
Compute max_steps and warmup_steps explicitly from the dataloader length
and max_epochs, and change the scheduler interval from "epoch" to "step"
to match the step-level scheduling.
Made-with: Cursor
|
Hi! Thank you for the PR! I think the problem might come from your side, as we don't see any issue with the code. Do you have the latest version of stable-pretraining? |
|
Thanks for looking into this! I went ahead and verified from a completely clean setup: # Fresh clone of the original repo
git clone https://github.com/lucas-maes/le-wm.git
cd le-wm
# Fresh venv + install
uv venv --python=3.10
source .venv/bin/activate
uv pip install stable-worldmodel[train,env]
# Confirm latest versions
python -c "import importlib.metadata; print(importlib.metadata.version('stable-pretraining'))"
# → 0.1.6 (latest on PyPI)
# Run training
python train.py data=pushtThis crashes with: Full traceback: The issue is that stable_pretraining's This is reproducible on a clean clone with stable-pretraining==0.1.6. Could you let me know which version/setup you're testing with? Happy to adjust the fix if needed. |
Hi! A small fix to the scheduler params.
Summary
The
LinearWarmupCosineAnnealingLRscheduler config intrain.pyonly specifies{"type": "LinearWarmupCosineAnnealingLR"}without the requiredwarmup_stepsandmax_stepsarguments.This causes training to crash immediately for all tasks (not just PushT) with:
TypeError: LinearWarmupCosineAnnealingLR.__init__() missing 2 required positional arguments: 'warmup_steps' and 'max_steps'Root Cause
stable_pretraining's create_scheduler has a fallback that attempts to auto-infer these parameters from module.trainer.estimated_stepping_batches.
However, at the time configure_optimizers is called by Lightning, the trainer has not yet estimated stepping batches (the dataloader iterator hasn't been created), so estimated_stepping_batches returns None.
The auto-inference code filters out None values, which means max_steps is silently dropped, and the scheduler constructor fails with the missing argument error.
Full traceback:
Changes
In
train.py, computemax_stepsandwarmup_stepsexplicitly from the dataloader length andmax_epochs, then pass them directly to the scheduler config:max_steps = len(train_dataloader) * max_epochs(total training steps)warmup_steps = 1% of max_steps(linear warmup phase)warmup_start_lr=0.0andeta_min=0.0for completeness"epoch"to"step"to match the step-level schedulingThis removes the dependency on the auto-inference path entirely, making it robust across
stable_pretrainingversions.Testing
python train.py data=pushtlaunches training successfully and progresses through epochs (tested on macOS with MPS backend,stable_pretraining==0.1.6,stable_worldmodel==0.0.6)