Skip to content

fix: provide explicit scheduler params for LinearWarmupCosineAnnealingLR in train.py#16

Open
kev-hanwen-yang wants to merge 1 commit into
lucas-maes:mainfrom
kev-hanwen-yang:fix/scheduler-params
Open

fix: provide explicit scheduler params for LinearWarmupCosineAnnealingLR in train.py#16
kev-hanwen-yang wants to merge 1 commit into
lucas-maes:mainfrom
kev-hanwen-yang:fix/scheduler-params

Conversation

@kev-hanwen-yang

Copy link
Copy Markdown

Hi! A small fix to the scheduler params.

Summary

The LinearWarmupCosineAnnealingLR scheduler config in train.py only specifies {"type": "LinearWarmupCosineAnnealingLR"} without the required warmup_steps and max_steps arguments.

This causes training to crash immediately for all tasks (not just PushT) with:

TypeError: LinearWarmupCosineAnnealingLR.__init__() missing 2 required positional arguments: 'warmup_steps' and 'max_steps'

Root Cause

stable_pretraining's create_scheduler has a fallback that attempts to auto-infer these parameters from module.trainer.estimated_stepping_batches.

However, at the time configure_optimizers is called by Lightning, the trainer has not yet estimated stepping batches (the dataloader iterator hasn't been created), so estimated_stepping_batches returns None.

The auto-inference code filters out None values, which means max_steps is silently dropped, and the scheduler constructor fails with the missing argument error.

Full traceback:

File ".../stable_pretraining/module.py", line 634, in configure_optimizers
    scheduler = create_scheduler(opt, sched_config, module=self)
File ".../stable_pretraining/optim/lr_scheduler.py", line 193, in create_scheduler
    return fn(optimizer, **params)
TypeError: LinearWarmupCosineAnnealingLR.__init__() missing 2 required positional arguments: 'warmup_steps' and 'max_steps'

Changes

In train.py, compute max_steps and warmup_steps explicitly from the dataloader length and max_epochs, then pass them directly to the scheduler config:

  • max_steps = len(train_dataloader) * max_epochs (total training steps)
  • warmup_steps = 1% of max_steps (linear warmup phase)
  • Also passes warmup_start_lr=0.0 and eta_min=0.0 for completeness
  • Changes scheduler interval from "epoch" to "step" to match the step-level scheduling

This removes the dependency on the auto-inference path entirely, making it robust across stable_pretraining versions.

Testing

  • Verified that python train.py data=pusht launches training successfully and progresses through epochs (tested on macOS with MPS backend, stable_pretraining==0.1.6, stable_worldmodel==0.0.6)
  • The scheduler initializes without error and learning rate warmup + cosine annealing behaves as expected

The scheduler config only specified {"type": "LinearWarmupCosineAnnealingLR"}
without warmup_steps or max_steps. The auto-inference in
stable_pretraining relies on trainer.estimated_stepping_batches, which
is not yet available when configure_optimizers runs, causing a TypeError.

Compute max_steps and warmup_steps explicitly from the dataloader length
and max_epochs, and change the scheduler interval from "epoch" to "step"
to match the step-level scheduling.

Made-with: Cursor
@lucas-maes

Copy link
Copy Markdown
Owner

Hi! Thank you for the PR! I think the problem might come from your side, as we don't see any issue with the code. Do you have the latest version of stable-pretraining?

@kev-hanwen-yang

Copy link
Copy Markdown
Author

Thanks for looking into this! I went ahead and verified from a completely clean setup:

# Fresh clone of the original repo
git clone https://github.com/lucas-maes/le-wm.git
cd le-wm

# Fresh venv + install
uv venv --python=3.10
source .venv/bin/activate
uv pip install stable-worldmodel[train,env]

# Confirm latest versions
python -c "import importlib.metadata; print(importlib.metadata.version('stable-pretraining'))"
# → 0.1.6 (latest on PyPI)

# Run training
python train.py data=pusht

This crashes with:
TypeError: LinearWarmupCosineAnnealingLR.__init__() missing 2 required positional arguments: 'warmup_steps' and 'max_steps'

Full traceback:

File ".../stable_pretraining/module.py", line 634, in configure_optimizers
    scheduler = create_scheduler(opt, sched_config, module=self)
File ".../stable_pretraining/optim/lr_scheduler.py", line 193, in create_scheduler
    return fn(optimizer, **params)
TypeError: LinearWarmupCosineAnnealingLR.__init__() missing 2 required positional arguments: 'warmup_steps' and 'max_steps'

The issue is that stable_pretraining's create_scheduler enters the dict path, pops "type" leaving params = {}, then falls into the auto-inference fallback. The _build_default_params factory accesses trainer.estimated_stepping_batches, but at configure_optimizers time, this either returns None or raises, so the try/except at line 187-190 silently swallows the error and leaves params = {}, causing the constructor to fail.

This is reproducible on a clean clone with stable-pretraining==0.1.6. Could you let me know which version/setup you're testing with? Happy to adjust the fix if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants