-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
I am using my own dataloader which produced data on the fly. But the problem is that after the first epoch, only about 1/8 of the dataloader is used for each epoch:
Epoch 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:46<00:00, 0.69it/s, v_num=0]Epoch 0, global step 32: 'train/nll' reached 9.21892 (best 9.21892), saving model to '/home/akasaei/shayan/other/duo/outputs/openwebtext/2025.09.15/150638/checkpoints/best.ckpt' as top 1
Epoch 1: 12%|███████████▌ | 4/32 [00:14<01:44, 0.27it/s, v_num=0]Epoch 1, global step 36: 'train/nll' reached 9.15495 (best 9.15495), saving model to '/home/akasaei/shayan/other/duo/outputs/openwebtext/2025.09.15/150638/checkpoints/best.ckpt' as top 1
Epoch 2: 12%|███████████▌ | 4/32 [00:12<01:29, 0.31it/s, v_num=0]Epoch 2, global step 40: 'train/nll' reached 9.13006 (best 9.13006), saving model to '/home/akasaei/shayan/other/duo/outputs/openwebtext/2025.09.15/150638/checkpoints/best.ckpt' as top 1
Epoch 3: 12%|███████████▌ | 4/32 [00:13<01:32, 0.30it/s, v_num=0]Epoch 3, global step 44: 'train/nll' reached 9.10235 (best 9.10235), saving model to '/home/akasaei/shayan/other/duo/outputs/openwebtext/2025.09.15/150638/checkpoints/best.ckpt' as top 1
Epoch 4: 12%|███████████▌ | 4/32 [00:12<01:30, 0.31it/s, v_num=0]Epoch 4, global step 48: 'train/nll' reached 9.07143 (best 9.07143), saving model to '/home/akasaei/shayan/other/duo/outputs/openwebtext/2025.09.15/150638/checkpoints/best.ckpt' as top 1
The only change I made to the config file was setting limit_val_batches: 0. I also tried with other dataset sizes but still 1/8 of the data was used. I am using AR parametrization and I previously tested my dataloader on smiles-mdlm and did not face such issue. Any idea why this happens?
Metadata
Metadata
Assignees
Labels
No labels