-
Notifications
You must be signed in to change notification settings - Fork 163
Open
Description
Hello, thanks for releasing the code!
Is it possible that you also release the exact hyperparameters that you used to obtain the results of Table 2 (C4 dataset experiments)?
I'm especially interested in the optimal learning rates that you found for each model-method configuration based on the tuning you mention in the appendix section C.1:
"For all methods on each size of models (from 60M to 1B), we tune their favorite learning rate from a set of {0.01, 0.005, 0.001, 0.0005, 0.0001}, and the best learning rate is chosen based on the validation perplexity."
Releasing those hyperparameters would be a great help as I'm trying to replicate the results of your paper.
Wenhua-Hu
Metadata
Metadata
Assignees
Labels
No labels