Table 2 hyperparameters

Hello, thanks for releasing the code! 

Is it possible that you also release the exact hyperparameters that you used to obtain the results of Table 2 (C4 dataset experiments)? 

I'm especially interested in the optimal learning rates that you found for each model-method configuration based on the tuning you mention in the appendix section C.1:

"For all methods on each size of models (from 60M to 1B), we tune their favorite learning rate from a set of {0.01, 0.005, 0.001, 0.0005, 0.0001}, and the best learning rate is chosen based on the validation perplexity." 

Releasing those hyperparameters would be a great help as I'm trying to replicate the results of your paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Table 2 hyperparameters #68

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Table 2 hyperparameters #68

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions