Hi, thank you for sharing this great work! I’m curious about the pretraining process of this model. Could you please share some details on: The amount of compute used (number/type of GPUs) and the total training time during pretraining. This information would be very helpful for better understanding the scale of the training. Thanks in advance!
Hi, thank you for sharing this great work! I’m curious about the pretraining process of this model. Could you please share some details on: The amount of compute used (number/type of GPUs) and the total training time during pretraining. This information would be very helpful for better understanding the scale of the training. Thanks in advance!