|
att = F.softmax(att, dim=-1) |
I accidentally changed the softmax dimension to -2 instead of -1 and got incredibly low losses on both the training and validation set when using the tiny_shakespeare dataset. However, when generating from the model, I get very low-quality result. What is the explanation ?
My guess is that I'm somehow leaking information when taking the softmax in the wrong dimension, which may explain why the training loss is very low. However, I don't quite get why validation loss would also be low.

@karpathy Any idea why this is the case?
minGPT/mingpt/model.py
Line 64 in 37baab7
I accidentally changed the softmax dimension to -2 instead of -1 and got incredibly low losses on both the training and validation set when using the tiny_shakespeare dataset. However, when generating from the model, I get very low-quality result. What is the explanation ?
My guess is that I'm somehow leaking information when taking the softmax in the wrong dimension, which may explain why the training loss is very low. However, I don't quite get why validation loss would also be low.
@karpathy Any idea why this is the case?