Strange model behavior when taking the softmax in the wrong dimension

https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/model.py#L64

I accidentally changed the softmax dimension to -2 instead of -1 and got incredibly low losses on both the training and validation set when using the [tiny_shakespeare](https://www.tensorflow.org/datasets/catalog/tiny_shakespeare) dataset. However, when generating from the model, I get very low-quality result. What is the explanation ?

My guess is that I'm somehow leaking information when taking the softmax in the wrong dimension, which may explain why the training loss is very low. However, I don't quite get why validation loss would also be low.

![image](https://github.com/karpathy/minGPT/assets/41118094/907e6c53-8fef-4d13-bd8f-f865f1160635)

@karpathy  Any idea why this is the case?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange model behavior when taking the softmax in the wrong dimension #132

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Strange model behavior when taking the softmax in the wrong dimension #132

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions