gpt padding.py proves the forward(), using a small part of dataset. But the shape of the output is to be considered.