When I fine-tune on my own dataset, grad_norm becomes too large, reaching up to 1e6. Have any of you encountered this situation?