Skip to content

Conversation

@jonb377
Copy link

@jonb377 jonb377 commented Aug 10, 2023

It seems directly initializing onto the XLA device impacts the steady-state HLO and increases memory usage. This change will first initialize on CPU, then move the tensors to the XLA device.

@jonb377 jonb377 requested a review from alanwaketan August 10, 2023 00:56
@jonb377 jonb377 self-assigned this Aug 10, 2023
@jonb377 jonb377 requested a review from JackCaoG August 10, 2023 00:57
Copy link
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jonb377
Copy link
Author

jonb377 commented Aug 10, 2023

Verified with a run of 70B, we see an improvement on both memory utilization and MFU

@jonb377 jonb377 merged commit e169167 into llama2-google-next-training Aug 10, 2023
@jonb377 jonb377 deleted the jonbolin-cpu-init branch August 10, 2023 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants