-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Issues: NVIDIA/Megatron-LM
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[QUESTION] is there any restriction to use allgather with moe_expert_capacity_factor?
#1277
opened Nov 7, 2024 by
Louis-J
[BUG] TP-comm-overlap bug when replacing
TELayerNormColumnParallelLinear
into TEColumnParallelLinear
.
#1275
opened Nov 6, 2024 by
wplf
[BUG] The
cached_loss_mask
maybe modified unexpectedly in GPTDataset?
#1269
opened Nov 1, 2024 by
shmily326
[QUESTION] How to use loader_mcore and why it requires torch distributed
#1266
opened Oct 29, 2024 by
KookHoiKim
[ENHANCEMENT] Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining
#1263
opened Oct 28, 2024 by
dhia680
[QUESTION] NVIDIA Megatron Core 0.9.0 does not have shared_experts.py
#1257
opened Oct 25, 2024 by
clarence-lee-sheng
[ENHANCEMENT] Add layer name in a layer to improve code debugging
#1198
opened Oct 4, 2024 by
rybakov
[BUG] "ValueError: optimizer got an empty parameter list" under pipeline parallel
#1166
opened Oct 2, 2024 by
takuya576
Why are not all SMs active when NCCL kernel and compute kernel overlap?[QUESTION]
#1161
opened Sep 27, 2024 by
yu-depend
[QUESTION] Do we really need to call np.arange every time we restart the task?
#1159
opened Sep 26, 2024 by
zyksir
[BUG] Some checkpoint shards don't save / hang on multi-node setups, since v0.7
#1154
opened Sep 23, 2024 by
chotzen
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-10-09.