NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.4k
Star 10.5k

Code
Issues 130
Pull requests 150
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: NVIDIA/Megatron-LM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

130 Open 641 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[QUESTION] is there any restriction to use allgather with moe_expert_capacity_factor?

#1277 opened Nov 7, 2024 by Louis-J

[QUESTION] scaleing MFU calculate

#1276 opened Nov 6, 2024 by ltm920716

[BUG] TP-comm-overlap bug when replacing TELayerNormColumnParallelLinear into TEColumnParallelLinear .

#1275 opened Nov 6, 2024 by wplf

[BUG] training crash when set --tp-comm-overlap

#1274 opened Nov 5, 2024 by ltm920716

[QUESTION] How to Visualize Computational Graph

#1272 opened Nov 2, 2024 by zixianwang2022

[BUG] The cached_loss_mask maybe modified unexpectedly in GPTDataset?

#1269 opened Nov 1, 2024 by shmily326

[BUG] build multimodal dockerfile problem

#1267 opened Oct 30, 2024 by FortuneBush

[QUESTION] How to use loader_mcore and why it requires torch distributed

#1266 opened Oct 29, 2024 by KookHoiKim

[ENHANCEMENT] Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining

#1263 opened Oct 28, 2024 by dhia680

[BUG] Flash attention cannot be applied by passing the --use-flash-attn flag when the --use-mcore-models flag is also passed

#1259 opened Oct 26, 2024 by efsotr

[BUG] MoE pre-training does not scale beyond DP dim>8

#1258 opened Oct 25, 2024 by hwang595

[QUESTION] NVIDIA Megatron Core 0.9.0 does not have shared_experts.py

#1257 opened Oct 25, 2024 by clarence-lee-sheng

[BUG] Cannot Save mamba model in distributed training

#1234 opened Oct 22, 2024 by siriusctrl

[BUG] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=5, OpType=ALLREDUCE, NumelIn=1, NumelOut=1, Timeout(ms)=600000) ran for 600013 milliseconds before timing out.

#1207 opened Oct 10, 2024 by takuya576

[QUESTION]Fail to build communication between muti machines

#1202 opened Oct 8, 2024 by zmtttt

[QUESTION] Encoder with more TP than the decoder

#1200 opened Oct 6, 2024 by MlWoo

[ENHANCEMENT] Add layer name in a layer to improve code debugging

#1198 opened Oct 4, 2024 by rybakov

[QUESTION] Why do we need both " train_valid_test_datasets_provider.is_distributed = True" and batched data broadcasting ?

#1196 opened Oct 4, 2024 by rayleizhu

[BUG] "ValueError: optimizer got an empty parameter list" under pipeline parallel

#1166 opened Oct 2, 2024 by takuya576

[QUESTION] About all_reduce order while using CP

#1162 opened Sep 27, 2024 by junjzhang

Why are not all SMs active when NCCL kernel and compute kernel overlap?[QUESTION]

#1161 opened Sep 27, 2024 by yu-depend

[QUESTION] Do we really need to call np.arange every time we restart the task?

#1159 opened Sep 26, 2024 by zyksir

[BUG]TypeError: 'type' object is not subscriptable

#1158 opened Sep 25, 2024 by wangchaottss

[QUESTION] How to enable ZeRO 2/3 stages ?

#1156 opened Sep 24, 2024 by polisettyvarma

[BUG] Some checkpoint shards don't save / hang on multi-node setups, since v0.7

#1154 opened Sep 23, 2024 by chotzen

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-10-09.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly