Skip to content
View vuiseng9's full-sized avatar

Block or report vuiseng9

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vuiseng9/README.md

Narrow Precision Training

Distributed & Parallel

  • Megatron, Transformed! A Hands-on Megatron-LM Tutorial on Replicating Empirical Trends in Distributed Training and Model Parallelism.
  • Quick Visual Rundown on MLPerf Training v5.1, new Llama3.1-8B, Flux.1 only.

Model Optimization for Efficient Inference

Perhaps useful: dlbp, dockerhub, HuggingFace

Pinned Loading

  1. fp4-training fp4-training Public

    mxfp8/nvfp4 training - from concept to implementation (cuBLASLt + Microxcaling).

    Python

  2. megatron-tutorials megatron-tutorials Public

    Hands-on Megatron-LM tutorials on ablating parallelism and scaling trends. DP → ZeRO → TP → SP → CP → PP → VPP → EP

    Shell

  3. nemo-perf-nvfp4 nemo-perf-nvfp4 Public

    Local NVFP4 Integration and Benchmark

    Dockerfile

  4. mlperf-t5.1-rundown mlperf-t5.1-rundown Public

    A Quick Rundown of MLPerf Training v5.1 on the New Llama3.1-8B, Flux.1 Models

  5. TransformerEngine TransformerEngine Public

    Forked from NVIDIA/TransformerEngine

    A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

    Python

  6. faster-qat faster-qat Public

    Revisiting QAT: QAT vs. native NVFP4/MXFP8 fine-tuning.

    Dockerfile