Skip to content
View martin-kukla's full-sized avatar

Block or report martin-kukla

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. tritex tritex Public

    Pre-Training LLMs in Triton from the first principle. It replicates GPT2 (1.6B) with 57.5% MFU on A100 SXM.

    Python 9

  2. pre-tjax pre-tjax Public

    Transformers written from first principle in JAX/Torch.Func/Triton; Comparison of their training efficiency on 1GPU

    Python 2

  3. rm-for-rank-torchtune rm-for-rank-torchtune Public

    TorchTune recipes for ranking using RM: ORPO recipe (single GPU + DDP) + DDP for DPO (to avoid existing bug in FSDP) + ranking evals

    Python 3

  4. distributed-llm-code-samples distributed-llm-code-samples Public

    Code samples on how to distribute the LLM training between GPUs/nodes

    Python