-
Mila, Université de Montréal
- Montreal, QC, Canada
-
15:10
(UTC -05:00) - https://hiroki11x.github.io/
- @_hiroki11x
- in/hiroki11x
Highlights
Stars
A collection of optimization problems in mathematics
A theory of optimal learning rate schedules in SGD from optimal control theory
Scalable Computing for Advanced Library and Environment
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training (TMLR2025)
Implementatoin for paper: A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias
CellViT: Vision Transformers for Precise Cell Segmentation and Classification
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
Open-source framework for the research and development of foundation models.
The official implementation of MARS: Unleashing the Power of Variance Reduction for Training Large Models
fmchisel: Efficient Compression and Training Algorithms for Foundation Models
Benchmarking Optimizers for LLM Pretraining
[ICLR 2025] How Does Critical Batch Size Scale in Pre-training?
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
Minimal reference implementations for per-example gradient norm methods for computing GNS
Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime