Unofficial C and CUDA LeetArxiv implementation of the paper Online Normalizer Calculation for Softmax (Milakov & Gimelshein, 2018)
Complete writeup and coding guide available here
The 2018 paper Online Normalizer Calculation for Softmax (Milakov & Gimelshein, 2018) addresses two shortcomings with the original softmax:
-
The naive softmax suffers from underflow and overflow when inputs are extreme (Tianlong, 2025).
-
The safer version of the naive softmax cannot run in parallel on GPU (Wangkuiyi, 2025)
The authors use a pretty clever trick to calculate the online normalizer in one loop (Tianlong, 2025).
Instead of first finding the maximum, the authors propose rescaling the accumulated sum whenever a new max is encountered.
You can run the Jupyter Notebook locally or online in this Google Colab notebook.
Follow the free writeup here
The C version runs with
gcc Softmax.c -lm -o m.o && ./m.o
Feel free to reach out on Twitter @murage_kibicho or via kibicho.murage@gmail.com