Online-Softmax-Paper in CUDA and C

Unofficial C and CUDA LeetArxiv implementation of the paper Online Normalizer Calculation for Softmax (Milakov & Gimelshein, 2018)

Complete writeup and coding guide available here

Paper Summary

The 2018 paper Online Normalizer Calculation for Softmax (Milakov & Gimelshein, 2018) addresses two shortcomings with the original softmax:

The naive softmax suffers from underflow and overflow when inputs are extreme (Tianlong, 2025).
The safer version of the naive softmax cannot run in parallel on GPU (Wangkuiyi, 2025)

The authors use a pretty clever trick to calculate the online normalizer in one loop (Tianlong, 2025).

Instead of first finding the maximum, the authors propose rescaling the accumulated sum whenever a new max is encountered.

You can run the Jupyter Notebook locally or online in this Google Colab notebook.

Follow the free writeup here

The C version runs with

gcc Softmax.c -lm -o m.o && ./m.o

Feel free to reach out on Twitter @murage_kibicho or via kibicho.murage@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Online Softmax		Online Softmax
README.md		README.md