Add floating point variable types: Float16, Float8, BFloat16, TensorFloat32, Float6, Float4 (ESROGUE-730) by ruck314 · Pull Request #1172 · slaclab/rogue

ruck314 · 2026-04-04T17:39:23Z

Summary

Extends Rogue's variable model system with six new floating point types for NVIDIA GPU interoperability.

New types

Type	Format	Bits	Value Range	NVIDIA Support
`pr.Float16` / `pr.Float16BE`	IEEE 754 half-precision, 1s/5e/10m	16	±65504.0	All
`pr.Float8` / `pr.Float8BE`	E4M3, NaN=0x7F, no Inf	8	±448.0	Hopper, Blackwell
`pr.BFloat16` / `pr.BFloat16BE`	1s/8e/7m, IEEE exponent range	16	±3.39e38	Ampere, Hopper, Blackwell
`pr.TensorFloat32` / `pr.TensorFloat32BE`	1s/8e/10m, 19 bits in 4 bytes	19 (32 storage)	±3.40e38	Ampere, Hopper, Blackwell
`pr.Float6` / `pr.Float6BE`	E3M2, no NaN/Inf	6	±28.0	Blackwell
`pr.Float4` / `pr.Float4BE`	E2M1, no NaN/Inf	4	±6.0	Blackwell

Implementation

Each type follows the same full-stack pattern:

C++ model ID constant in Constants.h
Block get/set methods with inline bit-manipulation converters in Block.cpp
Variable dispatch in Variable.cpp
Python Model class with toBytes/fromBytes/fromString/minValue/maxValue
Sphinx API documentation (per-type page + consolidated summary reference)

Special-value handling per spec:

Float16: full IEEE 754 NaN and ±Inf
Float8: NaN = 0x7F, no infinity (clamps to max finite)
BFloat16/TF32: full IEEE 754 NaN and ±Inf
Float6/Float4: no NaN or Inf (clamps to max finite on encode)

Documentation

Added docs/src/pyrogue_tree/core/float_types_summary.rst — a consolidated quick-reference page with format details table, special-value handling, NVIDIA architecture support matrix, and usage example.

Test plan

pytest tests/core/test_float16.py — all boundary values, round-trip, metadata, remote variable integration
pytest tests/core/test_float8.py — all boundary values, round-trip, metadata, remote variable integration
pytest tests/core/test_bfloat16.py — all boundary values, round-trip, metadata, remote variable integration
pytest tests/core/test_tensorfloat32.py — all boundary values, round-trip, metadata, remote variable integration
pytest tests/core/test_float6.py — all 64 bit patterns, boundary values, NaN/Inf clamping, remote variable integration
pytest tests/core/test_float4.py — all 16 bit patterns, boundary values, NaN/Inf clamping, remote variable integration
Full regression: 207 passed
flake8 + cpplint clean

Resolves ESROGUE-730.

Add Float16 model type so users can declare 16-bit half-precision floating point variables with base=pr.Float16. Implements the full C++/Python stack: model ID constant, Block get/set methods with inline IEEE 754 half-float converters, Variable dispatch, Python Model classes (Float16, Float16BE), Sphinx API documentation, and comprehensive tests including boundary values, NaN/Inf, subnormals, and cache key behavior. Resolves ESROGUE-730.

…et/set methods - Add Float8 = 0x0A constant to Constants.h between Float16 and Custom - Declare setFloat8Py/getFloat8Py/setFloat8/getFloat8 in Block.h - Implement floatToFloat8/float8ToFloat E4M3 converters in anonymous namespace - Implement all four Block methods following Float16 pattern - setFloat8Py/getFloat8Py use NPY_FLOAT (no native numpy Float8 dtype)

…nding - Add setFloat8_/getFloat8_ function pointer members to Variable.h - Add public setFloat8/getFloat8 accessor declarations to Variable.h - Add case rim::Float8 to all four switch blocks in Variable.cpp - Implement setFloat8/getFloat8 C++ accessor wrappers in Variable.cpp - Expose Float8 constant to Python in module.cpp

- Add Float8(Model) with E4M3 encoding: modelId=rim.Float8, ndType=float32 - Manual bit manipulation toBytes/fromBytes (no struct.pack format for 8-bit float) - NaN encodes as 0x7F, infinity clamps to max finite (sign|0x7E) - minValue=-448.0, maxValue=448.0 - Float8BE subclass with endianness='big'

- Create tests/core/test_float8.py with 18 tests covering metadata, boundary values round-trip, NaN encoding (0x7F), no-infinity clamping, known bit patterns, and RemoteVariable integration - Add Float8Var (offset=0x30, bitSize=8) to ModelVariableDevice in test_model_variables.py and exercise it in round-trip test - All 22 tests pass

- Create docs/src/api/python/pyrogue/float8.rst with autoclass directives - Add float8 to Models toctree in index.rst (after float16) - Add Float8/Float8BE rows to Built-In Model Types table in model.rst - Update floating-point family list to include Float8 and Float8BE

…thods - Add BFloat16 = 0x0B constant to Constants.h after Float8 - Add setBFloat16Py/getBFloat16Py/setBFloat16/getBFloat16 declarations to Block.h - Add floatToBFloat16/bfloat16ToFloat converter functions (upper 16 bits of float32) - Implement all four BFloat16 Block methods using NPY_FLOAT (no native numpy dtype)

…binding - Add setBFloat16_/getBFloat16_ function pointer members to Variable.h - Add setBFloat16/getBFloat16 public accessor declarations to Variable.h - Add four case rim::BFloat16 dispatch blocks in Variable.cpp constructor - Add setBFloat16/getBFloat16 C++ accessor wrapper methods in Variable.cpp - Initialize setBFloat16_/getBFloat16_ to NULL in constructor - Expose rim.BFloat16 = 0x0B constant via module.cpp Python binding

- BFloat16(Model): 1s/8e/7m format, upper 16 bits of float32 bit pattern - modelId = rim.BFloat16, ndType = np.float32, bitSize must be 16 - toBytes/fromBytes use integer bit manipulation (no struct format code for BF16) - Supports infinity and NaN (unlike Float8 which clamps) - BFloat16BE subclass with endianness = 'big'

- Create tests/core/test_bfloat16.py with 17 tests covering: metadata, wrong bitsize rejection, instance caching, boundary value round-trips, NaN, infinity, known bit patterns, BE endianness, and remote variable integration through C++ Block layer - Update test_model_variables.py: add BFloat16Var at offset=0x32 and include it in round-trip test parametrization

- Create docs/src/api/python/pyrogue/bfloat16.rst with autoclass directives - Add bfloat16 to Models toctree in index.rst after float8 - Add BFloat16 and BFloat16BE rows to Built-In Model Types table in model.rst - Update floating-point family list to include BFloat16 and BFloat16BE

Copilot

Pull request overview

Adds IEEE 754 half-precision (Float16) as a first-class model/variable type across the C++/Python memory stack, with docs and tests.

Changes:

Introduces Float16/Float16BE model types (Python) and a new model ID constant (Float16 = 0x09).
Adds C++ Block/Variable dispatch and Python bindings for Float16 get/set, including half-float conversion helpers.
Updates Sphinx/Doxygen docs and expands tests to cover Float16 metadata, caching, and RemoteVariable integration.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/core/test_model.py	Adds Float16 model metadata, caching, and boundary/NaN round-trip tests (Python struct-based).
tests/core/test_model_variables.py	Adds a Float16 RemoteVariable and round-trip/boundary tests through the remote/block path.
src/rogue/interfaces/memory/Variable.cpp	Wires Float16 modelId dispatch to Block get/set and adds Variable::get/setFloat16 methods.
src/rogue/interfaces/memory/module.cpp	Exposes `Float16` constant into the Python `rogue.interfaces.memory` module.
src/rogue/interfaces/memory/Block.cpp	Implements Float16 Python/C++ get/set plus half-float conversion helpers.
python/pyrogue/_Model.py	Adds `pyrogue.Float16` and `pyrogue.Float16BE` Model implementations.
include/rogue/interfaces/memory/Variable.h	Declares Float16 function pointers and public get/setFloat16 APIs.
include/rogue/interfaces/memory/Constants.h	Adds the `Float16` model ID constant.
include/rogue/interfaces/memory/Block.h	Declares Float16 Block APIs for C++ and Python.
docs/src/api/python/pyrogue/index.rst	Adds `float16` to the Models docs toctree.
docs/src/api/python/pyrogue/float16.rst	New API doc page for `pyrogue.Float16` / `pyrogue.Float16BE`.
docs/src/api/cpp/interfaces/memory/model.rst	Links Float16/Float16BE from the C++ interfaces memory model docs.
docs/src/api/cpp/interfaces/memory/constants.rst	Adds Float16 to the constants reference list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…nd Block methods - Add TensorFloat32 = 0x0C constant to Constants.h - Add floatToTensorFloat32/tensorFloat32ToFloat converter pair (4-byte, mask 0xFFFFE000) - Add setTensorFloat32/getTensorFloat32 C++ and Python Block methods using uint32_t storage

…nding - Add setTensorFloat32_/getTensorFloat32_ function pointers and public accessors to Variable.h - Add NULL init, 4 switch-case dispatches, and C++ wrapper methods to Variable.cpp - Expose TensorFloat32 constant to Python via module.cpp - rim.TensorFloat32 == 12 verified from Python

- TensorFloat32 class with 4-byte storage, 0xFFFFE000 mask, rim.TensorFloat32 modelId - TensorFloat32BE big-endian variant - bitSize=32 assertion, ndType=np.float32, min/max ~3.40e38

…tion - 17 tests: metadata, wrong bitsize, cache, boundary round-trip, NaN, infinity, fromString, bit patterns, BE endianness, remote variable round-trip, remote variable boundary values - TF32Var added to model variables integration device at offset=0x34, bitSize=32 - All 35 existing BFloat16/Float8 tests pass with no regressions

- New tensorfloat32.rst with autoclass directives for TensorFloat32 and TensorFloat32BE - Added tensorfloat32 to Models toctree in pyrogue index.rst - Added TF32 rows to Built-In Model Types table and floating-point family list in model.rst

… get/set methods - Add Float6 = 0x0D constant in Constants.h - Add setFloat6/getFloat6/setFloat6Py/getFloat6Py declarations in Block.h - Implement floatToFloat6/float6ToFloat converters with E3M2 semantics (1s/3e/2m, bias=3, no NaN/Inf) - Implement all four Block methods with NPY_FLOAT numpy array support

…nding - Add setFloat6_/getFloat6_ function pointer members in Variable.h - Add public setFloat6/getFloat6 accessor declarations in Variable.h - Add 4 case rim::Float6 dispatch blocks in Variable.cpp - Add NULL initialization for Float6 function pointers - Add C++ accessor wrapper implementations - Expose Float6 constant to Python in module.cpp

- Float6 E3M2 model with toBytes/fromBytes bit manipulation - NaN/Inf inputs clamp to max finite (+/-28.0) - Float6BE big-endian variant - bitSize=8, ndType=float32, modelId=rim.Float6

- Create test_float6.py with metadata, boundary, bit pattern, NaN/Inf, remote variable tests - Add Float6Var to test_model_variables.py integration test - Fix float6ToFloat C++ subnormal decode using direct formula instead of normalization loop

- Create float6.rst with autoclass directives for Float6/Float6BE - Add float6 to API index toctree - Add Float6/Float6BE rows to model type table and family list

… methods - Add Float4 = 0x0E constant in Constants.h - Add setFloat4/getFloat4/setFloat4Py/getFloat4Py declarations in Block.h - Implement floatToFloat4/float4ToFloat converters with E2M1 semantics (1s/2e/1m, bias=1, no NaN/Inf) - Implement Block set/get methods for Float4 with NPY_FLOAT numpy array support

…nding - Add setFloat4_/getFloat4_ function pointer members in Variable.h - Add public setFloat4/getFloat4 accessor declarations in Variable.h - Add 4 case rim::Float4: blocks in Variable.cpp switch statements - Add NULL initialization for Float4 function pointers - Add C++ accessor wrapper implementations for Float4 - Expose Float4 constant to Python in module.cpp

- Float4 E2M1 model with manual bit manipulation encode/decode - NaN/Inf inputs clamp to max finite (+/-6.0), no special encodings - Float4BE big-endian variant - ndType=float32, bitSize assertion enforces 8-bit storage

- Initialize setFloat16_/getFloat16_ and setFloat8_/getFloat8_ to NULL alongside all other function pointers in Variable constructor; missing initialization left indeterminate values that made NULL checks unreliable - Fix floatToHalf() subnormal encoding: remove redundant >> 13 after mantissa >>= shift which flushed all half-precision subnormals to zero - Fix floatToHalf() NaN encoding: ensure half mantissa payload is non-zero when the float32 NaN payload fits in fewer than 13 bits, preventing mis-encoding as infinity (0x7C00) - Fix halfToFloat() subnormal decoding: use int32_t for the working exponent in the normalization loop so it can go negative without wrapping to UINT32_MAX and producing invalid float results - Add true half-precision subnormal (2**-24) to the Float16 integration test to cover the floatToHalf/halfToFloat denormal path

…type The normalization loop in float8ToFloat checked bit 2 (0x04) for the implicit leading 1, but Float8's 3-bit stored mantissa has its leading bit at position 3 (0x08). Stopping one shift early produced a 2x error for every subnormal value (e.g. 0x01 decoded to 2^-8 instead of 2^-9). Fix by checking bit 3 (0x08) and stripping with 0x07 (not 0x03), and use int32_t for the working exponent so the counter can go negative without wrapping. Add a RemoteVariable subnormal test (2^-9) to cover the C++ Block conversion path.

ruck314 · 2026-04-05T05:48:12Z

Follow-up to the Copilot review fixes (commit 19691ae): audited all other new floating point types for the same 5 bug categories.

Bug 4 (uint32_t underflow in normalization loop) also affected float8ToFloat — and had a related secondary bug: the loop terminated when bit 2 was set (0x04) instead of bit 3 (0x08), stopping one shift early. The combination caused every Float8 subnormal to decode at 2× the correct value (e.g. 0x01 → 2^-8 instead of 2^-9). Fixed in commit d42516f.

Float6 and Float4 are immune to both issues — their ToFloat decoders use direct arithmetic instead of a normalization loop.

BFloat16 and TensorFloat32 have an analogous NaN-payload-truncation edge case (Bug 3): a float32 NaN with payload bits only in the lower 16/13 bits will round-trip as Inf rather than NaN. This is an inherent property of the truncation-based encoding (both formats are defined as bit-truncations of float32), not a coding error.

Bug 1 (NULL init) for Float8 pointers was already caught and fixed in 19691ae alongside the Float16 fix.

cpplint requires controlled statements inside while clause brackets to be on separate lines (whitespace/newline rule).

Copilot

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 12 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

C++: - floatToBFloat16(): detect NaN whose payload is entirely in the lower 16 truncated bits and force a NaN mantissa bit so it cannot become Inf - floatToTensorFloat32(): same fix for the lower 13 masked mantissa bits Python: - BFloat16.toBytes/fromBytes: use self.endianness (big vs little) when packing/unpacking the uint16; BFloat16BE was silently producing little-endian bytes - BFloat16.toBytes: same NaN-preservation logic as C++ fix above - TensorFloat32.toBytes/fromBytes: same endianness and NaN fixes Tests: - test_bfloat16_be_endianness: add known bit-pattern assertions for 1.0, +Inf, -Inf encoded in big-endian byte order - test_tensorfloat32_be_endianness: add known bit-pattern assertion for 1.0 encoded in big-endian byte order Docs: - float_types_summary.rst: fix Float16 model-class column (was "Float") - pyrogue_tree/core/model.rst: add Float16/Float16BE to floating-point model list - api/cpp/constants.rst: add Float8, BFloat16, TF32, Float6, Float4 - api/cpp/model.rst: add anchor sections and Python doc links for all new model types (Float8, BFloat16, TF32, Float6, Float4)

Copilot

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ble rows - floatToFloat6/floatToFloat4: NaN has no meaningful sign so always clamp to positive max; infinity preserves sign to match Python model behavior - Float16.toBytes(): document that struct.pack uses round-to-nearest-even while the C++ floatToHalf uses truncation (difference is at most 1 ULP) - model.rst: add missing Float16/Float16BE rows to Built-In Model Types table - Add remote variable NaN clamping tests for Float6/Float4 through C++ path

Copilot

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix "C++ filed point" → "C++ fixed point" typo in Variable.h section header - Add missing Float16BE to big-endian variant list in float_types_summary.rst

Copilot

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)

The pre-release merge brought the wide signed Fixed32/40/64 test variables into test_model_variables.py alongside the Float16/8/BFloat16/ TF32/Float6/Float4 variables already on this branch. Both sets used offsets 0x28/0x30/0x38, so Block::addVariables rejected the device with a bit-overlap GeneralError and every test in the file failed at construction time. Move the Fixed* entries into the free region after UFixedListVar (0x68/0x70/0x78), which is well within the 0x4000 emulated memory window.

…del.rst Convert the new Float8/BFloat16/TensorFloat32/Float6/Float4 rows (and BE variants) in the Built-In Model Types table to use :ref: cross-references consistent with every other row. Correct the Bit Size column for Float6 and Float4 from 8-bits (the storage size) to 6-bits and 4-bits (the format width). Widen columns so BFloat16/TF32 notes and the tensorfloat32be ref fit within their grid cells.

ruck314 changed the title ~~Add IEEE 754 half-precision (Float16) variable support~~ Add IEEE 754 half-precision (Float16) variable support (ESROGUE-730) Apr 4, 2026

ruck314 requested review from bengineerd and slacrherbst April 4, 2026 17:46

ruck314 added 9 commits April 4, 2026 17:15

bengineerd requested a review from Copilot April 5, 2026 01:09

Copilot started reviewing on behalf of bengineerd April 5, 2026 01:10 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

ruck314 added 13 commits April 4, 2026 18:40

feat(03-02): add TensorFloat32 and TensorFloat32BE Python model classes

4784631

- TensorFloat32 class with 4-byte storage, 0xFFFFE000 mask, rim.TensorFloat32 modelId - TensorFloat32BE big-endian variant - bitSize=32 assertion, ndType=np.float32, min/max ~3.40e38

feat(04-02): add Float6 and Float6BE Python model classes

e8e17ec

- Float6 E3M2 model with toBytes/fromBytes bit manipulation - NaN/Inf inputs clamp to max finite (+/-28.0) - Float6BE big-endian variant - bitSize=8, ndType=float32, modelId=rim.Float6

docs(04-02): add Float6 Sphinx documentation and model table entries

c1c1cc4

- Create float6.rst with autoclass directives for Float6/Float6BE - Add float6 to API index toctree - Add Float6/Float6BE rows to model type table and family list

feat(05-02): add Float4 and Float4BE Python model classes

a1801b0

- Float4 E2M1 model with manual bit manipulation encode/decode - NaN/Inf inputs clamp to max finite (+/-6.0), no special encodings - Float4BE big-endian variant - ndType=float32, bitSize assertion enforces 8-bit storage

ruck314 changed the title ~~Add IEEE 754 half-precision (Float16) variable support (ESROGUE-730)~~ Add floating point variable types: Float8, BFloat16, TensorFloat32, Float6, Float4 (ESROGUE-730) Apr 5, 2026

ruck314 added 2 commits April 4, 2026 22:38

Fix cpplint: expand single-line while body in float8ToFloat

267f549

cpplint requires controlled statements inside while clause brackets to be on separate lines (whitespace/newline rule).

ruck314 requested a review from Copilot April 5, 2026 05:56

Copilot started reviewing on behalf of ruck314 April 5, 2026 05:57 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

ruck314 changed the title ~~Add floating point variable types: Float8, BFloat16, TensorFloat32, Float6, Float4 (ESROGUE-730)~~ Add floating point variable types: Float16, Float8, BFloat16, TensorFloat32, Float6, Float4 (ESROGUE-730) Apr 5, 2026

ruck314 requested a review from Copilot April 5, 2026 06:11

Copilot started reviewing on behalf of ruck314 April 5, 2026 06:11 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Comment thread src/rogue/interfaces/memory/Block.cpp Outdated

Comment thread src/rogue/interfaces/memory/Block.cpp Outdated

Comment thread python/pyrogue/_Model.py

Comment thread docs/src/pyrogue_tree/core/model.rst Outdated

Comment thread tests/core/test_tensorfloat32.py

ruck314 requested a review from Copilot April 5, 2026 06:27

Copilot started reviewing on behalf of ruck314 April 5, 2026 06:28 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Comment thread include/rogue/interfaces/memory/Variable.h Outdated

Comment thread python/pyrogue/_Model.py

Comment thread docs/src/pyrogue_tree/core/float_types_summary.rst Outdated

Fix Variable.h comment typo and add Float16BE to docs summary

88ef930

- Fix "C++ filed point" → "C++ fixed point" typo in Variable.h section header - Add missing Float16BE to big-endian variant list in float_types_summary.rst

ruck314 requested a review from Copilot April 5, 2026 14:06

Copilot started reviewing on behalf of ruck314 April 5, 2026 14:06 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

ruck314 added 6 commits April 7, 2026 17:09

Shift new float type constants by 1 to reserve 0x09 for UFixed (PR #1166

ded2696

)

Merge branch 'pre-release' into ESROGUE-730

7b53db1

Merge branch 'pre-release' into ESROGUE-730

682e861

Merge branch 'pre-release' into ESROGUE-730

85e01be

slacrherbst approved these changes Apr 13, 2026

View reviewed changes

ruck314 merged commit 6c470c0 into pre-release Apr 13, 2026
13 of 14 checks passed

ruck314 deleted the ESROGUE-730 branch April 13, 2026 16:50

Conversation

ruck314 commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New types

Implementation

Documentation

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ruck314 commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ruck314 commented Apr 4, 2026 •

edited

Loading