Add floating point variable types: Float16, Float8, BFloat16, TensorFloat32, Float6, Float4 (ESROGUE-730)#1172
Conversation
Add Float16 model type so users can declare 16-bit half-precision floating point variables with base=pr.Float16. Implements the full C++/Python stack: model ID constant, Block get/set methods with inline IEEE 754 half-float converters, Variable dispatch, Python Model classes (Float16, Float16BE), Sphinx API documentation, and comprehensive tests including boundary values, NaN/Inf, subnormals, and cache key behavior. Resolves ESROGUE-730.
…et/set methods - Add Float8 = 0x0A constant to Constants.h between Float16 and Custom - Declare setFloat8Py/getFloat8Py/setFloat8/getFloat8 in Block.h - Implement floatToFloat8/float8ToFloat E4M3 converters in anonymous namespace - Implement all four Block methods following Float16 pattern - setFloat8Py/getFloat8Py use NPY_FLOAT (no native numpy Float8 dtype)
…nding - Add setFloat8_/getFloat8_ function pointer members to Variable.h - Add public setFloat8/getFloat8 accessor declarations to Variable.h - Add case rim::Float8 to all four switch blocks in Variable.cpp - Implement setFloat8/getFloat8 C++ accessor wrappers in Variable.cpp - Expose Float8 constant to Python in module.cpp
- Add Float8(Model) with E4M3 encoding: modelId=rim.Float8, ndType=float32 - Manual bit manipulation toBytes/fromBytes (no struct.pack format for 8-bit float) - NaN encodes as 0x7F, infinity clamps to max finite (sign|0x7E) - minValue=-448.0, maxValue=448.0 - Float8BE subclass with endianness='big'
- Create tests/core/test_float8.py with 18 tests covering metadata, boundary values round-trip, NaN encoding (0x7F), no-infinity clamping, known bit patterns, and RemoteVariable integration - Add Float8Var (offset=0x30, bitSize=8) to ModelVariableDevice in test_model_variables.py and exercise it in round-trip test - All 22 tests pass
- Create docs/src/api/python/pyrogue/float8.rst with autoclass directives - Add float8 to Models toctree in index.rst (after float16) - Add Float8/Float8BE rows to Built-In Model Types table in model.rst - Update floating-point family list to include Float8 and Float8BE
…thods - Add BFloat16 = 0x0B constant to Constants.h after Float8 - Add setBFloat16Py/getBFloat16Py/setBFloat16/getBFloat16 declarations to Block.h - Add floatToBFloat16/bfloat16ToFloat converter functions (upper 16 bits of float32) - Implement all four BFloat16 Block methods using NPY_FLOAT (no native numpy dtype)
…binding - Add setBFloat16_/getBFloat16_ function pointer members to Variable.h - Add setBFloat16/getBFloat16 public accessor declarations to Variable.h - Add four case rim::BFloat16 dispatch blocks in Variable.cpp constructor - Add setBFloat16/getBFloat16 C++ accessor wrapper methods in Variable.cpp - Initialize setBFloat16_/getBFloat16_ to NULL in constructor - Expose rim.BFloat16 = 0x0B constant via module.cpp Python binding
- BFloat16(Model): 1s/8e/7m format, upper 16 bits of float32 bit pattern - modelId = rim.BFloat16, ndType = np.float32, bitSize must be 16 - toBytes/fromBytes use integer bit manipulation (no struct format code for BF16) - Supports infinity and NaN (unlike Float8 which clamps) - BFloat16BE subclass with endianness = 'big'
- Create tests/core/test_bfloat16.py with 17 tests covering: metadata, wrong bitsize rejection, instance caching, boundary value round-trips, NaN, infinity, known bit patterns, BE endianness, and remote variable integration through C++ Block layer - Update test_model_variables.py: add BFloat16Var at offset=0x32 and include it in round-trip test parametrization
- Create docs/src/api/python/pyrogue/bfloat16.rst with autoclass directives - Add bfloat16 to Models toctree in index.rst after float8 - Add BFloat16 and BFloat16BE rows to Built-In Model Types table in model.rst - Update floating-point family list to include BFloat16 and BFloat16BE
There was a problem hiding this comment.
Pull request overview
Adds IEEE 754 half-precision (Float16) as a first-class model/variable type across the C++/Python memory stack, with docs and tests.
Changes:
- Introduces
Float16/Float16BEmodel types (Python) and a new model ID constant (Float16 = 0x09). - Adds C++ Block/Variable dispatch and Python bindings for Float16 get/set, including half-float conversion helpers.
- Updates Sphinx/Doxygen docs and expands tests to cover Float16 metadata, caching, and RemoteVariable integration.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/core/test_model.py | Adds Float16 model metadata, caching, and boundary/NaN round-trip tests (Python struct-based). |
| tests/core/test_model_variables.py | Adds a Float16 RemoteVariable and round-trip/boundary tests through the remote/block path. |
| src/rogue/interfaces/memory/Variable.cpp | Wires Float16 modelId dispatch to Block get/set and adds Variable::get/setFloat16 methods. |
| src/rogue/interfaces/memory/module.cpp | Exposes Float16 constant into the Python rogue.interfaces.memory module. |
| src/rogue/interfaces/memory/Block.cpp | Implements Float16 Python/C++ get/set plus half-float conversion helpers. |
| python/pyrogue/_Model.py | Adds pyrogue.Float16 and pyrogue.Float16BE Model implementations. |
| include/rogue/interfaces/memory/Variable.h | Declares Float16 function pointers and public get/setFloat16 APIs. |
| include/rogue/interfaces/memory/Constants.h | Adds the Float16 model ID constant. |
| include/rogue/interfaces/memory/Block.h | Declares Float16 Block APIs for C++ and Python. |
| docs/src/api/python/pyrogue/index.rst | Adds float16 to the Models docs toctree. |
| docs/src/api/python/pyrogue/float16.rst | New API doc page for pyrogue.Float16 / pyrogue.Float16BE. |
| docs/src/api/cpp/interfaces/memory/model.rst | Links Float16/Float16BE from the C++ interfaces memory model docs. |
| docs/src/api/cpp/interfaces/memory/constants.rst | Adds Float16 to the constants reference list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…nd Block methods - Add TensorFloat32 = 0x0C constant to Constants.h - Add floatToTensorFloat32/tensorFloat32ToFloat converter pair (4-byte, mask 0xFFFFE000) - Add setTensorFloat32/getTensorFloat32 C++ and Python Block methods using uint32_t storage
…nding - Add setTensorFloat32_/getTensorFloat32_ function pointers and public accessors to Variable.h - Add NULL init, 4 switch-case dispatches, and C++ wrapper methods to Variable.cpp - Expose TensorFloat32 constant to Python via module.cpp - rim.TensorFloat32 == 12 verified from Python
- TensorFloat32 class with 4-byte storage, 0xFFFFE000 mask, rim.TensorFloat32 modelId - TensorFloat32BE big-endian variant - bitSize=32 assertion, ndType=np.float32, min/max ~3.40e38
…tion - 17 tests: metadata, wrong bitsize, cache, boundary round-trip, NaN, infinity, fromString, bit patterns, BE endianness, remote variable round-trip, remote variable boundary values - TF32Var added to model variables integration device at offset=0x34, bitSize=32 - All 35 existing BFloat16/Float8 tests pass with no regressions
- New tensorfloat32.rst with autoclass directives for TensorFloat32 and TensorFloat32BE - Added tensorfloat32 to Models toctree in pyrogue index.rst - Added TF32 rows to Built-In Model Types table and floating-point family list in model.rst
… get/set methods - Add Float6 = 0x0D constant in Constants.h - Add setFloat6/getFloat6/setFloat6Py/getFloat6Py declarations in Block.h - Implement floatToFloat6/float6ToFloat converters with E3M2 semantics (1s/3e/2m, bias=3, no NaN/Inf) - Implement all four Block methods with NPY_FLOAT numpy array support
…nding - Add setFloat6_/getFloat6_ function pointer members in Variable.h - Add public setFloat6/getFloat6 accessor declarations in Variable.h - Add 4 case rim::Float6 dispatch blocks in Variable.cpp - Add NULL initialization for Float6 function pointers - Add C++ accessor wrapper implementations - Expose Float6 constant to Python in module.cpp
- Float6 E3M2 model with toBytes/fromBytes bit manipulation - NaN/Inf inputs clamp to max finite (+/-28.0) - Float6BE big-endian variant - bitSize=8, ndType=float32, modelId=rim.Float6
- Create test_float6.py with metadata, boundary, bit pattern, NaN/Inf, remote variable tests - Add Float6Var to test_model_variables.py integration test - Fix float6ToFloat C++ subnormal decode using direct formula instead of normalization loop
- Create float6.rst with autoclass directives for Float6/Float6BE - Add float6 to API index toctree - Add Float6/Float6BE rows to model type table and family list
… methods - Add Float4 = 0x0E constant in Constants.h - Add setFloat4/getFloat4/setFloat4Py/getFloat4Py declarations in Block.h - Implement floatToFloat4/float4ToFloat converters with E2M1 semantics (1s/2e/1m, bias=1, no NaN/Inf) - Implement Block set/get methods for Float4 with NPY_FLOAT numpy array support
…nding - Add setFloat4_/getFloat4_ function pointer members in Variable.h - Add public setFloat4/getFloat4 accessor declarations in Variable.h - Add 4 case rim::Float4: blocks in Variable.cpp switch statements - Add NULL initialization for Float4 function pointers - Add C++ accessor wrapper implementations for Float4 - Expose Float4 constant to Python in module.cpp
- Float4 E2M1 model with manual bit manipulation encode/decode - NaN/Inf inputs clamp to max finite (+/-6.0), no special encodings - Float4BE big-endian variant - ndType=float32, bitSize assertion enforces 8-bit storage
- Initialize setFloat16_/getFloat16_ and setFloat8_/getFloat8_ to NULL alongside all other function pointers in Variable constructor; missing initialization left indeterminate values that made NULL checks unreliable - Fix floatToHalf() subnormal encoding: remove redundant >> 13 after mantissa >>= shift which flushed all half-precision subnormals to zero - Fix floatToHalf() NaN encoding: ensure half mantissa payload is non-zero when the float32 NaN payload fits in fewer than 13 bits, preventing mis-encoding as infinity (0x7C00) - Fix halfToFloat() subnormal decoding: use int32_t for the working exponent in the normalization loop so it can go negative without wrapping to UINT32_MAX and producing invalid float results - Add true half-precision subnormal (2**-24) to the Float16 integration test to cover the floatToHalf/halfToFloat denormal path
…type The normalization loop in float8ToFloat checked bit 2 (0x04) for the implicit leading 1, but Float8's 3-bit stored mantissa has its leading bit at position 3 (0x08). Stopping one shift early produced a 2x error for every subnormal value (e.g. 0x01 decoded to 2^-8 instead of 2^-9). Fix by checking bit 3 (0x08) and stripping with 0x07 (not 0x03), and use int32_t for the working exponent so the counter can go negative without wrapping. Add a RemoteVariable subnormal test (2^-9) to cover the C++ Block conversion path.
|
Follow-up to the Copilot review fixes (commit 19691ae): audited all other new floating point types for the same 5 bug categories. Bug 4 (uint32_t underflow in normalization loop) also affected Float6 and Float4 are immune to both issues — their BFloat16 and TensorFloat32 have an analogous NaN-payload-truncation edge case (Bug 3): a float32 NaN with payload bits only in the lower 16/13 bits will round-trip as Inf rather than NaN. This is an inherent property of the truncation-based encoding (both formats are defined as bit-truncations of float32), not a coding error. Bug 1 (NULL init) for Float8 pointers was already caught and fixed in 19691ae alongside the Float16 fix. |
cpplint requires controlled statements inside while clause brackets to be on separate lines (whitespace/newline rule).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
C++: - floatToBFloat16(): detect NaN whose payload is entirely in the lower 16 truncated bits and force a NaN mantissa bit so it cannot become Inf - floatToTensorFloat32(): same fix for the lower 13 masked mantissa bits Python: - BFloat16.toBytes/fromBytes: use self.endianness (big vs little) when packing/unpacking the uint16; BFloat16BE was silently producing little-endian bytes - BFloat16.toBytes: same NaN-preservation logic as C++ fix above - TensorFloat32.toBytes/fromBytes: same endianness and NaN fixes Tests: - test_bfloat16_be_endianness: add known bit-pattern assertions for 1.0, +Inf, -Inf encoded in big-endian byte order - test_tensorfloat32_be_endianness: add known bit-pattern assertion for 1.0 encoded in big-endian byte order Docs: - float_types_summary.rst: fix Float16 model-class column (was "Float") - pyrogue_tree/core/model.rst: add Float16/Float16BE to floating-point model list - api/cpp/constants.rst: add Float8, BFloat16, TF32, Float6, Float4 - api/cpp/model.rst: add anchor sections and Python doc links for all new model types (Float8, BFloat16, TF32, Float6, Float4)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ble rows - floatToFloat6/floatToFloat4: NaN has no meaningful sign so always clamp to positive max; infinity preserves sign to match Python model behavior - Float16.toBytes(): document that struct.pack uses round-to-nearest-even while the C++ floatToHalf uses truncation (difference is at most 1 ULP) - model.rst: add missing Float16/Float16BE rows to Built-In Model Types table - Add remote variable NaN clamping tests for Float6/Float4 through C++ path
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix "C++ filed point" → "C++ fixed point" typo in Variable.h section header - Add missing Float16BE to big-endian variant list in float_types_summary.rst
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The pre-release merge brought the wide signed Fixed32/40/64 test variables into test_model_variables.py alongside the Float16/8/BFloat16/ TF32/Float6/Float4 variables already on this branch. Both sets used offsets 0x28/0x30/0x38, so Block::addVariables rejected the device with a bit-overlap GeneralError and every test in the file failed at construction time. Move the Fixed* entries into the free region after UFixedListVar (0x68/0x70/0x78), which is well within the 0x4000 emulated memory window.
…del.rst Convert the new Float8/BFloat16/TensorFloat32/Float6/Float4 rows (and BE variants) in the Built-In Model Types table to use :ref: cross-references consistent with every other row. Correct the Bit Size column for Float6 and Float4 from 8-bits (the storage size) to 6-bits and 4-bits (the format width). Widen columns so BFloat16/TF32 notes and the tensorfloat32be ref fit within their grid cells.
Summary
Extends Rogue's variable model system with six new floating point types for NVIDIA GPU interoperability.
New types
pr.Float16/pr.Float16BEpr.Float8/pr.Float8BEpr.BFloat16/pr.BFloat16BEpr.TensorFloat32/pr.TensorFloat32BEpr.Float6/pr.Float6BEpr.Float4/pr.Float4BEImplementation
Each type follows the same full-stack pattern:
Constants.hBlock.cppVariable.cppModelclass withtoBytes/fromBytes/fromString/minValue/maxValueSpecial-value handling per spec:
0x7F, no infinity (clamps to max finite)Documentation
Added
docs/src/pyrogue_tree/core/float_types_summary.rst— a consolidated quick-reference page with format details table, special-value handling, NVIDIA architecture support matrix, and usage example.Test plan
pytest tests/core/test_float16.py— all boundary values, round-trip, metadata, remote variable integrationpytest tests/core/test_float8.py— all boundary values, round-trip, metadata, remote variable integrationpytest tests/core/test_bfloat16.py— all boundary values, round-trip, metadata, remote variable integrationpytest tests/core/test_tensorfloat32.py— all boundary values, round-trip, metadata, remote variable integrationpytest tests/core/test_float6.py— all 64 bit patterns, boundary values, NaN/Inf clamping, remote variable integrationpytest tests/core/test_float4.py— all 16 bit patterns, boundary values, NaN/Inf clamping, remote variable integrationResolves ESROGUE-730.