Skip to content

Conversation

@Ravenwater
Copy link
Contributor

@Ravenwater Ravenwater commented Aug 22, 2025

bug fixes and completing bfloat16 math library support

Summary by CodeRabbit

  • New Features

    • Bessel function family; extensive bfloat16 math APIs and constants; mixed-precision Conv2D benchmarking apps; bfloat16 command-line tooling.
  • Tests

    • New bfloat16 and Bessel test suites, BFLOAT16 performance benchmarks, expanded configurable math verification harness; some test runners simplified to always succeed.
  • Documentation

    • Added cross-distribution Linux packaging guide and Bessel function overview.
  • Chores

    • Version bumped to v3.83; CI now reruns failed tests and uploads test logs.

Ravenwater and others added 30 commits August 10, 2025 13:22
Fixes odr linking issues with multiple TU
…nt the hypot intermediate so we need to go through double
@Ravenwater Ravenwater self-assigned this Aug 22, 2025
@coderabbitai
Copy link

coderabbitai bot commented Aug 22, 2025

Walkthrough

Minor version bumped to 3.83 across CI/CMake/Docker; BFLOAT → bfloat16 support added widely (types, constants, math wrappers, tests, benchmarks); new mixed‑precision Conv2D framework and apps added; Bessel functions and cfloat long-double constants introduced; verification/test harness refactored and many test/tools migrated.

Changes

Cohort / File(s) Summary
CI/CD & Versioning
CMakeLists.txt, .github/workflows/cmake.yml, docker/build_release_container.sh, docker/build_test_container.sh
Default minor version bumped to 83; GitHub Actions: added ctest run, conditional rerun-failed, upload of LastTest.log; Docker image tags updated to v3.83; BFLOAT build path in CMake switched to static/bfloat16.
bfloat16 core & API
include/sw/universal/number/bfloat16/* (many new/modified headers), .../bfloat16.hpp, .../bfloat16_impl.hpp, .../bfloat16_fwd.hpp, .../numeric_limits.hpp
Introduces/renames bfloat16 headers and forward declarations; adds inline/free helpers (parse, to_binary, abs, sqrt, floor), changes member signatures (setbits, isinteger, ispos), refines NaN/Inf/classification logic and inlines many functions.
bfloat16 math functions
include/sw/universal/number/bfloat16/math/functions/*
Adds many inline math wrappers for bfloat16: classify, exponent, logarithm, pow, sqrt, trigonometry, hyperbolic, hypot, fractional (fmod/remainder/frac), truncate (trunc/round/floor/ceil), min/max, nextafter, error/gamma (erf/erfc/tgamma).
bfloat16 constants
include/sw/universal/number/bfloat16/math/constants/constants.hpp
Adds 17 inline bfloat16 math constants (pi variants, reciprocals, sqrt/log/e constants) exposed in sw::universal.
cfloat / long-double constants
include/sw/universal/number/cfloat/math/constants/constants.hpp, include/sw/math/constants/longdouble_constants.hpp, include/sw/universal/number/cfloat/math/functions/sqrt.hpp
Adds cfloat long-double precision constants header; expands long-double constants suite; repaths cfloat sqrt include.
Mixed‑precision Conv2D frameworks & apps
applications/mixed-precision/dnn/conv2d-1.hpp, .../conv2d-1.cpp, conv2d-2.hpp, conv2d-2.cpp
Adds MixedPrecisionConfig, Tensor4D, DirectConv2D/Conv2D implementations (direct + tiled + placeholder), ConvolutionBenchmark/BenchmarkRunner utilities, and two standalone benchmark apps exercising multiple numeric types.
Benchmarks: bfloat16 & arithmetic
benchmark/performance/arithmetic/CMakeLists.txt, benchmark/performance/arithmetic/bfloat16/*, benchmark/performance/arithmetic/bfloat16/performance.cpp
Adds BFLOAT16 benchmark target, CMake rules and a bfloat16 performance executable covering copy, decode, arithmetic workloads (with optional normalize/convert stubs).
Tests, tooling & migration to bfloat16
static/bfloat16/*, static/appenv/multifile/*, tools/cmd/bfloat.cpp, static/bfloat16/math/*.cpp, static/bfloat16/api/*, static/bfloat16/api/constants.cpp
New and migrated test/runners and examples targeting bfloat16: constants tests, mathlib/pow tests, compilation tests, an appenv polynomial example; updates to includes and small test assertions (e.g., isone).
Verification & test-harness refactor
include/sw/universal/verification/test_suite_mathlib.hpp, include/sw/universal/verification/test_suite.hpp, include/sw/universal/benchmark/performance_runner.hpp
Generalizes Verify* functions to accept RefType and maxSamples, adds TEST_SUITE_MATHLIB_MAX_ERRORS cap, uniform NaN handling and sample caps; explicit typed comparisons (e.g., Scalar(0)) and an explicit cast in conversions.
Bessel functions, docs & tests
include/sw/math/functions.hpp, include/sw/math/functions/bessel.hpp, include/sw/math/functions/bessel.md, numeric/functions/bessel.cpp
Adds comprehensive Bessel family implementations (J, Y, I, K), documents algorithms in bessel.md, exposes bessel via top-level math/functions.hpp, and adds a numeric test harness.
Performance/test flow simplifications
benchmark/performance/arithmetic/posit/performance.cpp, c_api/shim/test/posit/performance.c, mixedprecision/roots/rpoly.cpp
Removes STRESS_TESTING paths and failure-count-based conditional exits; centralizes startup tag printing; decode diagnostics now print hex with type labels; several mains now return EXIT_SUCCESS unconditionally on normal completion.
Math & numerics fixes
include/sw/blas/solvers/qr.hpp, include/sw/universal/number/dd/math/constants/dd_constants.hpp, numeric/functions/factorial.cpp, numeric/utils/lookup_arithmetic.cpp
Simplified redundant bounds check in QR solver; corrected dd sqrt constants (sqrt3, sqrt5); updated license headers/SPDX metadata.
Constexpr & demos
static/cfloat/api/constexpr.cpp, static/cfloat/api/...
Adds constexpr constants demo for cfloat types and test invocations; LONG_DOUBLE_CONSTEXPR_SUPPORT conditional.
Misc small changes
tools/*, static/*, other headers
Various include-path reorganizations, header inlining, and formatting/newline fixes across many files.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant App as conv2d-1/conv2d-2 Main
  participant Bench as ConvolutionBenchmark / BenchmarkRunner
  participant Conv as DirectConv2D / Conv2D
  participant Worker as Parallel Workers

  User->>App: start benchmark
  App->>Bench: benchmark_direct_conv(params)
  Bench->>Bench: allocate/init tensors, warmup
  Bench->>Conv: Conv::forward(input, weights, output, params)
  alt params.use_parallel && N>1
    Conv->>Worker: spawn per-batch tasks (futures/threads)
    Worker-->>Conv: partial outputs
  else
    Conv->>Conv: single-threaded MAC loops
  end
  Conv-->>Bench: output ready
  Bench->>Bench: compute time, GFLOPS, memory
  Bench-->>App: BenchmarkResult
  App-->>User: print results
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

I twitched my whiskers — v3.83,
Bessels bloom and BF16 hops free.
Convs crunch, benchmarks drum a beat,
Constants gleam, tests hum on repeat.
A rabbit signs the patch with glee. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch v3.83

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 32

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (12)
numeric/utils/lookup_arithmetic.cpp (1)

140-143: operator+= should return a reference

The signature promises a reference but the function doesn’t return one. This will trigger undefined behavior when used.

Apply:

-			posit& operator+=(const posit& b) {
+			posit& operator+=(const posit& b) {
 				uint16_t index = (bits << 8) & b.bits;
 				std::cout << index << std::endl;
+				return *this;
 			}
include/sw/universal/number/dd/math/constants/dd_constants.hpp (2)

18-19: Fix incorrect dd_pi_3 constant to true π/3 value

The dd_pi_3 constant was mistakenly defined as π/2. Update its hi/lo pair to the correct split for π/3 as computed with 80-digit precision.

• File: include/sw/universal/number/dd/math/constants/dd_constants.hpp
Line 18

- constexpr dd dd_pi_3     (1.570796326794896558e+00, 6.123233995736766036e-17); // pi/2
+ constexpr dd dd_pi_3     (1.047197551196597853e+00, -1.072081766451090985e-16); // pi/3

No other occurrences of dd_pi_3 were found in the codebase—this change restores correct behavior for any computations relying on π/3.


37-39: Fix incorrect dd_sqrt3 and dd_sqrt5 definitions
The current definitions for dd_sqrt3 and dd_sqrt5 in
include/sw/universal/number/dd/math/constants/dd_constants.hpp
(around lines 37–39) mistakenly duplicate the hi/lo for √2. They must be updated to the correct double-double splits:

Proposed diff:

--- a/include/sw/universal/number/dd/math/constants/dd_constants.hpp
+++ b/include/sw/universal/number/dd/math/constants/dd_constants.hpp
@@ -34,7 +34,7 @@
 constexpr dd dd_sqrt2    (1.41421356237309510e+00, -9.6672933134529122e-17); // sqrt(2)
-constexpr dd dd_sqrt3    (1.41421356237309510e+00, -9.6672933134529122e-17); // sqrt(3)
-constexpr dd dd_sqrt5    (1.41421356237309510e+00, -9.6672933134529122e-17); // sqrt(5)
+constexpr dd dd_sqrt3    (1.73205080756887720e+00,  1.0035084221806903e-16); // sqrt(3)
+constexpr dd dd_sqrt5    (2.23606797749979000e+00, -1.0864230407365012e-16); // sqrt(5)

These hi/lo values were computed via high-precision splitting of √3 and √5.
Please apply these critical fixes before release.

CMakeLists.txt (1)

285-294: SSE3 flag variable mismatch prevents enabling SSE3 on GCC/Clang.

You check COMPILER_HAS_SSE_FLAG but only define COMPILER_HAS_SSE3_FLAG; the condition never succeeds.

Apply:

- check_cxx_compiler_flag("-msse3" COMPILER_HAS_SSE3_FLAG)
...
- if (UNIVRSL_USE_SSE3 AND COMPILER_HAS_SSE_FLAG)
+ check_cxx_compiler_flag("-msse3" COMPILER_HAS_SSE3_FLAG)
...
+ if (UNIVRSL_USE_SSE3 AND COMPILER_HAS_SSE3_FLAG)
mixedprecision/roots/rpoly.cpp (1)

35-43: Tail() has two correctness bugs: size_t underflow and out-of-bounds writes.

  • Underflow: vector.size() - size wraps when size > vector.size().
  • OOB: writing tail[i] while i ranges up to vector.size()-1 exceeds tail's last index (size-1) when startIdx > 0.

Fix:

-template <typename Real>
-sw::numeric::containers::vector<Real> Tail(const sw::numeric::containers::vector<Real>& vector, size_t size){
-    sw::numeric::containers::vector<Real> tail(size);
-    size_t startIdx = vector.size() - size > 0 ? vector.size() - size : 0;
-    for (size_t i = startIdx; i < vector.size(); ++i){
-        tail[i] = vector[i - startIdx];
-    }
-    return tail;
-}
+template <typename Real>
+sw::numeric::containers::vector<Real> Tail(const sw::numeric::containers::vector<Real>& vector, size_t size) {
+    sw::numeric::containers::vector<Real> tail(size); // default-initialized to 0s
+    const size_t startIdx = (size >= vector.size()) ? 0 : (vector.size() - size);
+    for (size_t i = startIdx; i < vector.size(); ++i) {
+        tail[i - startIdx] = vector[i];
+    }
+    return tail;
+}
docs/linux-packages.md (1)

1-303: Backslashes are escaping Markdown and breaking rendering throughout this page

Many headings, emphasis, and identifiers are prefixed with backslashes (e.g., “#”, “##”, “**…**”, “_”), so GitHub will render them literally instead of as headers/emphasis. This significantly degrades readability.

Here’s a representative cleanup for the top section (apply similarly throughout):

- \# Linux Package engineering
+# Linux Package engineering

- Creating Linux packages for your Universal Number Library involves several interconnected processes. Lets break down the comprehensive workflow for packaging, building, testing, and distributing your header-only C++ library across major Linux distributions.
+ Creating Linux packages for your Universal Number Library involves several interconnected processes. Let's break down the comprehensive workflow for packaging, building, testing, and distributing your header-only C++ library across major Linux distributions.

- \## Package Format Selection
+ ## Package Format Selection

- \- \*\*DEB packages\*\* (Debian, Ubuntu, derivatives)
+ - **DEB packages** (Debian, Ubuntu, derivatives)

Also remove escapes before underscores in code fragments:

- set(CPACK\_PACKAGE\_VERSION "${PROJECT\_VERSION}")
+ set(CPACK_PACKAGE_VERSION "${PROJECT_VERSION}")

If helpful, I can provide a quick script to sanitize the whole file with safe regexes.

include/sw/universal/number/bfloat/bfloat16_fwd.hpp (1)

1-14: Forward header uses std::string but doesn’t include

Reliance on transitive includes makes this fragile. Add the include to keep the forward header self-contained.

 #pragma once
@@
-namespace sw { namespace universal {
+#include <string>
+
+namespace sw { namespace universal {
static/bfloat/math/pow.cpp (1)

14-16: Fix SFINAE: current enable_if usage won’t compile

The second template parameter is malformed. You’re passing std::is_floating_point<Ty>::type as the boolean and then accessing ::value on enable_if. Use enable_if_t (or enable_if<..., Ty>::type) with the .value from the trait.

Apply:

-template<typename Ty,
-    typename = typename std::enable_if<std::is_floating_point<Ty>::type, Ty>::value
->
+template<typename Ty,
+         typename = std::enable_if_t<std::is_floating_point<Ty>::value, Ty>
+>
include/sw/universal/number/bfloat/mathlib.hpp (1)

30-46: ipow() mishandles negative exponents (casts to uint64_t) → wrong results

Casting a negative integer exponent to uint64_t wraps to a huge value, producing nonsense or overflow. Implement exponentiation-by-squaring with signed exponent handling; return 1/(a^|b|) for negative b. Use wider intermediates for stability.

-        inline bfloat16 ipow(bfloat16 a, bfloat16 b) {
-            // precondition
-            if (!a.isinteger() || !b.isinteger()) return bfloat16(0);
-
-			//using ComputeType = bfloat16;  bfloat16 does have enough precision to handle the intermediate results
-            using ComputeType = float;
-            ComputeType result{ 1 };
-            ComputeType base = ComputeType(a);
-            uint64_t exp = uint64_t(b);
-            for (;;) {
-                if (exp & 0x1) result *= base;
-                exp >>= 1;
-                if (exp == 0) break;
-                base *= base;
-            }
-            return bfloat16(result);
-        }
+        inline bfloat16 ipow(bfloat16 a, bfloat16 b) {
+            // precondition
+            if (!a.isinteger() || !b.isinteger()) return bfloat16(0);
+
+            using ComputeType = double;  // more headroom for intermediates
+            ComputeType base = static_cast<ComputeType>(a);
+            int64_t exp = static_cast<int64_t>(b);
+            bool neg = (exp < 0);
+            uint64_t uexp = neg ? static_cast<uint64_t>(-exp) : static_cast<uint64_t>(exp);
+
+            ComputeType result{1};
+            while (uexp) {
+                if (uexp & 1u) result *= base;
+                uexp >>= 1;
+                if (uexp == 0) break;
+                base *= base;
+            }
+
+            if (neg) {
+                if (static_cast<ComputeType>(a) == ComputeType(0)) {
+                    return bfloat16(std::numeric_limits<ComputeType>::infinity());
+                }
+                result = ComputeType(1) / result;
+            }
+            return bfloat16(result);
+        }
include/sw/universal/number/bfloat/bfloat16_impl.hpp (2)

444-448: parse() is a stub — operator>> always fails. Implement minimal parsing.

Returning false unconditionally breaks formatted input (operator>> prints an error on every read). Provide at least a basic implementation that supports decimal/hexfloat strings via std::stof and the existing 0b… bitstring via assign().

Apply:

-inline bool parse(const std::string& number, bfloat16& value) {
-	bool bSuccess = false;
-	value.zero();
-	return bSuccess;
-}
+inline bool parse(const std::string& number, bfloat16& value) {
+	value.zero();
+	// Support binary bitstring form used elsewhere in the library: 0b<sign>.<exp>.<frac>
+	if (number.size() >= 2 && number[0] == '0' && number[1] == 'b') {
+		// assign() reports to std::cerr on malformed input and resets to 0
+		value.assign(number);
+		return true; // cannot reliably detect assign() failure, but 0 is valid; leave diagnostics to assign()
+	}
+	// Fallback: decimal / scientific / inf / nan via std::stof
+	try {
+		float f = std::stof(number);
+		value = bfloat16(f);
+		return true;
+	} catch (...) {
+		return false;
+	}
+}

467-472: Bug: to_binary uses “0x0.” prefix for zero case. Should be “0b0.”

Mixed binary/hex prefixes are confusing and inconsistent with the non-zero case.

Apply:

-	s << (bits & mask ? "0b1." : "0x0.");
+	s << (bits & mask ? "0b1." : "0b0.");
include/sw/universal/verification/test_suite_mathlib.hpp (1)

279-301: exp2 verifier uses std::exp for dref filter

The filter should be consistent with exp2. Use std::exp2 for the reference underflow check.

Apply this diff:

-            RefType dref = std::exp(da);
+            RefType dref = std::exp2(da);
🧹 Nitpick comments (82)
include/sw/universal/number/cfloat/math/functions/sqrt.hpp (2)

8-10: Be explicit: include for std::sqrt; optionally gate iostream usage under a trace macro.
Avoid relying on transitive includes for C math and keep iostream out of headers unless needed.

Apply this small header tweak:

+#include <cmath>   // for std::sqrt
 #include <universal/native/ieee754.hpp>
 #include <universal/number/cfloat/math/functions/sqrt_tables.hpp>

If you want to keep the debug trace below without hard-depending on iostream, gate it:

+#if defined(CFLOAT_SQRT_TRACE)
+#include <iostream>
+#endif

37-42: Remove or compile-time gate the debug print inside the Babylonian method.
This is in a header, so unconditional std::cout adds I/O and iostream dependency to all TUs.

Either remove it:

-           std::cout << " x_n+1: " << x_next << " x_n: " << x_n << " diff " << diff << std::endl;

Or gate it behind a macro (together with the conditional include from the previous comment):

+#if defined(CFLOAT_SQRT_TRACE)
            std::cout << " x_n+1: " << x_next << " x_n: " << x_n << " diff " << diff << std::endl;
+#endif
include/sw/universal/number/cfloat/mathlib.hpp (2)

9-22: Consider providing deprecation shims at the old include paths to ease downstream migration.
This keeps external consumers from breaking immediately and allows a clean deprecation window.

Example shim (at include/sw/universal/number/cfloat/math/sqrt.hpp):

#pragma once
// Deprecated: use <universal/number/cfloat/math/functions/sqrt.hpp> instead.
#include <universal/number/cfloat/math/functions/sqrt.hpp>

If acceptable, similar shims can be added for the other moved headers listed in this block.


30-33: Minor naming consistency nit: hasSubnormal(s) vs hasSubnormals.
Template parameter names differ between files (singular vs plural). Not functionally wrong, but harmonizing reduces cognitive load.

Suggestion: standardize on either hasSubnormal/hasSupernormal or hasSubnormals/hasSupernormals across cfloat headers for consistency.

Also applies to: 51-56

numeric/utils/lookup_arithmetic.cpp (1)

95-97: Typo: “uniary” → “unary”

Minor documentation fix to keep comments polished.

Apply:

-// generate a look-up table for uniary operators
+// generate a look-up table for unary operators
numeric/functions/factorial.cpp (1)

1-1: BOM present at start of file

There is a UTF-8 BOM character before the first comment. Some toolchains/scripts may choke on it. Safe to remove.

Apply:

-// factorial.cpp: evaluation of factorials in the posit number systems
+// factorial.cpp: evaluation of factorials in the posit number systems
docker/build_release_container.sh (1)

11-18: Harden the script and de-duplicate version source

  • Add strict bash flags to fail fast.
  • Consider sourcing version from a single file (e.g., scripts/version.sh or CMake cache) to avoid manual bumps in multiple places.

Apply:

-#!/usr/bin/env bash
+#!/usr/bin/env bash
+set -Eeuo pipefail

 MAJOR=v3
 MINOR=83
 VERSION="$MAJOR.$MINOR"

Optionally, centralize version:

- MAJOR=v3
- MINOR=83
- VERSION="$MAJOR.$MINOR"
+. ./scripts/version.sh  # exports VERSION=v3.83
include/sw/blas/solvers/qr.hpp (1)

105-105: Bound check simplification is safe as written, but make it future-proof

Given n is currently set to m, removing the n-bound is correct. To guard against future edits, derive n from Q’s shape and keep a single bound on n.

Apply:

-                size_t n = m;
-                if (j > m - 1) { std::cerr << "Index out of bounds\n"; return false; }
+                size_t n = num_cols(Q);
+                if (j >= n) { std::cerr << "Index out of bounds\n"; return false; }
include/sw/universal/number/dd/math/constants/dd_constants.hpp (1)

49-50: Nit: fix comment wording/parenthesis

Minor doc cleanup.

Apply:

-// 2 / sqrt(pi) = 1 / (sqrt(pi/4)
+// 2 / sqrt(pi) = 1 / sqrt(pi/4)
CMakeLists.txt (1)

102-104: Duplicate option UNIVRSL_BUILD_TYPE_HIERARCHY.

The option is declared twice; keep a single authoritative definition to avoid user confusion.

Also applies to: 112-114

mixedprecision/roots/rpoly.cpp (1)

12-21: Missing include; relies on transitive includes.

This TU uses std::vector (e.g., quadratic roots, convergence checks). Add the header to avoid brittle builds.

 #include <iostream>
 #include <iomanip>
 
 #include <complex>
+#include <vector>
include/sw/universal/number/bfloat/math/functions/minmax.hpp (1)

11-17: Consider IEEE-style NaN-aware min/max and noexcept.

std::min/max do not implement IEEE fmin/fmax semantics for NaNs. If the BF16 math layer aims to mirror standard math behavior, prefer std::fminf/std::fmaxf on float and mark as noexcept.

-inline bfloat16 min(bfloat16 x, bfloat16 y) {
-	return bfloat16(std::min(float(x), float(y)));
+inline bfloat16 min(bfloat16 x, bfloat16 y) noexcept {
+	return bfloat16(std::fminf(float(x), float(y)));
 }
 
-inline bfloat16 max(bfloat16 x, bfloat16 y) {
-	return bfloat16(std::max(float(x), float(y)));
+inline bfloat16 max(bfloat16 x, bfloat16 y) noexcept {
+	return bfloat16(std::fmaxf(float(x), float(y)));
 }

Note: requires #include <cmath>.

include/sw/universal/number/bfloat/math/functions/next.hpp (1)

27-27: inline on nextafter is correct for a header; consider noexcept.

Marking the function as noexcept clarifies intent and may enable better codegen.

-inline bfloat16 nextafter(bfloat16 x, bfloat16 target) {
+inline bfloat16 nextafter(bfloat16 x, bfloat16 target) noexcept {
include/sw/universal/number/bfloat/math/functions/logarithm.hpp (2)

12-14: Unify conversion precision (float vs double) across BF16 math wrappers

pow(bfloat16, …) uses float conversions, while these log wrappers use double. After rounding back to bfloat16, the extra precision rarely changes results but can cost cycles on some targets and is inconsistent. Consider standardizing on float here for consistency and potential perf wins.

Apply this diff if you prefer float like pow.hpp:

-inline bfloat16 log(bfloat16 x) {
-	return bfloat16(std::log(double(x)));
+inline bfloat16 log(bfloat16 x) {
+	return bfloat16(std::log(float(x)));
 }
 
-inline bfloat16 log2(bfloat16 x) {
-	return bfloat16(std::log2(double(x)));
+inline bfloat16 log2(bfloat16 x) {
+	return bfloat16(std::log2(float(x)));
 }
 
-inline bfloat16 log10(bfloat16 x) {
-	return bfloat16(std::log10(double(x)));
+inline bfloat16 log10(bfloat16 x) {
+	return bfloat16(std::log10(float(x)));
 }
 
-inline bfloat16 log1p(bfloat16 x) {
-	return bfloat16(std::log1p(double(x)));
+inline bfloat16 log1p(bfloat16 x) {
+	return bfloat16(std::log1p(float(x)));
 }

Also applies to: 17-19, 22-24, 27-29


12-14: Verify edge-case behavior and add tests (±0, negatives, subnormals, NaN/Inf)

Please ensure log(0) -> -inf, log(negative) and log1p(x≤-1) -> NaN/-inf propagate correctly through bfloat16, including signed zeros and subnormals. Add/extend tests in verification/test_suite_mathlib.hpp accordingly.

I can add targeted cases to VerifyLog/VerifyLog2/VerifyLog10/VerifyLog1p with ULP bounds appropriate for bfloat16. Want me to draft those?

Also applies to: 27-29

c_api/shim/test/posit/performance.c (2)

109-109: Prefer %zu for size_t when printing loop index

The cast to (int) is unnecessary and can truncate if NR_OPS grows. Print size_t directly.

Apply this diff at each occurrence:

-                printf("decode failed for %d\n", (int)i);
+                printf("decode failed for %zu\n", i);

Also applies to: 128-128, 147-147, 166-166


431-447: Unconditional EXIT_SUCCESS may hide regressions in automated environments

At present, main() returns EXIT_SUCCESS regardless of the “COPY/DECODE FAIL” flags printed inside workloads. If this binary ever gates CI, consider an opt-in failure mode (env var or compile-time flag) that exits non-zero on any detected FAIL.

Minimal approach (no signature changes): accumulate a global flag from the workloads and honor it in main() when, e.g., getenv("UNIVERSAL_PERF_STRICT") is set. I can sketch the patch if you want it wired under such an env flag.

include/sw/universal/number/bfloat/math/functions/pow.hpp (1)

12-25: LGTM: inline BF16 pow overloads; consider aligning precision policy with logarithm.hpp

The wrappers are clean and consistent. To avoid mixed policies across BF16 math, decide whether all wrappers should use float or double internally and apply uniformly.

If you standardize on float, no changes here. If on double, mirror the conversions used in logarithm.hpp.

include/sw/math/constants/longdouble_constants.hpp (1)

17-29: Naming consistency: “a_b reads a over b” vs products like ld_3pi/ld_4pi

The header documents the a_b convention (a over b). New product constants (ld_3pi, ld_4pi) don’t follow that convention literally (though ld_3pi_4 already existed). Consider documenting the exception for “k*pi” forms or adding aliases (e.g., ld_3_pi_over_1) to keep naming predictable for users discovering APIs by pattern.

If you want, I can add brief comments or aliases to clarify the pattern without breaking existing names.

include/sw/universal/number/bfloat/math/constants/bfloat_constants.hpp (1)

1-9: Make the header self-sufficient (ensure bfloat16 is visible).

This header assumes bfloat16 is already declared. To prevent include-order hazards, include the defining header here.

Apply this diff near the top:

 #pragma once
 // bfloat_constants.hpp: definition of math constants in bfloat16 precision
 //
 // Copyright (C) 2017 Stillwater Supercomputing, Inc.
 // SPDX-License-Identifier: MIT
 //
 // This file is part of the universal numbers project, which is released under an MIT Open Source license.
 
+#include <universal/number/bfloat/bfloat16.hpp>
+
 namespace sw { namespace universal {
static/cfloat/api/constexpr.cpp (3)

111-114: Remove duplicate 2/pi emission.

2/pi is printed twice (two_pi and two_pi2). Keep a single occurrence for clarity.

Apply this diff:

-	constexpr Cfloat two_pi2 = sw::universal::cf_2_pi;
-	std::cout << "2/pi        : " << to_binary(two_pi2) << " : " << two_pi2 << '\n';
+	// (2/pi) printed once above

And in the double-cast branch:

-	constexpr Cfloat two_pi2 = double(sw::universal::cf_2_pi);
-	std::cout << "2/pi        : " << to_binary(two_pi2) << " : " << two_pi2 << '\n';
+	// (2/pi) printed once above

Also applies to: 164-166, 187-191


100-115: Centralize LONG_DOUBLE_CONSTEXPR_SUPPORT at the TU top.

Defining a preprocessor switch inside a function is unconventional and can surprise readers. Move it next to the other config defines at the top of the file.

Apply this diff:

 #define CFLOAT_THROW_ARITHMETIC_EXCEPTION 1
 #include <universal/number/cfloat/cfloat.hpp>
+#define LONG_DOUBLE_CONSTEXPR_SUPPORT 0
 #include <universal/number/cfloat/math/constants/constants.hpp>

And remove the in-function define:

-#define LONG_DOUBLE_CONSTEXPR_SUPPORT 0

Also applies to: 15-17


164-214: Consider BIT_CAST_CONSTEXPR for portability of constexpr conversions.

Some compilers may not accept constexpr construction of Cfloat from double in all modes. Using the project’s BIT_CAST_CONSTEXPR (already used above) can improve portability.

Example pattern:

-constexpr Cfloat pi = double(sw::universal::cf_pi);
+BIT_CAST_CONSTEXPR Cfloat pi = double(sw::universal::cf_pi);

If you prefer to keep it as-is, at least we now gate long double via LONG_DOUBLE_CONSTEXPR_SUPPORT.

benchmark/performance/arithmetic/bfloat16/CMakeLists.txt (1)

1-3: Use CONFIGURE_DEPENDS with GLOB to pick up new sources without a manual reconfigure.

Plain GLOB can miss newly added files until CMake is rerun manually.

Apply this diff:

-file (GLOB SOURCES "./*.cpp")
+file(GLOB SOURCES CONFIGURE_DEPENDS "./*.cpp")
 
 compile_all("true" "bfloat16" "Benchmarks/Performance/Arithmetic/bfloat16" "${SOURCES}")
.github/workflows/cmake.yml (1)

42-55: Avoid double-running the entire test suite; consolidate into one step or clearly separate responsibilities

You currently:

  • Run CTest once (“Test” with -C).
  • Then run “Run all tests” again (now corrected per the other comment).

This doubles CI time. Either remove the first “Test” step, or fold the rerun into a single step. Two alternatives:

A) Keep separate steps; remove the initial “Test” step:

-      - name: Test
-        working-directory: ${{github.workspace}}/build
-        run: ctest -C ${{env.BUILD_TYPE}}

B) Single-step Bash (needs shell: bash on Windows runners):

-      - name: Test
-        working-directory: ${{github.workspace}}/build
-        run: ctest -C ${{env.BUILD_TYPE}}
+      - name: Test (with rerun-on-failure)
+        shell: bash
+        working-directory: ${{github.workspace}}/build
+        run: |
+          ctest -C "${{env.BUILD_TYPE}}" --output-on-failure || \
+          ctest -C "${{env.BUILD_TYPE}}" --rerun-failed --output-on-failure

Pick one to cut CI time and reduce complexity.

docker/build_test_container.sh (1)

1-29: Harden the script and make tagging behavior consistent across branches

  • Add strict bash flags to fail fast on errors and unset vars.
  • Optional: in the “else” branch also tag :latest for parity with the default branch (if desired policy-wise).

Proposed patch:

 #!/usr/bin/env bash
+set -euo pipefail

@@
 if [[ $# == 0 ]]; then
 	# default is to build with Clang 15
 	docker build --force-rm -t "stillwater/universal:$VERSION" -t stillwater/universal:latest -f Dockerfile.clang15 ..
 else 
@@
-	docker build --force-rm -t "stillwater/universal:$VERSION" --build-arg "target=$TARGET" -f "Dockerfile.$COMPILER" ..
+	docker build --force-rm -t "stillwater/universal:$VERSION" --build-arg "target=$TARGET" -f "Dockerfile.$COMPILER" ..
+  # Uncomment if you want :latest here too:
+  # docker tag "stillwater/universal:$VERSION" stillwater/universal:latest
include/sw/universal/number/bfloat/math/functions/sqrt.hpp (1)

12-15: Minor cleanup: rely on using-import or explicit namespace, not both

Either:

  • Keep using std::sqrt; and call sqrt(...), or
  • Drop the using and keep std::sqrt(...).

Also prefer explicit cast style consistency.

-inline bfloat16 sqrt(bfloat16 x) {
-	using std::sqrt;
-	return bfloat16(std::sqrt(float(x)));
-}
+inline bfloat16 sqrt(bfloat16 x) {
+  using std::sqrt;
+  return bfloat16(sqrt(static_cast<float>(x)));
+}
include/sw/universal/number/bfloat/manipulators.hpp (2)

99-135: Nibble marker for exponent never triggers due to incorrect condition

The condition uses (i - es) > 0 while i ranges [es-1..0], so it’s always false. This prevents nibble markers from ever appearing in the exponent field.

Fix:

-    for (int i = int(es) - 1; i >= 0; --i) {
-      s << cyan << (e.test(static_cast<size_t>(i)) ? '1' : '0');
-      if ((i - es) > 0 && ((i - es) % 4) == 0 && nibbleMarker) s << yellow << '\'';
-    }
+    for (int i = int(es) - 1; i >= 0; --i) {
+      s << cyan << (e.test(static_cast<size_t>(i)) ? '1' : '0');
+      if (i > 0 && (i % 4) == 0 && nibbleMarker) s << yellow << '\'';
+    }

70-76: Typo in comment: odd “¿” character for exponent range

Replace “¿126/¿127” with “-126/-127” for clarity and ASCII cleanliness.

-    // Exponents range from ¿126 to +127 because exponents of ¿127 (all 0s) and +128 (all 1s) are reserved for special numbers.
+    // Exponents range from -126 to +127 because exponents of -127 (all 0s) and +128 (all 1s) are reserved for special numbers.
static/bfloat/math/pow.cpp (2)

7-11: Prefer explicitly including standard headers used in this TU

std::cout, std::setw/std::setprecision, and std::pow are used. Relying on transitive includes from test headers is brittle.

Add minimal direct includes:

 #include <universal/utility/directives.hpp>
 #include <universal/number/bfloat/bfloat.hpp>
 #include <universal/verification/test_suite.hpp>
 #include <universal/verification/test_suite_mathlib.hpp>
+#include <type_traits>
+#include <iostream>
+#include <iomanip>
+#include <cmath>

51-51: Qualify iostream symbols in MANUAL_TESTING block

cout/endl are unqualified; if MANUAL_TESTING is enabled this won’t compile.

-    cout << endl;
+    std::cout << std::endl;
benchmark/performance/arithmetic/CMakeLists.txt (2)

15-15: Guard compile_all against empty GLOB to avoid configuration surprises

If the directory is absent or empty on some consumers’ clones, unguarded compile_all may cause a confusing target with no sources.

-compile_all("true" "benchmark_bfloat"  "Benchmarks/Performance/Arithmetic/bfloat16" "${BFLOAT16_SRC}")
+if (BFLOAT16_SRC)
+  compile_all("true" "benchmark_bfloat"  "Benchmarks/Performance/Arithmetic/bfloat16" "${BFLOAT16_SRC}")
+else()
+  message(WARNING "No bfloat16 arithmetic benchmark sources found under ${CMAKE_CURRENT_LIST_DIR}/bfloat16")
+endif()

15-15: Naming consistency: consider target name ‘benchmark_bfloat16’

Other directories use names aligned with the folder (e.g., cfloat -> benchmark_cfloat). Optional, but it improves discoverability.

include/sw/universal/number/bfloat/math/functions/error_and_gamma.hpp (2)

11-24: Use double for intermediate to improve rounding; also include explicitly

Casting to float before calling std functions slightly reduces available precision before rounding back to bf16. Using double is cheap and can improve final rounding. Also ensure <cmath> is included (don’t rely on transitive includes).

-// Compute the error function erf(x) = 2 over sqrt(PI) times Integral from 0 to x of e ^ (-t)^2 dt
+// Compute the error function erf(x) = 2/sqrt(pi) * ∫[0..x] e^(-t^2) dt
 inline bfloat16 erf(bfloat16 x) {
-    return bfloat16(std::erf(float(x)));
+    return bfloat16(std::erf(double(x)));
 }
 
-// Compute the complementary error function: 1 - erf(x)
+// Compute the complementary error function: 1 - erf(x)
 inline bfloat16 erfc(bfloat16 x) {
-    return bfloat16(std::erfc(float(x)));
+    return bfloat16(std::erfc(double(x)));
 }
 
 // Compute the gamma function:
 inline bfloat16 tgamma(bfloat16 x) {
-    return bfloat16(std::tgamma(float(x)));
+    return bfloat16(std::tgamma(double(x)));
 }

And at the top:

 #pragma once
 // error_gamma.hpp: error and gamma functions for Google Brain floats
+// local includes and dependencies
+#include <cmath>
+#include <universal/number/bfloat/bfloat.hpp>

2-2: Header title nit: file name mismatch

Comment says error_gamma.hpp but file is error_and_gamma.hpp. Rename in the comment to avoid confusion in searches.

applications/mixed-precision/dnn/conv2d-1.cpp (2)

7-15: Include (and optionally ) for the printing

Avoid relying on transitive includes from the header. This TU does direct std::cout logging.

 #include <chrono>
+#include <iostream>

21-26: Trim unused aliases/configs or annotate intended use

Posit80, CFloatMixedConfig, and HybridConfig aren’t used in this TU. Consider removing to reduce reader load, or leave a brief comment indicating they are illustrative placeholders for future comparisons.

Also applies to: 30-35

include/sw/universal/number/bfloat/math/functions/hypot.hpp (2)

1-9: Add missing includes for correctness and self-containment

This header uses std::hypot and bfloat16 but doesn’t include their declarations. Make the header self-sufficient.

 #pragma once
 // hypot.hpp: hypotenuse functions for Google Brain floats
 //
 // Copyright (C) 2017 Stillwater Supercomputing, Inc.
 // SPDX-License-Identifier: MIT
 //
 // This file is part of the universal numbers project, which is released under an MIT Open Source license.
 
+#include <cmath>
+#include <universal/number/bfloat/bfloat.hpp>

44-65: Consolidate hypot variants or gate long double support

Optional: You can reduce duplication by implementing hypotf and hypotl in terms of hypot, or detect long double support and call std::hypotl when available.

Example:

 inline bfloat16 hypot(bfloat16 x, bfloat16 y) {
-    double result = std::hypot(double(x), double(y));
-    return bfloat16(result);
+    return bfloat16(std::hypot(double(x), double(y)));
 }
 
 inline bfloat16 hypotf(bfloat16 x, bfloat16 y) {
-    double result = std::hypot(double(x), double(y));
-    return bfloat16(result);
+    return hypot(x, y);
 }
 
-// GCC doesn't support hypotl, so we use the double version
-//...
 inline bfloat16 hypotl(bfloat16 x, bfloat16 y) {
-    return bfloat16(std::hypot(double(x), double(y)));
+    // prefer std::hypotl if distinct and available
+    return bfloat16(std::hypot(double(x), double(y)));
 }
applications/mixed-precision/dnn/conv2d-2.cpp (3)

1-2: Header filename comment is incorrect

The file banner says "conv2d-1.cpp" but this is conv2d-2.cpp. Fix to avoid confusion in grep-based discovery and tooling.

-// conv2d-1.cpp: mixed-precision convolution benchmark
+// conv2d-2.cpp: mixed-precision convolution benchmark

41-47: Prefer brace-initialization for designated initializers (C++20)

Using parentheses around a braced designated initializer is unconventional and may be rejected by some compilers. Use direct list-initialization with braces only.

-    ConvolutionParameters2D conv2dParams({
+    ConvolutionParameters2D conv2dParams{
         .batch_size = 1, .in_channels = 64, .out_channels = 128,
         .in_height = 28, .in_width = 28,
         .kernel_height = 3, .kernel_width = 3,
         .stride_h = 1, .stride_w = 1, .pad_h = 1, .pad_w = 1,
         .dilation_h = 1, .dilation_w = 1
-        });
+    };

49-51: Remove unused local variable

Conv2D_float conv(conv2dParams); is not used. Drop it to avoid unnecessary construction work.

-    Conv2D_float conv(conv2dParams);
     benchmark.conv2D<Conv2D_float>(conv2dParams);
include/sw/universal/number/bfloat/math/functions/exponent.hpp (1)

12-29: Wrappers are correct; consider constexpr/noexcept where feasible

The convert–compute–cast pattern is consistent. If bfloat16 conversions are constexpr-friendly and you want these usable in constant-evaluation contexts, consider marking these constexpr and noexcept.

include/sw/universal/number/bfloat/math/functions/fractional.hpp (2)

1-3: Header comment mismatches the type

The comment says "classic floating-point cfloats" but this file implements bfloat16 helpers. Update comment to avoid confusion.

-// fractional.hpp: fractional functions for classic floating-point cfloats
+// fractional.hpp: fractional functions for Google Brain bfloat16

16-19: Docstring for remainder is incorrect

std::remainder uses rounding to nearest (ties to even), not truncation. The code is correct; the comment isn't.

-// remainder returns x - n*y, where n = trunc(x/y)
+// remainder returns x - n*y, where n = round-to-nearest-even(x/y)
include/sw/universal/number/bfloat/math/functions/truncate.hpp (1)

31-89: Non-compilable reference block in a public header

Even under #ifdef NOW, carrying a large, non-compilable reference algorithm blob in a public header is distracting and may trigger header scanners or static analyzers. Consider moving it to developer docs or replacing with a short commented URL/reference.

include/sw/universal/number/bfloat/mathlib.hpp (2)

8-22: Consider adding if you’ll return infinities

The ipow fix below uses std::numeric_limits<...>::infinity(). Add the header explicitly.

-#include <cmath>
+#include <cmath>
+#include <limits>

48-55: copysign() is fine; consider delegating to std::copysign for parity with -0 and NaNs

If bfloat16 models IEEE sign-bit semantics (±0, NaNs), delegating to a tested implementation may improve edge-case behavior. Otherwise, current sign-bit flip is acceptable.

include/sw/universal/number/bfloat/math/functions/hyperbolic.hpp (1)

1-9: Make header self-contained (missing includes).

These wrappers use std:: and the bfloat16 type but the header doesn’t include or the bfloat16 definition. If this file is included directly, it risks ODR/compile errors. Include the complete bfloat16 type (not just a fwd decl) because the functions are defined inline and return by value.

Apply:

 #pragma once
 // hyperbolic.hpp: hyperbolic functions for Google brain floats
 //
 // Copyright (C) 2017 Stillwater Supercomputing, Inc.
 // SPDX-License-Identifier: MIT
 //
 // This file is part of the universal numbers project, which is released under an MIT Open Source license.
 
+// required deps for inline definitions returning bfloat16 and calling std::<func>
+#include <universal/number/bfloat/bfloat16.hpp>
+#include <cmath>
 
 namespace sw { namespace universal {
include/sw/universal/number/bfloat/bfloat16_impl.hpp (1)

8-13: Add for std::memcpy usage.

This header calls std::memcpy in multiple places but doesn’t include . Relying on transitive includes is brittle.

Apply:

 #include <string>
 #include <sstream>
 #include <iostream>
 #include <iomanip>
 #include <regex>
+#include <cstring>  // for std::memcpy
include/sw/universal/number/bfloat/math/functions/classify.hpp (1)

1-9: Add required includes.

This header uses std::fpclassify and FP_* macros but doesn’t include . Add it to avoid depending on transitive includes.

Apply:

 #pragma once
 // classify.hpp: classification functions for Google Brain floats
 //
 // Copyright (C) 2017 Stillwater Supercomputing, Inc.
 // SPDX-License-Identifier: MIT
 //
 // This file is part of the universal numbers project, which is released under an MIT Open Source license.
 
+#include <cmath>
include/sw/universal/number/bfloat/math/functions/trigonometry.hpp (1)

51-64: Consider numerically safer cot/sec/csc forms (optional).

For some inputs, 1/std::tan and 1/std::sin may be more straightforward and avoid subtractive cancellation (pi/2 - x) around x ≈ pi/2. Not required, but worth considering.

Potential alternative:

-inline bfloat16 cot(bfloat16 x) {
-	return bfloat16(std::tan(f_pi_2 - float(x)));
-}
+inline bfloat16 cot(bfloat16 x) { return bfloat16(1.0f / std::tan(float(x))); }
static/bfloat/math/mathlib.cpp (2)

7-11: Include for std::setw and std::setprecision.

This TU uses formatting manipulators but doesn’t include .

Apply:

 #include <universal/verification/test_suite_mathlib.hpp>
+#include <iomanip>

161-169: Nit: incorrect diagnostic message references 2.0f while base is 71.0f.

Log line says ipow(2.0f, i) though ‘a’ is set to 71. Use ‘a’ for clarity.

Apply:

-				if (true) std::cerr << "bfloat16 ipow(2.0f, " << i << ") " << to_binary(b) << " != " << to_binary(ref) << '\n';
+				if (true) std::cerr << "bfloat16 ipow(" << float(a) << ", " << i << ") " << to_binary(b) << " != " << to_binary(ref) << '\n';
benchmark/performance/arithmetic/bfloat16/performance.cpp (3)

20-27: Avoid undefined/implementation-defined setbits ranges for large NR_OPS

When NR_OPS exceeds the representable pattern count of Bfloat (e.g., 2^16 for bfloat16), a.setbits(i) and c.setbits(j) will repeatedly wrap or rely on masking inside setbits. Make the masking explicit to exercise all patterns deterministically.

Apply this diff:

-    for (size_t i = 0; i < NR_OPS; ++i,++j) {
-        a.setbits(i);
+    constexpr size_t bitcount = Bfloat::nbits;
+    const auto mask = (bitcount >= 64 ? ~size_t{0} : ((size_t{1} << bitcount) - 1));
+    for (size_t i = 0; i < NR_OPS; ++i,++j) {
+        a.setbits(i & mask);
         b = a;
-        c.setbits(j);
+        c.setbits(j & mask);

Also applies to: 39-42


31-42: Minor: tag and NR_OPS are fine; consider documenting that CopyWorkload validates sign only

Currently COPY FAIL checks only sign(b) vs sign(c). Clarify in the banner or expand the check (equality of all bits) if the goal is full-copy validation.


203-223: Embedded benchmark log has a future date

Comment block says "Date run : 8/23/2025" while today is 2025-08-22. If this is intentional, ignore; otherwise adjust to avoid confusion.

applications/mixed-precision/dnn/conv2d-2.hpp (7)

2-2: Header/file banner mismatch

Comment says "conv2d-1.cpp" but this is conv2d-2.hpp. Update to avoid confusion during grep and doc generation.

Apply this diff:

- // conv2d-1.cpp: mixed-precision convolution benchmark
+ // conv2d-2.hpp: mixed-precision convolution benchmark

34-40: Make size helpers const noexcept

input_size() and weight_size() are accessors; mark them const (and noexcept) for broader usability.

Apply this diff:

-    size_t input_size() {
+    size_t input_size() const noexcept {
         return batch_size * in_channels * in_height * in_width;
     }
-    size_t weight_size() {
+    size_t weight_size() const noexcept {
         return out_channels * in_channels * kernel_height * kernel_width;
     }

82-143: forward_direct lacks input/weight size validation

We compute indices assuming input.size() == input_size() and weights.size() == weight_size(). Add precondition checks to catch misuse, especially since weights/bias are owned by the instance and user-supplied.

Proposed guard:

     void forward_direct(
         const std::vector<InputType>& input,
         std::vector<OutputType>& output,
         bool use_bias = true
     ) {
+        const size_t expected_input = params.input_size();
+        const size_t expected_weights = params.weight_size();
+        if (input.size() != expected_input) {
+            throw std::invalid_argument("forward_direct: input size mismatch");
+        }
+        if (weights.size() != expected_weights) {
+            throw std::invalid_argument("forward_direct: weights size mismatch");
+        }

244-257: Depthwise separable path is a stub

The public API exposes forward_depthwise_separable but it’s unimplemented. Either implement or guard behind a compile-time flag to avoid accidental use.

Apply this diff to make it explicit:

-    void forward_depthwise_separable(
+    [[nodiscard]] bool forward_depthwise_separable(
         const std::vector<InputType>& input,
         std::vector<OutputType>& output,
         const std::vector<WeightType>& pointwise_weights,
         bool use_bias = true
     ) {
-        // This would implement depthwise separable convolution
+        // TODO: implement depthwise separable convolution
         ...
-        // Implementation left as exercise - follows similar pattern
-        // but processes channels independently first
+        (void)input; (void)output; (void)pointwise_weights; (void)use_bias;
+        return false; // not implemented
     }

360-379: Energy model counts unique reads, not total memory traffic

estimate_energy_direct() returns input + weight element counts, ignoring reuse across output positions. With "assuming no caching," memory_reads should scale with total_ops not just unique elements.

Consider: memory_reads ≈ total_outputs * ops_per_output (inputs) + total_outputs * ops_per_output (weights), or document that the metric reflects footprint, not traffic.


410-416: type_tag dependency not explicitly included

This relies on type_tag being available via transitive includes. Add the correct include to make the header self-contained.

Proposed include near the top:

+#include <universal/utility/type_tag.hpp>

491-537: Tile size sweep lacks output validation

The tile-size micro-benchmark measures time only. Add a quick sanity check (NaN/Inf) like in benchmark_conv_method to catch broken kernels.

applications/mixed-precision/dnn/conv2d-1.hpp (4)

117-123: ConvParams: consider noexcept and documenting units

The fields are fine; marking the struct trivially copyable/noexcept and adding brief comments for stride/pad/dilation units (pixels) can help downstream.


137-143: Output shape formula OK for stride/pad/dilation; add guards for invalid configs

For negative or inconsistent combinations (e.g., kernel larger than padded input), H_out/W_out underflow can occur before being used. Add input validation and throw with a clear message.

Proposed guard:

+            if (K_h == 0 || K_w == 0 || params.stride_h == 0 || params.stride_w == 0) {
+                throw std::invalid_argument("Invalid convolution parameters (zero kernel/stride).");
+            }
+            if (H_out <= 0 || W_out <= 0) {
+                throw std::invalid_argument("Output dimension is non-positive; check pad/stride/dilation.");
+            }

64-106: Tensor4D supports CHW/HWC in enum but not in compute_index

compute_index throws for layouts other than NCHW/NHWC while the enum exposes CHW/HWC. Either implement those layouts or document they are unsupported.

If unsupported for now, consider restricting the enum to the supported layouts to avoid runtime surprises.


368-371: type_tag dependency visibility

Like conv2d-2.hpp, get_type_name uses type_tag(T()). Ensure the appropriate include is present in files including this header, or include it here.

include/sw/universal/verification/test_suite_mathlib.hpp (4)

51-69: Sqrt verifier skips 0 and negative patterns by starting at i=1

Starting at i=1 misses zero (and any encoding-dependent negatives that should map to NaN). If intentional, add a comment. Otherwise, start from i=0.

Apply this diff:

-    for (unsigned i = 1; i < NR_TEST_CASES; i++) {
+    for (unsigned i = 0; i < NR_TEST_CASES; i++) {

228-247: exp verifier: rounding filter OK; consider documenting minpos behavior

The softening logic that filters 0.0 and NaN differences is pragmatic. Add a brief comment stating that certain universal backends round underflow to minpos and quiet sNaN, and why we ignore those.


1250-1319: VerifyNextafter: Good quadrant coverage; add NaN/Inf edge cases

Consider adding explicit tests when x or y are NaN/Inf to match the documented behavior in the comment.


85-99: NaN equivalence checks assume TestType::isnan()

Each verifier calls result.isnan() and ref.isnan() where ref is TestType. This is fine for Universal types but will not compile for native floats. If native types are in-scope for these verifiers, guard with std::isnan for arithmetic types.

Sketch:

auto is_nan = [](const auto& v) {
    if constexpr (std::is_arithmetic_v<std::decay_t<decltype(v)>>) return std::isnan(static_cast<double>(v));
    else return v.isnan();
};

Then use is_nan(result) && is_nan(ref).

Also applies to: 120-133, 151-165, 196-205, 249-260, 300-311, 343-356, 373-389, 441-452, 472-486, 506-520, 540-553, 573-586, 604-617, 638-652, 674-688, 705-721, 742-755, 769-788, 801-812, 845-856, 888-900, 926-938, 966-978, 1008-1016, 1067-1075, 1123-1131, 1163-1171, 1195-1203, 1227-1235

include/sw/math/functions/bessel.md (1)

167-175: Fix table formatting for LaTeX expressions.

The table contains escaped LaTeX notation that should be cleaned up for consistency.

-| Function | Behavior | Singular at 0 | Oscillatory? | Implementation |
-
-|---------|----------|----------------|---------------|----------------|
-
-| \( Y_n(x) \) | Diverges at 0, oscillates | Yes | Yes | Recurrence or series |
-
-| \( I_n(x) \) | Exponential growth | No | No | Series or recurrence |
-
-| \( K_n(x) \) | Exponential decay | Yes | No | Recurrence or asymptotics |
+| Function | Behavior | Singular at 0 | Oscillatory? | Implementation |
+|----------|----------|---------------|--------------|----------------|
+| Y_n(x) | Diverges at 0, oscillates | Yes | Yes | Recurrence or series |
+| I_n(x) | Exponential growth | No | No | Series or recurrence |
+| K_n(x) | Exponential decay | Yes | No | Recurrence or asymptotics |
numeric/functions/bessel.cpp (2)

16-21: Duplicate computation of J_0 in test output.

The test computes J_0(x) twice - once with the specialized bessel_j0 function (line 16) and again in the loop with the general bessel_j function when n=0 (line 19). This creates redundant output.

 // Bessel functions of the first kind J_n(x)
 Scalar j0 = bessel_j0<Scalar>(x);
 std::cout << "  J_" << 0 << "(x) = " << j0 << "\n";
-for (unsigned int n = 0; n <= max_order; ++n) {
+for (unsigned int n = 1; n <= max_order; ++n) {
     Scalar jn = bessel_j<Scalar>(n, x);
     std::cout << "  J_" << n << "(x) = " << jn << "\n";
 }

58-68: Hardcoded test value limits test coverage.

The test only exercises x=3.0. Consider parameterizing or adding multiple test values to validate behavior at boundary conditions (near zero, large values).

Consider expanding the test coverage:

-constexpr double x = 3.0;
+// Test multiple values to cover different behavior regions
+const std::vector<double> test_values = {0.1, 1.0, 3.0, 10.0};
 
-// Test with float
-TestBesselFunctions<float>("float", x);
+for (double x : test_values) {
+    std::cout << "\n========== Testing with x = " << x << " ==========\n";
+    
+    // Test with float
+    TestBesselFunctions<float>("float", static_cast<float>(x));
 
-// Test with double
-TestBesselFunctions<double>("double", x);
+    // Test with double
+    TestBesselFunctions<double>("double", x);
 
-// Test with posit<32,2>
-using Posit = posit<32, 2>;
-TestBesselFunctions<Posit>("posit<32,2>", Posit(x));
+    // Test with posit<32,2>
+    using Posit = posit<32, 2>;
+    TestBesselFunctions<Posit>("posit<32,2>", Posit(x));
+}
include/sw/math/functions/bessel.hpp (7)

130-132: Remove commented-out dead code.

There are commented-out variable declarations that serve no purpose and clutter the implementation.

-//T ln_term = -log(x / T(2)) * bessel_i0(x);
 T series = T(0);
-//T x2 = x * x;

138-139: Remove unused commented variable.

 T y = x / T(2);
-//T y2 = y * y;
 series = -log(y) * bessel_i0(x) + T(0.5772156649);  // Euler-Mascheroni γ

139-142: Define mathematical constants centrally.

The code uses hardcoded values for π (3.14159265358979323846) and Euler-Mascheroni constant (0.5772156649) in multiple places. Consider defining these as named constants for better maintainability and precision control.

Add at the beginning of the namespace:

 namespace sw::math::function {
+
+// Mathematical constants
+template<typename T>
+constexpr T pi() { return T(3.14159265358979323846264338327950288419716939937510); }
+
+template<typename T>
+constexpr T euler_gamma() { return T(0.5772156649015328606065120900824024310421593359399235); }

Then update usages:

-series = -log(y) * bessel_i0(x) + T(0.5772156649);  // Euler-Mascheroni γ
+series = -log(y) * bessel_i0(x) + euler_gamma<T>();
-series = sqrt(T(3.14159265358979323846) / (T(2) * x)) * exp(-x);
+series = sqrt(pi<T>() / (T(2) * x)) * exp(-x);

215-215: Uninitialized variable warning.

Variable yn is declared but only conditionally initialized in the loop. While the loop always executes when n >= 2, initializing it would prevent potential static analysis warnings.

-T yn;
+T yn = T(0);

233-233: Initialize variable to prevent warnings.

-T in;
+T in = T(0);

252-252: Initialize variable to prevent warnings.

-T kn;
+T kn = T(0);

33-34: Potential precision loss in factorial computation.

Computing factorial<T>(k + n) could overflow for moderate values of k and n. Consider using logarithmic factorial or gamma function for better numerical stability.

For improved numerical stability with large n values, consider using logarithms:

-T numerator = pow(x / T(2), T(2 * k + n));
-T denominator = factorial<T>(k) * factorial<T>(k + n);
-T term = sign * numerator / denominator;
+// Use log-space computation to avoid overflow
+T log_term = (T(2 * k + n)) * log(abs(x / T(2))) 
+           - log_factorial<T>(k) - log_factorial<T>(k + n);
+T term = sign * exp(log_term);

You would need to add a log_factorial helper function or use lgamma if available.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
static/bfloat/math/mathlib.cpp (1)

14-17: Fix SFINAE: malformed enable_if causes compilation failure

The second template parameter is declared as a type but given a non-type default value (= 0). Use a proper enable_if form (type-parameter or non-type-parameter), or drop the constraint.

Apply one of these fixes (C++14+ first; C++11-friendly alternative second):

-template<typename Ty,
-         typename std::enable_if<std::is_floating_point<Ty>::value, Ty>::type = 0
->
+template<typename Ty,
+         typename = std::enable_if_t<std::is_floating_point<Ty>::value>
+>

or

-template<typename Ty,
-         typename std::enable_if<std::is_floating_point<Ty>::value, Ty>::type = 0
->
+template<typename Ty,
+         typename std::enable_if<std::is_floating_point<Ty>::value>::type* = nullptr
+>
🧹 Nitpick comments (4)
static/bfloat/math/mathlib.cpp (4)

7-11: Include what you use: add missing standard headers

This TU relies on std::type_traits, std::pow, iostream and iomanip but doesn’t include them explicitly. Make the file self-contained to avoid accidental transitive include breakages.

 #include <universal/utility/directives.hpp>
 #include <universal/number/bfloat/bfloat.hpp>
 #include <universal/verification/test_suite.hpp>
 #include <universal/verification/test_suite_mathlib.hpp>
 
+// standard headers used in this file
+#include <type_traits>
+#include <cmath>
+#include <iostream>
+#include <iomanip>

161-174: Correct diagnostic string and remove no-op if(true)

The message says ipow(2.0f, i) while the base is 71.0f. Also, if (true) is redundant.

-				if (true) std::cerr << "bfloat16 ipow(2.0f, " << i << ") " << to_binary(b) << " != " << to_binary(ref) << '\n';
+				std::cerr << "bfloat16 ipow(" << float(a) << ", " << i << ") "
+				          << to_binary(b) << " != " << to_binary(ref) << '\n';

229-236: Clean up stray quote/semicolon in comments

These look like accidental leftovers: the trailing "\n"; is inside a comment. Harmless to compile but noisy.

-	// bfloat16 fractional function validation\n";
+	// bfloat16 fractional function validation
@@
-	// bfloat16 error function validation\n";
+	// bfloat16 error function validation

26-31: Restore std::cout precision after temporary change

Avoid leaving std::cout in a different state for callers. Capture and restore the previous precision.

-	std::cout << std::setprecision(nbits - 2);
+	auto oldPrec = std::cout.precision();
+	std::cout << std::setprecision(nbits - 2);
@@
-	std::cout << std::setprecision(5);
+	std::cout << std::setprecision(oldPrec);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between acecf8c and 041ac9f.

📒 Files selected for processing (1)
  • static/bfloat/math/mathlib.cpp (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-22T18:06:33.627Z
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/verification/test_suite_mathlib.hpp:44-45
Timestamp: 2025-08-22T18:06:33.627Z
Learning: The test suite in include/sw/universal/verification/test_suite_mathlib.hpp is designed for number systems with nbits <= 12, so the expression (1 << nbits) for NR_TEST_CASES will never cause integer overflow in practice.

Applied to files:

  • static/bfloat/math/mathlib.cpp
🧬 Code graph analysis (1)
static/bfloat/math/mathlib.cpp (5)
include/sw/universal/number/bfloat/math/functions/pow.hpp (6)
  • pow (12-15)
  • pow (12-12)
  • pow (17-20)
  • pow (17-17)
  • pow (22-25)
  • pow (22-22)
include/sw/universal/number/bfloat/bfloat16_impl.hpp (21)
  • to_binary (467-490)
  • to_binary (467-467)
  • sqrt (24-24)
  • i (251-259)
  • i (251-251)
  • bfloat16 (82-82)
  • bfloat16 (84-84)
  • bfloat16 (84-84)
  • bfloat16 (85-85)
  • bfloat16 (85-85)
  • bfloat16 (91-123)
  • bfloat16 (91-91)
  • bfloat16 (126-126)
  • bfloat16 (126-126)
  • bfloat16 (127-127)
  • bfloat16 (127-127)
  • bfloat16 (128-128)
  • bfloat16 (128-128)
  • bfloat16 (129-129)
  • bfloat16 (129-129)
  • bfloat16 (130-130)
include/sw/universal/verification/test_suite_mathlib.hpp (6)
  • VerifyTgamma (1215-1244)
  • VerifyTgamma (1215-1215)
  • VerifySqrt (42-71)
  • VerifySqrt (42-42)
  • VerifyPow (369-416)
  • VerifyPow (369-369)
include/sw/universal/number/bfloat/mathlib.hpp (3)
  • ipow (30-46)
  • ipow (30-30)
  • c (52-52)
include/sw/universal/number/bfloat/math/functions/classify.hpp (2)
  • fpclassify (12-14)
  • fpclassify (12-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Run for macos-latest
  • GitHub Check: Run for ubuntu-latest
  • GitHub Check: Run for windows-latest
  • GitHub Check: Run for macos-latest
  • GitHub Check: Run for windows-latest
  • GitHub Check: Run for ubuntu-latest
🔇 Additional comments (2)
static/bfloat/math/mathlib.cpp (2)

191-238: Broad test coverage looks great

Coverage across trig, hyperbolic (with float reference where needed), logs/exp, rounding, fractional ops, and tgamma is comprehensive and matches the bfloat16 precision characteristics you called out. Good balance between deterministic checks and randomized sample caps.


41-41: Remove unused variable test_tag in static/bfloat/math/mathlib.cpp

Verified: the repository search shows only the local declaration in that TU and no references to that symbol — safe to remove.

  • static/bfloat/math/mathlib.cpp — line 41: remove the unused declaration.
-	std::string test_tag    = "mathlib";

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
static/bfloat/math/mathlib.cpp (1)

14-17: Fix SFINAE: second template parameter is declared as a type but given a non-type default; this won’t compile

The template currently declares a type parameter and tries to default it to 0. Use a proper enable_if form (type SFINAE) or drop the constraint entirely.

Apply one of the following:

Option A (type SFINAE, C++14+):

-template<typename Ty,
-         typename std::enable_if<std::is_floating_point<Ty>::value, Ty>::type = 0
->
-void GenerateTestCase(Ty fa, Ty fb) {
+template<typename Ty,
+         typename = std::enable_if_t<std::is_floating_point<Ty>::value>>
+void GenerateTestCase(Ty fa, Ty fb) {

Option B (non-type SFINAE, C++14+):

-template<typename Ty,
-         typename std::enable_if<std::is_floating_point<Ty>::value, Ty>::type = 0
->
-void GenerateTestCase(Ty fa, Ty fb) {
+template<typename Ty,
+         std::enable_if_t<std::is_floating_point<Ty>::value, int> = 0>
+void GenerateTestCase(Ty fa, Ty fb) {

Option C (simplest): remove the constraint if not strictly needed.

This issue mirrors a previously raised concern; it appears to have resurfaced in this file.

🧹 Nitpick comments (7)
static/bfloat/math/mathlib.cpp (7)

7-11: Include what you use (iostream, iomanip, cmath, cstdlib, type_traits)

These headers are used directly (std::cout, std::setw/setprecision, std::pow, EXIT_* macros, std::is_floating_point/enable_if). Avoid relying on transitive includes to keep the TU robust across platforms/compilers.

+#include <iostream>
+#include <iomanip>
+#include <cmath>
+#include <cstdlib>
+#include <type_traits>
 #include <universal/utility/directives.hpp>
 #include <universal/number/bfloat/bfloat.hpp>
 #include <universal/verification/test_suite.hpp>
 #include <universal/verification/test_suite_mathlib.hpp>

26-31: Preserve and restore stream precision instead of forcing 5 globally

Use the prior precision to avoid surprising callers if this TU grows shared code.

-	std::cout << std::setprecision(nbits - 2);
+	auto oldPrec = std::cout.precision();
+	std::cout << std::setprecision(nbits - 2);
 	std::cout << std::setw(nbits) << " -> pow(" << fa << "," << fb << ") = " << std::setw(nbits) << fref << std::endl;
 	std::cout << " -> pow( " << a << "," << b << ") = " << to_binary(power) << " (reference: " << to_binary(ref) << ")   " ;
 	std::cout << (ref == power ? "PASS" : "FAIL") << std::endl << std::endl;
-	std::cout << std::setprecision(5);
+	std::cout << std::setprecision(oldPrec);

41-41: Remove unused variable ‘test_tag’

The local variable is never used and may trigger -Wunused warnings.

-	std::string test_tag    = "mathlib";

53-104: Manual testing block has an early return; subsequent checks in this block are unreachable

Inside MANUAL_TESTING you return at Line 53, making the following ad-hoc checks dead code. Either move the return to the end or conditionally compile only the section you want to run.

-	return 0;
+	// return 0; // move this to the end of the MANUAL_TESTING section if you only want the basic cases

166-172: ipow test: fix unconditional logging and wrong base in message

  • Logging is unconditional via if (true).
  • The message hardcodes 2.0f but the base is a = 71.0f.
-			if (b != ref) {
-				if (true) std::cerr << "bfloat16 ipow(2.0f, " << i << ") " << to_binary(b) << " != " << to_binary(ref) << '\n';
-				failures++;
-			}
+			if (b != ref) {
+				if (reportTestCases) {
+					std::cerr << "bfloat16 ipow(" << a << ", " << i << ") "
+					          << to_binary(b) << " != " << to_binary(ref) << '\n';
+				}
+				++failures;
+			}

230-230: Stray “\n";” in comments

These look like copy/paste artifacts and can confuse readers.

-	// bfloat16 fractional function validation\n";
+	// bfloat16 fractional function validation
-	// bfloat16 error function validation\n";
+	// bfloat16 error function validation

Also applies to: 234-234


34-34: Remove unused STRESS_TESTING define (currently not referenced)

If stress mode will be added later, keep it; otherwise delete to reduce noise.

-#define STRESS_TESTING 0
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fdc6df8 and 9ecd62c.

📒 Files selected for processing (1)
  • static/bfloat/math/mathlib.cpp (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-22T18:07:22.707Z
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/number/bfloat/math/functions/truncate.hpp:1-9
Timestamp: 2025-08-22T18:07:22.707Z
Learning: In the Universal Numbers Library bfloat16 mathlib, individual function headers under include/sw/universal/number/bfloat/math/functions/ (like truncate.hpp) are designed to be included only through the main mathlib aggregator header, not standalone. Dependencies like <cmath> and bfloat16 headers are intentionally satisfied at the mathlib level rather than in individual function headers. This is an architectural design choice to maintain a clean modular structure.

Applied to files:

  • static/bfloat/math/mathlib.cpp
📚 Learning: 2025-08-22T18:06:33.627Z
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/verification/test_suite_mathlib.hpp:44-45
Timestamp: 2025-08-22T18:06:33.627Z
Learning: The test suite in include/sw/universal/verification/test_suite_mathlib.hpp is designed for number systems with nbits <= 12, so the expression (1 << nbits) for NR_TEST_CASES will never cause integer overflow in practice.

Applied to files:

  • static/bfloat/math/mathlib.cpp
🧬 Code graph analysis (1)
static/bfloat/math/mathlib.cpp (5)
include/sw/universal/number/bfloat/math/functions/pow.hpp (6)
  • pow (12-15)
  • pow (12-12)
  • pow (17-20)
  • pow (17-17)
  • pow (22-25)
  • pow (22-22)
include/sw/universal/number/bfloat/bfloat16_impl.hpp (21)
  • to_binary (467-490)
  • to_binary (467-467)
  • sqrt (24-24)
  • i (251-259)
  • i (251-251)
  • bfloat16 (82-82)
  • bfloat16 (84-84)
  • bfloat16 (84-84)
  • bfloat16 (85-85)
  • bfloat16 (85-85)
  • bfloat16 (91-123)
  • bfloat16 (91-91)
  • bfloat16 (126-126)
  • bfloat16 (126-126)
  • bfloat16 (127-127)
  • bfloat16 (127-127)
  • bfloat16 (128-128)
  • bfloat16 (128-128)
  • bfloat16 (129-129)
  • bfloat16 (129-129)
  • bfloat16 (130-130)
include/sw/universal/verification/test_suite_mathlib.hpp (8)
  • VerifyTgamma (1215-1244)
  • VerifyTgamma (1215-1215)
  • VerifySqrt (42-71)
  • VerifySqrt (42-42)
  • VerifyPow (369-416)
  • VerifyPow (369-369)
  • VerifyHypot (829-865)
  • VerifyHypot (829-829)
include/sw/universal/number/bfloat/mathlib.hpp (3)
  • ipow (30-46)
  • ipow (30-30)
  • c (52-52)
include/sw/universal/number/bfloat/math/functions/classify.hpp (2)
  • fpclassify (12-14)
  • fpclassify (12-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Run for windows-latest
  • GitHub Check: Run for macos-latest
  • GitHub Check: Run for ubuntu-latest
  • GitHub Check: Run for macos-latest
  • GitHub Check: Run for windows-latest
  • GitHub Check: Run for ubuntu-latest

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
include/sw/universal/number/bfloat16/math/functions/next.hpp (1)

27-46: Align nextafter with IEEE/STD semantics: propagate NaN and simplify direction logic

Right now, target.isnan() drives a special-case that nudges x by one ULP based on the sign of x. That diverges from std::nextafter semantics, where if either argument is NaN, the result is NaN. Also, you can collapse the direction logic without the inner NaN branch. Suggest:

  • Propagate NaN if either operand is NaN.
  • Keep the single-step ULP move toward target.
-inline bfloat16 nextafter(bfloat16 x, bfloat16 target) {
-	if (x == target) return target;
-	if (target.isnan()) {
-		if (x.isneg()) {
-			--x;
-		}
-		else {
-			++x;
-		}
-	}
-	else {
-		if (x > target) {
-			--x;
-		}
-		else {
-			++x;
-		}
-	}
-	return x;
-}
+inline bfloat16 nextafter(bfloat16 x, bfloat16 target) {
+	// std::nextafter semantics: if either arg is NaN, result is NaN
+	if (x.isnan() || target.isnan()) {
+		return std::numeric_limits<bfloat16>::quiet_NaN();
+	}
+	if (x == target) return target;
+	if (x > target) {
+		--x;
+	} else {
+		++x;
+	}
+	return x;
+}

Note: If you intend a different policy than std (e.g., "nudge when target is NaN"), please document it explicitly in the header and tests. Otherwise, matching std behavior reduces surprise for users.

include/sw/universal/number/bfloat16/manipulators.hpp (1)

99-135: Nibble marker in exponent never prints due to incorrect condition.

In color_print(), the exponent loop uses (i - es) > 0, which is always false for i in [es-1..0]. Result: no nibble separators in exponent. Replace with the same pattern used for the fraction loop.

Apply this diff:

-        for (int i = int(es) - 1; i >= 0; --i) {
-            s << cyan << (e.test(static_cast<size_t>(i)) ? '1' : '0');
-            if ((i - es) > 0 && ((i - es) % 4) == 0 && nibbleMarker) s << yellow << '\'';
-        }
+        for (int i = int(es) - 1; i >= 0; --i) {
+            s << cyan << (e.test(static_cast<size_t>(i)) ? '1' : '0');
+            if (i > 0 && (i % 4) == 0 && nibbleMarker) s << yellow << '\'';
+        }
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (2)

444-448: parse() is a stub that always fails; operator>> is effectively broken

parse() returns false unconditionally, so streaming into bfloat16 will always fail and print an error. This is a regression if input streaming previously worked.

Apply this minimal functional implementation that accepts decimal inputs via float and preserves the existing binary bitstring path for later:

-inline bool parse(const std::string& number, bfloat16& value) {
-	bool bSuccess = false;
-	value.zero();
-	return bSuccess;
-}
+inline bool parse(const std::string& number, bfloat16& value) {
+    // Accept "0b{sign}.{exp}.{frac}" bitstrings (existing assign) or decimal floats
+    if (number.size() > 2 && number[0] == '0' && number[1] == 'b') {
+        value.assign(number);
+        return true;
+    }
+    try {
+        // Use float parse to keep behavior aligned with other wrappers
+        float f = std::stof(number);
+        value = bfloat16(f);
+        return true;
+    } catch (...) {
+        value.zero();
+        return false;
+    }
+}

471-472: to_binary() prefix typo

For the zero sign case, the prefix is "0x0." instead of "0b0.".

Apply this diff:

-    s << (bits & mask ? "0b1." : "0x0.");
+    s << (bits & mask ? "0b1." : "0b0.");
♻️ Duplicate comments (1)
applications/mixed-precision/dnn/conv2d-1.cpp (1)

46-57: Fix BenchmarkResult field name: use memory_footprint (not memory_footprint_bytes)

This currently won’t compile against conv2d-1.hpp’s BenchmarkResult, which exposes memory_footprint. This was already flagged earlier; repeating here for the current hunk.

Apply:

-    std::cout << "FP32 Config - GFLOPS: " << fp32_result.gflops
-        << ", Memory: " << fp32_result.memory_footprint_bytes << " bytes\n";
+    std::cout << "FP32 Config - GFLOPS: " << fp32_result.gflops
+        << ", Memory: " << fp32_result.memory_footprint << " bytes\n";

-    std::cout << "Bfloat16 Config - GFLOPS: " << bfloat_result.gflops 
-              << ", Memory: " << bfloat_result.memory_footprint_bytes << " bytes\n";
+    std::cout << "Bfloat16 Config - GFLOPS: " << bfloat_result.gflops 
+              << ", Memory: " << bfloat_result.memory_footprint << " bytes\n";

-    std::cout << "Posit8 Config - GFLOPS: " << posit_result.gflops
-        << ", Memory: " << posit_result.memory_footprint_bytes << " bytes\n";
+    std::cout << "Posit8 Config - GFLOPS: " << posit_result.gflops
+        << ", Memory: " << posit_result.memory_footprint << " bytes\n";

You can verify other occurrences with:

#!/bin/bash
rg -nP '\.memory_footprint_bytes\b' -C2
🧹 Nitpick comments (37)
include/sw/universal/number/bfloat16/bfloat16_fwd.hpp (1)

11-13: Avoid include-order hazards by ensuring is included before this fwd header.

This forward-decl references std::string but the header doesn’t include <string>. If any TU includes this header before <string>, it will fail. If keeping fwd headers ultra-lean is intentional, at least guarantee that the aggregator always includes <string> first; otherwise, include it here.

If you prefer to harden this header, apply:

 #pragma once
 // bfloat16_fwd.hpp: forward definitions of the Google Brain Float number system
@@
-// forward references
+// forward references
+// Requires std::string for the parse signature
+#include <string>
 class bfloat16;
 inline bool parse(const std::string& number, bfloat16& v);
include/sw/universal/number/bfloat16/math/functions/pow.hpp (3)

12-25: Integer exponent overload should use exponentiation-by-squaring instead of std::pow for accuracy/perf.

std::pow(float, float) for integral exponents is slower and can introduce extra rounding. Implement fast integer power and keep the float/bfloat conversions localized.

Apply only to the (bfloat16, int) overload:

-inline bfloat16 pow(bfloat16 x, int y) {
-	using std::pow;
-	return bfloat16(std::pow(float(x), float(y)));
-}
+inline bfloat16 pow(bfloat16 x, int y) {
+	// exponentiation by squaring in float domain, then round to bfloat16
+	if (y == 0) return bfloat16(1.0f);
+	bool neg = y < 0;
+	unsigned int n = neg ? static_cast<unsigned int>(-int64_t(y)) : static_cast<unsigned int>(y);
+	float base = float(x);
+	float result = 1.0f;
+	while (n) {
+		if (n & 1u) result *= base;
+		base *= base;
+		n >>= 1;
+	}
+	return neg ? bfloat16(1.0f / result) : bfloat16(result);
+}

12-15: Minor: using std::pow; inside the body is redundant.

You’re explicitly qualifying std::pow already. Drop the using lines to reduce noise.

-inline bfloat16 pow(bfloat16 x, bfloat16 y) {
-	using std::pow;
-	return bfloat16(std::pow(float(x), float(y)));
-}
+inline bfloat16 pow(bfloat16 x, bfloat16 y) {
+	return bfloat16(std::pow(float(x), float(y)));
+}
@@
-inline bfloat16 pow(bfloat16 x, float y) {
-	using std::pow;
-	return bfloat16(std::pow(float(x), y));
-}
+inline bfloat16 pow(bfloat16 x, float y) {
+	return bfloat16(std::pow(float(x), y));
+}

Also applies to: 17-20, 22-25


8-8: Align with the mathlib aggregator’s include policy.

Per the project’s bf16 mathlib design, individual function headers typically do not include <cmath>. If the aggregator guarantees <cmath>, consider removing it here to be consistent; otherwise, keep it.

You can remove it if confirmed safe:

-#include <cmath>
include/sw/universal/number/bfloat16/math/functions/truncate.hpp (1)

31-89: Remove or harden the disabled #ifdef NOW block to avoid accidental breakage.

The block contains non-compilable text and risky macros (__HI/__LO). If NOW is ever defined (even by a build flag), this header will fail hard. Replace with #if 0 and convert the prose to comments, or move it to docs.

Apply:

-#ifdef NOW
-by design, non compilable text;
-this is the libc algorithm for floor(double)
+#if 0  // Reference: libc algorithm for floor(double) (kept for historical context)
+// Non-compilable reference text preserved intentionally.
@@
-#endif //NOW
+#endif // 0

Alternatively, move the entire reference into a developer doc or a unit test comment.

static/bfloat16/api/traits.cpp (1)

57-58: duble alias verified

The identifier duble is indeed a custom type alias (defined in include/sw/universal/number/cfloat/cfloat.hpp at lines 94–95 via

using duble = cfloat<64,11,uint32_t,true,false,false>;
using fp64  = duble;

), so the calls

std::cout << minmax_range< duble >() << '\n';
std::cout << symmetry_range< duble >() << '\n';

are correct and will compile as intended.

Optional readability improvement:

  • You may prefer the fp64 alias (also defined in the same header) for consistency with other code that uses “fpXX” naming conventions.
  • Alternatively, add a brief inline comment in static/bfloat16/api/traits.cpp clarifying that duble is a cfloat-based 64-bit floating‐point type.
static/bfloat16/api/attributes.cpp (1)

45-61: Optional: exercise NumericalLimits for bf16 to increase coverage.

You’ve got commented-out invocations of NumericalLimits<bfloat_t>() and NumericalLimits<bfloat16>(). Enabling at least the bfloat16 one would increase coverage for the newly wired bf16 path.

-//		NumericalLimits<bfloat16>();
+		NumericalLimits<bfloat16>();

I can add assertions to validate epsilon, minpos, and maxpos against known constants if helpful.

tools/cmd/bfloat.cpp (1)

33-33: Align CLI messaging and input parsing with bfloat16

Two small polish items:

  • The help/usage text still says “bfloat”, which can confuse users now that this is bfloat16.
  • Prefer std::strtof/std::stof over atof for better error handling and locale safety.

Apply these diffs:

-	using Scalar = bfloat16;
+	using Scalar = bfloat16;

Additional edits outside the changed lines to align wording and parsing:

-		std::cerr << "bfloat : components of a Google Brain floating-point: 16 bits with 8 exponent bits\n";
-		std::cerr << "Show the sign/scale/fraction components of a Google Brain floating-point.\n";
-		std::cerr << "Usage: bfloat value\n";
-		std::cerr << "Example: bfloat 0.03124999\n";
+		std::cerr << "bfloat16 : components of a Google Brain floating-point (16 bits, 8 exponent bits)\n";
+		std::cerr << "Show the sign/scale/fraction components of a bfloat16 value.\n";
+		std::cerr << "Usage: bfloat16 value\n";
+		std::cerr << "Example: bfloat16 0.03124999\n";
-	Scalar bf = Scalar(atof(argv[1]));
+	char* endp = nullptr;
+	float v = std::strtof(argv[1], &endp);
+	if (endp == argv[1]) {
+		std::cerr << "Could not parse input as a floating-point number: '" << argv[1] << "'\n";
+		return EXIT_FAILURE;
+	}
+	Scalar bf = Scalar(v);
static/appenv/multifile/main.cpp (1)

47-48: Type alias addition is fine; ensure headers cover std::vector for downstream prototype

Adding Bfloat16 is good. Since new prototypes use std::vector below, verify this TU includes explicitly; it’s not present among current includes.

Minimal fix near the other standard headers:

 #include <string>
 #include <cmath>
 #include <limits>
+#include <vector>
include/sw/universal/number/bfloat16/math/functions/sqrt.hpp (2)

8-8: Follow the mathlib pattern: avoid per-header includes

Per the library’s mathlib architecture, individual function headers shouldn’t pull in ; dependencies are provided by the mathlib aggregator. Drop this include to keep headers lean and consistent.

-#include <cmath>

12-15: Tidy sqrt wrapper in sqrt.hpp
The <cmath> header is already pulled in by include/sw/universal/number/bfloat16/mathlib.hpp (line 8), so an unqualified call to sqrt will resolve to std::sqrt. Mark the function noexcept and rely on ADL:

• File: include/sw/universal/number/bfloat16/math/functions/sqrt.hpp

-inline bfloat16 sqrt(bfloat16 x) {
-    using std::sqrt;
-    return bfloat16(std::sqrt(float(x)));
+inline bfloat16 sqrt(bfloat16 x) noexcept {
+    using std::sqrt;
+    return bfloat16(sqrt(float(x)));
 }
include/sw/universal/number/bfloat16/math/functions/next.hpp (1)

48-71: Fix TODO nexttoward prototype comments and invalid double API usage

The commented nexttoward sketch uses non-existent double methods (isnan(), isneg()). If/when you implement, rely on <cmath> helpers and reuse the ULP-step logic:

-/* TODO
-bfloat16 nexttoward(bfloat16 x, double target) {
-	double _x(x);
-	if (_x == target) return x;
-	if (target.isnan()) {
-		if (_x.isneg()) {
-			--_x;
-		}
-		else {
-			++_x;
-		}
-	}
-	else {
-		if (_x > target) {
-			--_x;
-		}
-		else {
-			++_x;
-		}
-	}
-	x = _x;
-	return x;
-}
-*/
+/* TODO
+inline bfloat16 nexttoward(bfloat16 x, double target) {
+	// Propagate NaN if either operand is NaN
+	const double dx = static_cast<double>(x);
+	if (std::isnan(dx) || std::isnan(target)) {
+		return std::numeric_limits<bfloat16>::quiet_NaN();
+	}
+	if (dx == target) return x;
+	// Move one bfloat16 ULP toward target
+	if (dx > target) {
+		--x;
+	} else {
+		++x;
+	}
+	return x;
+}
+*/

If you want, I can wire this up behind a feature macro and add unit tests mirroring std::nexttoward edge cases (NaNs, zeros, infinities).

include/sw/universal/number/bfloat16/math/functions/error_and_gamma.hpp (1)

2-2: Nit: header title comment mismatch with filename

The file is error_and_gamma.hpp but the title line says error_gamma.hpp. Align the comment for future greps.

-// error_gamma.hpp: error and gamma functions for Google Brain floats
+// error_and_gamma.hpp: error and gamma functions for Google Brain floats
static/bfloat16/math/pow.cpp (1)

51-51: Nit: qualify std symbols in manual-testing branch

cout/endl are unqualified; this will fail if MANUAL_TESTING is enabled.

-	cout << endl;
+	std::cout << std::endl;
include/sw/universal/number/bfloat16/manipulators.hpp (2)

36-42: Comment mismatch: not “binary string”; it’s hex.

to_hex() emits hex digits (with optional 0x). Update the comment to avoid confusion.

-   // generate a binary string for bfloat16
+   // generate a hexadecimal string for bfloat16

64-96: Exponent printing: subnormal/special handling and a couple typos.

  • For exponent==0 (subnormal), scale should be -126 (1-bias), not -127.
  • For exponent==0xFF (specials), printing a numeric “scale” is misleading; consider flag-only.
  • Typo/encoding: “¿126”, “¿127” should be “-126”, “-127”.

Apply this minimal fix:

-        if (exponent == 0u) {
-            s << "exp=0,";
-        }
-        else if (exponent == 0xFFu) {
-            s << "exp=1, ";
-        }
-        int scale = int(exponent) - 127;
-        s << scale << ",0b";
+        int scale = 0;
+        if (exponent == 0u) {
+            s << "exp=0,";
+            scale = -126; // subnormal effective exponent
+            s << scale << ",0b";
+        } else if (exponent == 0xFFu) {
+            s << "exp=1, ";
+            // specials: skip scale print; fraction bits will still be shown
+            s << "0b";
+        } else {
+            scale = int(exponent) - 127;
+            s << scale << ",0b";
+        }

And fix the comment text:

- // Exponents range from ¿126 to +127 because exponents of ¿127 (all 0s) and +128 (all 1s) are reserved for special numbers.
+ // Exponents range from -126 to +127 because exponent fields of 0 (all 0s) and 255 (all 1s) are reserved for subnormals/specials.
include/sw/universal/number/bfloat16/math/functions/fractional.hpp (1)

16-20: Comment for remainder() is incorrect.

std::remainder uses the nearest-integer quotient (ties to even), not truncation. The code is correct; the comment is not.

-// remainder returns x - n*y, where n = trunc(x/y)
+// remainder returns x - n*y, where n = round(x/y) to nearest integer (ties to even)
include/sw/universal/number/bfloat16/mathlib.hpp (1)

50-55: copysign() is correct; consider tests for signed zero.

Behavior matches std::copysign including -0 handling via negation. Add unit tests for ±0 and NaN payloads to lock this in.

include/sw/universal/number/bfloat16/math/functions/hypot.hpp (2)

9-40: Documentation typos: “+8/±8” should be +∞/±∞; minor spelling.

The long comment block appears to have “∞” mis-encoded as “8” and “suport” -> “support”. Consider tightening the comment or linking to a reference.

- if one of the arguments is ±8, hypot returns +8 even if the other argument is NaN
+ if one of the arguments is ±∞, hypot returns +∞ even if the other argument is NaN
...
- hypot(INFINITY, NAN) returns +8, but sqrt(INFINITY*INFINITY+NAN*NAN) returns NaN.
+ hypot(INFINITY, NAN) returns +∞, but sqrt(INFINITY*INFINITY+NAN*NAN) returns NaN.
...
- // GCC doesn't suport hypotl, so we use the double version
+ // GCC doesn't support hypotl, so we use the double version

44-65: Wrappers look fine; consider API consistency.

Providing hypot/hypotf/hypotl for bf16 is fine, but other bf16 wrappers (e.g., logarithms) don’t expose *f/*l variants. Aligning naming (either only hypot(), or consistently providing f/l forms across the board) would improve API predictability.

include/sw/universal/number/bfloat16/math/functions/exponent.hpp (1)

12-29: Avoid double-rounding: use float intermediates for bfloat16 to match the rest of the math wrappers

Other bfloat16 wrappers (e.g., hyperbolic, pow.hpp) compute via float to avoid double-rounding and improve reproducibility across compilers. These functions currently promote to double. Switch to float and float literals for consistency and to minimize 1 ULP mismatches.

Apply:

 inline bfloat16 exp(bfloat16 x) {
-	return bfloat16(std::exp(double(x)));
+	return bfloat16(std::exp(float(x)));
 }
 
 // Base-2 exponential function
 inline bfloat16 exp2(bfloat16 x) {
-	return bfloat16(std::exp2(double(x)));
+	return bfloat16(std::exp2(float(x)));
 }
 
 // Base-10 exponential function
 inline bfloat16 exp10(bfloat16 x) {
-	return bfloat16(std::pow(10.0, double(x)));
+	return bfloat16(std::pow(10.0f, float(x)));
 }
 		
 // Base-e exponential function exp(x)-1
 inline bfloat16 expm1(bfloat16 x) {
-	return bfloat16(std::expm1(double(x)));
+	return bfloat16(std::expm1(float(x)));
 }

If any unit tests were disabled for exp10 due to sensitivity, please re-run them after this change. I can help adjust the harness if needed.

include/sw/universal/number/bfloat16/math/functions/hyperbolic.hpp (2)

11-13: Radian/angle commentary is misleading for hyperbolic functions

Hyperbolic functions operate on real values; they are not “angles in radians.” Consider removing or rewording to avoid confusion.

Apply:

-// value representing an angle expressed in radians
-// One radian is equivalent to 180/PI degrees
+// Hyperbolic functions operate on real-valued arguments.

14-42: Fix incorrect doc comments: atanh/acosh/asinh are inverse hyperbolic functions, not cotangent/cosecant/secant

The code is correct; only the comments are wrong and could mislead readers.

Apply:

-// hyperbolic sine of an angle of x radians
+// hyperbolic sine of x

-// hyperbolic cosine of an angle of x radians
+// hyperbolic cosine of x

-// hyperbolic tangent of an angle of x radians
+// hyperbolic tangent of x

-// hyperbolic cotangent of an angle of x radians
+// inverse hyperbolic tangent of x

-// hyperbolic cosecant of an angle of x radians
+// inverse hyperbolic cosine of x

-// hyperbolic secant of an angle of x radians
+// inverse hyperbolic sine of x
applications/mixed-precision/dnn/conv2d-1.cpp (2)

7-15: Add missing iostream include (and keep includes self-sufficient)

Transitive inclusion may mask it, but is directly required for std::cout.

Apply:

 #include <chrono>
+#include <iostream>
 
 #include <universal/native/ieee754.hpp>

21-26: Remove unused aliases/config to reduce warnings and keep the example lean

Posit80, CFloat16, CFloat32, FixedPoint16, and HybridConfig aren’t used here.

Apply:

-    using Posit80 = posit<8, 0>;
     using Posit82 = posit<8, 2>;
-    using CFloat16 = cfloat<16, 5>;
-    using CFloat32 = cfloat<32, 8>;
-    using FixedPoint16 = fixpnt<16, 8>;
@@
-    using CFloatMixedConfig = MixedPrecisionConfig<CFloat16, CFloat16, CFloat32, CFloat16>;
-    using HybridConfig = MixedPrecisionConfig<half, half, float, half>;
+    // Additional configs can be added here as needed (e.g., CFloat/Hybrid).

Also applies to: 34-35

include/sw/universal/number/bfloat16/math/functions/classify.hpp (1)

21-24: Typo in comment: “cfloative” → “positive”

Doc-only fix.

Apply:

-// Determines if the given floating point number arg is a cfloative or negative infinity.
+// Determines if the given floating point number arg is a positive or negative infinity.
static/bfloat16/math/mathlib.cpp (3)

27-31: Optional: include and explicitly for i/o manipulators

These headers are used directly (std::cout, std::setw, std::setprecision); include them to avoid reliance on transitive includes.

Apply:

 #include <universal/verification/test_suite_mathlib.hpp>
+
+#include <iostream>
+#include <iomanip>

169-171: Fix diagnostic message in ipow loop (shows 2.0f while testing 71.0f)

Minor correctness/readability nit.

Apply:

-				if (true) std::cerr << "bfloat16 ipow(2.0f, " << i << ") " << to_binary(b) << " != " << to_binary(ref) << '\n';
+				if (true) std::cerr << "bfloat16 ipow(" << float(a) << ", " << i << ") " << to_binary(b) << " != " << to_binary(ref) << '\n';

230-238: Stray quote in comments

There are trailing quote characters in two comment headers; harmless but noisy—clean them up.

Apply:

-	// bfloat16 fractional function validation\n";
+	// bfloat16 fractional function validation
@@
-	// bfloat16 error function validation\n";
+	// bfloat16 error function validation
include/sw/universal/number/bfloat16/math/functions/trigonometry.hpp (3)

2-2: Typo in file header comment

The header comment says "trigonometric.hpp" but the file name is "trigonometry.hpp". Please align to avoid confusion in search/refs.


51-54: Use bfloat16 constant and consistent typing in cot()

cot(x) currently computes tan(f_pi_2 - float(x)), mixing float-constant naming and raw float subtraction. Prefer using the bfloat16 constant you introduced and only cast once at the std call boundary.

Apply this diff:

-inline bfloat16 cot(bfloat16 x) {
-	return bfloat16(std::tan(f_pi_2 - float(x)));
-}
+inline bfloat16 cot(bfloat16 x) {
+	// tan(pi/2 - x) == cot(x); cast to float only for the std:: call
+	return bfloat16(std::tan(float(bf_pi_2 - x)));
+}

56-64: Consider domain checks / NaN propagation near singularities

sec(x) and csc(x) blow up when cos(x) or sin(x) ≈ 0. If practical, consider short-circuiting to Inf/NaN when |cos| or |sin| underflows in float to reduce spurious finite bfloat16 results after rounding. If you keep the current light wrappers, add a brief note in the header that these functions rely on float-domain behavior.

include/sw/universal/number/bfloat16/bfloat16_impl.hpp (3)

22-25: Forward declarations without guaranteed includes

You forward-declare abs/sqrt/floor(bfloat16) to use them later (e.g., isinteger relies on floor). Please ensure the aggregator that includes this header also includes the corresponding bfloat16 math headers defining these functions for all TU’s; otherwise, ODR/link errors may surface when this header is used standalone.

I can scan the repo to verify include order in the aggregator(s) if you want.


358-358: isinteger() runtime dependency on floor()

isinteger() now relies on floor(bfloat16). Given isinteger() is frequently called in classification paths, consider a bit-level check to avoid conversion + external call, or at least document the dependency since this moved from constexpr to runtime.

I can propose an inlined bit-check if you want to keep this hot.


159-159: Char conversion parity

Switching to (char)(float(*this)) is fine. Note that char signedness is implementation-defined; if API consumers relied on a particular sign, consider documenting or providing explicit signed char/unsigned char conversions (already present).

include/sw/universal/number/bfloat16/math/constants/constants.hpp (2)

1-1: UTF-8 BOM detected

There’s a BOM before #pragma once. It’s harmless for most compilers but can cause tooling diffs and some build hiccups. Recommend removing.


9-37: Missing dependency include (contextual)

This header defines bfloat16 objects but does not include a bfloat16 declaration. If this header can be included directly (outside the mathlib aggregator), it will not compile. If the design mandates inclusion only via the aggregator, please add a comment stating that requirement at the top.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9ecd62c and 0391a0a.

📒 Files selected for processing (34)
  • .github/workflows/cmake.yml (2 hunks)
  • CMakeLists.txt (2 hunks)
  • applications/mixed-precision/dnn/conv2d-1.cpp (1 hunks)
  • applications/mixed-precision/dnn/conv2d-2.hpp (1 hunks)
  • benchmark/performance/arithmetic/bfloat16/performance.cpp (1 hunks)
  • include/sw/universal/number/bfloat16/bfloat16.hpp (2 hunks)
  • include/sw/universal/number/bfloat16/bfloat16_fwd.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/bfloat16_impl.hpp (6 hunks)
  • include/sw/universal/number/bfloat16/manipulators.hpp (2 hunks)
  • include/sw/universal/number/bfloat16/math/constants/constants.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/classify.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/error_and_gamma.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/exponent.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/fractional.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/hyperbolic.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/hypot.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/logarithm.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/minmax.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/next.hpp (2 hunks)
  • include/sw/universal/number/bfloat16/math/functions/pow.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/sqrt.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/trigonometry.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/truncate.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/mathlib.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/numeric_limits.hpp (1 hunks)
  • static/appenv/multifile/bfloat16.cpp (1 hunks)
  • static/appenv/multifile/main.cpp (4 hunks)
  • static/bfloat16/api/api.cpp (1 hunks)
  • static/bfloat16/api/attributes.cpp (1 hunks)
  • static/bfloat16/api/traits.cpp (1 hunks)
  • static/bfloat16/arithmetic/arithmetic.cpp (1 hunks)
  • static/bfloat16/math/mathlib.cpp (1 hunks)
  • static/bfloat16/math/pow.cpp (2 hunks)
  • tools/cmd/bfloat.cpp (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • static/bfloat16/arithmetic/arithmetic.cpp
  • include/sw/universal/number/bfloat16/numeric_limits.hpp
🚧 Files skipped from review as they are similar to previous changes (4)
  • CMakeLists.txt
  • applications/mixed-precision/dnn/conv2d-2.hpp
  • benchmark/performance/arithmetic/bfloat16/performance.cpp
  • .github/workflows/cmake.yml
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-22T18:07:22.707Z
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/number/bfloat/math/functions/truncate.hpp:1-9
Timestamp: 2025-08-22T18:07:22.707Z
Learning: In the Universal Numbers Library bfloat16 mathlib, individual function headers under include/sw/universal/number/bfloat/math/functions/ (like truncate.hpp) are designed to be included only through the main mathlib aggregator header, not standalone. Dependencies like <cmath> and bfloat16 headers are intentionally satisfied at the mathlib level rather than in individual function headers. This is an architectural design choice to maintain a clean modular structure.

Applied to files:

  • include/sw/universal/number/bfloat16/math/functions/sqrt.hpp
  • include/sw/universal/number/bfloat16/math/functions/minmax.hpp
  • static/bfloat16/api/attributes.cpp
  • include/sw/universal/number/bfloat16/math/functions/trigonometry.hpp
  • include/sw/universal/number/bfloat16/bfloat16.hpp
  • static/bfloat16/api/api.cpp
  • static/appenv/multifile/main.cpp
  • include/sw/universal/number/bfloat16/bfloat16_fwd.hpp
  • include/sw/universal/number/bfloat16/math/functions/exponent.hpp
  • include/sw/universal/number/bfloat16/manipulators.hpp
  • include/sw/universal/number/bfloat16/math/constants/constants.hpp
  • static/bfloat16/api/traits.cpp
  • include/sw/universal/number/bfloat16/mathlib.hpp
  • include/sw/universal/number/bfloat16/math/functions/logarithm.hpp
  • include/sw/universal/number/bfloat16/math/functions/error_and_gamma.hpp
  • include/sw/universal/number/bfloat16/math/functions/pow.hpp
  • tools/cmd/bfloat.cpp
  • include/sw/universal/number/bfloat16/math/functions/hyperbolic.hpp
  • include/sw/universal/number/bfloat16/math/functions/classify.hpp
  • include/sw/universal/number/bfloat16/math/functions/fractional.hpp
  • include/sw/universal/number/bfloat16/math/functions/hypot.hpp
  • include/sw/universal/number/bfloat16/bfloat16_impl.hpp
  • include/sw/universal/number/bfloat16/math/functions/truncate.hpp
  • static/bfloat16/math/mathlib.cpp
  • static/bfloat16/math/pow.cpp
  • include/sw/universal/number/bfloat16/math/functions/next.hpp
📚 Learning: 2025-08-22T18:30:36.656Z
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/number/bfloat/math/functions/trigonometry.hpp:8-9
Timestamp: 2025-08-22T18:30:36.656Z
Learning: In the Universal Numbers Library, when reviewing individual math function headers under include/sw/universal/number/bfloat/math/functions/, relative include paths like <math/constants/float_constants.hpp> are correct and individual headers should not include <cmath> or the main type header as these dependencies are satisfied at the mathlib aggregator level. This is an intentional architectural design choice.

Applied to files:

  • include/sw/universal/number/bfloat16/math/functions/sqrt.hpp
  • include/sw/universal/number/bfloat16/math/functions/minmax.hpp
  • static/bfloat16/api/attributes.cpp
  • include/sw/universal/number/bfloat16/math/functions/trigonometry.hpp
  • include/sw/universal/number/bfloat16/bfloat16.hpp
  • static/bfloat16/api/api.cpp
  • static/appenv/multifile/main.cpp
  • include/sw/universal/number/bfloat16/math/functions/exponent.hpp
  • include/sw/universal/number/bfloat16/manipulators.hpp
  • include/sw/universal/number/bfloat16/math/constants/constants.hpp
  • static/bfloat16/api/traits.cpp
  • include/sw/universal/number/bfloat16/mathlib.hpp
  • include/sw/universal/number/bfloat16/math/functions/logarithm.hpp
  • include/sw/universal/number/bfloat16/math/functions/error_and_gamma.hpp
  • tools/cmd/bfloat.cpp
  • include/sw/universal/number/bfloat16/math/functions/hyperbolic.hpp
  • include/sw/universal/number/bfloat16/math/functions/classify.hpp
  • include/sw/universal/number/bfloat16/math/functions/fractional.hpp
  • include/sw/universal/number/bfloat16/math/functions/hypot.hpp
  • include/sw/universal/number/bfloat16/bfloat16_impl.hpp
  • include/sw/universal/number/bfloat16/math/functions/truncate.hpp
  • static/bfloat16/math/mathlib.cpp
  • static/bfloat16/math/pow.cpp
📚 Learning: 2025-08-22T18:06:33.627Z
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/verification/test_suite_mathlib.hpp:44-45
Timestamp: 2025-08-22T18:06:33.627Z
Learning: The test suite in include/sw/universal/verification/test_suite_mathlib.hpp is designed for number systems with nbits <= 12, so the expression (1 << nbits) for NR_TEST_CASES will never cause integer overflow in practice.

Applied to files:

  • static/bfloat16/api/attributes.cpp
  • static/appenv/multifile/main.cpp
  • static/bfloat16/api/traits.cpp
  • tools/cmd/bfloat.cpp
  • static/bfloat16/math/mathlib.cpp
  • static/bfloat16/math/pow.cpp
🧬 Code graph analysis (9)
include/sw/universal/number/bfloat16/math/functions/sqrt.hpp (1)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (17)
  • sqrt (24-24)
  • bfloat16 (82-82)
  • bfloat16 (84-84)
  • bfloat16 (84-84)
  • bfloat16 (85-85)
  • bfloat16 (85-85)
  • bfloat16 (91-123)
  • bfloat16 (91-91)
  • bfloat16 (126-126)
  • bfloat16 (126-126)
  • bfloat16 (127-127)
  • bfloat16 (127-127)
  • bfloat16 (128-128)
  • bfloat16 (128-128)
  • bfloat16 (129-129)
  • bfloat16 (129-129)
  • bfloat16 (130-130)
include/sw/universal/number/bfloat16/math/functions/trigonometry.hpp (1)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (16)
  • bfloat16 (82-82)
  • bfloat16 (84-84)
  • bfloat16 (84-84)
  • bfloat16 (85-85)
  • bfloat16 (85-85)
  • bfloat16 (91-123)
  • bfloat16 (91-91)
  • bfloat16 (126-126)
  • bfloat16 (126-126)
  • bfloat16 (127-127)
  • bfloat16 (127-127)
  • bfloat16 (128-128)
  • bfloat16 (128-128)
  • bfloat16 (129-129)
  • bfloat16 (129-129)
  • bfloat16 (130-130)
static/appenv/multifile/main.cpp (1)
static/appenv/multifile/bfloat16.cpp (2)
  • bfloat16Polynomial (13-25)
  • bfloat16Polynomial (13-13)
include/sw/universal/number/bfloat16/math/functions/exponent.hpp (1)
include/sw/universal/number/bfloat16/math/functions/pow.hpp (6)
  • pow (12-15)
  • pow (12-12)
  • pow (17-20)
  • pow (17-17)
  • pow (22-25)
  • pow (22-22)
include/sw/universal/number/bfloat16/mathlib.hpp (2)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (16)
  • bfloat16 (82-82)
  • bfloat16 (84-84)
  • bfloat16 (84-84)
  • bfloat16 (85-85)
  • bfloat16 (85-85)
  • bfloat16 (91-123)
  • bfloat16 (91-91)
  • bfloat16 (126-126)
  • bfloat16 (126-126)
  • bfloat16 (127-127)
  • bfloat16 (127-127)
  • bfloat16 (128-128)
  • bfloat16 (128-128)
  • bfloat16 (129-129)
  • bfloat16 (129-129)
  • bfloat16 (130-130)
include/sw/universal/number/bfloat16/math/functions/exponent.hpp (2)
  • exp (12-14)
  • exp (12-12)
include/sw/universal/number/bfloat16/math/functions/fractional.hpp (1)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (19)
  • bfloat16 (82-82)
  • bfloat16 (84-84)
  • bfloat16 (84-84)
  • bfloat16 (85-85)
  • bfloat16 (85-85)
  • bfloat16 (91-123)
  • bfloat16 (91-91)
  • bfloat16 (126-126)
  • bfloat16 (126-126)
  • bfloat16 (127-127)
  • bfloat16 (127-127)
  • bfloat16 (128-128)
  • bfloat16 (128-128)
  • bfloat16 (129-129)
  • bfloat16 (129-129)
  • bfloat16 (130-130)
  • abs (23-23)
  • abs (436-438)
  • abs (436-436)
applications/mixed-precision/dnn/conv2d-1.cpp (1)
applications/mixed-precision/dnn/conv2d-1.hpp (2)
  • N (297-348)
  • N (297-300)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (2)
include/sw/universal/number/bfloat16/math/functions/truncate.hpp (5)
  • floor (22-24)
  • floor (22-22)
  • floor (33-33)
  • floor (41-88)
  • floor (41-41)
include/sw/universal/number/bfloat16/numeric_limits.hpp (1)
  • bf (16-20)
include/sw/universal/number/bfloat16/math/functions/truncate.hpp (1)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (20)
  • bfloat16 (82-82)
  • bfloat16 (84-84)
  • bfloat16 (84-84)
  • bfloat16 (85-85)
  • bfloat16 (85-85)
  • bfloat16 (91-123)
  • bfloat16 (91-91)
  • bfloat16 (126-126)
  • bfloat16 (126-126)
  • bfloat16 (127-127)
  • bfloat16 (127-127)
  • bfloat16 (128-128)
  • bfloat16 (128-128)
  • bfloat16 (129-129)
  • bfloat16 (129-129)
  • bfloat16 (130-130)
  • floor (25-25)
  • floor (358-358)
  • i (251-259)
  • i (251-251)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Run for ubuntu-latest
  • GitHub Check: Run for macos-latest
  • GitHub Check: Run for windows-latest
  • GitHub Check: Run for ubuntu-latest
  • GitHub Check: Run for macos-latest
  • GitHub Check: Run for windows-latest
🔇 Additional comments (23)
include/sw/universal/number/bfloat16/bfloat16_fwd.hpp (1)

13-13: Ensure the parse definition is marked inline and token-identical
I see the forward declaration in

  • include/sw/universal/number/bfloat16/bfloat16_fwd.hpp:13
    but no matching inline bool parse(const std::string& number, bfloat16& v) { … } definition was found. To avoid One Definition Rule (ODR) violations, please:

  • In include/sw/universal/number/bfloat16/bfloat16_impl.hpp, confirm there is a definition for

    inline bool parse(const std::string& number, bfloat16& v) {
        //
    }
  • Ensure that the signature and inline keyword are token-identical in every translation unit.

include/sw/universal/number/bfloat16/math/functions/truncate.hpp (3)

11-29: LGTM: concise wrappers with correct semantics.

The bf16 wrappers for trunc/round/floor/ceil match the std semantics and are properly inline. Nice and consistent with the rest of the math wrappers.


1-29: Confirmed <cmath> inclusion in bfloat16 mathlib aggregator

The header include/sw/universal/number/bfloat16/mathlib.hpp includes <cmath> at line 8 before importing truncate.hpp, satisfying the dependency for std::trunc/round/floor/ceil. No changes to the individual truncate.hpp file are required under the established mathlib inclusion pattern.


11-29: No duplicated floor(bfloat16) definitions; the declaration in bfloat16_impl.hpp is intentional

I’ve verified that in

  • bfloat16_impl.hpp (line 25) there is only a forward declaration
  • math/functions/truncate.hpp contains the inline definition of
    inline bfloat16 floor(bfloat16 x) { ... }

This matches the library’s design of declaring overloads in the core bfloat16_impl.hpp and defining them in the math‐functions headers (included via the mathlib aggregator). There is no ODR risk or duplicate definitions—please disregard the original duplication concern.

Likely an incorrect or invalid review comment.

static/bfloat16/api/traits.cpp (1)

8-8: Header include updated to bfloat16.hpp — good alignment with the bf16 migration.

This matches the broader shift to bf16 headers across the static tests.

static/bfloat16/api/attributes.cpp (1)

16-16: Header include updated to bfloat16.hpp — consistent with the rest of the suite.

Looks good and consistent with the bf16-focused tests.

static/bfloat16/api/api.cpp (1)

13-13: bfloat16 exception macro usage confirmed consistent
I’ve verified that:

  • There are no remaining includes of the old <universal/number/bfloat/bfloat.hpp> header.
  • Only BFLOAT_THROW_ARITHMETIC_EXCEPTION is defined and used (no BFLOAT16_THROW_ARITHMETIC_EXCEPTION variants), with definitions in
    • static/bfloat16/api/attributes.cpp (line 11)
    • static/bfloat16/api/api.cpp (line 12)
    • Default fallback in include/sw/universal/number/bfloat16/bfloat16.hpp (lines 36–38)

All checks pass—no further action required.

tools/cmd/bfloat.cpp (1)

8-8: LGTM: include migrated to bfloat16

Header swap to the bfloat16 path is consistent with the broader migration. No issues.

static/appenv/multifile/main.cpp (2)

18-18: LGTM: added include for bfloat16

Consistent with adding Bfloat16 to the appenv demo. No concerns.


75-76: Good: integrate bfloat16 into the evaluation printout

Wires the new polynomial into the demo output path. Once the implementation is fixed as above, output will be correct.

include/sw/universal/number/bfloat16/bfloat16.hpp (2)

47-62: Header path migration to bfloat16 looks consistent

Swapping the legacy bfloat includes for bfloat16 variants is coherent and aligns with the PR direction. Keeping traits separate is fine. No functional concerns here.


49-62: All legacy bfloat/ includes removed
I ran a comprehensive search for any remaining references to universal/number/bfloat/ (excluding the bfloat16 path) and found none. There are no straggler includes or usages of the old bfloat/ directory in the codebase. You’re clear to merge.

include/sw/universal/number/bfloat16/math/functions/error_and_gamma.hpp (1)

11-24: Wrappers are minimal and correct for bfloat16 via float backend

Casting through float is appropriate for bfloat16 (same exponent range, lower precision). This ensures you hit the std::erf(float), std::erfc(float), and std::tgamma(float) overloads. Looks good.

static/bfloat16/math/pow.cpp (3)

8-11: Header and harness updates are consistent with the bfloat16 switch

Including <universal/number/bfloat16/bfloat16.hpp> and switching to test_suite_mathlib.hpp aligns with the PR’s mathlib consolidation. No issues.


70-70: Guard block closure is fine

#endif // LATER matches the #ifdef LATER above and keeps the legacy integer-power demo out of builds. LGTM.


73-73: VerifyPow is defined and properly included – no action required

  • The template
    template<typename TestType, typename RefType = double> int VerifyPow(bool reportTestCases, unsigned int maxSamples = 100)
    is present in include/sw/universal/verification/test_suite_mathlib.hpp (around lines 368–374).
  • Both static/bfloat16/math/pow.cpp and static/bfloat16/math/mathlib.cpp include <universal/verification/test_suite_mathlib.hpp> and use using namespace sw::universal;, so instantiating VerifyPow<bfloat16> will link correctly.

No further changes needed.

include/sw/universal/number/bfloat16/manipulators.hpp (2)

11-11: Include now targets bfloat16_fwd.hpp — verify translation unit inclusion order.

These inline functions call member APIs (sign(), exponent(), fraction(), nibble()). A forward-declare header alone is insufficient unless this header is always included after the full bfloat16 definition via an aggregator. If any TU includes manipulators.hpp directly, you’ll get ODR/undefined member errors. Either:

  • Guarantee inclusion order via aggregator (document it), or
  • Switch to including bfloat16.hpp here.

Would you like me to scan the repo to verify no TU includes manipulators.hpp before bfloat16.hpp?


18-20: Inlining type_tag is fine and reduces ODR risk.

No behavior change; minimal compile-time win.

include/sw/universal/number/bfloat16/math/functions/fractional.hpp (1)

11-15: fmod wrapper looks good.

Using std::fmod via promotion to float/double and returning bfloat16 is consistent with the mathlib pattern.

include/sw/universal/number/bfloat16/math/functions/logarithm.hpp (1)

11-29: Log wrappers are consistent and correct.

Promotion to double matches the rest of the math wrappers and improves rounding when downcasting to bf16. No issues spotted.

include/sw/universal/number/bfloat16/mathlib.hpp (1)

9-22: Including at the aggregator level aligns with the library’s design.

Matches the documented architecture: individual function headers omit ; the aggregator provides it.

include/sw/universal/number/bfloat16/math/functions/trigonometry.hpp (1)

8-8: Include path likely incorrect for bfloat16 context; prefer bfloat16 constants header

This bfloat16 math header pulls in float constants via <math/constants/float_constants.hpp>. Given this file lives under bfloat16/ and you added a dedicated bfloat16 constants header, include that instead to avoid mixing float-based constants with bfloat16 API.

Apply this diff:

-#include <math/constants/float_constants.hpp>
+#include <universal/number/bfloat16/math/constants/constants.hpp>
⛔ Skipped due to learnings
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/number/bfloat/math/functions/trigonometry.hpp:8-9
Timestamp: 2025-08-22T18:30:36.656Z
Learning: In the Universal Numbers Library, when reviewing individual math function headers under include/sw/universal/number/bfloat/math/functions/, relative include paths like <math/constants/float_constants.hpp> are correct and individual headers should not include <cmath> or the main type header as these dependencies are satisfied at the mathlib aggregator level. This is an intentional architectural design choice.
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/number/bfloat/math/functions/truncate.hpp:1-9
Timestamp: 2025-08-22T18:07:22.707Z
Learning: In the Universal Numbers Library bfloat16 mathlib, individual function headers under include/sw/universal/number/bfloat/math/functions/ (like truncate.hpp) are designed to be included only through the main mathlib aggregator header, not standalone. Dependencies like <cmath> and bfloat16 headers are intentionally satisfied at the mathlib level rather than in individual function headers. This is an architectural design choice to maintain a clean modular structure.
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (1)

248-260: API tightening: setbits argument widened to unsigned; good, but audit callers

Changing setbits(unsigned short) to setbits(unsigned) with masking is safer. Please verify all external callers compile cleanly and no narrowing warnings creep in when passing wider ints.

I can generate a repo scan to list all setbits( call sites if helpful.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
include/sw/universal/number/dd/math/constants/dd_constants.hpp (1)

27-35: Fix misdefined dd_log2 and clarify dd_log10; resolves verification failures.

  • dd_log2 currently holds ln(2) with a misleading comment “log2(e)”. The test table expects log10(2) = 0.30102999566398…, and runtime output shows the mismatch.
  • dd_log10 is numerically ln(10) and duplicates dd_ln10; the comment “log10(e)” is wrong.

Apply this diff to align semantics and comments without breaking existing names:

-constexpr dd dd_log2     (6.931471805599452862e-01, 2.319046813846299558e-17); // log2(e)
-constexpr dd dd_log10    (2.302585092994045901e+00, -2.170756223382249351e-16);// log10(e)
+// log10(2)
+constexpr dd dd_log2     (3.010299956639812000e-01, -2.803728127785170000e-18); // log10(2)
+// ln(10) — deprecated alias of dd_ln10; keep for compatibility, prefer dd_ln10
+constexpr dd dd_log10    (2.302585092994045901e+00, -2.170756223382249351e-16);// ln(10) [alias]

Follow-up:

  • If you want a base-10 log of e constant, dd_loge already provides that (0.43429448…).
  • Consider deprecating dd_log10 in headers/comments to guide users toward dd_ln10/dd_loge/dd_lge.
🧹 Nitpick comments (6)
static/dd/api/constants.cpp (3)

282-285: Commented long double references: consider moving to header or removing.

These constexpr long double literals are handy references but risk drifting from the dd values. Either (a) move them into dd_constants.hpp as clearly marked documentation-only references or (b) remove to avoid confusion. Non-blocking.


315-338: Reduce noise; also highlights a dd_log2 mismatch upstream.

  • The pasted, machine-specific output block adds a lot of noise; consider relocating to docs or gating behind a verbose flag.
  • It also surfaces a real mismatch: for dd_log2, the “digits” are 0.3010299… (log10(2)) while the compiled constant prints ~0.693147… (ln(2)). That inconsistency stems from dd_constants.hpp; see my separate fix there.

If you want to retain this sample output, wrap it under an #if VERBOSE_SAMPLES block so CI logs stay clean.


166-179: Default MANUAL_TESTING to off.

Hardcoding MANUAL_TESTING 1 in the file risks entering the verbose/manual path in automated workflows.

Apply this small tweak so CMake controls it while keeping a safe default:

-#define MANUAL_TESTING 1
+#ifndef MANUAL_TESTING
+#define MANUAL_TESTING 0
+#endif
include/sw/universal/number/dd/math/constants/dd_constants.hpp (3)

18-21: dd_pi_3 equals pi/2; likely a naming/constant mix-up.

dd_pi_3 currently duplicates dd_pi_2 (value ≈ 1.5707963267… and comment “pi/2”). If the intent is pi/3 (≈ 1.0471975512…), correct the value (and provide an accurate dd pair), or drop dd_pi_3 to avoid confusion.

I can derive a consistent dd pair for pi/3 from MPFR/QD if you want to include it.


1-1: Remove UTF-8 BOM.

The file starts with a BOM character before #pragma once. Some toolchains/scripts choke on this; safer to save as UTF-8 without BOM.


49-50: Minor comment typo: unmatched parenthesis.

“2 / sqrt(pi) = 1 / (sqrt(pi/4)” is missing a closing parenthesis.

Apply:

-constexpr dd dd_2_sqrtpi (1.12837916709551257e+00, 1.5335459652770156e-17);  // 2 / sqrt(pi) = 1 / (sqrt(pi/4)
+constexpr dd dd_2_sqrtpi (1.12837916709551257e+00, 1.5335459652770156e-17);  // 2 / sqrt(pi) = 1 / (sqrt(pi/4))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0391a0a and 899375e.

📒 Files selected for processing (2)
  • include/sw/universal/number/dd/math/constants/dd_constants.hpp (2 hunks)
  • static/dd/api/constants.cpp (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Run for ubuntu-latest
  • GitHub Check: Run for windows-latest
  • GitHub Check: Run for macos-latest
  • GitHub Check: Run for ubuntu-latest
  • GitHub Check: Run for macos-latest
  • GitHub Check: Run for windows-latest
🔇 Additional comments (4)
static/dd/api/constants.cpp (2)

273-275: LGTM: Added sqrt(3) and sqrt(5) to the symbol table.

The additions for dd_sqrt3 and dd_sqrt5 look consistent with the new dd constants; good to see these verified alongside dd_sqrt2.


349-351: Manual mode currently ignores failures; confirm intent.

In MANUAL_TESTING path, the runner unconditionally returns EXIT_SUCCESS. That’s fine for local experiments but can mask regressions if MANUAL_TESTING accidentally leaks into CI.

Consider guarding with a runtime flag or add a conspicuous banner to reduce the chance of committing with MANUAL_TESTING enabled.

include/sw/universal/number/dd/math/constants/dd_constants.hpp (2)

38-40: LGTM: Correct double-double values for sqrt(3) and sqrt(5).

These pairs align with the added verifications. Nice catch correcting prior copy/paste of sqrt(2).


9-12: Include order for dd_constants.hpp is correct

A repo-wide search shows dd_constants.hpp is only ever included in dd.hpp (line 74), and there are no other direct includes of dd_constants.hpp. This guarantees that dd.hpp is always processed before the constants header, so no include-order or incomplete-type issues will arise. No changes required.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (2)

444-448: parse() is a stub that always fails — operator>> will never succeed.

This is a functional regression: operator>> relies on parse() and will emit an error for every input.

Apply this minimal, exception-based implementation to accept both the binary triplet form and decimal text:

-inline bool parse(const std::string& number, bfloat16& value) {
-  bool bSuccess = false;
-  value.zero();
-  return bSuccess;
-}
+inline bool parse(const std::string& number, bfloat16& value) {
+  value.zero();
+  // Accept binary triplet: 0bS.EEEEEEEE.FFFFFFF
+  if (number.size() > 2 && number[0] == '0' && number[1] == 'b') {
+    value.assign(number);
+    return true;
+  }
+  // Fallback: decimal/hex real via float
+  try {
+    float f = std::stof(number);
+    value = bfloat16(f);
+    return true;
+  } catch (...) {
+    return false;
+  }
+}

If you prefer no exceptions in headers, I can switch this to std::from_chars (C++17) with .


467-490: to_binary() prints the wrong prefix for negative sign bit path ("0x0." instead of "0b0.").

This typo breaks the documented binary format and will confuse downstream tooling/parsers.

Apply this one-liner fix:

-  s << (bits & mask ? "0b1." : "0x0.");
+  s << (bits & mask ? "0b1." : "0b0.");
♻️ Duplicate comments (3)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (2)

355-355: Fix for isone() is now correct and bit-precise.

Comparing to 0x3F80u ensures only +1.0 is recognized.


369-375: NaN classification now follows IEEE-754 MSB-of-fraction rule.

Detecting qNaN by the top fraction bit and sNaN otherwise is correct and addresses the prior over-restrictive logic.

static/bfloat16/math/mathlib.cpp (1)

14-17: SFINAE misuse: non-type template parameter of floating type is ill-formed

This template declares a non-type template parameter of type Ty (a floating type), which is not allowed and will not compile. Constrain via a dummy type parameter or a C++20 concept.

Apply:

-template<typename Ty,
-         typename std::enable_if<std::is_floating_point<Ty>::value, Ty>::type = 0
->
+template<typename Ty,
+         typename = std::enable_if_t<std::is_floating_point<Ty>::value>>

Optionally, if C++20 concepts are available:

- template<typename Ty, typename = std::enable_if_t<std::is_floating_point<Ty>::value>>
+ template<std::floating_point Ty>
🧹 Nitpick comments (9)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (2)

22-25: Forward-declaring abs/sqrt/floor here couples bfloat16 core to mathlib; verify link-time availability.

Given the library design (math functions provided via the mathlib aggregator), calling floor() from this header relies on the aggregator being linked in every TU that uses isinteger(). Please confirm that all targets which include this header also pull in the mathlib definitions, otherwise you’ll face undefined references.

If you want to remove this coupling while keeping behavior, consider guarding isinteger() with finite checks (see below) and keeping the floor() call, or provide a tiny local fallback floor using float inside this header as a last resort. Let me know if you want me to prototype that fallback.


159-159: Avoid platform-dependent char conversion; prefer explicit signed/unsigned char overloads.

operator char() is implementation-defined w.r.t. signedness and can surprise on platforms where char is signed. You already provide operator signed char(); consider removing operator char() or adding operator unsigned char() to make intent explicit.

Would you like me to add an explicit operator unsigned char() and deprecate operator char()?

include/sw/universal/verification/test_suite.hpp (1)

120-139: Make cast style consistent and avoid char signedness pitfalls in u8 conversion

The new explicit cast for u8 is clearer, but routing through char(a) can still be implementation-defined due to char signedness. The subsequent lines also mix C-style, functional, and width-variant casts (unsigned long vs fixed-width). Prefer direct fixed-width targets with static_cast to avoid platform-dependent narrowing (e.g., LP64 vs LLP64) and to keep style consistent.

Apply:

-			uint8_t u8 = uint8_t(char(a)); // conversion to uint8_t
+			uint8_t u8 = static_cast<uint8_t>(a); // conversion to uint8_t

-			uint16_t u16 = (unsigned short)(a); // conversion to uint16_t
+			uint16_t u16 = static_cast<uint16_t>(a); // conversion to uint16_t

-			uint32_t u32 = (unsigned long)(a); // conversion to uint32_t
+			uint32_t u32 = static_cast<uint32_t>(a); // conversion to uint32_t

-			uint64_t u64 = (unsigned long long)(a); // conversion to uint64_t
+			uint64_t u64 = static_cast<uint64_t>(a); // conversion to uint64_t

-			int8_t i8 = (signed char)(a); // conversion to int8_t
+			int8_t i8 = static_cast<int8_t>(a); // conversion to int8_t

-			int16_t i16 = short(a); // conversion to int16_t
+			int16_t i16 = static_cast<int16_t>(a); // conversion to int16_t

-			int32_t i32 = long(a); // conversion to int32_t
+			int32_t i32 = static_cast<int32_t>(a); // conversion to int32_t

-			int64_t i64 = (long long)(a); // conversion to int64_t
+			int64_t i64 = static_cast<int64_t>(a); // conversion to int64_t
static/bfloat16/api/constants.cpp (2)

53-66: Add at least one automated check for bf16 constants in non-manual path

The regression path currently performs no validations. Add a minimal check that bf_1_pi matches the bf16-rounded value of 1/pi to catch wiring or definition regressions.

Example insertion:

 #if REGRESSION_LEVEL_1
-
+{
+    int failures{0};
+    bfloat16 expected = bfloat16(1.0f / 3.14159265358979323846f);
+    if (bf_1_pi != expected) {
+        std::cerr << "bf_1_pi mismatch: " << to_binary(bf_1_pi)
+                  << " != expected " << to_binary(expected) << '\n';
+        ++failures;
+    }
+    nrOfFailedTestCases += ReportTestResult(failures, "bfloat16", "bf_1_pi");
+}
 #endif

If you prefer, I can extend this to cover a small set of core constants.


7-11: Trim unused include

<limits> isn’t used here.

Apply:

-#include <limits>
static/bfloat16/math/mathlib.cpp (4)

7-11: Add missing standard headers instead of relying on transitive includes

std::pow, std::setw, and std::setprecision are used; add the direct headers to avoid fragile transitive dependencies.

Apply:

 #include <universal/utility/directives.hpp>
 #include <universal/number/bfloat16/bfloat16.hpp>
 #include <universal/verification/test_suite.hpp>
 #include <universal/verification/test_suite_mathlib.hpp>
+#include <type_traits>
+#include <cmath>
+#include <iomanip>

33-35: Let build system control MANUAL/STRESS flags; default them off

Hard-defining these macros here prevents CMake/tooling overrides. Guard with #ifndef and default to 0 for automated runs.

Apply:

-#define MANUAL_TESTING 0
-#define STRESS_TESTING 0
+#ifndef MANUAL_TESTING
+#define MANUAL_TESTING 0
+#endif
+#ifndef STRESS_TESTING
+#define STRESS_TESTING 0
+#endif

161-175: Nit: mismatch in diagnostic string for ipow loop

The message says ipow(2.0f, i) but a is initialized to 71.0f. Align the text to aid debugging.

Apply:

-                if (true) std::cerr << "bfloat16 ipow(2.0f, " << i << ") " << to_binary(b) << " != " << to_binary(ref) << '\n';
+                if (true) std::cerr << "bfloat16 ipow(71.0f, " << i << ") " << to_binary(b) << " != " << to_binary(ref) << '\n';

178-190: Qualify min/max calls to target the intended overloads

Unqualified calls to min(a, b) and max(a, b) in static/bfloat16/math/mathlib.cpp (lines 178–190) will bind via ADL to whichever min/max is found in the namespaces of bfloat16 or any brought-in scopes. We verified that there are sw::universal::min(bfloat16, bfloat16) and sw::universal::max(bfloat16, bfloat16) definitions in
include/sw/universal/number/bfloat16/math/functions/minmax.hpp, but unqualified lookup could instead pick std::min/std::max if <algorithm> is ever included or if namespace imports change.

To ensure the intended overload is always selected and avoid future surprises, please qualify these calls explicitly. For example:

-   if (min(a, b) != a) {
+   if (sw::universal::min(a, b) != a) {
       std::cerr << "min(1.0f, 2.0f) failed\n";
       nrOfFailedTestCases++;
   }
-   if (max(a, b) != b) {
+   if (sw::universal::max(a, b) != b) {
       std::cerr << "max(1.0f, 2.0f) failed\n";
       nrOfFailedTestCases++;
   }

This makes the code’s intentions clear and robust against changes to namespace imports.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 899375e and 94f8a57.

📒 Files selected for processing (8)
  • include/sw/universal/number/bfloat16/bfloat16_impl.hpp (6 hunks)
  • include/sw/universal/number/bfloat16/math/constants/constants.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/classify.hpp (1 hunks)
  • include/sw/universal/number/bfloat16/math/functions/fractional.hpp (1 hunks)
  • include/sw/universal/verification/test_suite.hpp (1 hunks)
  • static/bfloat16/api/api.cpp (2 hunks)
  • static/bfloat16/api/constants.cpp (1 hunks)
  • static/bfloat16/math/mathlib.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • include/sw/universal/number/bfloat16/math/constants/constants.hpp
  • include/sw/universal/number/bfloat16/math/functions/fractional.hpp
  • static/bfloat16/api/api.cpp
  • include/sw/universal/number/bfloat16/math/functions/classify.hpp
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-22T18:06:33.627Z
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/verification/test_suite_mathlib.hpp:44-45
Timestamp: 2025-08-22T18:06:33.627Z
Learning: The test suite in include/sw/universal/verification/test_suite_mathlib.hpp is designed for number systems with nbits <= 12, so the expression (1 << nbits) for NR_TEST_CASES will never cause integer overflow in practice.

Applied to files:

  • static/bfloat16/api/constants.cpp
  • static/bfloat16/math/mathlib.cpp
📚 Learning: 2025-08-22T18:07:22.707Z
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/number/bfloat/math/functions/truncate.hpp:1-9
Timestamp: 2025-08-22T18:07:22.707Z
Learning: In the Universal Numbers Library bfloat16 mathlib, individual function headers under include/sw/universal/number/bfloat/math/functions/ (like truncate.hpp) are designed to be included only through the main mathlib aggregator header, not standalone. Dependencies like <cmath> and bfloat16 headers are intentionally satisfied at the mathlib level rather than in individual function headers. This is an architectural design choice to maintain a clean modular structure.

Applied to files:

  • static/bfloat16/math/mathlib.cpp
  • include/sw/universal/number/bfloat16/bfloat16_impl.hpp
📚 Learning: 2025-08-22T18:30:36.656Z
Learnt from: Ravenwater
PR: stillwater-sc/universal#475
File: include/sw/universal/number/bfloat/math/functions/trigonometry.hpp:8-9
Timestamp: 2025-08-22T18:30:36.656Z
Learning: In the Universal Numbers Library, when reviewing individual math function headers under include/sw/universal/number/bfloat/math/functions/, relative include paths like <math/constants/float_constants.hpp> are correct and individual headers should not include <cmath> or the main type header as these dependencies are satisfied at the mathlib aggregator level. This is an intentional architectural design choice.

Applied to files:

  • static/bfloat16/math/mathlib.cpp
  • include/sw/universal/number/bfloat16/bfloat16_impl.hpp
🧬 Code graph analysis (2)
static/bfloat16/math/mathlib.cpp (3)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (21)
  • to_binary (467-490)
  • to_binary (467-467)
  • i (251-259)
  • i (251-251)
  • bfloat16 (82-82)
  • bfloat16 (84-84)
  • bfloat16 (84-84)
  • bfloat16 (85-85)
  • bfloat16 (85-85)
  • bfloat16 (91-123)
  • bfloat16 (91-91)
  • bfloat16 (126-126)
  • bfloat16 (126-126)
  • bfloat16 (127-127)
  • bfloat16 (127-127)
  • bfloat16 (128-128)
  • bfloat16 (128-128)
  • bfloat16 (129-129)
  • bfloat16 (129-129)
  • bfloat16 (130-130)
  • f (413-416)
include/sw/universal/number/bfloat16/math/functions/classify.hpp (2)
  • fpclassify (12-14)
  • fpclassify (12-12)
include/sw/universal/number/bfloat16/math/functions/fractional.hpp (2)
  • frac (24-28)
  • frac (24-24)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (2)
include/sw/universal/number/bfloat16/math/functions/truncate.hpp (5)
  • floor (22-24)
  • floor (22-22)
  • floor (33-33)
  • floor (41-88)
  • floor (41-41)
include/sw/universal/number/bfloat16/numeric_limits.hpp (1)
  • bf (16-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Run for ubuntu-latest
  • GitHub Check: Run for macos-latest
  • GitHub Check: Run for windows-latest
  • GitHub Check: Run for ubuntu-latest
  • GitHub Check: Run for macos-latest
  • GitHub Check: Run for windows-latest
🔇 Additional comments (4)
include/sw/universal/number/bfloat16/bfloat16_impl.hpp (4)

17-18: Include path swap to bfloat16-specific headers looks correct.

Migration from the generic bfloat headers to the bfloat16-specific forward and exceptions headers is appropriate and aligns with the file’s scope.


260-260: setbits widened to unsigned with mask — good defensive change.

Accepts wider inputs while safely constraining to 16 bits.


359-359: ispos() delegating to isneg() simplifies intent.

This reads cleaner and stays bit-accurate.


380-380: Inf detection logic is correct and clear.

Exponent all ones with zero fraction properly identifies infinities; sign is handled afterwards.

@Ravenwater Ravenwater merged commit 2e00efe into main Aug 22, 2025
9 of 10 checks passed
@Ravenwater Ravenwater deleted the v3.83 branch August 22, 2025 21:32
This was referenced Aug 24, 2025
@coderabbitai coderabbitai bot mentioned this pull request Sep 20, 2025
@coderabbitai coderabbitai bot mentioned this pull request Oct 26, 2025
@coderabbitai coderabbitai bot mentioned this pull request Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants