Skip to content

_linear_regression_rows_nd silently returns NaN when degrees of freedom < 1 #15438

@lazza442233

Description

@lazza442233

What happened?

Bug: _linear_regression_rows_nd Does Not Validate Degrees of Freedom

Summary

_linear_regression_rows_nd does not validate that degrees of freedom ≥ 1 before computing standard errors, t-statistics, and p-values. When n ≤ k + 1 (complete samples ≤ covariates + 1), the function silently returns NaN/Inf instead of raising an error.


Background

Scala/Spark backend — correctly guarded

Both LinearRegressionRowsSingle and LinearRegressionRowsChained in LinearRegression.scala check:

if (d < 1) fatal(s"$n samples and ${k + 1} ... implies $d degrees of freedom.")

Poisson regression _nd path — correctly guarded

statgen.py (~line 1790):

.when(mt.n - k - 1 >= 1, ...)
.or_error("insufficient degrees of freedom: n=%s, k=%s")

Linear regression _nd path — not guarded

statgen.py (~line 621) computes degrees of freedom without any guard:

ds = ht.ns.map(lambda n: n - k - 1)

And later divides by ds unconditionally:

se = ((1.0 / ht.ds[idx]) * (...)).map(lambda entry: hl.sqrt(entry))

Failure Modes

Condition IEEE 754 Behaviour Result
ds = 0 Division by zero → Inf, propagates through sqrt NaN in SE, t-stat, p-value
ds < 0 sqrt of a negative number NaN in SE, t-stat, p-value

No error is raised and no warning is emitted in either case.


Who Is Affected

  • All users on the service backend (non-Spark) — the Scala guard does not apply to this path.
  • All users of weighted linear regression on any backend — the weights argument forces the _nd path.
  • Chained regressions where per-group sample counts vary — some groups may silently produce invalid results even when others succeed.

Expected Behaviour

An informative error consistent with the Scala backend, e.g.:

FatalError: n samples and k covariates implies d degrees of freedom.

Actual Behaviour

Silent NaN in standard_error, t_stat, and p_value — no error, no warning.


Version

0.2.138 (main at 44bbc63)

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageA brand new issue that needs triaging.query

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions