What happened?
Bug: _linear_regression_rows_nd Does Not Validate Degrees of Freedom
Summary
_linear_regression_rows_nd does not validate that degrees of freedom ≥ 1 before computing
standard errors, t-statistics, and p-values. When n ≤ k + 1 (complete samples ≤ covariates + 1),
the function silently returns NaN/Inf instead of raising an error.
Background
Scala/Spark backend — correctly guarded
Both LinearRegressionRowsSingle and LinearRegressionRowsChained in LinearRegression.scala
check:
if (d < 1) fatal(s"$n samples and ${k + 1} ... implies $d degrees of freedom.")
Poisson regression _nd path — correctly guarded
statgen.py (~line 1790):
.when(mt.n - k - 1 >= 1, ...)
.or_error("insufficient degrees of freedom: n=%s, k=%s")
Linear regression _nd path — not guarded
statgen.py (~line 621) computes degrees of freedom without any guard:
ds = ht.ns.map(lambda n: n - k - 1)
And later divides by ds unconditionally:
se = ((1.0 / ht.ds[idx]) * (...)).map(lambda entry: hl.sqrt(entry))
Failure Modes
| Condition |
IEEE 754 Behaviour |
Result |
| ds = 0 |
Division by zero → Inf, propagates through sqrt |
NaN in SE, t-stat, p-value |
| ds < 0 |
sqrt of a negative number |
NaN in SE, t-stat, p-value |
No error is raised and no warning is emitted in either case.
Who Is Affected
- All users on the service backend (non-Spark) — the Scala guard does not apply to this path.
- All users of weighted linear regression on any backend — the
weights argument forces the _nd path.
- Chained regressions where per-group sample counts vary — some groups may silently produce invalid results even when others succeed.
Expected Behaviour
An informative error consistent with the Scala backend, e.g.:
FatalError: n samples and k covariates implies d degrees of freedom.
Actual Behaviour
Silent NaN in standard_error, t_stat, and p_value — no error, no warning.
Version
0.2.138 (main at 44bbc63)
Relevant log output
What happened?
Bug:
_linear_regression_rows_ndDoes Not Validate Degrees of FreedomSummary
_linear_regression_rows_nddoes not validate that degrees of freedom ≥ 1 before computing standard errors, t-statistics, and p-values. Whenn ≤ k + 1(complete samples ≤ covariates + 1), the function silently returnsNaN/Infinstead of raising an error.Background
Scala/Spark backend — correctly guarded
Both
LinearRegressionRowsSingleandLinearRegressionRowsChainedinLinearRegression.scalacheck:Poisson regression
_ndpath — correctly guardedstatgen.py(~line 1790):Linear regression
_ndpath — not guardedstatgen.py(~line 621) computes degrees of freedom without any guard:And later divides by
dsunconditionally:Failure Modes
No error is raised and no warning is emitted in either case.
Who Is Affected
weightsargument forces the_ndpath.Expected Behaviour
An informative error consistent with the Scala backend, e.g.:
Actual Behaviour
Silent
NaNinstandard_error,t_stat, andp_value— no error, no warning.Version
0.2.138 (main at 44bbc63)
Relevant log output