Skip to content

marginal-slope (binary + survival): severe slowdown / survival hang in gamfit 0.1.189 #979

@SauersML

Description

@SauersML

Summary

On gamfit 0.1.189 (PyPI), marginal-slope fits regressed badly versus an earlier version (~0.1.156) that completed the same fits in the same pipeline:

  1. bernoulli-marginal-slope is now extremely slow — fits that used to take seconds now spend tens of seconds just in the outer "continuation pre-warm", and at centers=20 do not finish within a 20–33 min window.
  2. survival_likelihood="marginal-slope" now HANGS (runs to a hard timeout with no output) rather than completing or raising. A hang is worse than the old crash (gam#751) because a try/except can't catch it, so it also blocks the Cox baselines run in the same script.

Minimal repro (binary slowdown)

import signal, time, pandas as pd, gamfit
d = pd.read_parquet("serial1d_phenoA_realpt_s1.parquet")     # n~2500, K=2 PCs, real P+T score
fit = d[d["split"].astype(str).str.contains("fit")]
data = {"z": fit["PGS_z"].to_numpy(float), "PC1": fit["PC1"].to_numpy(float),
        "PC2": fit["PC2"].to_numpy(float), "event": fit["y_binary"].to_numpy(float)}
t = time.time()
gamfit.fit(data, formula="event ~ matern(PC1, PC2, centers=4)",
           family="bernoulli-marginal-slope", link="probit",
           z_column="z", logslope_formula="matern(PC1, PC2, centers=4)")
print("secs", round(time.time()-t,1))

Even at centers=4 the fit progresses but is slow; the outer log shows:

[28s] [PIRLS/joint-Newton terminal] converged=true terminator=KKT/certificate-converged cycles=17/21 ...
[36s] [OUTER] custom family: continuation pre-warm seed 0 steps=4 elapsed=34.415s

i.e. ~35 s elapsed just for "continuation pre-warm seed 0" at centers=4. At centers=20 (a normal setting) the fit did not complete within 20–33 min in a 24-way parallel batch.

Survival hang

Same data, gamfit.fit(..., family/survival_likelihood="marginal-slope", Surv(time,event), matern(PCs, centers=12)): every cell ran to a 2000 s timeout (rc=124) producing no output — a hang, not a raise.

Impact

A multi-seed real-PGS cross-ancestry calibration study that completed end-to-end on ~0.1.156 is impractical on 0.1.189: binary centers=20 fits don't finish in reasonable time and survival fits hang.

Environment

gamfit 0.1.189 (PyPI wheel), Python 3.11, Linux. matern over 2–3 PCs, bernoulli-marginal-slope (probit) and survival marginal-slope, no link/score/time wiggle, centers ∈ {4, 12, 20}. The heavy time is in the marginal-slope inner solve / continuation pre-warm (the machinery added across 0.1.18x for #808/#787/#729/#691).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions