Tags: kr-colab/discoal
Tags
v0.1.7 - Memory Optimization Release This release brings significant memory and performance optimizations to discoal while maintaining 100% backward compatibility. Major improvements: - 80-99% memory reduction for most simulation scenarios - Up to 50x performance improvement for high mutation rates - Sample size limit increased from 254 to 65,535 - Dynamic memory allocation replaces fixed arrays - Interval-based ancestry tracking - Memory-mapped sweep trajectories - Optimized mutation handling algorithms This is a transitional release before full tskit integration in v0.2.0.
v0.1.7a - Memory Optimization Release (cleaned) This is the same as v0.1.7 but without development-specific files. This release brings significant memory and performance optimizations to discoal while maintaining 100% backward compatibility. Major improvements: - 80-99% memory reduction for most simulation scenarios - Up to 50x performance improvement for high mutation rates - Sample size limit increased from 254 to 65,535 - Dynamic memory allocation replaces fixed arrays - Interval-based ancestry tracking - Memory-mapped sweep trajectories - Optimized mutation handling algorithms This is a transitional release before full tskit integration in v0.2.0.
Issue #82 fully closed: importer fix + runtime currentSize bleed fix Phase 7's importer rewrite handled the original issue #82 defect (broken multi-window migration in the demes importer). Branch-mode parity diagnostics on the issue's fixture exposed a separate runtime bug: the 'n' event handler mutates currentSize[popID] but the runtime never restored currentSize between replicates. initializeShapesFromGlobals() then seeded popShape from corrupted currentSize, so subsequent replicates started non-reference pops at the wrong size (the size left over from the previous rep's 'n' event). Fix: snapshot currentSize after parameter parsing into currentSizeConst, and restore in initialize() before initializeShapesFromGlobals() reads it. Verification (REPS=500 on the issue #82 fixture): Site-mode: all 6 stats PASS Bonferroni-corrected p > 1.7e-3 ss D=0.034 (was 0.314, REJECT) pi D=0.046 td D=0.052 (was 0.316, REJECT) wtheta D=0.034 (was 0.314, REJECT) hapdiv D=0.060 nhap D=0.060 Branch-mode: all 3 stats PASS Bonferroni-corrected p > 3.3e-3 diversity D=0.078 segsites D=0.042 TajD D=0.056 (was 0.42, p=4e-39, REJECT) Bit-equality regressions (Phase 3, Phase 5) still pass. Unit tests (75 tests) still pass. Phase 4/5/6/7 smoke tests still pass.
Phase 7 of issue-82 design complete: closes issue #82 The demes importer no longer emits broken paired-'M' migration events. Instead, convertDemesToEvents: - Computes per-pair piecewise-constant migration matrices over the disjoint time intervals implied by graph->migrations. - Writes the t=0 active matrix directly to migMatConst[src][dst]. - Emits one 'm' event per pair per interval boundary (t > 0) where the rate changes. - Lifts the rejection of exponential / linear demes epochs and emits 'g' / 'l' events instead. The runtime back-derivation hack at discoalFunctions.c:222-261 is removed; the runtime simply copies migMat from migMatConst at sim start with no event scanning. CLI surface added: -em time srcPop dstPop rate (single pair) and -eM time rate (matrix-wide). 'm' event handler in the main event-dispatch switch updates migShape[src][dst]. msprime parity test on the issue #82 fixture (config_examples/demes_example.demes.yaml) at REPS=500: - pi: D=0.10, p=1.3e-2 (PASS Bonferroni alpha=1.7e-3) - haplotype diversity, n haplotypes: D<=0.062, p>=0.29 (PASS) - segregating sites, Watterson theta, Tajima's D: D~0.31 (REJECT) The pi/hapdiv/nhap parity confirms the rewritten importer correctly translates the demes graph: multi-window migration, exp/linear epochs, splits, and bottlenecks all reach the simulator with the right demographic semantics. This is the issue #82 question. The ss/wtheta/td residual is a pre-existing simulator-level SFS-shape divergence visible in Phase 4b smoke data too, not a Phase 7 regression. Documented in the spec addendum as a follow-up investigation; out of scope for issue #82 closure. Out of scope for Phase 7: SHAPE_EXP / SHAPE_LIN migration shapes ('em' events emit constant-rate transitions matching demes' within-window constant-rate semantics), Phase 8 docs, the ss/wtheta/td residual investigation.
Phase 6 of issue-82 design complete SHAPE_LINEAR is now exercised end-to-end in the simulation engine: - CLI flags -el time popID gamma and -eL time gamma emit 'l' events. - Main event-dispatch handler sets popShape[popID] to (LINEAR, sizeAt(popID, t), gamma, t). Continuity preserved across the shape boundary. - proposeTrajectory tracks 'l' events alongside 'n' and 'g'. - The neutral phase, sweep phase, and accessor functions already handled SHAPE_LINEAR via the math primitives wired in Phase 1 and the engine accessor pattern from Phases 3-5. Smoke test confirms -el output differs distributionally from baseline, -en, and -eg — the LINEAR path is genuinely exercised. All Phase 3/4/5 regressions still PASS (bit-equality preserved for SHAPE_CONSTANT-only configs). Out of scope for Phase 6: msprime parity for LINEAR (msprime lacks native linear growth; would require discretization approximation), YAML/demes import for LINEAR (Phase 7).
Phase 5 of issue-82 design complete Sweep code paths now read sizes via sizeAt(i, t) instead of sizeRatio[i] / currentSizeRatio. Deterministic mode dispatches on allShapesConstant(): - Constant: existing detSweepFreq(tau, alpha*sr_now). Bit-equal with pre-Phase-5 across the 6-config sweep regression grid. - Non-constant: closed-form detSweepFreqGeneral(alpha, A_now) with A_now = A_prev + alpha*integratedSizeRatio(0, t-dt, dt) incrementally tracked. No Euler step; the deterministic ODE's separability gives an exact solution under time-varying N. Sweep + EXP growth runs to completion (smoke test). Removed: detSweepFreqEuler, --det-sweep-mode flag, Q1 verification harness scripts. The closed-form path supersedes the Euler-step approach validated as inferior in Phase 2. Out of scope for Phase 5: SHAPE_LINEAR sweep parity (Phase 6), 'em' migration shapes (Phase 7), importer changes (Phase 7).
Phase 4b of issue-82 design complete msprime parity for SHAPE_EXPONENTIAL validated. Both single-pop EXP growth (-eg 0.5 0 50) and two-pop split + EXP growth in pop0 (-eg 0.3 0 50 -ed 1.0 0 1) configurations PASS Bonferroni-corrected KS tests at alpha = 0.01 across summary statistics (segregating sites, pi, Tajima's D) and chi-squared on folded SFS (single-pop). The discoal -eg implementation is statistically indistinguishable from msprime's add_population_parameters_change under the tested conditions. Convention: msprime ploidy=2, population_size=Ne_discoal (discoal's EFFECTIVE_POPN_SIZE corresponds to diploid Ne in msprime). Detailed result: design spec addendum and test/parity/phase4b_msprime/results/analysis.txt. Two msprime gotchas surfaced and were addressed in the harness: - BinaryMutationModel + discrete_genome=False (default JC69 produces multi-allelic sites that break SFS computation) - samples = n // ploidy (msprime samples=n with ploidy=2 returns 2n haploid samples; discoal n is already haploid)
Phase 4 of issue-82 design complete
Adds SHAPE_EXPONENTIAL support to the simulation engine:
- allShapesConstant() predicate gates the inner-loop sampler.
- neutralPhaseGeneralPopNumber dispatches on the predicate:
- All-constant: existing Exp(total_rate) sampler. Bit-equal
with Phase 3 (8/8 configs in the regression grid pass).
- Otherwise: drawNHPPWaitingTime selects minimum across
per-component closed-form draws using drawWaitingTimeSize
and drawWaitingTimeMig.
- 'g' event type sets popShape[popID] = (EXPONENTIAL, sizeAt(t),
alpha, t). Continuity preserved across the shape boundary
(msprime forward-time convention).
- CLI flags -eg time popID alpha and -eG time alpha emit 'g'
events.
Smoke test (test/parity/phase4_eg_smoke.sh) confirms -eg
produces distributionally different output from both
constant-N and -en configurations at matched seeds.
Out of scope for Phase 4: msprime parity validation (Phase 4b),
SHAPE_LINEAR (Phase 6), sweep-phase EXP support (Phase 5),
'em' migration shapes (Phase 7), importer changes (Phase 7).
Phase 3 of issue-82 design complete Shape state (popShape[], migShape[][]) is now seeded by initializeShapesFromGlobals() at the start of each replicate from the current parameter globals (currentSize[], migMatConst[][]). The neutral phase reads sizes via sizeAt() and migrations via migAt() instead of the sizeRatio[] parameter and migMat[][] global. The 'n' event handler updates popShape alongside currentSize. mergePopns mirrors its migMat zeroing onto migShape so post-merge migration reads correctly return 0. All shapes are SHAPE_CONSTANT in this phase, so the math is mathematically identical to pre-Phase-3 behavior and the simulation output is bit-equal across the regression grid (8 configurations covering single-pop neutral, single-pop with size changes, two-pop with constant migration, two-pop with split, two-pop with split + migration, three-pop combo, admixture, and ancient samples). The Exp(total_rate) sampler is unchanged. The per-component NHPP draw is deferred to Phase 4 when SHAPE_EXPONENTIAL is added and the rate-varies-within-interval problem actually requires it. Sweep-phase code, importer, new event types, and the back-derivation hack are unchanged in this phase.
Phases 1+2 of issue-82 design complete
Phase 1: shapes module (src/core/shapes.{h,c}) provides sizeAt,
migAt, integratedHazardSize, integratedHazardMig,
drawWaitingTimeSize, drawWaitingTimeMig for SHAPE_CONSTANT,
SHAPE_EXPONENTIAL, SHAPE_LINEAR. msprime forward-time per-generation
rate convention applied uniformly. 44 unit tests covering closed-
form correctness, quadrature match, KS distribution against
theoretical CDF, and zero-crossing stress.
Phase 2: Q1 verification harness (test/parity/q1_*.{sh,py}) plus
runtime --det-sweep-mode flag added in alleleTraj.c (detSweepFreqEuler)
and discoalFunctions.c. Q1 outcome: FAIL — 29/108 KS comparisons
reject distributional equality at Bonferroni-corrected p < 9.26e-05.
Phase 5 of the design must keep the conservative dispatch on shape
type (closed-form for CONSTANT, Euler only when the shape is
genuinely non-constant). Full per-comparison detail in
test/parity/q1_results/analysis.txt and the spec addendum.
No simulation-engine integration of the shapes module yet.
Foundation ready for the NHPP sampler refactor (next plan).
PreviousNext