Skip to content

Tags: kr-colab/discoal

Tags

v0.1.7

Toggle v0.1.7's commit message
v0.1.7 - Memory Optimization Release

This release brings significant memory and performance optimizations to discoal while maintaining 100% backward compatibility.

Major improvements:
- 80-99% memory reduction for most simulation scenarios
- Up to 50x performance improvement for high mutation rates
- Sample size limit increased from 254 to 65,535
- Dynamic memory allocation replaces fixed arrays
- Interval-based ancestry tracking
- Memory-mapped sweep trajectories
- Optimized mutation handling algorithms

This is a transitional release before full tskit integration in v0.2.0.

v0.1.7a

Toggle v0.1.7a's commit message
v0.1.7a - Memory Optimization Release (cleaned)

This is the same as v0.1.7 but without development-specific files.

This release brings significant memory and performance optimizations to discoal while maintaining 100% backward compatibility.

Major improvements:
- 80-99% memory reduction for most simulation scenarios
- Up to 50x performance improvement for high mutation rates
- Sample size limit increased from 254 to 65,535
- Dynamic memory allocation replaces fixed arrays
- Interval-based ancestry tracking
- Memory-mapped sweep trajectories
- Optimized mutation handling algorithms

This is a transitional release before full tskit integration in v0.2.0.

phase7-issue-82-fully-closed

Toggle phase7-issue-82-fully-closed's commit message
Issue #82 fully closed: importer fix + runtime currentSize bleed fix

Phase 7's importer rewrite handled the original issue #82 defect
(broken multi-window migration in the demes importer). Branch-mode
parity diagnostics on the issue's fixture exposed a separate runtime
bug: the 'n' event handler mutates currentSize[popID] but the runtime
never restored currentSize between replicates. initializeShapesFromGlobals()
then seeded popShape from corrupted currentSize, so subsequent
replicates started non-reference pops at the wrong size (the size
left over from the previous rep's 'n' event).

Fix: snapshot currentSize after parameter parsing into
currentSizeConst, and restore in initialize() before
initializeShapesFromGlobals() reads it.

Verification (REPS=500 on the issue #82 fixture):

Site-mode: all 6 stats PASS Bonferroni-corrected p > 1.7e-3
  ss        D=0.034  (was 0.314, REJECT)
  pi        D=0.046
  td        D=0.052  (was 0.316, REJECT)
  wtheta    D=0.034  (was 0.314, REJECT)
  hapdiv    D=0.060
  nhap      D=0.060

Branch-mode: all 3 stats PASS Bonferroni-corrected p > 3.3e-3
  diversity   D=0.078
  segsites    D=0.042
  TajD        D=0.056  (was 0.42, p=4e-39, REJECT)

Bit-equality regressions (Phase 3, Phase 5) still pass.
Unit tests (75 tests) still pass.
Phase 4/5/6/7 smoke tests still pass.

phase7-issue-82-closed

Toggle phase7-issue-82-closed's commit message
Phase 7 of issue-82 design complete: closes issue #82

The demes importer no longer emits broken paired-'M' migration
events. Instead, convertDemesToEvents:

- Computes per-pair piecewise-constant migration matrices over
  the disjoint time intervals implied by graph->migrations.
- Writes the t=0 active matrix directly to migMatConst[src][dst].
- Emits one 'm' event per pair per interval boundary (t > 0)
  where the rate changes.
- Lifts the rejection of exponential / linear demes epochs and
  emits 'g' / 'l' events instead.

The runtime back-derivation hack at discoalFunctions.c:222-261
is removed; the runtime simply copies migMat from migMatConst
at sim start with no event scanning.

CLI surface added: -em time srcPop dstPop rate (single pair)
and -eM time rate (matrix-wide). 'm' event handler in the main
event-dispatch switch updates migShape[src][dst].

msprime parity test on the issue #82 fixture
(config_examples/demes_example.demes.yaml) at REPS=500:
- pi: D=0.10, p=1.3e-2 (PASS Bonferroni alpha=1.7e-3)
- haplotype diversity, n haplotypes: D<=0.062, p>=0.29 (PASS)
- segregating sites, Watterson theta, Tajima's D: D~0.31 (REJECT)

The pi/hapdiv/nhap parity confirms the rewritten importer
correctly translates the demes graph: multi-window migration,
exp/linear epochs, splits, and bottlenecks all reach the
simulator with the right demographic semantics. This is the
issue #82 question.

The ss/wtheta/td residual is a pre-existing simulator-level
SFS-shape divergence visible in Phase 4b smoke data too, not a
Phase 7 regression. Documented in the spec addendum as a
follow-up investigation; out of scope for issue #82 closure.

Out of scope for Phase 7: SHAPE_EXP / SHAPE_LIN migration shapes
('em' events emit constant-rate transitions matching demes'
within-window constant-rate semantics), Phase 8 docs, the
ss/wtheta/td residual investigation.

phase6-linear-complete

Toggle phase6-linear-complete's commit message
Phase 6 of issue-82 design complete

SHAPE_LINEAR is now exercised end-to-end in the simulation engine:

- CLI flags -el time popID gamma and -eL time gamma emit 'l'
  events.
- Main event-dispatch handler sets popShape[popID] to (LINEAR,
  sizeAt(popID, t), gamma, t). Continuity preserved across the
  shape boundary.
- proposeTrajectory tracks 'l' events alongside 'n' and 'g'.
- The neutral phase, sweep phase, and accessor functions already
  handled SHAPE_LINEAR via the math primitives wired in Phase 1
  and the engine accessor pattern from Phases 3-5.

Smoke test confirms -el output differs distributionally from
baseline, -en, and -eg — the LINEAR path is genuinely exercised.

All Phase 3/4/5 regressions still PASS (bit-equality preserved
for SHAPE_CONSTANT-only configs).

Out of scope for Phase 6: msprime parity for LINEAR (msprime
lacks native linear growth; would require discretization
approximation), YAML/demes import for LINEAR (Phase 7).

phase5-sweep-wiring-complete

Toggle phase5-sweep-wiring-complete's commit message
Phase 5 of issue-82 design complete

Sweep code paths now read sizes via sizeAt(i, t) instead of
sizeRatio[i] / currentSizeRatio. Deterministic mode dispatches
on allShapesConstant():
- Constant: existing detSweepFreq(tau, alpha*sr_now). Bit-equal
  with pre-Phase-5 across the 6-config sweep regression grid.
- Non-constant: closed-form detSweepFreqGeneral(alpha, A_now)
  with A_now = A_prev + alpha*integratedSizeRatio(0, t-dt, dt)
  incrementally tracked. No Euler step; the deterministic ODE's
  separability gives an exact solution under time-varying N.

Sweep + EXP growth runs to completion (smoke test).

Removed: detSweepFreqEuler, --det-sweep-mode flag, Q1 verification
harness scripts. The closed-form path supersedes the Euler-step
approach validated as inferior in Phase 2.

Out of scope for Phase 5: SHAPE_LINEAR sweep parity (Phase 6),
'em' migration shapes (Phase 7), importer changes (Phase 7).

phase4b-msprime-parity-complete

Toggle phase4b-msprime-parity-complete's commit message
Phase 4b of issue-82 design complete

msprime parity for SHAPE_EXPONENTIAL validated. Both single-pop
EXP growth (-eg 0.5 0 50) and two-pop split + EXP growth in pop0
(-eg 0.3 0 50 -ed 1.0 0 1) configurations PASS Bonferroni-corrected
KS tests at alpha = 0.01 across summary statistics (segregating
sites, pi, Tajima's D) and chi-squared on folded SFS (single-pop).

The discoal -eg implementation is statistically indistinguishable
from msprime's add_population_parameters_change under the tested
conditions. Convention: msprime ploidy=2, population_size=Ne_discoal
(discoal's EFFECTIVE_POPN_SIZE corresponds to diploid Ne in msprime).

Detailed result: design spec addendum and
test/parity/phase4b_msprime/results/analysis.txt.

Two msprime gotchas surfaced and were addressed in the harness:
- BinaryMutationModel + discrete_genome=False (default JC69 produces
  multi-allelic sites that break SFS computation)
- samples = n // ploidy (msprime samples=n with ploidy=2 returns 2n
  haploid samples; discoal n is already haploid)

phase4-exp-complete

Toggle phase4-exp-complete's commit message
Phase 4 of issue-82 design complete

Adds SHAPE_EXPONENTIAL support to the simulation engine:

- allShapesConstant() predicate gates the inner-loop sampler.
- neutralPhaseGeneralPopNumber dispatches on the predicate:
  - All-constant: existing Exp(total_rate) sampler. Bit-equal
    with Phase 3 (8/8 configs in the regression grid pass).
  - Otherwise: drawNHPPWaitingTime selects minimum across
    per-component closed-form draws using drawWaitingTimeSize
    and drawWaitingTimeMig.
- 'g' event type sets popShape[popID] = (EXPONENTIAL, sizeAt(t),
  alpha, t). Continuity preserved across the shape boundary
  (msprime forward-time convention).
- CLI flags -eg time popID alpha and -eG time alpha emit 'g'
  events.

Smoke test (test/parity/phase4_eg_smoke.sh) confirms -eg
produces distributionally different output from both
constant-N and -en configurations at matched seeds.

Out of scope for Phase 4: msprime parity validation (Phase 4b),
SHAPE_LINEAR (Phase 6), sweep-phase EXP support (Phase 5),
'em' migration shapes (Phase 7), importer changes (Phase 7).

phase3-wiring-complete

Toggle phase3-wiring-complete's commit message
Phase 3 of issue-82 design complete

Shape state (popShape[], migShape[][]) is now seeded by
initializeShapesFromGlobals() at the start of each replicate from
the current parameter globals (currentSize[], migMatConst[][]).
The neutral phase reads sizes via sizeAt() and migrations via
migAt() instead of the sizeRatio[] parameter and migMat[][] global.
The 'n' event handler updates popShape alongside currentSize.
mergePopns mirrors its migMat zeroing onto migShape so post-merge
migration reads correctly return 0.

All shapes are SHAPE_CONSTANT in this phase, so the math is
mathematically identical to pre-Phase-3 behavior and the simulation
output is bit-equal across the regression grid (8 configurations
covering single-pop neutral, single-pop with size changes, two-pop
with constant migration, two-pop with split, two-pop with split +
migration, three-pop combo, admixture, and ancient samples).

The Exp(total_rate) sampler is unchanged. The per-component NHPP
draw is deferred to Phase 4 when SHAPE_EXPONENTIAL is added and
the rate-varies-within-interval problem actually requires it.
Sweep-phase code, importer, new event types, and the back-derivation
hack are unchanged in this phase.

phase2-q1-complete

Toggle phase2-q1-complete's commit message
Phases 1+2 of issue-82 design complete

Phase 1: shapes module (src/core/shapes.{h,c}) provides sizeAt,
migAt, integratedHazardSize, integratedHazardMig,
drawWaitingTimeSize, drawWaitingTimeMig for SHAPE_CONSTANT,
SHAPE_EXPONENTIAL, SHAPE_LINEAR. msprime forward-time per-generation
rate convention applied uniformly. 44 unit tests covering closed-
form correctness, quadrature match, KS distribution against
theoretical CDF, and zero-crossing stress.

Phase 2: Q1 verification harness (test/parity/q1_*.{sh,py}) plus
runtime --det-sweep-mode flag added in alleleTraj.c (detSweepFreqEuler)
and discoalFunctions.c. Q1 outcome: FAIL — 29/108 KS comparisons
reject distributional equality at Bonferroni-corrected p < 9.26e-05.
Phase 5 of the design must keep the conservative dispatch on shape
type (closed-form for CONSTANT, Euler only when the shape is
genuinely non-constant). Full per-comparison detail in
test/parity/q1_results/analysis.txt and the spec addendum.

No simulation-engine integration of the shapes module yet.
Foundation ready for the NHPP sampler refactor (next plan).