Multivariate normal CDF, correlation matrix repair, and Gaussian copulas for Go.
Go's scientific computing stack (gonum) has solid univariate distributions and multivariate normal PDF/sampling, but no multivariate normal CDF. It's been an open issue since 2019.
If you're doing quant finance, risk modeling, or ML in Go, you've probably hit this and had to pick from three bad options:
- Shell out to Python (150ms+ subprocess overhead per call)
- Use cgo bindings to C/Fortran (build headaches, cross-compilation pain)
- Just assume independence (mathematically wrong)
mvstat is a fourth option: pure Go, matches scipy's accuracy, nanosecond-scale latency, zero cgo.
go get github.com/tgcohce/mvstat// Bivariate CDF: P(X1 <= 1, X2 <= 1) with correlation 0.5
p := mvstat.BvnCDF(1.0, 1.0, 0.5) // 0.7452...
// 5-dimensional CDF
mu := []float64{0, 0, 0, 0, 0}
upper := []float64{1, 1, 1, 1, 1}
sigma := mat.NewSymDense(5, covData)
result, err := mvstat.MvnCDF(mu, upper, sigma, nil)// Digital basket option: P(all 3 assets above strike at expiry)
// Under risk-neutral GBM, this is just a multivariate normal CDF
result, _ := mvstat.MvnCDFStd(thresholds, corrMatrix, nil)
digitalPrice := discount * result.ProbMonte Carlo validation over 1M paths confirms agreement within 1 standard error. Full example
// P(all 5 assets lose more than 2% in one day)
result, _ := mvstat.MvnCDF(dailyMu, lossThresholds, covMatrix, opts)
// Result: 4.9e-3 (correlated) vs 1.7e-5 (independent), about 290x higherCorrelation increases joint tail risk by two orders of magnitude. Full example
// Matrix from market data with gaps, not PSD, Cholesky fails
fixed, iters, _ := mvstat.NearestCorr(nil, broken, nil)
// Now PSD with unit diagonal, minimal change from originalFinds the nearest valid correlation matrix in the Frobenius norm (Higham 2002). Full example
// Gaussian copula: joint default probability of 3 bonds
// P(all default) with correlation is 20x the independence assumption
thresholds := []float64{phiInv(p1), phiInv(p2), phiInv(p3)}
jointDefault, _ := mvstat.TvnCDF(thresholds[0], thresholds[1], thresholds[2], R)AMD Ryzen 7 7735HS, Go 1.24, Windows 11:
| Operation | mvstat | Python subprocess | Speedup |
|---|---|---|---|
| BvnCDF | 640 ns | ~194 ms | 303,000x |
| BvnCDF x 1000 (batch) | 651 us | ~175 ms | 269x |
| MvnCDF d=5 | 2.3 ms | ~181 ms | 78x |
| NearestCorr 50x50 | 16 ms | ~207 ms | 13x |
The single-call speedup is mostly Python startup overhead. The batch comparison is more fair: mvstat computes 1000 bivariate CDFs in 651us while Python takes 175ms for the same work.
Internal benchmarks:
| Benchmark | Time/op | Allocs/op |
|---|---|---|
| BvnCDF | 640 ns | 0 |
| TvnCDF | 41 us | 0 |
| MvnCDF d=5 (default) | 2.3 ms | 25 |
| MvnCDF d=10 (default) | 5.1 ms | 26 |
| NearestPSD 50x50 | 360 us | 15 |
| NearestCorr 50x50 | 16 ms | 534 |
| GaussianCopulaPDF d=5 | 3.2 us | 10 |
Validated against scipy multivariate_normal.cdf across 2,000+ test cases:
| Function | Max error vs scipy | Method | Notes |
|---|---|---|---|
| BvnCDF | 7.4e-16 | vs scipy (1,575 cases) | Machine precision |
| TvnCDF | ~2e-7 | vs QMC reference | More accurate than scipy (details below) |
| MvnCDF d=4-8 | 5.1e-6 | vs scipy (MaxEval=100k) | Improves with MaxEval |
TvnCDF cross-validation. The largest disagreement with scipy (2.4e-6) was independently verified by running a 1M-evaluation QMC integrator. TvnCDF matched the QMC reference to within ~2e-7 in all cases. TvnCDF was closer in 5/5 cases, scipy in 0/5. The residual is scipy's error, not ours.
The Abserr field tracks actual error to within one order of magnitude in 99% of
cases (validated across 1,000 test cases at multiple MaxEval settings).
For d >= 4, accuracy improves with MaxEval. Typical convergence:
MaxEval=1,000 -> ~1e-3 accuracy
MaxEval=10,000 -> ~1e-4 accuracy
MaxEval=100,000 -> ~1e-6 accuracy
Set opts.Src to a rand.Source for randomized QMC shifts (tighter error
estimates). Without Src, results are deterministic but Abserr is zero since
there are no shifts to estimate from.
| Function | What it does | Algorithm |
|---|---|---|
BvnCDF(x1, x2, rho) |
Bivariate normal CDF | Drezner-Wesolowsky, 20-pt Gauss-Legendre |
TvnCDF(x1, x2, x3, R) |
Trivariate normal CDF | Genz (2004), adaptive quadrature |
MvnCDF(mu, upper, sigma, opts) |
General d-dimensional CDF | Genz-Bretz quasi-Monte Carlo |
MvnCDFStd(upper, R, opts) |
Standardized CDF (R = correlation) | Same, skips standardization |
CDF(upper, mu, sigma, opts) |
Convenience wrapper | Dispatches by dimension |
| Function | What it does |
|---|---|
BvnCDFRect(lo1, hi1, lo2, hi2, rho) |
P(lo <= X <= hi) for d=2 |
TvnCDFRect(lo, hi, R) |
P(lo <= X <= hi) for d=3 |
MvnCDFRect(mu, lo, hi, sigma, opts) |
P(lo <= X <= hi) for general d |
MvnCDFRectStd(lo, hi, R, opts) |
Standardized version |
| Function | What it does |
|---|---|
BvnLogCDF(x1, x2, rho) |
Log-space CDF, avoids underflow at extreme tails |
BvnSurvival(x1, x2, rho) |
Upper tail P(X1 > x1, X2 > x2) |
| Function | What it does |
|---|---|
NearestPSD(dst, A, minEig) |
Nearest PSD matrix via eigenvalue clipping |
NearestCorr(dst, A, opts) |
Nearest correlation matrix (Higham 2002) |
IsPSD(A, tol) |
Check positive semi-definiteness |
| Function | What it does |
|---|---|
GaussianCopulaCDF(u, R, opts) |
Gaussian copula CDF |
GaussianCopulaPDF(u, R) |
Gaussian copula density |
| Function / Type | What it does |
|---|---|
FromDistmvNormal(dist, upper, opts) |
Compute CDF from a gonum distmv.Normal |
Normal |
Pre-decomposed distribution for repeated CDF calls |
NewNormal(mu, sigma) |
Constructor, validates and decomposes once |
(*Normal).CDF(upper, opts) |
CDF without re-decomposing sigma |
(*Normal).CDFRect(lo, hi, opts) |
Rectangular CDF |
(*Normal).LogCDF(upper) |
Log CDF (d=2 only) |
| scipy | mvstat |
|---|---|
multivariate_normal.cdf(upper, mean=mu, cov=sigma) |
MvnCDF(mu, upper, sigma, nil) |
multivariate_normal.cdf(upper, mean=zeros, cov=R) |
MvnCDFStd(upper, R, nil) |
| No direct equivalent | MvnCDFRect(mu, lo, hi, sigma, nil) |
norm.cdf(x1) * norm.cdf(x2) (independent only) |
BvnCDF(x1, x2, 0) |
| No direct equivalent | BvnLogCDF(x1, x2, rho) |
gaussian_copula.cdf(u) |
GaussianCopulaCDF(u, R, nil) |
cov_nearest(A) (statsmodels) |
NearestCorr(nil, A, nil) |
One thing to watch: CDF(upper, mu, ...) takes upper first then mu, which is
the reverse of MvnCDF(mu, upper, ...).
MvnCDF handles covariance matrices with condition number up to ~1e8 reliably.
Above 1e10, consider regularizing with NearestPSD first. The library won't
panic on valid input regardless of conditioning.
- Drezner, Z. and Wesolowsky, G.O. (1990). "On the computation of the bivariate normal integral." J. Statist. Comput. Simul. 35:101-107.
- Genz, A. (2004). "Numerical computation of rectangular bivariate and trivariate normal and t probabilities." Statistics and Computing 14.3:251-260.
- Genz, A. (1992). "Numerical computation of multivariate normal probabilities." J. Comput. Graph. Statist. 1:141-150.
- Genz, A. and Bretz, F. (2002). "Comparison of methods for the computation of multivariate t-probabilities." J. Comput. Graph. Statist. 11:950-971.
- Higham, N.J. (2002). "Computing the nearest correlation matrix - a problem from finance." IMA J. Numer. Anal. 22(3):329-343.
- Cranley, R. and Patterson, T.N.L. (1976). "Randomization of number theoretic methods for multiple integration." SIAM J. Numer. Anal. 13:904-914.
MIT