Skip to content

Tgcohce/mvstat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mvstat

Go Reference MIT License

Multivariate normal CDF, correlation matrix repair, and Gaussian copulas for Go.

Why this exists

Go's scientific computing stack (gonum) has solid univariate distributions and multivariate normal PDF/sampling, but no multivariate normal CDF. It's been an open issue since 2019.

If you're doing quant finance, risk modeling, or ML in Go, you've probably hit this and had to pick from three bad options:

  1. Shell out to Python (150ms+ subprocess overhead per call)
  2. Use cgo bindings to C/Fortran (build headaches, cross-compilation pain)
  3. Just assume independence (mathematically wrong)

mvstat is a fourth option: pure Go, matches scipy's accuracy, nanosecond-scale latency, zero cgo.

Quick Start

go get github.com/tgcohce/mvstat
// Bivariate CDF: P(X1 <= 1, X2 <= 1) with correlation 0.5
p := mvstat.BvnCDF(1.0, 1.0, 0.5) // 0.7452...

// 5-dimensional CDF
mu := []float64{0, 0, 0, 0, 0}
upper := []float64{1, 1, 1, 1, 1}
sigma := mat.NewSymDense(5, covData)
result, err := mvstat.MvnCDF(mu, upper, sigma, nil)

What you can do with it

Price multi-asset derivatives

// Digital basket option: P(all 3 assets above strike at expiry)
// Under risk-neutral GBM, this is just a multivariate normal CDF
result, _ := mvstat.MvnCDFStd(thresholds, corrMatrix, nil)
digitalPrice := discount * result.Prob

Monte Carlo validation over 1M paths confirms agreement within 1 standard error. Full example

Compute portfolio tail risk

// P(all 5 assets lose more than 2% in one day)
result, _ := mvstat.MvnCDF(dailyMu, lossThresholds, covMatrix, opts)
// Result: 4.9e-3 (correlated) vs 1.7e-5 (independent), about 290x higher

Correlation increases joint tail risk by two orders of magnitude. Full example

Clean broken correlation matrices

// Matrix from market data with gaps, not PSD, Cholesky fails
fixed, iters, _ := mvstat.NearestCorr(nil, broken, nil)
// Now PSD with unit diagonal, minimal change from original

Finds the nearest valid correlation matrix in the Frobenius norm (Higham 2002). Full example

Model dependent defaults

// Gaussian copula: joint default probability of 3 bonds
// P(all default) with correlation is 20x the independence assumption
thresholds := []float64{phiInv(p1), phiInv(p2), phiInv(p3)}
jointDefault, _ := mvstat.TvnCDF(thresholds[0], thresholds[1], thresholds[2], R)

Full example

Performance

AMD Ryzen 7 7735HS, Go 1.24, Windows 11:

Operation mvstat Python subprocess Speedup
BvnCDF 640 ns ~194 ms 303,000x
BvnCDF x 1000 (batch) 651 us ~175 ms 269x
MvnCDF d=5 2.3 ms ~181 ms 78x
NearestCorr 50x50 16 ms ~207 ms 13x

The single-call speedup is mostly Python startup overhead. The batch comparison is more fair: mvstat computes 1000 bivariate CDFs in 651us while Python takes 175ms for the same work.

Internal benchmarks:

Benchmark Time/op Allocs/op
BvnCDF 640 ns 0
TvnCDF 41 us 0
MvnCDF d=5 (default) 2.3 ms 25
MvnCDF d=10 (default) 5.1 ms 26
NearestPSD 50x50 360 us 15
NearestCorr 50x50 16 ms 534
GaussianCopulaPDF d=5 3.2 us 10

Accuracy

Validated against scipy multivariate_normal.cdf across 2,000+ test cases:

Function Max error vs scipy Method Notes
BvnCDF 7.4e-16 vs scipy (1,575 cases) Machine precision
TvnCDF ~2e-7 vs QMC reference More accurate than scipy (details below)
MvnCDF d=4-8 5.1e-6 vs scipy (MaxEval=100k) Improves with MaxEval

TvnCDF cross-validation. The largest disagreement with scipy (2.4e-6) was independently verified by running a 1M-evaluation QMC integrator. TvnCDF matched the QMC reference to within ~2e-7 in all cases. TvnCDF was closer in 5/5 cases, scipy in 0/5. The residual is scipy's error, not ours.

The Abserr field tracks actual error to within one order of magnitude in 99% of cases (validated across 1,000 test cases at multiple MaxEval settings).

Convergence

For d >= 4, accuracy improves with MaxEval. Typical convergence:

MaxEval=1,000   -> ~1e-3 accuracy
MaxEval=10,000  -> ~1e-4 accuracy
MaxEval=100,000 -> ~1e-6 accuracy

Set opts.Src to a rand.Source for randomized QMC shifts (tighter error estimates). Without Src, results are deterministic but Abserr is zero since there are no shifts to estimate from.

API

CDF functions

Function What it does Algorithm
BvnCDF(x1, x2, rho) Bivariate normal CDF Drezner-Wesolowsky, 20-pt Gauss-Legendre
TvnCDF(x1, x2, x3, R) Trivariate normal CDF Genz (2004), adaptive quadrature
MvnCDF(mu, upper, sigma, opts) General d-dimensional CDF Genz-Bretz quasi-Monte Carlo
MvnCDFStd(upper, R, opts) Standardized CDF (R = correlation) Same, skips standardization
CDF(upper, mu, sigma, opts) Convenience wrapper Dispatches by dimension

Rectangular regions

Function What it does
BvnCDFRect(lo1, hi1, lo2, hi2, rho) P(lo <= X <= hi) for d=2
TvnCDFRect(lo, hi, R) P(lo <= X <= hi) for d=3
MvnCDFRect(mu, lo, hi, sigma, opts) P(lo <= X <= hi) for general d
MvnCDFRectStd(lo, hi, R, opts) Standardized version

Tail functions

Function What it does
BvnLogCDF(x1, x2, rho) Log-space CDF, avoids underflow at extreme tails
BvnSurvival(x1, x2, rho) Upper tail P(X1 > x1, X2 > x2)

Matrix utilities

Function What it does
NearestPSD(dst, A, minEig) Nearest PSD matrix via eigenvalue clipping
NearestCorr(dst, A, opts) Nearest correlation matrix (Higham 2002)
IsPSD(A, tol) Check positive semi-definiteness

Copulas

Function What it does
GaussianCopulaCDF(u, R, opts) Gaussian copula CDF
GaussianCopulaPDF(u, R) Gaussian copula density

gonum integration

Function / Type What it does
FromDistmvNormal(dist, upper, opts) Compute CDF from a gonum distmv.Normal
Normal Pre-decomposed distribution for repeated CDF calls
NewNormal(mu, sigma) Constructor, validates and decomposes once
(*Normal).CDF(upper, opts) CDF without re-decomposing sigma
(*Normal).CDFRect(lo, hi, opts) Rectangular CDF
(*Normal).LogCDF(upper) Log CDF (d=2 only)

scipy equivalents

scipy mvstat
multivariate_normal.cdf(upper, mean=mu, cov=sigma) MvnCDF(mu, upper, sigma, nil)
multivariate_normal.cdf(upper, mean=zeros, cov=R) MvnCDFStd(upper, R, nil)
No direct equivalent MvnCDFRect(mu, lo, hi, sigma, nil)
norm.cdf(x1) * norm.cdf(x2) (independent only) BvnCDF(x1, x2, 0)
No direct equivalent BvnLogCDF(x1, x2, rho)
gaussian_copula.cdf(u) GaussianCopulaCDF(u, R, nil)
cov_nearest(A) (statsmodels) NearestCorr(nil, A, nil)

One thing to watch: CDF(upper, mu, ...) takes upper first then mu, which is the reverse of MvnCDF(mu, upper, ...).

Ill-conditioned inputs

MvnCDF handles covariance matrices with condition number up to ~1e8 reliably. Above 1e10, consider regularizing with NearestPSD first. The library won't panic on valid input regardless of conditioning.

References

  1. Drezner, Z. and Wesolowsky, G.O. (1990). "On the computation of the bivariate normal integral." J. Statist. Comput. Simul. 35:101-107.
  2. Genz, A. (2004). "Numerical computation of rectangular bivariate and trivariate normal and t probabilities." Statistics and Computing 14.3:251-260.
  3. Genz, A. (1992). "Numerical computation of multivariate normal probabilities." J. Comput. Graph. Statist. 1:141-150.
  4. Genz, A. and Bretz, F. (2002). "Comparison of methods for the computation of multivariate t-probabilities." J. Comput. Graph. Statist. 11:950-971.
  5. Higham, N.J. (2002). "Computing the nearest correlation matrix - a problem from finance." IMA J. Numer. Anal. 22(3):329-343.
  6. Cranley, R. and Patterson, T.N.L. (1976). "Randomization of number theoretic methods for multiple integration." SIAM J. Numer. Anal. 13:904-914.

License

MIT

About

Multivariate normal CDF, nearest correlation matrix, and Gaussian copulas for Go

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages