FIM 548, Lecture #12
Quasi Monte Carlo
Chapter 5 in Glasserman and Chapter 15 of Owen
        Lecture and Codes by Andrew
      Papanicolaou Slides originally of Y.
                 Kitapbayev
                 Feb 26, 2023
                                                 1 / 20
Integration using
    ►
        So far we have focused on computing
                                  ∫1
                                              0
                               θ       ψ(x )d
    ►
        Monte Carlo integration can be especially useful
        for estimating high-dimensional integrals.
    ►
        Suppose we want to estimate
                               ∫1∫1
                                  0           0
                            θ         ψ(x, y )dxdy
    ►
        Monte Carlo is easy because we just need to sample
        uniformly on [0, 1) × [0, 1),
                                      1 Σ
                                        N
                           θˆ =                   ψ(X i, Y i ) .
                                      N
                                          i
Quasi Monte Carlo
   ►
       We already saw that standard MC methods have a
                             √
       convergence of O(1/ N).
   ►
       To accelerate it, one can apply so-called quasi Monte
       Carlo method, or low-discrepancy method.
   ►
       These methods seek to increase accuracy by generating points
       that are too evenly distributed to be random.
   ►
       Due to their deterministic nature, statistical methods do not
       apply to QMC methods.
   ►
       In this lecture we focus on d -dimensional integrals,
                                θ = Eψ(U) ,
       where U ∼ uniform[0, 1)d . Using inverse CDFs we can apply
       to a great many other d -dimensional distributions.
Integration
    ►
        We are already familiar with basic QMC.
    ►
        The congruential method for pseudo random numbers.
                                  Xi+1 = mod(aXi , m)
                                  Ui +1 = Xi +1/m
        with m = 231 and a = 16807. Generates a scalar sequence of
        pseudo-randoms on (0, 1).
    ►
        A similar idea, but perhaps more uniformly separated
        are rank-1 lattices,
                              Xi = mod(i V1, 1) ,
        where V1 ∈ Rd . For
                         √
                            example, the Fibonacci lattice in d = 2
                         1+   5
        takes V1 = (1,   2
Lattic
                   congruential
          1                            1             grid                              rank 1
                                                                       1
         0.8                          0.8                             0.8
         0.6                          0.6                             0.6
         0.4                          0.4                             0.4
         0.2                          0.2                             0.2
          0                            0                                0
               0       0.5        1         0         0.5         1         0           0.5           1
                   Hammersley
          1                                     Latin Hypercube                 Stratified Random
                                       1                               1
         0.8                          0.8                             0.8
         0.6                          0.6                             0.6
         0.4                          0.4                             0.4
         0.2                          0.2                             0.2
                                                        0                          0            0.5
                                                        0                          0
Lattic
         0.5 0
                 1     5 / 20
                 0.5
A Simple QMC
   ►
       Once we have our QMC points in [0, 1)d , we only need to
       plug them into our random number generator in place of
       our pseudo-random uniforms.
   ►
       For example, we can use an integration lattice in d = 2 to
       estimate a function of a jointly-normal pair.
   ►
       For example,
                                           q
                E cos(ǁX ǁ) ≈         cos   X2+Y2 ,
                               1 Σ  N
                                               i     i
                               N
                                   i =1
       where (Xi , Yi ) are given by Box-Muller with points from
       an integration lattice.
                         ∫                      1   ǁx ǁ2
Example: θ =         1        cos(ǁx ǁ)e−           2       dx
                     2 R     2
  %% Simple QMC Example
  n = 2 ˆ8 ;
  p h i = (1+ s q r t ( 5 ) ) / 2 ; % F i b o n a c c i l a t t i c e
  v1 = [ 1 /( n+1) ; p h i ] ;
  U qmc = mod( v1 ∗ [ 1 : n ] , 1 ) ;
    X = s q r t (−2∗ l o g ( U qmc ( 1 , : ) ) ) . ∗ cos ( 2 ∗ p i ∗U qmc ( 2 , : ) ) ;
    Y = s q r t (−2∗ l o g ( U qmc ( 1 , : ) ) ) . ∗ s i n ( 2 . ∗ p i ∗U qmc ( 2 , : ) )
  theta qmc = mean ( cos ( s q r t (X.ˆ2+Y. ˆ 2 ) ) ) ;
  theta mc = mean ( cos ( s q r t ( sum ( randn ( 2 , n ) . ˆ 2 , 1 ) ) ) ) ;
Discrepan
    ►
        Let us assume that d is the dimension of a problem, and
        we want to estimate the integral of ψ over [0, 1)d .
    ►
        Now we aim to fill the cube uniformly using
        some deterministic rule.
    ►
        There are a couple discrepancy definitions.
    ►
        We define discrepancy for point set {x1, . . . , xn} as
                                         #{xi ∈ [0, a)}
             D(x , . . . , x ) = sup .                            .
                                                        − .[0, a). ,
                  1         2
                                 a∈[0,1)        n                   .
                                 d
                                   .
        where [0, a) = [0, a1) × · · · × [0, an), where #{xi ∈ [0, a)}
                                                               .
        denotes the number of xi contained in [0, a), and [0,  . a)
        denotes the measure or volume of the interval.
Low Discrepancy
   ►
       For Ui ∼ uniform[0, 1)
                            √ it is known that
                              d
                               2nD(U1, U2, . . . , Un)
                                   √                   =1.
                  lim sup
                      n→∞            log log n
   ►
       It is known that deterministic low-discrepancy sets can have
            1   2        n                   nd
                                         log(n)
       D(u , u , . . . , u ) that is O            .
   ►
       Integration error when using a low-discrepancy set is
       quantified with the Koksma-Hlawka inequality,
                     |θˆ − θ| ≤ Vhk (ψ)D(u1, u2, . . . , un) ,
       where Vhk (ψ) is the Hardy-Krause total variation.
   ►
       Examples of sequences with low discrepancy include
       Halton sequence, Sobol sequence, Faure sequence, and
Low Discrepancy
     Niederreiter sequence.
Digital Nets With Low-
                                Sobol                               Sobol Scramble
               1                                     1
              0.8                                   0.8
              0.6                                   0.6
              0.4                                   0.4
              0.2                                   0.2
               0                                     0
                    0   0.2   0.4   0.6   0.8   1         0   0.2      0.4   0.6      0.8   1
                               Halton                               Halton Scramble
               1                                     1
              0.8                                   0.8
              0.6                                   0.6
              0.4                                   0.4
              0.2                                   0.2
               0                                     0
                    0   0.2   0.4   0.6   0.8   1         0   0.2      0.4   0.6      0.8   1
  Figure: Digital nets don’t have the periodicity of lattices. Integration
  lattices are good for integrating smooth functions with smooth periodic
  features.
Randomized
   ►
       Randomized digital nets offer a way to take advantage of
       low-discrepancy sets and avoid some of the pitfalls in
       basic QMC.
   ►
       The Matousek-Owen scramble is commonly used with
       the Sobol set.
   ►
       Reverse-Radix scrambling is used for the Halton set.
 Example: Scrambled Sobol Set to Generate
Brownian Motion
  %% QMC Sim Brownian Motion
  T = 3 / 12 ;
  dt = 1 / 365 ;
  N = round (T/ dt ) ;
  Rho = −. 7 ;
  dW = z e r o s (N, 2 ) ;
   Sb l = s o b o l s e t ( 2 , ’ Skip ’ , 1 e3 , ’ Leap ’ , 1
   e2 ) ; p = s c r a m b l e ( Sbl , ’ Matousek Affine
   Owen ’ ) ; p = net ( p , N) ’ ;
   U1 = p ( 1 : N) ;
   U2 = p ( ( N+1) : end ) ;
   dW( : , 1 ) = s q r t (−2∗ l o g ( U1 ) ) . ∗ cos ( 2 ∗ p i ∗U2 ) ∗ s q r t (
   dt ) ; dW( : , 2 ) = s q r t (−2∗ l o g ( U1 ) ) . ∗ s i n ( 2 ∗ p i ∗U2 ) ∗ s
   q r t ( dt ) ;
   dW( : , 2 ) = Rho∗dW( : , 1 )+s q r t (1−Rho ˆ2 ) ∗dW( : , 2 ) ;
Example: Heston Model Call
                                Heston Call Implied Volatility (T=3 months)
             0.23
                                                                              MC
                                                                              QMC
                                                                              RQMC
            0.22                                                              Quadrature
            0.21
             0.2
            0.19
            0.18
            0.17
            0.16
               -0.15   -0.1   -0.05       0        0.05       0.1      0.15         0.2    0.25
  Figure: Using QMC and RQMC to estimate Heston call price, and then
  comparing implied vols.
Copul
   ►
        Copulas are the CDFs of marginally uniform[0, 1) random
        variables.
   ►
        For X ∈ Rd , with X l denoting lth element.
   ►
        Let Fl be the (marginal) CDF of X l .
   ►
        We have X l = F −1 (Ul ) where Ul is a marginally uniform[0, 1)
                             l
        random variable, and is the lth element of U ∈ [0, 1)d .
   ►
        The CDF of U is copula C .
   ►
        Gaussian copula:
                            s                      ∫ Φ−1(ud )
                                      ∫ −1
        C (u1, . . . , ud )    (2π)−d Φ (u1) · · ·            e
                                                                1 ∗ −
                                                               − x ρ 1x
                                                                        dx ,
        =                                                        2
                                  |ρ|  −∞            −∞
        where ρ ∈ Rd ×d is a correlation matrix and |ρ| = det(ρ).
   ►
        Archimedean copula:
                C (u1, . . . , ud ) = ϕ−1(ϕ(u1) + · · · + ϕ(ud )) ,
Copul
        where ϕ is the copula the generator.
 Example: Large Portfolio of Co-Dependent
Claims
    ►
        Let us consider an insurance example.
    ►
        Individual loss distributions have standard normal
        distribution (i.e., marginally standardnormal).
    ►
        U l = Φ(X l ) are jointly dependent through a Clayton copula,
        with                           1
                               ϕ(t) = (t−ν 1) ,
                                       ν      −
        where ν > 0 is the parameter.
    ►
        A claim is made if X l ≥ 3.
    ►
        We let the portfolio have l = 1, 2, . . . , 100. (i.e., 100 policies).
    ►
        We want to gain an understanding of the magnitude of
        the total claims given that at least one claim is made.
    ►
        For example, think about fire insurance for homeowners in
        California.
Marshall-Olkin
    ►
        We can sample from the copula using the Marshall-
        Olkin Algorithm
                     −   1
         1. V ∼ LS ϕ
         2. U˜ ∼ uniform[0, 1)d
         3. Return U where U = ϕ(− log(U˜)/V ).
        LS ϕ (v ) = ∫0 ϕ(t)e −vt dt is the Laplace-Stieltjes transform of
    ►
                      ∞
        ϕ, which for Clayton copula has known inverse,
                                  − 1
                             LS    ϕ (v   ) ∝ v 1/ν−1 e −v ,
        i.e., V is gamma distributed with shape 1/ν and scale 1.
Example: Large Portfolio of Co-Dependent
Insurance
    ►
        We can run this 100-dimensional example with Monte Carlo,
        QMC and RQMC, and we’ll see generally comparable results.
    ►
        We can also run the example with a normal approximation
        to the data, which requires some shrinkage of the
        covariance matrice’s eigenvalues.
    ►
        The reduced error from QMC/RQMC does not pop out at you
        in this example, or in most other examples.
    ►
        To see error reduction we will need to run many trials, save
        the MC/QMC/RQMC estimators, and then compare
        statistics of the aggregate distributions to see which method
        has the least variation.
    ►
        We estimate θ = probability of at least 1 claim, M =
        the expectation number of claims given at least 1, and
        the expectation of the peaks-over-threshold distribution
        (POT) given at least 1 claim.
Example: Large Portfolio of Co-Dependent
Insurance
                              theta mc
            40                                                                    theta qmc
                                                                40
            20                                                  20
             0.03
               0      0.035       0.04         0.045     0.05      0
                                                                0.035               0.04               0.045
                               M mc                                                M qmc
            40
                                                                40
            20                                                  20
                 20   2.5     3          3.5         4   4.5       0
                                                                 2.8     3        3.2      3.4   3.6         3.8
                              pot mc                                              pot qmc
            40                                                  40
            20                                                  20
             0                                                   0
                 8      10        12            14       16          9       10             11          12
  Figure: 500 trials of MC and RQMC estimations for the insurance
Example: Large Portfolio of Co-Dependent
Insurance
  example using Clayton copula.
Example: Large Portfolio of Co-Dependent
Insurance
  Based on the 500 trials from on the previous slide, we have
  the following statistics:
            E [θmc ]    = 0.0409,      sd(θmc )    =   0.0032
           E [θqmc ]    = 0.0409,      sd(θqmc )   =   0.0017
           E [Mmc ]     = 3.3259,     sd(Mmc )     =   0.2896
           E [Mqmc ]    = 3.3202,     sd(Mqmc )    =   0.1458
          E [potmc ]    = 10.9189,    sd(potmc )   =   0.9520
          E [potqmc ]   = 10.9020,   sd(potqmc )   =   0.4778
  As we can see from this table, error in RQMC estimation is about
  half of the standard deviation in MC estimation.
Summa
  ►
      QMC is good for some problems in multiple dimensions.
  ►
      Implementation is more complicated than standard MC.
  ►
      Issues such as selection of a lattice/net, parameters for it,
      and the absence of statistical analysis of error (i.e., no CLT)
      make it less straightforward.
  ►
      In 1 and 2 dimension there are quadrature rules that
      are better (e.g., Gauss-Lobatto quadrature).
  ►
      The Koksma-Hlawka bound and possible O(log(n)d /n)
      error is appealing, but is only theoretical and in reality it
      requires some now-how to see reduced error.