Cptuto
Cptuto
Margaux Zaffran
Useful resources on Conformal Prediction (non exhaustive)
10 10 10
5 5 5
        0                          0                           0
   Y
                                                          Y
       −5                         −5                          −5
1 1 1
                                                                                   2 / 55
Quantile Regression
Beyond exchangeability
Reminder about quantiles
                                                                                 2
             MAE(Y, f (X)) = |Y − f (X)|
2.0
1.5 0
                                                                             Y
                                1.0
                                                                             −2
0.5
                                                                             −4
                                0.0
                                           −4   −2      0        2       4           0   1   2       3   4   5
                                                     Y − f (X)                                   X
                                                                                                                 4 / 55
Generalization: Quantile regression
⇒ f ? (X ) = QY |X (β)
                                                                                β = 0.05
                                         4                                      β = 0.1
                                                                                β = 0.3
                                         3                                      β = 0.5
                         `β (Y, f (X))
                                                                                β = 0.7
                                                                                β = 0.9
                                         2
                                                                                β = 0.95
                                         0
                                             −4        −2      0        2   4
                                                            Y − f (X)                      5 / 55
Quantile regression: foundations
                                0 = (β − 1)FY (q ? ) + β(1 − FY (q ? ))
                  (1 − β)FY (q ? ) = β(1 − FY (q ? ))
                                β = FY (q ? )
                           ⇔ q ? = FY−1 (β)
                                                                                            6 / 55
Quantile regression: visualisation
                                                                                            β = 0.05
                      4                                     2                               β = 0.1
                                                                                            β = 0.3
                      3                                                                     β = 0.5
      `β (Y, f (X))
0 β = 0.7
                                                        Y
                                                                                            β = 0.9
                      2
                                                                                            β = 0.95
                                                        −2
                      1
                                                        −4
                      0
                          −4   −2      0        2   4           0   1   2       3   4   5
                                    Y − f (X)                               X
                                                                1
    Warning
    No theoretical guarantee with a finite sample!
                            h                          i
                    P Y ∈ Q̂Y |X (β/2); Q̂Y |X (1 − β/2) 6= 1 − β
                                                                                                       7 / 55
Quantile Regression
Beyond exchangeability
Quantifying predictive uncertainty
     • (X , Y ) ∈ Rd × R random variables
     • n training samples (Xi , Yi )ni=1
     • Goal: predict an unseen point Yn+1 at Xn+1 with confidence
     • How? Given a miscoverage level α ∈ [0, 1], build a predictive set Cα such that:
                                P {Yn+1 ∈ Cα (Xn+1 )} ≥ 1 − α,                    (1)
       and Cα should be as small as possible, in order to be informative
       For example: α = 0.1 and obtain a 90% coverage interval
     • Construction of the predictive intervals should be
          ◦ agnostic to the model
          ◦ agnostic to the data distribution
          ◦ valid in finite samples
                                                                                         8 / 55
Quantile Regression
Beyond exchangeability
Split Conformal Prediction (SCP)1,2,3 : toy example
                    0
               Y
−2
                        0             1            2                3          4         5
                                                          X
    1
      Vovk et al. (2005), Algorithmic Learning in a Random 1   World
    2
      Papadopoulos et al. (2002), Inductive Confidence Machines for Regression, ECML
    3
      Lei et al. (2018), Distribution-Free Predictive Inference for Regression, JRSS B
                                                                                             9 / 55
Split Conformal Prediction (SCP)1,2,3 : training step
     0
 Y
−2
         0                  2                   4
                                    X
                                1
     1
       Vovk et al. (2005), Algorithmic Learning in a Random World
     2
       Papadopoulos et al. (2002), Inductive Confidence Machines for Regression, ECML
     3
       Lei et al. (2018), Distribution-Free Predictive Inference for Regression, JRSS B
                                                                                          10 / 55
Split Conformal Prediction (SCP)1,2,3 : calibration step
     2
                                                                I Predict with µ̂
                                                                I Get the |residuals|, a.k.a.
                                                                  conformity scores
     0
 Y
                             1
     1
       Vovk et al. (2005), Algorithmic Learning in a Random World
     2
       Papadopoulos et al. (2002), Inductive Confidence Machines for Regression, ECML
     3
       Lei et al. (2018), Distribution-Free Predictive Inference for Regression, JRSS B
                                                                                                  11 / 55
Split Conformal Prediction (SCP)1,2,3 : prediction step
     0
                                                                I Predict with µ̂
 Y
         0                  2                   4
                                    X
                                1
     1
       Vovk et al. (2005), Algorithmic Learning in a Random World
     2
       Papadopoulos et al. (2002), Inductive Confidence Machines for Regression, ECML
     3
       Lei et al. (2018), Distribution-Free Predictive Inference for Regression, JRSS B
                                                                                                      12 / 55
SCP: implementation details
Calib. Train
    1. Randomly split the training data into a proper training set (size #Tr) and a
       calibration set (size #Cal)
    2. Get µ̂ by training the algorithm A on the proper training set
    3. On the calibration set, get prediction values with µ̂
    4. Obtain a set of #Cal + 1 conformity scores :
                         S = {Si = |µ̂(Xi ) − Yi |, i ∈ Cal} ∪ {+∞}
                                                                (+ worst-case scenario)
                                                                                      13 / 55
SCP: implementation details
Calib. Train
    1. Randomly split the training data into a proper training set (size #Tr) and a
       calibration set (size #Cal)
    2. Get µ̂ by training the algorithm A on the proper training set
    3. On the calibration set, get prediction values with µ̂
    4. Obtain a set of #Cal       conformity scores :
                          S = {Si = |µ̂(Xi ) − Yi |, i ∈ Cal}
                                         
                                 1
    5. Compute the (1 − α)           + 1 quantile of these scores, noted q1−α (S)
                               #Cal
    6. For a new point Xn+1 , return
                 Cbα (Xn+1 ) = [µ̂(Xn+1 ) − q1−α (S); µ̂(Xn+1 ) + q1−α (S)]
                                                                                    13 / 55
SCP: theoretical foundation
    Definition (Exchangeability)
    (Xi , Yi )ni=1 are exchangeable if, for any permutation σ of J1, nK:
                                                                                            
              L ((X1 , Y1 ) , . . . , (Xn , Yn )) = L   Xσ(1) , Yσ(1) , . . . , Xσ(n) , Yσ(n) ,
                                                                     γ2
                            .                           ..                     
                            ..                               .                 
     • The components of N   ..  , 
                                                                                
                                                           γ2
                                                                                   
                                                                   ..
                            .                                         .
                                                                                  
                                                                                   
                             m                                                σ2
                                                                                                  14 / 55
SCP: theoretical guarantees
   SCP enjoys finite sample guarantees proved in Vovk et al. (2005); Lei et al. (2018).
    Theorem
    Suppose (Xi , Yi )n+1                  4                           n
                       i=1 are exchangeable . SCP applied on (Xi , Yi )i=1 outputs
    Cbα (·) such that:           n                o
                              P Yn+1 ∈ Cbα (Xn+1 ) ≥ 1 − α.
    4
        Only the calibration and test data need to be exchangeable.
                                                                                      15 / 55
Proof architecture of SCP guarantees
                                                                                       16 / 55
Proof of the quantile lemma
                                                                                              17 / 55
SCP: theoretical guarantees
   SCP enjoys finite sample guarantees proved in Vovk et al. (2005); Lei et al. (2018).
    Theorem
    Suppose (Xi , Yi )n+1                  4                           n
                       i=1 are exchangeable . SCP applied on (Xi , Yi )i=1 outputs
    Cbα (·) such that:           n                o
                              P Yn+1 ∈ Cbα (Xn+1 ) ≥ 1 − α.
    4
        Only the calibration and test data need to be exchangeable.
                                                                                      18 / 55
Conditional coverage implies adaptiveness
                              n               o
     • Marginal coverage: P Yn+1 ∈ Cbα (Xn+1 ) the errors may differ across regions
       of the input space (i.e. non-adaptive)
                                n                      o
     • Conditional coverage: P Yn+1 ∈ Cbα (Xn+1 ) |Xn+1 errors are evenly distributed
             (i.e. fully adaptive)
     • Conditional coverage is stronger than marginal coverage
         0                                0                              0
     Y
                                                                     Y
      −2                              −2                             −2
−4 −4 −4
             0        2           4           0      2           4           0       2           4
                              X                              X                               X
1 1 1
                                                                                                     19 / 55
Standard mean-regression SCP is not adaptive
     0
                                        I Predict with µ̂
 Y
         0        2            4
                          X
                                                                              20 / 55
Informative conditional coverage as such is impossible
     • Impossibility results
        ,→ Vovk (2012); Lei and Wasserman (2014); Barber et al. (2021a)
Beyond exchangeability
Conformalized Quantile Regression (CQR)5
                      0
                 Y
−2
−4
                          0            1             2                 3    4     5
                                                             X
                                                         1
    5
        Romano et al. (2019), Conformalized Quantile Regression, NeurIPS
                                                                                      22 / 55
Conformalized Quantile Regression (CQR)5 : training step
     0
 Y
−4
           0                  2                  4
                                      X
     5
         Romano et al. (2019), Conformalized Quantile Regression, NeurIPS
                                                                                               23 / 55
Conformalized Quantile Regression (CQR)5 : calibration step
                                        +       +
                                                            +
                           + +                                  +
                      ++                                            I Predict with QR
                                                                                   c lower and
                                                                      QR
                                                                      c upper
                                                                    I Get the scores
                              -                           -
                                                                      S = {Si }Cal ∪ {+∞}
                                                        - +
                                    -               -
                                            -                       I Compute the (1 − α) empirical
                                                        +
                                                                      quantile of S, noted q1−α (S)
                                   n                                        o
                                     c lower (Xi ) − Yi , Yi − QR
                       ,→ Si := max QR                         c upper (Xi )
    5
        Romano et al. (2019), Conformalized Quantile Regression, NeurIPS
                                                                                                      24 / 55
Conformalized Quantile Regression (CQR)5 : prediction step
     0
 Y
                                                               I Predict with QR
                                                                              c lower and
   −2                                                            QR
                                                                 c upper
−4
           0                  2                  4
                                      X
         I Build                  1
                                     c lower (x) − q1−α (S); QR
                          Cbα (x) = [QR                      c upper (x) + q1−α (S)]
     5
         Romano et al. (2019), Conformalized Quantile Regression, NeurIPS
                                                                                            25 / 55
CQR: implementation details
Calib. Train
    1. Randomly split the training data into a proper training set (size #Tr) and a
       calibration set (size #Cal)
    2. Get QR
            c lower and QR c upper by training the algorithm A on the proper training
       set
    3. Obtain a set of #Cal + 1 conformity scores S:
                                                            
       S = {Si = max QR
                      c lower (Xi ) − Yi , Yi − QR
                                                c upper (Xi ) , i ∈ Cal} ∪ {+∞}
    4. Compute the 1 − α quantile of these scores, noted q1−α (S)
    5. For a new point Xn+1 , return
                           c lower (Xn+1 ) − q1−α (S); QR
            Cbα (Xn+1 ) = [QR                          c upper (Xn+1 ) + q1−α (S)]
                                                                                     26 / 55
CQR: implementation details
    1. Randomly split the training data into a proper training set (size #Tr) and a
       calibration set (size #Cal)
    2. Get QR
            c lower and QR c upper by training the algorithm A on the proper training
       set
    3. Obtain a set of #Cal       conformity scores S:
                                                           
       S = {Si = max QR
                     c lower (Xi ) − Yi , Yi − QR
                                               c upper (Xi ) , i ∈ Cal}
                                          
                                  1
    4. Compute the (1 − α)            + 1 quantile of these scores, noted q1−α (S)
                               #Cal
    5. For a new point Xn+1 , return
                           c lower (Xn+1 ) − q1−α (S); QR
            Cbα (Xn+1 ) = [QR                          c upper (Xn+1 ) + q1−α (S)]
                                                                                     26 / 55
CQR: theoretical guarantees
   This procedure enjoys the finite sample guarantee proposed and proved in Romano
   et al. (2019).
    Theorem
   Suppose (Xi , Yi )n+1                 6                   n
                     i=1 are exchangeable . CQR on (Xi , Yi )i=1 outputs Cα (·) such
                                                                         b
   that:                       n                 o
                             P Yn+1 ∈ Cbα (Xn+1 ) ≥ 1 − α.
   If, in addition, the scores {Si }i∈Cal ∪ {Sn+1 } are almost surely distinct, then
                          n                   o                  1
                       P Yn+1 ∈ Cbα (Xn+1 ) ≤ 1 − α +                  .
                                                            #Cal + 1
    6
        Only the calibration and test data need to be exchangeable.
                                                                                       27 / 55
Quantile Regression
Beyond exchangeability
SCP is defined by the conformity score function
Calib. Train
     1. Randomly split the training data into a proper training set (size #Tr) and a
        calibration set (size #Cal)
     2. Get  by training the algorithm A on the proper training set
     3. On the calibration set, obtain #Cal + 1 conformity scores
                            S = {Si = s (Â(Xi ), Yi ), i ∈ Cal} ∪ {+∞}
        Ex 1: s (Â(Xi ), Yi ) := |µ̂(Xi 
                                         ) − Yi | in regression with standard scores
                                      c lower (Xi ) − Yi , Yi − QR
        Ex 2: s (Â(Xi ), Yi ) := max QR                        c upper (Xi )           in CQR
     4. Compute the 1 − α quantile of these scores, noted q1−α (S)
     5. For a new point Xn+1 , return
                   Cbα (Xn+1 ) = {y such that s (Â(Xn+1 ), y ) ≤ q1−α (S)}
   ,→ The definition of the conformity scores is crucial, as they incorporate almost all
   the information: data + underlying model
                                                                                                 28 / 55
SCP: theoretical guarantees
   This procedure enjoys the finite sample guarantee proposed and proved in Vovk
   et al. (2005).
    Theorem
   Suppose (Xi , Yi )n+1                 7                   n
                     i=1 are exchangeable . SCP on (Xi , Yi )i=1 outputs Cα (·) such
                                                                         b
   that:                       n                 o
                             P Yn+1 ∈ Cbα (Xn+1 ) ≥ 1 − α.
   If, in addition, the scores {Si }i∈Cal ∪ {Sn+1 } are almost surely distinct, then
                          n                   o                  1
                       P Yn+1 ∈ Cbα (Xn+1 ) ≤ 1 − α +                  .
                                                            #Cal + 1
    7
        Only the calibration and test data need to be exchangeable.
                                                                                       29 / 55
SCP: what choices for the regression scores?
                              Cbα (Xn+1 ) = {y such that s (Â(Xn+1 ), y ) ≤ q1−α (S)}
                  Standard SCP                    Locally weighted SCP          CQR
                  Vovk et al. (2005)              Lei et al. (2018)             Romano et al. (2019)
                                                                                    c lower (X ) − Y ,
                                                                                max(QR
                                                  |µ̂(X ) − Y |
 s (Â(X ), Y )   |µ̂(X ) − Y |
                                                      ρ̂(X )                         Y − QR c upper (X ))
                                                                                 c lower (x) − q1−α (S);
                                                                                [QR
   Cbα (x)        [µ̂(x) ± q1−α (S)]              [µ̂(x) ± q1−α (S)ρ̂(x)]
                                                                                QR
                                                                                c upper (x) + q1−α (S)]
                      2                               2                             2
                      0                               0                             0
                  Y
                                                                                Y
                   −2                              −2                            −2
−4 −4 −4
                          0       2           4           0   2           4             0   2           4
    Visu.                                 X                           X                             X
1 1 1
     • Y ∈ {1, . . . , C }                                                   (C classes)
     • Â(X ) = (p̂1 (X ), . . . , p̂C (X ))                 (estimated probabilities)
     • s (Â(X ), Y ) := 1 − (Â(X ))Y
     • For a new point Xn+1 , return
                  Cbα (Xn+1 ) = {y such that s (Â(Xn+1 ), y ) ≤ q1−α (S)}
                                                                                       31 / 55
SCP: standard classification in practice
                                                                                                                32 / 55
SCP: standard classification in practice, cont’d
                                                                                                             33 / 55
SCP: limits of the standard classification case
                                                                                               34 / 55
SCP: classification with Adaptive Prediction Sets8
                                                                  probabilities
                                                                   Cumulative
                                Estimated
                                                                   estimated
                                                  er
                                                                                      er
                                                       g
                                                                                           g
                                                   t
                                                                                  t
                                                Ca
                                                                              Ca
                                                       Do
                                                                                           Do
                                               Tig
                                                                                  Tig
    8
        Romano et al. (2020b), Classification with Valid and Adaptive Coverage, NeurIPS
        Figure highly inspired by Angelopoulos and Bates (2023).                                                                 35 / 55
SCP: classification with Adaptive Prediction Sets in practice
                                                                                                             36 / 55
Split Conformal Prediction: summary
                                                                                       37 / 55
Challenges: open questions (non exhaustive!)
                                                                    38 / 55
Quantile Regression
Beyond exchangeability
Quantile Regression
Beyond exchangeability
Splitting the data might not be desired
     • lower statistical efficiency (lower model accuracy and higher predictive set size)
     • higher statistical variability
                                                                                        39 / 55
The naive idea does not enjoy valid coverage (even empirically)
     • A naive idea:
          ◦ Get  by training the algorithm A on {(X1 , Y1 ), . . . , (Xn , Yn )}.
          ◦ compute the empirical quantile q1−α (S) of the set of scores
                                         n             on
                                    S = s  (Xi ) , Yi          ∪ {∞}.
                                                                i=1
                          n                                     o
          ◦ output the set y such that s  (Xn+1 ) , y ≤ q1−α (S) .
     7 Â has been obtained using the training set {(X1 , Y1 ), . . . , (Xn , Yn )} but did
       not use
              Xn+1 .       
       ⇒ s  (Xn+1 ) , y stochastically dominates any element of
       n                 on
          s  (Xi ) , Yi     .
                             i=1
                                                                                              40 / 55
Full Conformal Prediction9 does not discard training points!
    9
        Vovk et al. (2005), Algorithmic Learning in a Random World
                                                                                        41 / 55
Full Conformal Prediction (CP): recovering exchangeability
                                                                                    42 / 55
Full CP: theoretical foundation
                                                                                     43 / 55
Full CP: theoretical guarantees
    Full CP applied on (Xi , Yi )ni=1 ∪ {Xn+1 } outputs Cbα (·) such that:
                                n                   o
                             P Yn+1 ∈ Cbα (Xn+1 ) ≥ 1 − α.
    Additionally, if the scores are a.s. distinct:
                            n                    o             1
                          P Yn+1 ∈ Cbα (Xn+1 ) ≤ 1 − α +          .
                                                              n+1
                         n                           o
   7 Marginal coverage: P Yn+1 ∈ Cbα (Xn+1 ) 
                                             |X   =x ≥1−α
                                                  
                                               n+1
                                                                             44 / 55
Interpolation regime
                                                                                 45 / 55
Quantile Regression
Beyond exchangeability
Jackknife: the naive idea does not enjoy valid coverage
    Warning
    No guarantee on the prediction of  with scores based on (Â−i )i , without
    assuming a form of stability on A.
                                                                                         46 / 55
Jackknife+10
Statistical efficiency
Computational efficiency
Beyond exchangeability
Exchangeability does not hold in many practical applications
                                                                   50 / 55
Covariate shift (Tibshirani et al., 2019)12
         • Setting:
                                            i.i.d.
                ◦ (X1 , Y1 ), . . . , (Xn , Yn ) ∼ PX × PY |X
                ◦ (Xn+1 , Yn+1 ) ∼ P̃X × PY |X
         • Idea: give more importance to calibration points that are closer in distribution
           to the test point
         • In practice:
                                                          dP̃X (Xi )
               1. estimate the likelihood ratio w (Xi ) =
                                                          dPX (Xi )
                                                                   w (Xi )
               2. normalize the weights, i.e. ωi = ω(Xi ) = Pn+1
                                                                  j=1 w (Xj )
               3. outputs Cbα (Xn+1 ) =
                           n                                                   o
                            y : s (Â(Xn+1 ), y ) ≤ q1−α ({ωi Si }i∈Cal ∪ {+∞})
    12
         Tibshirani et al. (2019), Conformal Prediction Under Covariate Shift, NeurIPS
                                                                                          51 / 55
Label shift (Podkopaev and Ramdas, 2021)13
        • Setting:
                                         i.i.d.
            ◦ (X1 , Y1 ), . . . , (Xn , Yn ) ∼ PX |Y × PY
            ◦ (Xn+1 , Yn+1 ) ∼ PX |Y × P̃Y
            ◦ Classification
        • Idea: give more importance to calibration points that are closer in distribution
          to the test point
        • Trouble: the actual test labels are unknown
        • In practice:
                                                           dP̃Y (Yi )
            1. estimate the likelihood ratio w (Yi ) =                using algorithms from the existing
                                                           dPY (Yi )
               label shift literature
                                                                        w (Yi )
            2. normalize the weights, i.e. ω yi = ω y (Xi ) = Pn
                                                                    j=1 (Yj ) + w (y )
                                                                       w
            3. outputs Cbα (Xn+1 ) =
                        n                                                     o
                         y : s (Â(Xn+1 ), y ) ≤ q1−α ({ω yi Si }i∈Cal ∪ {+∞})
   13
     Podkopaev and Ramdas (2021), Distribution-free uncertainty quantification for classification under label
   shift, UAI                                                                                               52 / 55
Generalizations
     • Arbitrary distribution shift: Cauchois et al. (2020) leverages ideas from the
       distributionally robust optimization literature
     • Two major general theoretical results beyond exchangeability:
         ◦ Chernozhukov et al. (2018)
           ,→ If the learnt model is accurate and the data noise is strongly mixing, then CP
           is valid asymptotically 3
         ◦ Barber et al. (2022)
           ,→ Quantifies the coverage loss depending on the strength of exchangeability
           violation
              P(Yn+1 ∈ Cbα (Xn+1 )) ≥ 1 − α − average  violation of exchangeability
                                                    by each calibration point
           ,→ proposed algorithm: reweighting again!
           e.g., in a temporal setting, give higher weights to more recent points.
                                                                                           53 / 55
Online setting
                                                                                          54 / 55
Recent developments
   Bastani, O., Gupta, V., Jung, C., Noarov, G., Ramalingam, R., and Roth, A.
     (2022). Practical adversarial multivalid conformal prediction. In Advances in
     Neural Information Processing Systems. Curran Associates, Inc.
   Bhatnagar, A., Wang, H., Xiong, C., and Bai, Y. (2023). Improved online
     conformal prediction via strongly adaptive online learning. In Proceedings of the
     40th International Conference on Machine Learning. PMLR.
   Cauchois, M., Gupta, S., Ali, A., and Duchi, J. C. (2020). Robust Validation:
     Confident Predictions Even When Distributions Shift. arXiv: 2008.04267.
   Chernozhukov, V., Wüthrich, K., and Yinchu, Z. (2018). Exact and Robust
     Conformal Inference Methods for Predictive Machine Learning with Dependent
     Data. In Conference On Learning Theory. PMLR.
   Chernozhukov, V., Wüthrich, K., and Zhu, Y. (2021). Distributional conformal
     prediction. Proceedings of the National Academy of Sciences, 118(48).
References iii
   Lei, J. (2019). Fast exact conformalization of the lasso using piecewise linear
     homotopy. Biometrika, 106(4).
   Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., and Wasserman, L. (2018).
     Distribution-Free Predictive Inference for Regression. Journal of the American
     Statistical Association.
   Lei, J. and Wasserman, L. (2014). Distribution-free prediction bands for
     non-parametric regression. Journal of the Royal Statistical Society: Series B
     (Statistical Methodology), 76(1).
   Manokhin, V. (2022). Awesome conformal prediction.
    https://github.com/valeman/awesome-conformal-prediction.
   Ndiaye, E. (2022). Stable conformal prediction sets. In Proceedings of the 39th
     International Conference on Machine Learning. PMLR.
References vi
   Ndiaye, E. and Takeuchi, I. (2019). Computing full conformal prediction set with
     approximate homotopy. In Advances in Neural Information Processing Systems.
     Curran Associates, Inc.
   Ndiaye, E. and Takeuchi, I. (2022). Root-finding approaches for computing
     conformal prediction set. Machine Learning, 112(1).
   Nouretdinov, I., Melluish, T., and Vovk, V. (2001). Ridge regression confidence
     machine. In Proceedings of the 18th International Conference on Machine
     Learning.
   Papadopoulos, H., Proedrou, K., Vovk, V., and Gammerman, A. (2002). Inductive
     Confidence Machines for Regression. In Machine Learning: ECML. Springer.
   Podkopaev, A. and Ramdas, A. (2021). Distribution-free uncertainty quantification
     for classification under label shift. In Proceedings of the Thirty-Seventh
     Conference on Uncertainty in Artificial Intelligence. PMLR.
References vii
   Romano, Y., Barber, R. F., Sabatti, C., and Candès, E. (2020a). With Malice
     Toward None: Assessing Uncertainty via Equalized Coverage. Harvard Data
     Science Review, 2(2).
   Romano, Y., Patterson, E., and Candès, E. (2019). Conformalized Quantile
     Regression. In Advances in Neural Information Processing Systems. Curran
     Associates, Inc.
   Romano, Y., Sesia, M., and Candes, E. (2020b). Classification with valid and
     adaptive coverage. In Advances in Neural Information Processing Systems.
     Curran Associates, Inc.
   Sesia, M. and Romano, Y. (2021). Conformal prediction using conditional
     histograms. In Advances in Neural Information Processing Systems. Curran
     Associates, Inc.
References viii
   Tibshirani, R. J., Barber, R. F., Candes, E., and Ramdas, A. (2019). Conformal
     Prediction Under Covariate Shift. In Advances in Neural Information Processing
     Systems. Curran Associates, Inc.
   Vovk, V. (2012). Conditional Validity of Inductive Conformal Predictors. In Asian
     Conference on Machine Learning. PMLR.
   Vovk, V. (2015). Cross-conformal predictors. Annals of Mathematics and Artificial
     Intelligence, 74(1-2).
   Vovk, V., Gammerman, A., and Shafer, G. (2005). Algorithmic Learning in a
     Random World. Springer US.
   Zaffran, M., Féron, O., Goude, Y., Josse, J., and Dieuleveut, A. (2022). Adaptive
     conformal predictions for time series. In Proceedings of the 39th International
     Conference on Machine Learning. PMLR.
References ix