0% found this document useful (0 votes)

17 views76 pages

Cptuto

Uploaded by

Gaurav Dhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views76 pages

Cptuto

Uploaded by

Gaurav Dhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

Distribution-Free Uncertainty Quantification

A short introduction to Conformal Prediction

Margaux Zaffran
Useful resources on Conformal Prediction (non exhaustive)

• Book reference: Vovk et al. (2005) (new edition in 2022)

• A gentle tutorial:
◦ Angelopoulos and Bates (2023)
◦ Videos playlist
• Another tutorial: Fontana et al. (2023)
• Ryan Tibshirani introductive lecture’s notes
• GitHub repository with plenty of links: Manokhin (2022)

All material is freely accessible on this webpage, including sources.

Feel free to reuse it for presentations or teaching! If you do, please credit this
version, and include a link to this webpage.

This tutorial by Margaux Zaffran (2023) is licensed under a Creative Commons

Attribution-NonCommercial-ShareAlike 4.0 International License.
https://conformalpredictionintro.github.io
1 / 55
On the importance of quantifying uncertainty

10 10 10

5 5 5

0 0 0
Y

Y
−5 −5 −5

−10 −10 −10

0 2 4 0 2 4 0 2 4
X X X

1 1 1

,→ Same predictions, yet 3 distinct underlying phenomena!

=⇒ Quantifying uncertainty conveys this information.

2 / 55
Quantile Regression

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

Beyond exchangeability
Reminder about quantiles

• Quantile level β ∈ [0, 1]

• QX (β) := inf{x ∈ R, P(X ≤ x) ≥ β}
:= inf{x ∈ R, FX (x) ≥ β}
• Empirical quantile qβ (X1 , . . . , Xn )
:= dβ × ne smallest value of (X1 , . . . , Xn )

Example of quantile: the median

β = 0.5
,→ q0.5 (X1 , . . . , Xn ) is the empirical median of (X1 , . . . , Xn );
,→ QX (0.5) represents the median of the distribution of X .

Similarly, let qβ,inf (X1 , . . . , Xn ) := bβ × nc smallest value of (X1 , . . . , Xn )

3 / 55
Median regression

• The Bayes predictor depends on the chosen loss function.

,→ Bayes predictor f ? ∈ argmin Risk` (f )
f

:= argmin E [`(Y , f (X ))]

f
• Mean Absolute Error (MAE): `(Y , Y 0 ) = |Y − Y 0 | Associated risk:
Risk` (f ) = E [|Y − f (X )|]
⇒ f ? (X ) = median [Y |X ] = QY |X (0.5)
2.5

2
MAE(Y, f (X)) = |Y − f (X)|

2.0

1.5 0

Y
1.0
−2

0.5

−4
0.0
−4 −2 0 2 4 0 1 2 3 4 5
Y − f (X) X
4 / 55
Generalization: Quantile regression

• Quantile level β ∈ [0, 1]

• Pinball loss
`β (Y , Y 0 ) = β|Y − Y 0 |1{|Y −Y 0 |≥0} + (1 − β)|Y − Y 0 |1{|Y −Y 0 |≤0}
Associated risk: Risk`β (f ) = E [`β (Y , f (X ))]
Bayes predictor: f ? ∈ argmin Risk`β (f )
f

⇒ f ? (X ) = QY |X (β)

β = 0.05
4 β = 0.1
β = 0.3
3 β = 0.5
`β (Y, f (X))

β = 0.7
β = 0.9
2
β = 0.95

0
−4 −2 0 2 4
Y − f (X) 5 / 55
Quantile regression: foundations

0 = (β − 1)FY (q ? ) + β(1 − FY (q ? ))
(1 − β)FY (q ? ) = β(1 − FY (q ? ))
β = FY (q ? )
⇔ q ? = FY−1 (β)

6 / 55
Quantile regression: visualisation

β = 0.05
4 2 β = 0.1
β = 0.3
3 β = 0.5
`β (Y, f (X))

0 β = 0.7

Y
β = 0.9
2
β = 0.95
−2
1

−4
0
−4 −2 0 2 4 0 1 2 3 4 5
Y − f (X) X

1
Warning
No theoretical guarantee with a finite sample!
h i
P Y ∈ Q̂Y |X (β/2); Q̂Y |X (1 − β/2) 6= 1 − β

7 / 55
Quantile Regression

Split Conformal Prediction (SCP)

Standard regression case
Conformalized Quantile Regression (CQR)
Generalization of SCP: going beyond regression

Avoiding data splitting: full conformal and out-of-bags approaches

Beyond exchangeability
Quantifying predictive uncertainty

• (X , Y ) ∈ Rd × R random variables
• n training samples (Xi , Yi )ni=1
• Goal: predict an unseen point Yn+1 at Xn+1 with confidence
• How? Given a miscoverage level α ∈ [0, 1], build a predictive set Cα such that:
P {Yn+1 ∈ Cα (Xn+1 )} ≥ 1 − α, (1)
and Cα should be as small as possible, in order to be informative
For example: α = 0.1 and obtain a 90% coverage interval
• Construction of the predictive intervals should be
◦ agnostic to the model
◦ agnostic to the data distribution
◦ valid in finite samples

8 / 55
Quantile Regression

Split Conformal Prediction (SCP)

Standard regression case
Conformalized Quantile Regression (CQR)
Generalization of SCP: going beyond regression

Avoiding data splitting: full conformal and out-of-bags approaches

Beyond exchangeability
Split Conformal Prediction (SCP)1,2,3 : toy example

Train Cal Test

0
Y

−2

0 1 2 3 4 5
X
1
Vovk et al. (2005), Algorithmic Learning in a Random 1 World
2
Papadopoulos et al. (2002), Inductive Confidence Machines for Regression, ECML
3
Lei et al. (2018), Distribution-Free Predictive Inference for Regression, JRSS B
9 / 55
Split Conformal Prediction (SCP)1,2,3 : training step

0
Y

I Learn (or get) µ̂

−2

0 2 4
X

1
1
Vovk et al. (2005), Algorithmic Learning in a Random World
2
Papadopoulos et al. (2002), Inductive Confidence Machines for Regression, ECML
3
Lei et al. (2018), Distribution-Free Predictive Inference for Regression, JRSS B
10 / 55
Split Conformal Prediction (SCP)1,2,3 : calibration step

2
I Predict with µ̂
I Get the |residuals|, a.k.a.
conformity scores
0
Y

I Compute the (1 − α) empirical

quantile of
−2
S = {|residuals|}Cal ∪ {+∞},
noted q1−α (S)
0 2 4
X

1
1
Vovk et al. (2005), Algorithmic Learning in a Random World
2
Papadopoulos et al. (2002), Inductive Confidence Machines for Regression, ECML
3
Lei et al. (2018), Distribution-Free Predictive Inference for Regression, JRSS B
11 / 55
Split Conformal Prediction (SCP)1,2,3 : prediction step

0
I Predict with µ̂
Y

I Build Cbα (x): [µ̂(x) ± q1−α (S)]

−2

0 2 4
X

Calib. Train

1. Randomly split the training data into a proper training set (size #Tr) and a
calibration set (size #Cal)
2. Get µ̂ by training the algorithm A on the proper training set
3. On the calibration set, get prediction values with µ̂
4. Obtain a set of #Cal + 1 conformity scores :
S = {Si = |µ̂(Xi ) − Yi |, i ∈ Cal} ∪ {+∞}
(+ worst-case scenario)

5. Compute the 1 − α quantile of these scores, noted q1−α (S)

6. For a new point Xn+1 , return
Cbα (Xn+1 ) = [µ̂(Xn+1 ) − q1−α (S); µ̂(Xn+1 ) + q1−α (S)]

13 / 55
SCP: implementation details

Calib. Train

1. Randomly split the training data into a proper training set (size #Tr) and a
calibration set (size #Cal)
2. Get µ̂ by training the algorithm A on the proper training set
3. On the calibration set, get prediction values with µ̂
4. Obtain a set of #Cal conformity scores :
S = {Si = |µ̂(Xi ) − Yi |, i ∈ Cal}

1
5. Compute the (1 − α) + 1 quantile of these scores, noted q1−α (S)
#Cal
6. For a new point Xn+1 , return
Cbα (Xn+1 ) = [µ̂(Xn+1 ) − q1−α (S); µ̂(Xn+1 ) + q1−α (S)]

13 / 55
SCP: theoretical foundation

Definition (Exchangeability)
(Xi , Yi )ni=1 are exchangeable if, for any permutation σ of J1, nK:

L ((X1 , Y1 ) , . . . , (Xn , Yn )) = L Xσ(1) , Yσ(1) , . . . , Xσ(n) , Yσ(n) ,

where L designates the joint distribution.

Examples of exchangeable sequences

• i.i.d. samples
   2 
m σ

γ2
 .   .. 
 ..   . 
• The components of N  ..  , 
   
γ2

 ..
 .   .
 

m σ2

14 / 55
SCP: theoretical guarantees

SCP enjoys finite sample guarantees proved in Vovk et al. (2005); Lei et al. (2018).
Theorem
Suppose (Xi , Yi )n+1 4 n
i=1 are exchangeable . SCP applied on (Xi , Yi )i=1 outputs
Cbα (·) such that: n o
P Yn+1 ∈ Cbα (Xn+1 ) ≥ 1 − α.

Additionally, if the scores {Si }i∈Cal ∪ {Sn+1 } are a.s. distinct:

n o 1
P Yn+1 ∈ Cbα (Xn+1 ) ≤ 1 − α + .
#Cal + 1

4
Only the calibration and test data need to be exchangeable.
15 / 55
Proof architecture of SCP guarantees

Lemma (Quantile lemma)

If (U1 , . . . , Un , Un+1 ) are exchangeable, then for any β ∈]0, 1[:
P (Un+1 ≤ qβ (U1 , . . . , Un , +∞)) ≥ β.

Additionally, if U1 , . . . , Un , Un+1 are almost surely distinct, then:

1
P (Un+1 ≤ qβ (U1 , . . . , Un , +∞)) ≤ β + .
n+1

When (Xi , Yi )n+1

i=1 are exchangeable, the scores {Si }i∈Cal ∪{Sn+1 } are exchangeable.
,→ applying the quantile lemma to the scores concludes the proof.

16 / 55
Proof of the quantile lemma

First note that Un+1 ≤ qβ (U1 , . . . , Un , +∞) ⇐⇒ Un+1 ≤ qβ (U1 , . . . , Un , Un+1 ).

Then, by definition of qβ :
Un+1 ≤ qβ (U1 , . . . , Un , Un+1 ) ⇐⇒ rank(Un+1 ) ≤ dβ(n + 1)e

By exchangeability, rank(Un+1 ) ∼ U{1, . . . , n + 1}. Thus:

dβ(n + 1)e
P (rank(Un+1 ) ≤ dβ(n + 1)e) ≥ ≥ β.
n+1
If U1 , . . . , Un , Un+1 are almost surely distinct (without ties):
dβ(n + 1)e
P (rank(Un+1 ) ≤ dβ(n + 1)e) =
n+1
1 + β(n + 1) 1
≤ =β+ .
n+1 n+1

17 / 55
SCP: theoretical guarantees

Additionally, if the scores {Si }i∈Cal ∪ {Sn+1 } are a.s. distinct:

n o 1
P Yn+1 ∈ Cbα (Xn+1 ) ≤ 1 − α + .
#Cal + 1
n o
7 Marginal coverage: P Yn+1 ∈ Cbα (Xn+1 )
|X ≥1−α

n+1
=
x

4
Only the calibration and test data need to be exchangeable.
18 / 55
Conditional coverage implies adaptiveness

n o
• Marginal coverage: P Yn+1 ∈ Cbα (Xn+1 ) the errors may differ across regions
of the input space (i.e. non-adaptive)
n o
• Conditional coverage: P Yn+1 ∈ Cbα (Xn+1 ) |Xn+1 errors are evenly distributed
(i.e. fully adaptive)
• Conditional coverage is stronger than marginal coverage

no coverage marginal conditional

2 2 2

0 0 0
Y

Y
−2 −2 −2

−4 −4 −4

0 2 4 0 2 4 0 2 4
X X X

1 1 1

19 / 55
Standard mean-regression SCP is not adaptive

0
I Predict with µ̂
Y

I Build Cbα (x): [µ̂(x) ± q1−α (S)]

−2

0 2 4
X

20 / 55
Informative conditional coverage as such is impossible

• Impossibility results
,→ Vovk (2012); Lei and Wasserman (2014); Barber et al. (2021a)

Without distribution assumption,

n in
finite sample,
aoperfectly condition-
ally valid Cα is such that P mes Cα (x) = ∞ ≥ 1 − α for any
b b
non-atomic x.

• Approximate conditional coverage

,→ Romano et al. (2020a); Guan (2022); Jung et al. (2023); Gibbs et al. (2023)
Target P(Yn+1 ∈ Cbα |Xn+1 ∈ R(x)) ≥ 1 − α
• Asymptotic (with the sample size) conditional coverage
,→ Romano et al. (2019); Kivaranovic et al. (2020); Chernozhukov et al.
(2021); Sesia and Romano (2021); Izbicki et al. (2022)
Non exhaustive references.
21 / 55
Quantile Regression

Split Conformal Prediction (SCP)

Standard regression case
Conformalized Quantile Regression (CQR)
Generalization of SCP: going beyond regression

Avoiding data splitting: full conformal and out-of-bags approaches

Beyond exchangeability
Conformalized Quantile Regression (CQR)5

Train Cal Test

0
Y

−2

−4

0 1 2 3 4 5
X

1
5
Romano et al. (2019), Conformalized Quantile Regression, NeurIPS
22 / 55
Conformalized Quantile Regression (CQR)5 : training step

0
Y

I Learn (or get) QR

c lower and
−2 QR
c upper

−4

0 2 4
X

5
Romano et al. (2019), Conformalized Quantile Regression, NeurIPS
23 / 55
Conformalized Quantile Regression (CQR)5 : calibration step

+ +
+
+ + +
++ I Predict with QR
c lower and
QR
c upper
I Get the scores
- -
S = {Si }Cal ∪ {+∞}
- +
- -
- I Compute the (1 − α) empirical
+
quantile of S, noted q1−α (S)

n o
c lower (Xi ) − Yi , Yi − QR
,→ Si := max QR c upper (Xi )

5
Romano et al. (2019), Conformalized Quantile Regression, NeurIPS
24 / 55
Conformalized Quantile Regression (CQR)5 : prediction step

0
Y

I Predict with QR
c lower and
−2 QR
c upper

−4

0 2 4
X

I Build 1
c lower (x) − q1−α (S); QR
Cbα (x) = [QR c upper (x) + q1−α (S)]

5
Romano et al. (2019), Conformalized Quantile Regression, NeurIPS
25 / 55
CQR: implementation details

Calib. Train

1. Randomly split the training data into a proper training set (size #Tr) and a
calibration set (size #Cal)
2. Get QR
c lower and QR c upper by training the algorithm A on the proper training
set
3. Obtain a set of #Cal + 1 conformity scores S:

S = {Si = max QR
c lower (Xi ) − Yi , Yi − QR
c upper (Xi ) , i ∈ Cal} ∪ {+∞}
4. Compute the 1 − α quantile of these scores, noted q1−α (S)
5. For a new point Xn+1 , return
c lower (Xn+1 ) − q1−α (S); QR
Cbα (Xn+1 ) = [QR c upper (Xn+1 ) + q1−α (S)]

26 / 55
CQR: implementation details

1. Randomly split the training data into a proper training set (size #Tr) and a
calibration set (size #Cal)
2. Get QR
c lower and QR c upper by training the algorithm A on the proper training
set
3. Obtain a set of #Cal conformity scores S:

S = {Si = max QR
c lower (Xi ) − Yi , Yi − QR
c upper (Xi ) , i ∈ Cal}

1
4. Compute the (1 − α) + 1 quantile of these scores, noted q1−α (S)
#Cal
5. For a new point Xn+1 , return
c lower (Xn+1 ) − q1−α (S); QR
Cbα (Xn+1 ) = [QR c upper (Xn+1 ) + q1−α (S)]

26 / 55
CQR: theoretical guarantees

This procedure enjoys the finite sample guarantee proposed and proved in Romano
et al. (2019).
Theorem
Suppose (Xi , Yi )n+1 6 n
i=1 are exchangeable . CQR on (Xi , Yi )i=1 outputs Cα (·) such
b
that: n o
P Yn+1 ∈ Cbα (Xn+1 ) ≥ 1 − α.

If, in addition, the scores {Si }i∈Cal ∪ {Sn+1 } are almost surely distinct, then
n o 1
P Yn+1 ∈ Cbα (Xn+1 ) ≤ 1 − α + .
#Cal + 1

Proof: application of the quantile lemma.

n o
7 Marginal coverage: P Yn+1 ∈ Cbα (Xn+1 )
|X =x ≥1−α

n+1

6
Only the calibration and test data need to be exchangeable.
27 / 55
Quantile Regression

Split Conformal Prediction (SCP)

Standard regression case
Conformalized Quantile Regression (CQR)
Generalization of SCP: going beyond regression

Avoiding data splitting: full conformal and out-of-bags approaches

Beyond exchangeability
SCP is defined by the conformity score function

Calib. Train

1. Randomly split the training data into a proper training set (size #Tr) and a
calibration set (size #Cal)
2. Get Â by training the algorithm A on the proper training set
3. On the calibration set, obtain #Cal + 1 conformity scores
S = {Si = s (Â(Xi ), Yi ), i ∈ Cal} ∪ {+∞}
Ex 1: s (Â(Xi ), Yi ) := |µ̂(Xi
) − Yi | in regression with standard scores
c lower (Xi ) − Yi , Yi − QR
Ex 2: s (Â(Xi ), Yi ) := max QR c upper (Xi ) in CQR
4. Compute the 1 − α quantile of these scores, noted q1−α (S)
5. For a new point Xn+1 , return
Cbα (Xn+1 ) = {y such that s (Â(Xn+1 ), y ) ≤ q1−α (S)}
,→ The definition of the conformity scores is crucial, as they incorporate almost all
the information: data + underlying model
28 / 55
SCP: theoretical guarantees

This procedure enjoys the finite sample guarantee proposed and proved in Vovk
et al. (2005).
Theorem
Suppose (Xi , Yi )n+1 7 n
i=1 are exchangeable . SCP on (Xi , Yi )i=1 outputs Cα (·) such
b
that: n o
P Yn+1 ∈ Cbα (Xn+1 ) ≥ 1 − α.

If, in addition, the scores {Si }i∈Cal ∪ {Sn+1 } are almost surely distinct, then
n o 1
P Yn+1 ∈ Cbα (Xn+1 ) ≤ 1 − α + .
#Cal + 1

Proof: application of the quantile lemma.

n o
7 Marginal coverage: P Yn+1 ∈ Cbα (Xn+1 )
|X =x ≥1−α

n+1

7
Only the calibration and test data need to be exchangeable.
29 / 55
SCP: what choices for the regression scores?
Cbα (Xn+1 ) = {y such that s (Â(Xn+1 ), y ) ≤ q1−α (S)}
Standard SCP Locally weighted SCP CQR
Vovk et al. (2005) Lei et al. (2018) Romano et al. (2019)
c lower (X ) − Y ,
max(QR
|µ̂(X ) − Y |
s (Â(X ), Y ) |µ̂(X ) − Y |
ρ̂(X ) Y − QR c upper (X ))
c lower (x) − q1−α (S);
[QR
Cbα (x) [µ̂(x) ± q1−α (S)] [µ̂(x) ± q1−α (S)ρ̂(x)]
QR
c upper (x) + q1−α (S)]
2 2 2

0 0 0
Y

Y
−2 −2 −2

−4 −4 −4

0 2 4 0 2 4 0 2 4
Visu. X X X

1 1 1

3 black-box around a “us- black-box around a “usable” adaptive

able” prediction prediction

7 not adaptive limited adaptiveness no black-box around a “us-

able” prediction
30 / 55
SCP: standard classification

• Y ∈ {1, . . . , C } (C classes)
• Â(X ) = (p̂1 (X ), . . . , p̂C (X )) (estimated probabilities)
• s (Â(X ), Y ) := 1 − (Â(X ))Y
• For a new point Xn+1 , return
Cbα (Xn+1 ) = {y such that s (Â(Xn+1 ), y ) ≤ q1−α (S)}

31 / 55
SCP: standard classification in practice

Ex: Yi ∈ {“dog”, “tiger”, “cat”}, with α = 0.1

• Scores on the calibration set

Cali “dog” “dog” “dog” “tiger” “tiger” “tiger” “tiger” “cat” “cat” “cat”
p̂dog (Xi ) 0.95 0.90 0.85 0.15 0.15 0.20 0.15 0.15 0.25 0.20
p̂tiger (Xi ) 0.02 0.05 0.10 0.60 0.55 0.50 0.45 0.40 0.35 0.45
p̂cat (Xi ) 0.03 0.05 0.05 0.25 0.30 0.30 0.40 0.45 0.40 0.35
Si 0.05 0.1 0.15 0.40 0.45 0.50 0.55 0.55 0.6 0.65

• q1−α (S) = 0.65

• Â(Xn+1 ) = (0.05, 0.60, 0.35)
,→ s (Â(Xn+1 ), “dog”) = 0.95 “dog” ∈/ Cbα (Xn+1 )
,→ s (Â(Xn+1 ), “tiger”) = 0.40 ≤ q1−α (S) “tiger” ∈ Cbα (Xn+1 )
,→ s (Â(Xn+1 ), “cat”) = 0.65 ≤ q1−α (S) “cat” ∈ Cbα (Xn+1 )
• Cbα (Xn+1 ) = {“tiger”, “cat”}

32 / 55
SCP: standard classification in practice, cont’d

Ex: Y ∈ {“dog”, “tiger”, “cat”}, with α = 0.1

• Scores on the calibration set

Cali “dog” “dog” “dog” “tiger” “tiger” “tiger” “tiger” “cat” “cat” “cat”
p̂dog (Xi ) 0.95 0.90 0.85 0.05 0.05 0.05 0.05 0.10 0.10 0.15
p̂tiger (Xi ) 0.02 0.05 0.10 0.85 0.80 0.75 0.70 0.25 0.30 0.30
p̂cat (Xi ) 0.03 0.05 0.05 0.10 0.15 0.20 0.25 0.65 0.60 0.55
Si 0.05 0.1 0.15 0.15 0.20 0.25 0.30 0.35 0.40 0.45
• q1−α (S) = 0.45
• Â(Xn+1 ) = (0.05, 0.60, 0.35)
,→ s (Â(Xn+1 ), “dog”) = 0.95 “dog” ∈/ Cbα (Xn+1 )
,→ s (Â(Xn+1 ), “tiger”) = 0.40 ≤ q1−α (S) “tiger” ∈ Cbα (Xn+1 )
,→ s (Â(Xn+1 ), “cat”) = 0.65 “cat” ∈/ Cbα (Xn+1 )
• Cbα (Xn+1 ) = {“tiger”}

33 / 55
SCP: limits of the standard classification case

The standard classification conformity score function leads to:

3 smallest prediction sets on average

7 undercovering (overcovering) hard (easy) subgroups
(similar to the standard mean regression case!)

⇒ Other score functions can be built to improve adaptiveness

(as in regression with localized scores)

34 / 55
SCP: classification with Adaptive Prediction Sets8

1. Sort in decreasing order p̂ σ(1) (X ) ≥ . . . ≥ p̂ σ(C ) (X )

σ −1 (Y )
X
2. s (Â(X ), Y ) := p̂ σ(k) (X ) (sum of the estimated probabilities
k=1
associated to classes at least as large as that of the true class Y )
3. Return the set of classes {σn+1 (1), . . . , σn+1 (r ? )}, where
( r )
X
r ? = arg max p̂ σn+1 (k) (Xn+1 ) < q1−α (S) +1
1≤r ≤C k=1
probabilities

probabilities
Cumulative
Estimated

estimated
er

er
g

g
t

t
Ca

Ca
Do

Do
Tig

Tig
8
Romano et al. (2020b), Classification with Valid and Adaptive Coverage, NeurIPS
Figure highly inspired by Angelopoulos and Bates (2023). 35 / 55
SCP: classification with Adaptive Prediction Sets in practice

Ex: Y ∈ {“dog”, “tiger”, “cat”}, with α = 0.1

• Scores on the calibration set

Cali “dog” “dog” “dog” “tiger” “tiger” “tiger” “tiger” “cat” “cat” “cat”
p̂dog (Xi ) 0.95 0.90 0.85 0.05 0.05 0.05 0.10 0.25 0.10 0.15
p̂tiger (Xi ) 0.02 0.05 0.10 0.85 0.80 0.75 0.75 0.40 0.30 0.30
p̂cat (Xi ) 0.03 0.05 0.05 0.10 0.15 0.20 0.15 0.35 0.60 0.55
Si 0.95 0.90 0.85 0.85 0.80 0.75 0.75 0.75 0.60 0.55
• q1−α (S) = 0.95
,→ Ex 1: Â(Xn+1 ) = (0.05, 0.45, 0.5), r ? = 2
Cbα (Xn+1 ) = {“tiger”, “cat”}
?
,→ Ex 2: Â(Xn+1 ) = (0.03, 0.95, 0.02), r = 1
Cbα (Xn+1 ) = {“tiger”}

36 / 55
Split Conformal Prediction: summary

• Simple procedure which quantifies the uncertainty of any predictive model Â

by returning predictive regions
• Finite-sample guarantees
• Distribution-free as long as the data are exchangeable (and so are the scores)
• Marginal theoretical guarantee over the joint (X , Y ) distribution, and not con-
ditional, i.e., no guarantee that for any x:
n o
P Yn+1 ∈ Cbα (Xn+1 ) |X = x ≥ 1 − α.

n+1
,→ marginal also over the whole calibration set and the test point!

37 / 55
Challenges: open questions (non exhaustive!)

• Conditional coverage (∼ Previous Section)

• Computational cost vs statistical power (Next Section)
• Exchangeability (Last Section)

38 / 55
Quantile Regression

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

Full Conformal Prediction
Jackknife+

Beyond exchangeability
Quantile Regression

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

Full Conformal Prediction
Jackknife+

Beyond exchangeability
Splitting the data might not be desired

SCP suffers from data splitting:

• lower statistical efficiency (lower model accuracy and higher predictive set size)
• higher statistical variability

Can we avoid splitting the data set?

39 / 55
The naive idea does not enjoy valid coverage (even empirically)

• A naive idea:
◦ Get Â by training the algorithm A on {(X1 , Y1 ), . . . , (Xn , Yn )}.
◦ compute the empirical quantile q1−α (S) of the set of scores
n on
S = s Â (Xi ) , Yi ∪ {∞}.
i=1
n o
◦ output the set y such that s Â (Xn+1 ) , y ≤ q1−α (S) .

7 Â has been obtained using the training set {(X1 , Y1 ), . . . , (Xn , Yn )} but did
not use
Xn+1 .
⇒ s Â (Xn+1 ) , y stochastically dominates any element of
n on
s Â (Xi ) , Yi .
i=1

40 / 55
Full Conformal Prediction9 does not discard training points!

• Full (or transductive) Conformal Prediction

◦ avoids data splitting
◦ at the cost of many more model fits
• Idea: the most probable labels Yn+1 live in Y, and have a low enough conformity
score. By looping over all possible y ∈ Y, the ones leading to the smallest
conformity scores will be found.

9
Vovk et al. (2005), Algorithmic Learning in a Random World
41 / 55
Full Conformal Prediction (CP): recovering exchangeability

For any candidate (Xn+1 , y ):

1. Get Ây by training A on {(X1 , Y1 ), . . . , (Xn , Yn )} ∪ {(Xn+1 , y )}

2. Obtain a set of training scores
n on
(train)
Sy = s (Ây (Xi ), Yi )∪ { s (Ây (Xn+1 ), y )}
i=1

(train)
and compute their 1 − α empirical quantile q1−α Sy
n o
(train)
Output the set y such that s Ây (Xn+1 ) , y ≤ q1−α Sy .

3 Test point treated in the same way than train points

7 Computationally costly

42 / 55
Full CP: theoretical foundation

Definition (Symmetrical algorithm)

A deterministic algorithm A : (U1 , . . . , Un ) 7→ Â is symmetric if for any
permutation σ of J1, nK:
a.s.
A (U1 , . . . , Un ) = A Uσ(1) , . . . , Uσ(n) .

43 / 55
Full CP: theoretical guarantees

Full CP enjoys finite sample guarantees proved in Vovk et al. (2005).

Theorem
Suppose that

(i) (Xi , Yi )n+1

i=1 are exchangeable,
(ii) the algorithm A is symmetric.

Full CP applied on (Xi , Yi )ni=1 ∪ {Xn+1 } outputs Cbα (·) such that:
n o
P Yn+1 ∈ Cbα (Xn+1 ) ≥ 1 − α.
Additionally, if the scores are a.s. distinct:
n o 1
P Yn+1 ∈ Cbα (Xn+1 ) ≤ 1 − α + .
n+1
n o
7 Marginal coverage: P Yn+1 ∈ Cbα (Xn+1 )
|X =x ≥1−α

n+1

44 / 55
Interpolation regime

FCP sets with an interpolating algorithm

Assume A interpolates:

• Â = A ((x1 , y1 ), . . . , (xn+1 , yn+1 ))

• Â(xk ) − yk = 0 for any k ∈ J1, n + 1K

⇒ Full Conformal Prediction (with standard score functions) outputs Y (the

whole label space) for any new test point!

45 / 55
Quantile Regression

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

Full Conformal Prediction
Jackknife+

Beyond exchangeability
Jackknife: the naive idea does not enjoy valid coverage

• Based on leave-one-out (LOO) residuals

• Dn = {(X1 , Y1 ), . . . , (Xn , Yn )} training data

• Get Â−i by training A on Dn \ (Xi , Yi )
n o
• LOO scores S = |Â−i (Xi ) − Yi | ∪ {+∞} (in standard mean regression)
i
• Get Â by training A on Dn
h i
• Build the predictive interval: Â(Xn+1 ) ± q1−α (S)

Warning
No guarantee on the prediction of Â with scores based on (Â−i )i , without
assuming a form of stability on A.
46 / 55
Jackknife+10

• Based on leave-one-out (LOO) residuals

• Dn = {(X1 , Y1 ), . . . , (Xn , Yn )} training data

• Get Â−i by training A on Dn \ (Xi , Yi )
• LOO predictions
n / predictive intervals o
Sup/down = Â−i (Xn+1 ) ± |Â−i (Xi ) − Yi | ∪ {±∞}
i
(in standard mean regression)
• Build the predictive interval: [qα,inf (Sdown ); q1−α (Sup )]
Theorem
If Dn ∪ (Xn+1 , Yn+1 ) are exchangeable and A is symmetric: P(Yn+1 ∈ Cbα (Xn+1 )) ≥ 1 − 2α.
10
Barber et al. (2021b), Predictive Inference with the jackknife+, The Annals of Statistics
Recall qβ,inf (X1 , . . . , Xn ) := bβ × nc smallest value of (X1 , . . . , Xn )
47 / 55
11
CV+ (see also cross-conformal predictors: Vovk, 2015)

• Based on cross-validation residuals

• Dn = {(X1 , Y1 ), . . . , (Xn , Yn )} training data
• Split Dn into K folds F1 , . . . , FK
• Get Â−Fk by training A on Dn \ Fk
• Cross-val predictions
/ predictive intervals
n o
Sup/down = Â−Fk (Xn+1 ) ± |Â−Fk (Xi ) − Yi | ∪ {±∞}
i∈Fk k
(in standard mean regression)
• Build the predictive interval: [qα,inf (Sdown ); q1−α (Sup )]
Theorem
If Dn ∪ (Xn+1 , Yn+1 ) are exchangeableand A is symmetric:
2(1 − 1/K ) 1 − K /n p
P(Yn+1 ∈ Cbα (Xn+1 )) ≥ 1 − 2α − min , ≥ 1 − 2α − 2/n.
n/K + 1 K +1
11
Barber et al. (2021b), Predictive Inference with the jackknife+, The Annals of Statistics
Recall qβ,inf (X1 , . . . , Xn ) := bβ × nc smallest value of (X1 , . . . , Xn ) 48 / 55
General overview

Statistical efficiency

Computational efficiency

SCP CV+ Jackknife+ FCP

Nested Conformal Prediction

• Generalized framework encapsulating out-of-sample methods: Nested CP

(Gupta et al., 2022)
• Accelerating FCP: Nouretdinov et al. (2001); Lei (2019); Ndiaye and Takeuchi
(2019); Cherubin et al. (2021); Ndiaye and Takeuchi (2022); Ndiaye (2022)
Non exhaustive references.
49 / 55
Quantile Regression

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

Beyond exchangeability
Exchangeability does not hold in many practical applications

• CP requires exchangeable data points to ensure validity

7 Covariate shift, i.e. LX changes but LY |X stays constant
7 Label shift, i.e. LY changes but LX |Y stays constant
7 Arbitrary distribution shift
7 Possibly many shifts, not only one

50 / 55
Covariate shift (Tibshirani et al., 2019)12

• Setting:
i.i.d.
◦ (X1 , Y1 ), . . . , (Xn , Yn ) ∼ PX × PY |X
◦ (Xn+1 , Yn+1 ) ∼ P̃X × PY |X
• Idea: give more importance to calibration points that are closer in distribution
to the test point
• In practice:
dP̃X (Xi )
1. estimate the likelihood ratio w (Xi ) =
dPX (Xi )
w (Xi )
2. normalize the weights, i.e. ωi = ω(Xi ) = Pn+1
j=1 w (Xj )
3. outputs Cbα (Xn+1 ) =
n o
y : s (Â(Xn+1 ), y ) ≤ q1−α ({ωi Si }i∈Cal ∪ {+∞})

12
Tibshirani et al. (2019), Conformal Prediction Under Covariate Shift, NeurIPS
51 / 55
Label shift (Podkopaev and Ramdas, 2021)13
• Setting:
i.i.d.
◦ (X1 , Y1 ), . . . , (Xn , Yn ) ∼ PX |Y × PY
◦ (Xn+1 , Yn+1 ) ∼ PX |Y × P̃Y
◦ Classification
• Idea: give more importance to calibration points that are closer in distribution
to the test point
• Trouble: the actual test labels are unknown
• In practice:
dP̃Y (Yi )
1. estimate the likelihood ratio w (Yi ) = using algorithms from the existing
dPY (Yi )
label shift literature
w (Yi )
2. normalize the weights, i.e. ω yi = ω y (Xi ) = Pn
j=1 (Yj ) + w (y )
w
3. outputs Cbα (Xn+1 ) =
n o
y : s (Â(Xn+1 ), y ) ≤ q1−α ({ω yi Si }i∈Cal ∪ {+∞})
13
Podkopaev and Ramdas (2021), Distribution-free uncertainty quantification for classification under label
shift, UAI 52 / 55
Generalizations

• Arbitrary distribution shift: Cauchois et al. (2020) leverages ideas from the
distributionally robust optimization literature
• Two major general theoretical results beyond exchangeability:
◦ Chernozhukov et al. (2018)
,→ If the learnt model is accurate and the data noise is strongly mixing, then CP
is valid asymptotically 3
◦ Barber et al. (2022)
,→ Quantifies the coverage loss depending on the strength of exchangeability
violation
P(Yn+1 ∈ Cbα (Xn+1 )) ≥ 1 − α − average violation of exchangeability
by each calibration point
,→ proposed algorithm: reweighting again!
e.g., in a temporal setting, give higher weights to more recent points.

53 / 55
Online setting

• Data: T0 random variables (X1 , Y1 ), . . . , (XT0 , YT0 ) in Rd × R

• Aim: predict the response values as well as predictive intervals for T1 subsequent
observations XT0 +1 , . . . , XT0 +T1 sequentially: at any prediction step t ∈ JT0 +
1, T0 + T1 K, Yt−T0 , . . . , Yt−1 have been revealed
• Build the smallest interval Cbαt such that:
n o
P Yt ∈ Cbαt (Xt ) ≥ 1 − α, for t ∈ JT0 + 1, T0 + T1 K,

often simplified in:

T0 +T1
1 X n o
1 Yt ∈ Cbαt (Xt ) ≈ 1 − α.
T1
t=T0 +1

54 / 55
Recent developments

• Consider splitting strategies that respect the temporal structure

• Gibbs and Candès (2021) propose a method which reacts faster to temporal
evolution
◦ Idea: track the previous coverages of the predictive intervals (1{Yt ∈ Cbα (Xt )})
◦ Tool: update the empirical quantile level with a learning rate γ
◦ Asymptotic guarantee (on average) for any distribution (even adversarial)
• Zaffran et al. (2022) studies the influence of this learning rate γ and proposes,
along with Gibbs and Candès (2022), a method not requiring to choose γ
• Bhatnagar et al. (2023) enjoys anytime regret bound, by leveraging tools from
the strongly adaptive regret minimization literature
• Bastani et al. (2022) proposes an algorithm achieving stronger coverage guar-
antees (conditional on specified overlapping subsets, and threshold calibrated)
without hold-out set
• Angelopoulos et al. (2023) combines CP ideas with control theory ones, to
adaptively improve the predictive intervals depending on the errors structure
Non exhaustive references. 55 / 55
References i

Angelopoulos, A. N. and Bates, S. (2023). Conformal prediction: A gentle

introduction. Foundations and Trends® in Machine Learning, 16(4).
Angelopoulos, A. N., Candès, E. J., and Tibshirani, R. J. (2023). Conformal pid
control for time series prediction. arXiv: 2307.16895.
Barber, R. F., Candès, E. J., Ramdas, A., and Tibshirani, R. J. (2021a). The limits
of distribution-free conditional predictive inference. Information and Inference: A
Journal of the IMA, 10(2).
Barber, R. F., Candès, E. J., Ramdas, A., and Tibshirani, R. J. (2021b). Predictive
inference with the jackknife+. The Annals of Statistics, 49(1).
Barber, R. F., Candès, E. J., Ramdas, A., and Tibshirani, R. J. (2022). Conformal
prediction beyond exchangeability. To appear in Annals of Statistics (2023).
References ii

Bastani, O., Gupta, V., Jung, C., Noarov, G., Ramalingam, R., and Roth, A.
(2022). Practical adversarial multivalid conformal prediction. In Advances in
Neural Information Processing Systems. Curran Associates, Inc.
Bhatnagar, A., Wang, H., Xiong, C., and Bai, Y. (2023). Improved online
conformal prediction via strongly adaptive online learning. In Proceedings of the
40th International Conference on Machine Learning. PMLR.
Cauchois, M., Gupta, S., Ali, A., and Duchi, J. C. (2020). Robust Validation:
Confident Predictions Even When Distributions Shift. arXiv: 2008.04267.
Chernozhukov, V., Wüthrich, K., and Yinchu, Z. (2018). Exact and Robust
Conformal Inference Methods for Predictive Machine Learning with Dependent
Data. In Conference On Learning Theory. PMLR.
Chernozhukov, V., Wüthrich, K., and Zhu, Y. (2021). Distributional conformal
prediction. Proceedings of the National Academy of Sciences, 118(48).
References iii

Cherubin, G., Chatzikokolakis, K., and Jaggi, M. (2021). Exact optimization of

conformal predictors via incremental and decremental learning. In Proceedings
of the 38th International Conference on Machine Learning. PMLR.
Fontana, M., Zeni, G., and Vantini, S. (2023). Conformal prediction: A unified
review of theory and new challenges. Bernoulli, 29(1).
Gibbs, I. and Candès, E. (2021). Adaptive conformal inference under distribution
shift. In Advances in Neural Information Processing Systems. Curran Associates,
Inc.
Gibbs, I. and Candès, E. (2022). Conformal inference for online prediction with
arbitrary distribution shifts. arXiv: 2208.08401.
Gibbs, I., Cherian, J. J., and Candès, E. J. (2023). Conformal prediction with
conditional guarantees. arXiv: 2305.12616.
References iv

Guan, L. (2022). Localized conformal prediction: a generalized inference

framework for conformal prediction. Biometrika, 110(1).
Gupta, C., Kuchibhotla, A. K., and Ramdas, A. (2022). Nested conformal
prediction and quantile out-of-bag ensemble methods. Pattern Recognition, 127.
Izbicki, R., Shimizu, G., and Stern, R. B. (2022). CD-split and HPD-split: Efficient
conformal regions in high dimensions. Journal of Machine Learning Research,
23(87).
Jung, C., Noarov, G., Ramalingam, R., and Roth, A. (2023). Batch multivalid
conformal prediction. In International Conference on Learning Representations.
Kivaranovic, D., Johnson, K. D., and Leeb, H. (2020). Adaptive, Distribution-Free
Prediction Intervals for Deep Networks. In International Conference on Artificial
Intelligence and Statistics. PMLR.
References v

Lei, J. (2019). Fast exact conformalization of the lasso using piecewise linear
homotopy. Biometrika, 106(4).
Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., and Wasserman, L. (2018).
Distribution-Free Predictive Inference for Regression. Journal of the American
Statistical Association.
Lei, J. and Wasserman, L. (2014). Distribution-free prediction bands for
non-parametric regression. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 76(1).
Manokhin, V. (2022). Awesome conformal prediction.
https://github.com/valeman/awesome-conformal-prediction.
Ndiaye, E. (2022). Stable conformal prediction sets. In Proceedings of the 39th
International Conference on Machine Learning. PMLR.
References vi

Ndiaye, E. and Takeuchi, I. (2019). Computing full conformal prediction set with
approximate homotopy. In Advances in Neural Information Processing Systems.
Curran Associates, Inc.
Ndiaye, E. and Takeuchi, I. (2022). Root-finding approaches for computing
conformal prediction set. Machine Learning, 112(1).
Nouretdinov, I., Melluish, T., and Vovk, V. (2001). Ridge regression confidence
machine. In Proceedings of the 18th International Conference on Machine
Learning.
Papadopoulos, H., Proedrou, K., Vovk, V., and Gammerman, A. (2002). Inductive
Confidence Machines for Regression. In Machine Learning: ECML. Springer.
Podkopaev, A. and Ramdas, A. (2021). Distribution-free uncertainty quantification
for classification under label shift. In Proceedings of the Thirty-Seventh
Conference on Uncertainty in Artificial Intelligence. PMLR.
References vii

Romano, Y., Barber, R. F., Sabatti, C., and Candès, E. (2020a). With Malice
Toward None: Assessing Uncertainty via Equalized Coverage. Harvard Data
Science Review, 2(2).
Romano, Y., Patterson, E., and Candès, E. (2019). Conformalized Quantile
Regression. In Advances in Neural Information Processing Systems. Curran
Associates, Inc.
Romano, Y., Sesia, M., and Candes, E. (2020b). Classification with valid and
adaptive coverage. In Advances in Neural Information Processing Systems.
Curran Associates, Inc.
Sesia, M. and Romano, Y. (2021). Conformal prediction using conditional
histograms. In Advances in Neural Information Processing Systems. Curran
Associates, Inc.
References viii

Tibshirani, R. J., Barber, R. F., Candes, E., and Ramdas, A. (2019). Conformal
Prediction Under Covariate Shift. In Advances in Neural Information Processing
Systems. Curran Associates, Inc.
Vovk, V. (2012). Conditional Validity of Inductive Conformal Predictors. In Asian
Conference on Machine Learning. PMLR.
Vovk, V. (2015). Cross-conformal predictors. Annals of Mathematics and Artificial
Intelligence, 74(1-2).
Vovk, V., Gammerman, A., and Shafer, G. (2005). Algorithmic Learning in a
Random World. Springer US.
Zaffran, M., Féron, O., Goude, Y., Josse, J., and Dieuleveut, A. (2022). Adaptive
conformal predictions for time series. In Proceedings of the 39th International
Conference on Machine Learning. PMLR.
References ix

This tutorial by Margaux Zaffran (2023) is licensed under a Creative Commons

Attribution-NonCommercial-ShareAlike 4.0 International License.
https://conformalpredictionintro.github.io

Conformal Prediction Methods
No ratings yet
Conformal Prediction Methods
61 pages
Con Formal
No ratings yet
Con Formal
16 pages
Conformalized Quantile Regression: Preprint. Work in Progress
No ratings yet
Conformalized Quantile Regression: Preprint. Work in Progress
19 pages
Theoretical Foundations of Conformal Prediction 1732440976
100% (2)
Theoretical Foundations of Conformal Prediction 1732440976
179 pages
NeurIPS 2021 Adaptive Conformal Inference Under Distribution Shift Paper
No ratings yet
NeurIPS 2021 Adaptive Conformal Inference Under Distribution Shift Paper
13 pages
Conformal Prediction
No ratings yet
Conformal Prediction
51 pages
Uncertainty Notes
No ratings yet
Uncertainty Notes
166 pages
Multivariate Conformal Prediction Using Optimal Transport: Michal Klein Louis Bethune Eugene Ndiaye Marco Cuturi
No ratings yet
Multivariate Conformal Prediction Using Optimal Transport: Michal Klein Louis Bethune Eugene Ndiaye Marco Cuturi
15 pages
313 Identifying Homogeneous and in
No ratings yet
313 Identifying Homogeneous and in
15 pages
Zaffran PHD Defense Handout
No ratings yet
Zaffran PHD Defense Handout
121 pages
Conformal Predictors: Part 1: Comp2208: Intelligent Systems
No ratings yet
Conformal Predictors: Part 1: Comp2208: Intelligent Systems
54 pages
Conformal Prediction Beyond Exchangeability
No ratings yet
Conformal Prediction Beyond Exchangeability
63 pages
Multiply Robust Conformal Risk Control With Coarsened Data: 1 Introduction and Motivation
No ratings yet
Multiply Robust Conformal Risk Control With Coarsened Data: 1 Introduction and Motivation
54 pages
CP Tutorial 2017
No ratings yet
CP Tutorial 2017
112 pages
Conformal Prediction
No ratings yet
Conformal Prediction
103 pages
2023 - Molnar - Introduction To Conformal Prediction With Python - A Short Guide For Quantifying Uncertainty of Machine Learning Models
No ratings yet
2023 - Molnar - Introduction To Conformal Prediction With Python - A Short Guide For Quantifying Uncertainty of Machine Learning Models
101 pages
Introduction To Conformal Prediction With Python: A Short Guide For Quantifying Uncertainty of Machine Learning Models 1st Edition Christoph Molnar Digital Download
0% (1)
Introduction To Conformal Prediction With Python: A Short Guide For Quantifying Uncertainty of Machine Learning Models 1st Edition Christoph Molnar Digital Download
96 pages
Quantile Methods Notes 2021
No ratings yet
Quantile Methods Notes 2021
12 pages
Lecture Notes For ECE 695-09/08/03
No ratings yet
Lecture Notes For ECE 695-09/08/03
3 pages
Adaptive Conformal Regression With Jackknife+ Rescaled Scores
No ratings yet
Adaptive Conformal Regression With Jackknife+ Rescaled Scores
24 pages
Con Formal
No ratings yet
Con Formal
50 pages
Fontana 2023 Conformal Review
No ratings yet
Fontana 2023 Conformal Review
23 pages
Label Noise Robustness of Conformal Prediction: Bat-Sheva Einbinder
No ratings yet
Label Noise Robustness of Conformal Prediction: Bat-Sheva Einbinder
66 pages
Toccaceli 17 A
No ratings yet
Toccaceli 17 A
23 pages
Split CP for Non-Exchangeable Data
No ratings yet
Split CP for Non-Exchangeable Data
38 pages
23 Ejs2180
No ratings yet
23 Ejs2180
61 pages
Tight Distribution-Free Confidence Intervals For Local Quantile Regression
No ratings yet
Tight Distribution-Free Confidence Intervals For Local Quantile Regression
50 pages
Conformal Recursive Feature Elimination
No ratings yet
Conformal Recursive Feature Elimination
35 pages
Unit-2 MLT
No ratings yet
Unit-2 MLT
84 pages
Qin 等 - 2024 - Distribution-Free Prediction Intervals Under Covariate Shift, With an Application to Causal Inferenc
No ratings yet
Qin 等 - 2024 - Distribution-Free Prediction Intervals Under Covariate Shift, With an Application to Causal Inferenc
14 pages
Conformal Prediction - LLMs
No ratings yet
Conformal Prediction - LLMs
16 pages
Quantile Methods Slides 2024
No ratings yet
Quantile Methods Slides 2024
35 pages
Vovk 21 A
No ratings yet
Vovk 21 A
20 pages
Advanced ML Notes (Midterm)
No ratings yet
Advanced ML Notes (Midterm)
10 pages
Machine Learning Uncertainty Guide
No ratings yet
Machine Learning Uncertainty Guide
263 pages
Inductive Conformal Martingales For Change-Point Detection: Denis Volkhonskiy
No ratings yet
Inductive Conformal Martingales For Change-Point Detection: Denis Volkhonskiy
22 pages
Nonparametric Quantile Regression: Ichiro Takeuchi Quoc V. Le Tim Sears Alexander J. Smola
No ratings yet
Nonparametric Quantile Regression: Ichiro Takeuchi Quoc V. Le Tim Sears Alexander J. Smola
32 pages
STATS Formula Sheet
No ratings yet
STATS Formula Sheet
4 pages
Model Selection and Multiple Hypothesis Testing
No ratings yet
Model Selection and Multiple Hypothesis Testing
6 pages
Conformalized Adaptive Forecasting of Heterogeneous Trajectories
No ratings yet
Conformalized Adaptive Forecasting of Heterogeneous Trajectories
52 pages
Seminar 1
No ratings yet
Seminar 1
12 pages
Model Comparison and Calibration Assessment
No ratings yet
Model Comparison and Calibration Assessment
70 pages
Ps Formuale
No ratings yet
Ps Formuale
7 pages
10 1134@S0005117915090052
No ratings yet
10 1134@S0005117915090052
16 pages
Recursive Estimation
No ratings yet
Recursive Estimation
12 pages
Calibrated Model Criticism Using Split Predictive Checks: Jiawei Li and Jonathan H. Huggins
No ratings yet
Calibrated Model Criticism Using Split Predictive Checks: Jiawei Li and Jonathan H. Huggins
61 pages
Sample Quantiles in Statistical Packages - Hyndman1996
No ratings yet
Sample Quantiles in Statistical Packages - Hyndman1996
6 pages
Mil780 Classification
No ratings yet
Mil780 Classification
18 pages
EScholarship UC Item 5d01s25c
No ratings yet
EScholarship UC Item 5d01s25c
177 pages
Warren Gilchrist - Statistical Modelling With Quantile Functions-Chapman & Hall - CRC (2000)
No ratings yet
Warren Gilchrist - Statistical Modelling With Quantile Functions-Chapman & Hall - CRC (2000)
301 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
Selected Theoretical Aspects of ML and Deep Learning
No ratings yet
Selected Theoretical Aspects of ML and Deep Learning
46 pages
coursIHP Gerchinovitz
No ratings yet
coursIHP Gerchinovitz
13 pages
Lecture 11
No ratings yet
Lecture 11
32 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
FIT5197 2021 S1 Formula Sheet
No ratings yet
FIT5197 2021 S1 Formula Sheet
20 pages
1-Hybrid Multiscale Models
No ratings yet
1-Hybrid Multiscale Models
82 pages
Swimming and Flying
No ratings yet
Swimming and Flying
15 pages
Stage 2014
No ratings yet
Stage 2014
41 pages
3MC CIMPA School 2025 Project Overview
No ratings yet
3MC CIMPA School 2025 Project Overview
1 page
Section 1 1exponentialmap
No ratings yet
Section 1 1exponentialmap
29 pages
Durbin Watson Tables
No ratings yet
Durbin Watson Tables
35 pages
Path Analysis Introduction and Example
No ratings yet
Path Analysis Introduction and Example
8 pages
Tutorial Confidence Interval
No ratings yet
Tutorial Confidence Interval
21 pages
David J. Bartholomew - Statistics Without Mathematics-SAGE (2015)
No ratings yet
David J. Bartholomew - Statistics Without Mathematics-SAGE (2015)
201 pages
Xtxtgee
No ratings yet
Xtxtgee
19 pages
Sta301 Fa24 Syllabus Hluce
No ratings yet
Sta301 Fa24 Syllabus Hluce
13 pages
Statistics History and Definitions
No ratings yet
Statistics History and Definitions
10 pages
Research Methods, Statistics, and Applications Kathrynn A. Adams PDF Download
No ratings yet
Research Methods, Statistics, and Applications Kathrynn A. Adams PDF Download
138 pages
Lesson 13
No ratings yet
Lesson 13
21 pages
Unit II Notes Correlation and Regression
No ratings yet
Unit II Notes Correlation and Regression
19 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
69 pages
Operations Management: Previous Class (#7) : Safety Stock and Newsvendor Model
No ratings yet
Operations Management: Previous Class (#7) : Safety Stock and Newsvendor Model
72 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
25 pages
Excel Project: Word Count Analysis
No ratings yet
Excel Project: Word Count Analysis
12 pages
Normality Check Student
No ratings yet
Normality Check Student
14 pages
Advanced Bayesian Inference Course
No ratings yet
Advanced Bayesian Inference Course
59 pages
Moving Block and Stationary Block Bootstrap For Time Series Data Darren Keeley
No ratings yet
Moving Block and Stationary Block Bootstrap For Time Series Data Darren Keeley
4 pages
Question 1 (4 Points) : © B.M. Wouters (Uva), 2020
No ratings yet
Question 1 (4 Points) : © B.M. Wouters (Uva), 2020
4 pages
Data Analysis for Researchers
No ratings yet
Data Analysis for Researchers
76 pages
Project 1: Analyze The Healthcare Cost and Utilization in Wisconsin Hospitals
0% (1)
Project 1: Analyze The Healthcare Cost and Utilization in Wisconsin Hospitals
13 pages
Soifer Shadow QMMR 2020 18-2
No ratings yet
Soifer Shadow QMMR 2020 18-2
10 pages
Transportation Planning Analysis
No ratings yet
Transportation Planning Analysis
13 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Class Size vs Test Score Analysis
No ratings yet
Class Size vs Test Score Analysis
5 pages
ANOVA Poplar-Trees
No ratings yet
ANOVA Poplar-Trees
3 pages
Business Research Methods: Bivariate Analysis - Tests of Differences
No ratings yet
Business Research Methods: Bivariate Analysis - Tests of Differences
56 pages
Banerjee 2016 Spatial Data Analysis
No ratings yet
Banerjee 2016 Spatial Data Analysis
17 pages
Persamaan Multiple Non-Linear Regression
No ratings yet
Persamaan Multiple Non-Linear Regression
4 pages
Flag Jogo MERDA
No ratings yet
Flag Jogo MERDA
6 pages
STAT 1005 Test 2 Instructions
No ratings yet
STAT 1005 Test 2 Instructions
8 pages

Cptuto

Uploaded by

Cptuto

Uploaded by

Distribution-Free Uncertainty Quantification

A short introduction to Conformal Prediction

• Book reference: Vovk et al. (2005) (new edition in 2022)

All material is freely accessible on this webpage, including sources.

This tutorial by Margaux Zaffran (2023) is licensed under a Creative Commons

−10 −10 −10

,→ Same predictions, yet 3 distinct underlying phenomena!

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

• Quantile level β ∈ [0, 1]

Example of quantile: the median

Similarly, let qβ,inf (X1 , . . . , Xn ) := bβ × nc smallest value of (X1 , . . . , Xn )

• The Bayes predictor depends on the chosen loss function.

:= argmin E [`(Y , f (X ))]

• Quantile level β ∈ [0, 1]

• Link between the pinball loss and the quantiles?

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

Train Cal Test

I Learn (or get) µ̂

I Compute the (1 − α) empirical

I Build Cbα (x): [µ̂(x) ± q1−α (S)]

5. Compute the 1 − α quantile of these scores, noted q1−α (S)

where L designates the joint distribution.

Examples of exchangeable sequences

Additionally, if the scores {Si }i∈Cal ∪ {Sn+1 } are a.s. distinct:

Lemma (Quantile lemma)

Additionally, if U1 , . . . , Un , Un+1 are almost surely distinct, then:

When (Xi , Yi )n+1

First note that Un+1 ≤ qβ (U1 , . . . , Un , +∞) ⇐⇒ Un+1 ≤ qβ (U1 , . . . , Un , Un+1 ).

By exchangeability, rank(Un+1 ) ∼ U{1, . . . , n + 1}. Thus:

Additionally, if the scores {Si }i∈Cal ∪ {Sn+1 } are a.s. distinct:

no coverage marginal conditional

I Build Cbα (x): [µ̂(x) ± q1−α (S)]

Without distribution assumption,

• Approximate conditional coverage

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

Train Cal Test

I Learn (or get) QR

Proof: application of the quantile lemma.

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

Proof: application of the quantile lemma.

3 black-box around a “us- black-box around a “usable” adaptive

7 not adaptive limited adaptiveness no black-box around a “us-

Ex: Yi ∈ {“dog”, “tiger”, “cat”}, with α = 0.1

• Scores on the calibration set

• q1−α (S) = 0.65

Ex: Y ∈ {“dog”, “tiger”, “cat”}, with α = 0.1

• Scores on the calibration set

The standard classification conformity score function leads to:

3 smallest prediction sets on average

⇒ Other score functions can be built to improve adaptiveness

1. Sort in decreasing order p̂ σ(1) (X ) ≥ . . . ≥ p̂ σ(C ) (X )

Ex: Y ∈ {“dog”, “tiger”, “cat”}, with α = 0.1

• Scores on the calibration set

• Simple procedure which quantifies the uncertainty of any predictive model Â

• Conditional coverage (∼ Previous Section)

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

Split Conformal Prediction (SCP)

Avoiding data splitting: full conformal and out-of-bags approaches

SCP suffers from data splitting:

Can we avoid splitting the data set?

• Full (or transductive) Conformal Prediction

For any candidate (Xn+1 , y ):

1. Get Ây by training A on {(X1 , Y1 ), . . . , (Xn , Yn )} ∪ {(Xn+1 , y )}

3 Test point treated in the same way than train points

Definition (Symmetrical algorithm)

Full CP enjoys finite sample guarantees proved in Vovk et al. (2005).

(i) (Xi , Yi )n+1

FCP sets with an interpolating algorithm

• Â = A ((x1 , y1 ), . . . , (xn+1 , yn+1 ))