0% found this document useful (0 votes)

40 views41 pages

Model Selection and BIC Explained

Uploaded by

Jay Tan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views41 pages

Model Selection and BIC Explained

Uploaded by

Jay Tan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Lecture 7

Model Selection

Semester 1 AY 2024/2025

1
Model Selection

• We touched on this topic briefly before.

• How should we pick an order for, say, AR(p) model for GDP?
• There are sets of tools and methods that are widely used, but
there is no universally agreed upon methodology.
• The superior method depends on the set of models considered,
application at hand, sample size, and the true model.

2
Model Selection Tradeoffs

• Fundamental tradeoff in model selection: estimation error vs.

model misspecification.
• More variables = more parameters = more estimation error.
• Fewer variables = less estimation error. More chance to miss
important predictors (misspecification).
• Due to this, in-sample fit does not in general translate to
better forecast performance out-of-sample.

3
Selection Based on Fit?

• Why don’t we just pick a model with the smallest SSR or

largest R 2 ?
• We know the two always imply better fit when extra variables
are added, so we could argue for SER and adjusted-R 2 .
• The latter penalize for model complexity, but not enough to
yield useful selection criteria.

4
Selection Based on Testing?

• We could test whether coefficients for some variables are zero.

• If we can reject the null, keep them in the model. If cannot,
remove.
• Can either use sequential t-tests or sequential F-tests
(something we used to select polynomial order in EC3303).
• Popular with some applied researchers, but not designed to
select best forecast model and can perform badly.

5
Example: Real GDP Growth (F-test)
Data: US annualized GDP growth rate: 1947q1 — 2017q2
• Lags 3-4 jointly insignificant

6
• Lags 2-4 jointly insignificant:

• Lags 1-4 jointly significant.

• Thus, F-test picks AR(1).

7
Example: Real GDP Growth (t-test)

Thus, t-test also picks AR(1). 8

Sequential Test Summary

• Intuitive, makes sense to applied researchers.

• F-tests preferred to t-tests in presence of high correlation
among regressors. See Kozbur (2020, ECMA).
• Search across models not comprehensive and the outcome is
often path-dependent (e.g., may differ if start with smaller
model and expand vs. “shrinking” a larger model).
• Frequently end up with models containing variables not
present in the true model (overparameterization).

9
Bayesian Criterion

• Let M1 be model 1, and M2 be model 2. Denote data by D.

• Bayes’ theorem:

P(D | M1 )P(M1 )
P(M1 | D) =
P(D | M1 )P(M1 ) + P(D | M2 )P(M2 )

I P(M1 ) and P(M2 ) are priors, i.e., beliefs by the user.

I P(D | M1 ) and P(D | M2 ) come from probabilistic models.
I P(M1 | D) is the posterior probability (beliefs updated by the
data).

10
Bayesian Criterion for AR(p)

• Assume AR(p) with normal errors and uniform priors, then:

BIC

P(M1 | D) ∝ exp −
2
SSR

BIC = T ln + k ln(T )
T
where k is number of estimated coefficients, and T is the
sample size.
• This is the famous Schwarz or Bayesian information criterion
(BIC or SIC in short).
• The model with the smallest BIC has the highest posterior
probability.

11
Alternative Versions of BIC
• The more rigorous formula (reported by Stata) is:

BIC = −2L + k ln(T )

SSR

2L = −T (ln(2π) + 1) − T ln
T
where L is the Gaussian log-likelihood.
• The difference in the formula above and simplification in the
previous slide is just a constant – it does not change with
models.
• Another frequently used definition (reported by R):
SSR ln(T )

BIC = ln +k
T T
• All definitions select the same model ordering.
12
Tradeoff in BIC

• The larger AR(p) model will have larger k and smaller SSR:

SSR

BIC = T ln + k ln(T )
T
• First term goes down, but second goes up: A tradeoff
between fit and model complexity.
• We typically compute BIC for all the considered models, say
AR(1) to AR(4) for quarterly data, and pick the one with the
lowest BIC.

13
Popular Mistake in Using BIC: Different Sample

• The larger AR(p) model will also reduce your sample size as
more observations are needed to construct lags.
• You need to make sure the models are estimated using the
same sample, otherwise you are comparing apples with
oranges.
• Convenient way to do this in Stata is to locate the time/date
(tp+1 ) for the (p + 1)st observation (p is the highest order
considered), and run regressions with if time >= tp+1
condition.

14
Popular Mistake in Using BIC: Different Sample

15
Example: Revisit GDP Growth

• Consider models AR(1) through AR(4).

• Data runs from 1947:Q2 to 2017:Q2 – 282 observations.
I AR(1) would use 1947:Q3 onwards;
I AR(2) – 1947:Q4 onwards;
I AR(3) – 1948:Q1 onwards;
I AR(4) – 1948:Q2 onwards.
• Thus, for the purpose of comparing BIC, we use 1948:Q2 for
all models. Add if time >= tq(1948q2) when regressing.

16
Example: GDP Growth AR(1)

. reg gdpgr L.gdpgr if time>=tq(1948q2), r

Linear regression Number of obs = 277

F(1, 275) = 29.48
Prob > F = 0.0000
R-squared = 0.1293
Root MSE = 3.5293

------------------------------------------------------------------------------
| Robust
gdpgr | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
gdpgr |
L1. | .3591947 .0661568 5.43 0.000 .2289567 .4894328
|
_cons | 1.998005 .3231902 6.18 0.000 1.361764 2.634247
------------------------------------------------------------------------------

. estimates store ar1

Note the estimates store command is useful for displaying all 4

BIC together later using estimates stats

17
Example: GDP Growth AR(2)

. reg gdpgr L(1/2).gdpgr if time>=tq(1948q2), r

Linear regression Number of obs = 277

F(2, 274) = 17.30
Prob > F = 0.0000
R-squared = 0.1416
Root MSE = 3.5106

------------------------------------------------------------------------------
| Robust
gdpgr | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
gdpgr |
L1. | .3162249 .0740875 4.27 0.000 .1703719 .4620779
L2. | .1189849 .0725869 1.64 0.102 -.0239141 .2618839
|
_cons | 1.757554 .3555542 4.94 0.000 1.057589 2.457519
------------------------------------------------------------------------------

. estimates store ar2

18
Example: GDP Growth AR(3)

. reg gdpgr L(1/3).gdpgr if time>=tq(1948q2), r

Linear regression Number of obs = 277

F(3, 273) = 12.21
Prob > F = 0.0000
R-squared = 0.1529
Root MSE = 3.4939

------------------------------------------------------------------------------
| Robust
gdpgr | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
gdpgr |
L1. | .3294918 .0741839 4.44 0.000 .1834466 .4755371
L2. | .1548351 .079526 1.95 0.053 -.0017271 .3113972
L3. | -.1138022 .0689545 -1.65 0.100 -.2495525 .021948
|
_cons | 1.960562 .368287 5.32 0.000 1.235518 2.685605
------------------------------------------------------------------------------

. estimates store ar3

19
Example: GDP Growth AR(4)

. reg gdpgr L(1/4).gdpgr if time>=tq(1948q2), r

Linear regression Number of obs = 277

F(4, 272) = 9.75
Prob > F = 0.0000
R-squared = 0.1577
Root MSE = 3.4903

------------------------------------------------------------------------------
| Robust
gdpgr | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
gdpgr |
L1. | .3207103 .0742694 4.32 0.000 .1744943 .4669262
L2. | .166058 .082436 2.01 0.045 .0037643 .3283517
L3. | -.0888071 .0692083 -1.28 0.201 -.2250591 .047445
L4. | -.0749689 .0738667 -1.01 0.311 -.220392 .0704542
|
_cons | 2.108767 .4110886 5.13 0.000 1.299447 2.918087
------------------------------------------------------------------------------

. estimates store ar4

20
Example: BIC for All Models

• First thing first: Check the sample sizes! All should have the
same (277 here). If not, we need to redo the whole exercise
fixing the same sample.
• BIC also picks AR(1) in this example.

. estimates stats ar1 ar2 ar3 ar4 /*show AIC/BIC for all models at once*/

Akaike’s information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
ar1 | 277 -760.5395 -741.3694 2 1486.739 1493.987
ar2 | 277 -760.5395 -739.389 3 1484.778 1495.65
ar3 | 277 -760.5395 -737.5635 4 1483.127 1497.623
ar4 | 277 -760.5395 -736.7721 5 1483.544 1501.664
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.

21
Issue With BIC

• This is more or less the theory behind BIC.

• If one of the models is true, and the others are false – BIC will
pick the model most likely to be true (or best approximation if
all are false) – consistency property.
• However, BIC selection not specifically designed to produce a
good forecast! We are not interested in the true model, we
want good forecasting performance.

22
Selection To Minimize MSFE

• Our goal is to minimize forecast risk, or MSFE:

R(Ŷ ) = E(Y − Ŷ )2

• If we had a good estimate of MSFE, we could simply pick the

model that minimizes it.
• SSR is a bad estimate:
1. Biased (in-sample overfitting).
2. Decreases as more variables are added – selects largest model.

23
The Bias of SSR

• It can be shown that approximately:

E(SSR) = E(MSFE) − 2σ 2 k and E(MSFE) = T σ 2

• Shibata (1980) suggested a bias correction:

2k

Sk = SSR 1 +
T
• This is known as Shibata criterion.

24
From Shibata to Akaike

• If you take Shibata’s formula, divide by T , take log and

multiply by T , you will obtain:
Sk SSR 2k

T ln = T ln + T ln 1 +
T T T
SSR

≈ T ln + 2k
T

• This is the Akaike information criterion (AIC).

• Looks similar to BIC, but 2 is replaced with ln(T ).
I BIC puts a harsher penalty on model size. ln(T ) > 2 for
T > 7.
I E.g., for T = 277 in our example, ln(T ) = 5.62 – almost three
times larger.

25
Motivation Behind AIC

• AIC is an approximately unbiased estimate of .

1. the MSFE and
2. the Kullback-Leibler information criterion (KLIC)
• KLIC is a loss function of a density forecast. Suppose f (Y ) is
a forecast density, and g(Y ) is a true density, then:

f (Y )

KLIC(f , g) = E ln
g(Y )

Minimizing AIC = minimizing estimated KLIC.

26
Example: AIC for All Models

AIC picked AR(3). BIC picked AR(1).

. estimates stats ar1 ar2 ar3 ar4 /*show AIC/BIC for all models at once*/

Akaike’s information criterion and Bayesian information criterion

27
Akaike’s Result

• Akaike has shown that in a normal AR(p) AIC is an

approximately unbiased estimator of the KLIC.
• So unlike testing or BIC, AIC is designed to find models with
low forecast risk.
• AIC will often select a larger model than BIC.
I Mechanically, it is because the penalty is lower.
I Conceptually, it is because, instead of trying to find a true
model (like BIC is designed to), AIC treats every model as an
approximation and tries to find the one which makes the best
forecast. Includes extra lags if they help forecast better.

28
AIC Asymptotic Properties

• Unlike BIC, not consistent.

• Designed using different consideration: all models considered.
• AIC is asymptotically efficient: if true model not contained in
the considered set, and Model k has the lowest risk, then,

Risk(AIC selection)/Risk(k) → 1 as T → ∞.

• That is, AIC will asymptotically pick the best forecasting

model when the true model is not in the set considered by the
forecaster.

29
Selection Based on Prediction Errors

• Why not compute true out-of-sample forecasts, the associated

forecast errors, and pick the model with the smallest value of
the loss function applied directly to these forecast errors.
• This approach is called Predictive Least Squares (PLS).
• Diebold calls it recursive cross-validation in Ch. 10.
• Originated with Rissanen (1986), a Finnish information
theorist.

30
The PLS Procedure

• You have T observations. Select a “holdout sample” of M

observations. Then, you will make recursive one-step-ahead
pseudo out-of-sample forecasts for P = T − M periods.
• E.g., for AR(1) model you compute:

Ŷt = α̂t−1 + β̂t−1 Yt−1

• Coefficients are estimated using data from [1, . . . , t − 1].

• t goes from M + 1 to T : A total of P recursive estimates.

31
The PLS Procedure (ctd.)

• The out-of-sample forecast errors are:

ẽt = Yt − Ŷt

• This is different from an in-sample residual – a true forecast

error.
• The PLS criterion is the estimated out-of-sample MSFE:
v
u1
u T
PLS = t
X
ẽt2
P t=M+1

• Select model with the smallest PLS. A very popular approach

in applied forecasting.

32
PLS Attractive Features

The advantages are:

• Doesn’t depend on approximations or any distribution theory.

• Can be computed for any forecast method (even for published
forecast surveys) without knowing how forecast was obtained.
• Possibly robust to moderate structural breaks.
• A common measure of “empirical performance” in applied
forecasting.
• Provides σ̂ for forecast intervals.

33
PLS Disadvantages

The disadvantages are:

• Tends to overestimate true MSFE.

• Tends to be over-parsimonious.
• VERY sensitive to the choice of P. No generally accepted
theory for the choice of P.

34
PLS in Stata

• Could be a bit tricky. Either use manual loop or the rolling

command.
• Thankfully, we can just recycle the loop we built for
1-step-ahead recursive window forecasting from Lecture 6!
• This time we will estimate all 4 models and produce 1-step
forecasts at each iteration.
• Then we construct 1-step forecast errors, square them, and
take the average to get the PLS criterion value.

35
PLS in Stata

forvalues p=182/281 {
*AR(1)
qui reg gdpgr L.gdpgr if t>=6 & t<=‘p’ /*note quiet execution of regression*/
predict fit1, xb

*AR(2)
qui reg gdpgr L(1/2).gdpgr if t>=6 & t<=‘p’ /*note quiet execution of regression*/
predict fit2, xb

*AR(3)
qui reg gdpgr L(1/3).gdpgr if t>=6 & t<=‘p’ /*note quiet execution of regression*/
predict fit3, xb

*AR(4)
qui reg gdpgr L(1/4).gdpgr if t>=6 & t<=‘p’ /*note quiet execution of regression*/
predict fit4, xb

*Store 1-step forecasts for all 4 models:

replace yhat1=fit1 if t==(‘p’+1)
replace yhat2=fit2 if t==(‘p’+1)
replace yhat3=fit3 if t==(‘p’+1)
replace yhat4=fit4 if t==(‘p’+1)

drop fit* /*clean up for next iteration -

note it’s easy to drop multiple variables with similar names*/
}

36
PLS in Stata: Results for GDP Growth

• PLS picks AR(2).

• Again, if I change P, this ranking may change.

Model PLS Criterion

AR(1) 2.2424549
AR(2) 2.1929934
AR(3) 2.2205646
AR(4) 2.2404814

37
Model Selection Summary

• Selection on measures of fit or testing inappropriate.

• Feasible criteria: AIC, BIC, PLS
• Hold sample constant for valid comparisons.
• All methods except PLS assume conditional homoskedasticity.
• PLS is sensitive to the choice of P.

38
Before you leave...

• Diebold, Francis X., “Elements of Forecasting,” 4th edition.

Chapter 12.
I Particularly, PLS on p. 212.

39
Appendix

40
Useful Stata Command

• store estimates after estimation

estimates store name

• display AIC/BIC of estimates store under “names” (can show

several at once).

estimates stats names

Model Selection and Model Averaging
No ratings yet
Model Selection and Model Averaging
16 pages
Multicollinearity & Model Selection
No ratings yet
Multicollinearity & Model Selection
30 pages
3rd Module EDBA Contiuation1
No ratings yet
3rd Module EDBA Contiuation1
6 pages
Class2 Slides
No ratings yet
Class2 Slides
26 pages
Model Selection-Handout PDF
No ratings yet
Model Selection-Handout PDF
57 pages
ECON7350 Univariate Time Series - II: Eric Eisenstat
No ratings yet
ECON7350 Univariate Time Series - II: Eric Eisenstat
24 pages
Week8 Lecture 1 ML SPR25
No ratings yet
Week8 Lecture 1 ML SPR25
20 pages
Introduction To Statistical Modeling With SAS/STAT Software
No ratings yet
Introduction To Statistical Modeling With SAS/STAT Software
60 pages
Lecture 2
No ratings yet
Lecture 2
35 pages
ECON835 Lecture Notes Part 2 Maximum Likelihood Through Panel Data (Fall 2014)
No ratings yet
ECON835 Lecture Notes Part 2 Maximum Likelihood Through Panel Data (Fall 2014)
68 pages
14.384 Time Series Analysis: Mit Opencourseware
No ratings yet
14.384 Time Series Analysis: Mit Opencourseware
6 pages
Module07 - Model Selection and Regularization
No ratings yet
Module07 - Model Selection and Regularization
46 pages
Empirical Finance8
No ratings yet
Empirical Finance8
11 pages
Lecture 18 Build Arima
No ratings yet
Lecture 18 Build Arima
22 pages
Diagnostic Tests2
No ratings yet
Diagnostic Tests2
25 pages
Regression Models Course Notes
No ratings yet
Regression Models Course Notes
102 pages
EC501 Lecture 03
No ratings yet
EC501 Lecture 03
30 pages
Applied Econometrics Module
100% (1)
Applied Econometrics Module
142 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
Lec2 17
No ratings yet
Lec2 17
27 pages
Half Life Tsay Notes
No ratings yet
Half Life Tsay Notes
25 pages
OLS Assumptions & Issues Guide
No ratings yet
OLS Assumptions & Issues Guide
4 pages
Model Selection for Statisticians
No ratings yet
Model Selection for Statisticians
41 pages
PSTAT 174/274 Lecture Notes 6: Model Identification AND Estimation
No ratings yet
PSTAT 174/274 Lecture Notes 6: Model Identification AND Estimation
19 pages
SRM Notes
100% (1)
SRM Notes
38 pages
Statistical Modelling: Regression: Choosing The Independent Variables
No ratings yet
Statistical Modelling: Regression: Choosing The Independent Variables
14 pages
LM02 Evaluating Regression Model Fit and Interpreting Model Results IFT Notes
No ratings yet
LM02 Evaluating Regression Model Fit and Interpreting Model Results IFT Notes
9 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Stat2 1st Edition Ann R. Cannon Available Instanly
No ratings yet
Stat2 1st Edition Ann R. Cannon Available Instanly
142 pages
ECON 332 Business Forecasting Methods Prof. Kirti K. Katkar
No ratings yet
ECON 332 Business Forecasting Methods Prof. Kirti K. Katkar
38 pages
Econometric S
100% (1)
Econometric S
348 pages
Maddala G.S. Introduction To Econometrics
100% (2)
Maddala G.S. Introduction To Econometrics
637 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
(Christian Gourieroux (Auth.) ) ARCH Models and Fi
No ratings yet
(Christian Gourieroux (Auth.) ) ARCH Models and Fi
233 pages
Statistical Regression
No ratings yet
Statistical Regression
32 pages
ETC2410 Introductory Econometrics Unit Overview
No ratings yet
ETC2410 Introductory Econometrics Unit Overview
42 pages
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
No ratings yet
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
362 pages
Lecture 1: Introduction and Key Concepts
No ratings yet
Lecture 1: Introduction and Key Concepts
62 pages
Selecting Amongst Large Classes of Models: Brian D. Ripley
No ratings yet
Selecting Amongst Large Classes of Models: Brian D. Ripley
38 pages
Lec7 Model
No ratings yet
Lec7 Model
8 pages
lOFTUS ET AL
No ratings yet
lOFTUS ET AL
17 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Estimation, Diagnosis, and Identification of Time Series Models
No ratings yet
Estimation, Diagnosis, and Identification of Time Series Models
15 pages
Rio Thesis - 054559
No ratings yet
Rio Thesis - 054559
53 pages
Stepwise Regression
No ratings yet
Stepwise Regression
4 pages
Econometrics: CLM & OLS Basics
No ratings yet
Econometrics: CLM & OLS Basics
11 pages
RPoE PDF
No ratings yet
RPoE PDF
253 pages
Slides 1 Handout
No ratings yet
Slides 1 Handout
23 pages
Applied Econometrics Module
100% (2)
Applied Econometrics Module
141 pages
BOOK MADDLA Econometric - Introduction To Econometrics
0% (1)
BOOK MADDLA Econometric - Introduction To Econometrics
637 pages
Mad Dala
No ratings yet
Mad Dala
637 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Anova-10671280: Download PDF
100% (1)
Anova-10671280: Download PDF
185 pages
Regression GL M
No ratings yet
Regression GL M
315 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Regression Classification
No ratings yet
Regression Classification
106 pages
EC4333 HW2 Sol
No ratings yet
EC4333 HW2 Sol
5 pages
Ec4333 HW1
No ratings yet
Ec4333 HW1
3 pages
EC4333 HW1 Sol
No ratings yet
EC4333 HW1 Sol
7 pages
L4 Modeling Cycles
No ratings yet
L4 Modeling Cycles
80 pages
Static Calibration Process: Ref: ANSI / ISA (1979) Standard
No ratings yet
Static Calibration Process: Ref: ANSI / ISA (1979) Standard
20 pages
X3DH Key Agreement Protocol Guide
No ratings yet
X3DH Key Agreement Protocol Guide
11 pages
HKUST - School of Engineering - BSC in Computer Science (For Students Admitted in 2019-20 Under The 4-Year Degree)
No ratings yet
HKUST - School of Engineering - BSC in Computer Science (For Students Admitted in 2019-20 Under The 4-Year Degree)
3 pages
IBDP1 Mathematics AA HL Diag MS
No ratings yet
IBDP1 Mathematics AA HL Diag MS
9 pages
Statistical Treatment of Data Writing Guide
No ratings yet
Statistical Treatment of Data Writing Guide
4 pages
WWW Analyticsvidhya Com Blog 2021 10 Support Vector Machines
No ratings yet
WWW Analyticsvidhya Com Blog 2021 10 Support Vector Machines
21 pages
Normalization: Normalization Techniques at A Glance
No ratings yet
Normalization: Normalization Techniques at A Glance
5 pages
Ch04 ClassProblems
No ratings yet
Ch04 ClassProblems
11 pages
DFT Properties
No ratings yet
DFT Properties
58 pages
DAA Question Bank
No ratings yet
DAA Question Bank
5 pages
Probability Distribution Analysis
No ratings yet
Probability Distribution Analysis
11 pages
CO Nos. Course Outcomes Level of Learning Domain (Based On Revised Bloom's)
No ratings yet
CO Nos. Course Outcomes Level of Learning Domain (Based On Revised Bloom's)
15 pages
Digital 02 00024 v2
No ratings yet
Digital 02 00024 v2
19 pages
Business Statistics
No ratings yet
Business Statistics
4 pages
Dhaka University Math 4th Year Result
No ratings yet
Dhaka University Math 4th Year Result
1 page
Food Demand Forecast Final Report
No ratings yet
Food Demand Forecast Final Report
30 pages
Khamene 2000
No ratings yet
Khamene 2000
10 pages
Six Sigma Process Map Guide
No ratings yet
Six Sigma Process Map Guide
1 page
Fuzzy and Neural Control
No ratings yet
Fuzzy and Neural Control
181 pages
TEITA170 Saral Mane Seminar Report
No ratings yet
TEITA170 Saral Mane Seminar Report
23 pages
AI Unit Test 1: Questions & Instructions
No ratings yet
AI Unit Test 1: Questions & Instructions
1 page
NLP Word Embeddings Explained
No ratings yet
NLP Word Embeddings Explained
55 pages
Simbu
No ratings yet
Simbu
3 pages
Convolution Codes Overview
No ratings yet
Convolution Codes Overview
14 pages
4 Core Pure - Exam Questions Up To 2023
No ratings yet
4 Core Pure - Exam Questions Up To 2023
373 pages
Linear Algebra in Image Compression - SVD and DCT
No ratings yet
Linear Algebra in Image Compression - SVD and DCT
10 pages
Surya Santoso, H. Wayne Beaty, Roger C. Dugan, Mark F. McGranaghan - Electrical Power Systems Quality
No ratings yet
Surya Santoso, H. Wayne Beaty, Roger C. Dugan, Mark F. McGranaghan - Electrical Power Systems Quality
10 pages
Week 2
No ratings yet
Week 2
8 pages
PBL PPT
No ratings yet
PBL PPT
11 pages
String Searching Algorithms Slides
100% (1)
String Searching Algorithms Slides
102 pages

Model Selection and BIC Explained

Uploaded by

Model Selection and BIC Explained

Uploaded by

Lecture 7

• We touched on this topic briefly before.

• Fundamental tradeoff in model selection: estimation error vs.

• Why don’t we just pick a model with the smallest SSR or

• We could test whether coefficients for some variables are zero.

• Lags 1-4 jointly significant.

• Thus, F-test picks AR(1).

Thus, t-test also picks AR(1). 8

• Intuitive, makes sense to applied researchers.

• Let M1 be model 1, and M2 be model 2. Denote data by D.

I P(M1 ) and P(M2 ) are priors, i.e., beliefs by the user.

• Assume AR(p) with normal errors and uniform priors, then:

BIC = −2L + k ln(T )

• Consider models AR(1) through AR(4).

. reg gdpgr L.gdpgr if time>=tq(1948q2), r

Linear regression Number of obs = 277

. estimates store ar1

Note the estimates store command is useful for displaying all 4

. reg gdpgr L(1/2).gdpgr if time>=tq(1948q2), r

Linear regression Number of obs = 277

. estimates store ar2

. reg gdpgr L(1/3).gdpgr if time>=tq(1948q2), r

Linear regression Number of obs = 277

. estimates store ar3

. reg gdpgr L(1/4).gdpgr if time>=tq(1948q2), r

Linear regression Number of obs = 277

. estimates store ar4

Akaike’s information criterion and Bayesian information criterion

• This is more or less the theory behind BIC.

• Our goal is to minimize forecast risk, or MSFE:

• If we had a good estimate of MSFE, we could simply pick the

• It can be shown that approximately:

E(SSR) = E(MSFE) − 2σ 2 k and E(MSFE) = T σ 2

• Shibata (1980) suggested a bias correction:

• If you take Shibata’s formula, divide by T , take log and

• This is the Akaike information criterion (AIC).

• AIC is an approximately unbiased estimate of .

Minimizing AIC = minimizing estimated KLIC.

AIC picked AR(3). BIC picked AR(1).

Akaike’s information criterion and Bayesian information criterion

• Akaike has shown that in a normal AR(p) AIC is an

• Unlike BIC, not consistent.

• That is, AIC will asymptotically pick the best forecasting

• Why not compute true out-of-sample forecasts, the associated

• You have T observations. Select a “holdout sample” of M

Ŷt = α̂t−1 + β̂t−1 Yt−1

• Coefficients are estimated using data from [1, . . . , t − 1].

• The out-of-sample forecast errors are:

• This is different from an in-sample residual – a true forecast

• Select model with the smallest PLS. A very popular approach

The advantages are:

• Doesn’t depend on approximations or any distribution theory.

The disadvantages are:

• Tends to overestimate true MSFE.

• Could be a bit tricky. Either use manual loop or the rolling

*Store 1-step forecasts for all 4 models:

drop fit* /*clean up for next iteration -

• PLS picks AR(2).

Model PLS Criterion

• Selection on measures of fit or testing inappropriate.

• Diebold, Francis X., “Elements of Forecasting,” 4th edition.

• store estimates after estimation

estimates store name

• display AIC/BIC of estimates store under “names” (can show

estimates stats names

You might also like