0% found this document useful (0 votes)
15 views13 pages

Variable Selection - v2

The document discusses various Bayesian variable selection methods, including Gibbs Variable Selection (GVS) and Stochastic Search Variable Selection (SSVS), highlighting their development and theoretical foundations. It emphasizes the importance of prior distributions and tuning parameters in improving model performance and mixing. Additionally, it provides R code examples for implementing these methods in practical applications, such as analyzing low birth weight data in Georgia.

Uploaded by

Ahmed HAMIMES
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views13 pages

Variable Selection - v2

The document discusses various Bayesian variable selection methods, including Gibbs Variable Selection (GVS) and Stochastic Search Variable Selection (SSVS), highlighting their development and theoretical foundations. It emphasizes the importance of prior distributions and tuning parameters in improving model performance and mixing. Additionally, it provides R code examples for implementing these methods in practical applications, such as analyzing low birth weight data in Georgia.

Uploaded by

Ahmed HAMIMES
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Bayesian variable selection

Gibbs VS versus Spike and slab VS


 Gibbs VS (GVS)
 Developed from Ku and Mallick (KM) after the paper:
 L. Kuo and B. Mallick(1998) Variable Selection for regression
Models, Sankhya,60,B, 65—81
 See also:

P. Dellaportas and J. Forster and I. Ntzoufras (2002),


On Bayesian Model and Variable Selection using MCMC,
Statistics and Computing, 12, 27-36
R. B. O'Hara and M. J. Sillanpaa (2009) A Review of Bayesian
Variable Selection Methods: What, How, and Which
Bayesian Analysis, 4, 85-118
Kuo and Mallick Selection
p
 Basic model:
y i =  β j xij + ei
j =1

β j ~ N (0,τ β−1 )
ei  N (0,τ y−1 )

 A special case of GVS:


 Introduce an indicator : I j p
 And formulate the model as y i =  β j I j xij + ei
j =1
 Assume that
Pr(I j , β j ) = Pr(I j ).Pr( β j )
Notes
 Independent priors on the indicator and regression
parameter
 For sampling when I j = 0 the prior for β j is the full
conditional distribution
 If the prior is too vague then mixing will be poor.
 Usually the prior for the indicator is I j  Bern( p j )
 Should we have a hyperprior for p j ?
Gibbs variable Selection (GVS)
 Avoids the problem of poor mixing with vague priors
by assigning a pseudo prior
 Here it is a assumed that Pr(I j , β j ) = Pr( β j | I j )Pr(I j )
 And Pr( β | I ) = (1 − I )N ( μ,S) + I N (0,τ ) −1
j j j j β

 where μ ,S τ −1
are tuning constants and β is a fixed
variance
 Tuning has to be done to improve mixing
Stochastic Search Variable
Selection (SSVS)
 Assume a mixture of spike and slab

Pr( β j | I j ) = (1 − I j )N (0,τ β−1 ) + I j N (0, gτ β−1 )

τ −1
 Where the spike variance is small ie β small
 Have to tune g and variance and these choices affect
the posterior estimation
 A random effect version could be assumed where τ β−1
has to be estimated also (with g fixed)
Adaptive Shrinkage
 Specify a prior directly on βj
 But with prior control:
β j | τ 2j  N (0,τ 2j )
and P (τ 2j )
 Could use the Jeffreys prior P (τ j ) ∝ 1/ τ j
2 2

 which leads to a improper posterior :


 unless?.............
 No tuning parameter however.
Laplacian shrinkage and others
 Could also use an exponential prior for j with mean μ
τ 2

 If you integrate over the variance components you get


a double exponential (Laplace) prior for P ( β j | μ )

 Random effect variant of the method where μ


has a prior is the Bayesian Lasso
 Reversible Jump McMC (RJMCMC) can also be used
of course
Comparison of methods
(from O’Hara and Sillanpää)
Some Code
 The R code for ABC-MC would be easy to implement
and probably needs M-H steps to help of course.
 WinBUGS code here for Kuo and Mallick is simple
(var_select_simple.odc)
 Data example in VAR_SELECTexample.odc
model{
Kuo and for(i in 1:N){
x1c[i]<-(x1[i]-mean(x1[]))/sd(x1[])
Mallick x2c[i]<-(x2[i]-mean(x2[]))/sd(x2[])
x3c[i]<-(x3[i]-mean(x3[]))/sd(x3[])

Winbugs x4c[i]<-(x4[i]-mean(x4[]))/sd(x4[])
x5c[i]<-(x5[i]-mean(x5[]))/sd(x5[])
y1[i]~dbin(p1[i],n[i])
code: 5 logit(p1[i])<-
b0+b[1]*psi[1]*x1c[i]+b[2]*psi[2]*x2c[i]+b[3]*ps
predictors i[3]*x3c[i]+b[4]*psi[4]*x4c[i]+b[5]*psi[5]*x5c[i
]+v[i]
and y1 v[i]~dnorm(0,tauV)
}
outcome, for( j in 1: 5){
psi[j]~dbern(p[j])
p[j]~dbeta(0.5,0.5)}
logistic b0~dnorm(0,taub0)
for(j in 1:5){b[j]~dnorm(0,taub[j])
regression ....}
Example
code in VAR_SELECTexample.odc
 Low birth weight in Georgia counties (159) in 2007
 binomial example with total births as denominator
 Predictors:
 Population density (x1)
 Black proportion (x2)
 Median income/1000 (x3)
 % below poverty (x4)
 Unemployment rate (x5)
Model run results
 Burnin: 30,000; thin: 10; sample size: 2508
 Clear evidence that x2 and x4 are really important:
 Continually selected in converged sample

mean sd MC_error val2.5pc median val97.5pc start sample

psi[1] 0.03907 0.1938 0.009205 0 0 1 30001 2508


psi[2] 1 0 2.00E-12 1 1 1 30001 2508

psi[3] 0.08533 0.2794 0.009163 0 0 1 30001 2508

psi[4] 1 0 2.00E-12 1 1 1 30001 2508

psi[5] 0.05104 0.2201 0.00444 0 0 1 30001 2508

You might also like