PRI workshop
Multilevel modeling: A comparison of software packages
Juan Leon Jara Almonte
Pennsylvania State University
814 Oswald
jjl292@psu.edu
March 2009
I. Multilevel Modeling
Multilevel models are statistical models used to analyze data that have a hierarchical or
nested structure, for example:
-
Students nested in schools
Children nested in households
Patients nested in hospitals
Criminals nested in jails
Repeated measures nested in individuals (e.g.: students, patients)
Thus, to address this issues, using a multilevel approach will be a more accurate model
than a ordinary least squares regression (OLS). The names received for this model in the
different fields or literature around this topic is: random coefficient model, variance
component model, and hierarchical linear model.
The basic two level model:
Level 1
Yij
0j + 1jXij + rij
- equation 1
00 + 01Zj + u0j
10
- equation 2
- equation 3
Level 2
0j
1j
=
=
Substitution of ( 2 ) and ( 3 ) in ( 1 )
Yij
00 + 10Xij + 01Zj + rij + u0j
- equation 4
Where
i
j
0j
00
Xij
10
rij and u0j
: subjects
: groups
: the mean of Yij for group j
: the grand mean of Yij
: the predictor variable for the subject i in the group j
: the fixed effect of the predictor variable Xij across groups
: random components normally distributed and independent of each other
We could add more predictors at level 1 and we could add random components at level 2.
However, the basic idea of multilevel modeling is to decompose the component variance
in two non-observables: one at a subject level and one at a group level.
To estimate this model, Maximum Likelihood estimators (MLE) are generally used.
MLEs require an iterative procedure until the model converges, allowing us to obtain
unbiased and efficient estimators.
Assumptions of OLS and Multilevel modeling
OLS estimation
Multilevel estimation
Linearity: Function
Linearity: at each level
Normality: Residuals
Normality: at each level
Homoscedasticity: Constant variance
Homoscedasticity: At level 1
Independence: between observations
Independence:
- No correlation between residuals at
level 1 and level 2.
- Observations in the highest level are
independent of each other.
Mean indicators for multilevel modeling
Proportion of variance:
At level 1: rij / (rij + u0j )
At level 2: u0j / (rij + u0j)
intraclass correlation coefficient
Variance explained (Proxy for an R-square):
Take as a base the null model
At level 1: ( rij(null) - rij(estimated))/ rij(null)
At level 2: ( u0j(null) - u0j(estimated))/ u0j(null)
Datasets that will be employed:
Student_level.dta/sav :
School_level.dta/sav :
Student level variables for Sixth grade students
Classroom and school level variables for each student
Variables in the dataset for the exercises:
idschool
idclass
idstudent
math
age
gender
nsibbl
nuclear
ses_i
indig
gestion
tipo
area
prog_ebi
:
:
:
:
:
:
:
:
:
:
:
:
:
:
bserv
dtotal
:
:
Schools ID
Classroom code
Students code
Math achievement in 2004
Age (years)
1 Female 0 Male
Number of siblings
The student lives with both parents
Socioeconomic index
The student has as a mother tongue an indigenous language
The school is public
Full grade school
1 Urban 0 Rural
Schools that participate in the government-headed bilingual
program
Number of basic services at school
Days of class during the year
II. Mixed models STATA
*There is two commands to run a multilevel model in STATA
*xtmixed
*gllamm
*XTMIXED
*********
*1st - Identified the dependent variable: math
*2nd - Number of levels: 2
*3rd - Identified level two id: idschool
*4th - With or without random coefficients: Without
*5th - Identified predictors at level 1: age gender nsibbl nuclear ses_i
indig
*6th - Identified predictors at level 2: gestion tipo area bserv dtotal
prog_ebi
*Syntax*
********
*Note: the id for level two has to be numeric
destring
idschool, replace
*****************************************a two level model
*xtmixed [dependent variable] [independent variables] || [id level 2]:,
[type of estimation: reml mle] [random component: var sd]
*****************************************a three level model
*xtmixed [dependent variable] [independent variables] || [id level 3]:
|| [id level 2]:, [type of estimation: reml mle] [random component: var
sd]
*****************************************a two level model with random
coefficients
*xtmixed [dependent variable] [independent variables] || [id level 2]:
[variable with random coefficient], [type of estimation: reml mle]
[random component: var sd]
******************************************************Random components
*var(_cons)
: Level 2 variance
*var(Residual)
: Level 1 variance
*Examples*
**********
*Null Model
xtmixed
math || idschool:, var
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0:
Iteration 1:
log restricted-likelihood = -55799.507
log restricted-likelihood = -55799.507
Computing standard errors:
Mixed-effects REML regression
Group variable: idschool
Log restricted-likelihood = -55799.507
Number of obs
Number of groups
=
=
10327
564
Obs per group: min =
avg =
max =
1
18.3
30
Wald chi2(0)
Prob > chi2
=
=
.
.
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_cons |
291.5814
2.171078
134.30
0.000
287.3262
295.8366
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------idschool: Identity
|
var(_cons) |
2422.165
160.494
2127.172
2758.067
-----------------------------+-----------------------------------------------var(Residual) |
2489.893
35.66315
2420.967
2560.782
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 4803.15 Prob >= chibar2 = 0.0000
*Model 1 - Student variables
xtmixed
math age gender nsibbl nuclear ses_i indig ||
idschool:, var
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0:
Iteration 1:
log restricted-likelihood = -54213.956
log restricted-likelihood = -54213.956
Computing standard errors:
Mixed-effects REML regression
Group variable: idschool
Log restricted-likelihood = -54213.956
Number of obs
Number of groups
=
=
10084
564
Obs per group: min =
avg =
max =
1
17.9
30
Wald chi2(6)
Prob > chi2
=
=
581.48
0.0000
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -8.197397
.5646148
-14.52
0.000
-9.304021
-7.090772
gender | -10.04743
1.053241
-9.54
0.000
-12.11175
-7.983119
nsibbl | -1.418298
.2599372
-5.46
0.000
-1.927766
-.9088307
nuclear | -4.110831
1.19185
-3.45
0.001
-6.446815
-1.774847
ses_i |
.4424885
.0532977
8.30
0.000
.338027
.54695
indig | -16.40169
2.547377
-6.44
0.000
-21.39446
-11.40892
_cons |
360.9887
9.142899
39.48
0.000
343.069
378.9085
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------idschool: Identity
|
var(_cons) |
1418.413
111.9487
1215.126
1655.709
-----------------------------+-----------------------------------------------var(Residual) |
2415.155
35.33061
2346.892
2485.404
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 1760.38 Prob >= chibar2 = 0.0000
*Model 2 - School variables
xtmixed
math age gender nsibbl nuclear ses_i indig gestion
tipo area bserv dtotal prog_ebi || idschool:, var
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0:
Iteration 1:
log restricted-likelihood = -54042.506
log restricted-likelihood = -54042.506
Computing standard errors:
Mixed-effects REML regression
Group variable: idschool
Log restricted-likelihood = -54042.506
Number of obs
Number of groups
=
=
10084
564
Obs per group: min =
avg =
max =
1
17.9
30
Wald chi2(12)
Prob > chi2
=
=
1170.77
0.0000
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -7.704743
.5607968
-13.74
0.000
-8.803884
-6.605601
gender | -10.23747
1.045682
-9.79
0.000
-12.28697
-8.187969
nsibbl | -1.104413
.2590295
-4.26
0.000
-1.612102
-.5967245
nuclear | -3.823928
1.185488
-3.23
0.001
-6.147442
-1.500415
ses_i |
.1828045
.0559183
3.27
0.001
.0732067
.2924024
indig | -13.38815
2.510379
-5.33
0.000
-18.3084
-8.467896
gestion | -39.19536
4.042729
-9.70
0.000
-47.11896
-31.27175
tipo |
11.72788
3.866162
3.03
0.002
4.150338
19.30541
area |
4.210408
4.328804
0.97
0.331
-4.273892
12.69471
bserv |
10.77618
1.946826
5.54
0.000
6.960472
14.59189
dtotal |
.404681
.1593815
2.54
0.011
.092299
.7170629
prog_ebi | -16.20831
4.441248
-3.65
0.000
-24.91299
-7.503619
_cons |
270.5271
44.33491
6.10
0.000
183.6323
357.4219
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------idschool: Identity
|
var(_cons) |
817.257
63.49015
701.8292
951.6688
-----------------------------+-----------------------------------------------var(Residual) |
2400.291
34.86962
2332.912
2469.617
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 1479.13 Prob >= chibar2 = 0.0000
*GLLAMM*
********
*1st - Identified the dependent variable: math
*2nd - Number of levels: 2
*3rd - Identified level two id: idschool
*4th - With or without random coefficients: Without
*5th - Identified predictors at level 1: age gender nsibbl nuclear ses_i
indig
*6th - Identified predictors at level 2: gestion tipo area bserv dtotal
prog_ebi
*Syntax
******************************a two level model
*Define equation
*gen
cons=1
*eq int
: cons
/* Intercept equation */
*gllamm
[dependent variable] [independent variables], i([id
level variable]) eqs([equations]) nrf([number of random components])
[other options]
******************************a three level model
*Define equation
*gen
cons=1
*eq int
: cons
/* Intercept equation */
*gllamm
[dependent variable] [independent variables], i([level
2 level 3]) eqs([equations]) nrf([number of random components]) [other
options]
******************************a two level model with random coefficients
*Define equation
*gen
cons=1
*eq int
: cons
/* Intercept equation */
*eq slope
: [variable with random coefficient]
*gllamm
[dependent variable] [independent variables], i([id
level variable]) eqs([equations]) nrf([number of random components])
[other options]
*Examples*
**********
*Null model
gen
eq int
gllamm
cons=1
: cons
math, i(idschool) eqs(int) nrf(1) adapt
Running adaptive quadrature
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
= -58198.98
= -56186.657
= -55826.109
= -55801.265
= -55801.26
Adaptive quadrature has converged, running Newton-Raphson
Iteration 0:
log likelihood = -55801.26
Iteration 1:
log likelihood = -55801.237
Iteration 2:
log likelihood =
-55801.2
Iteration 3:
log likelihood =
-55801.2
number of level 1 units = 10327
number of level 2 units = 564
Condition Number = 303.60003
gllamm model
log likelihood = -55801.2
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_cons |
291.5841
2.171009
134.31
0.000
287.329
295.8392
-----------------------------------------------------------------------------Variance at level 1
-----------------------------------------------------------------------------2489.9024 (35.663352)
Variances and covariances of random effects
-----------------------------------------------------------------------------***level 2 (idschool)
var(1): 2417.2909 (160.0545)
------------------------------------------------------------------------------
10
*Model 1 - Student variables
gllamm
math age gender nsibbl nuclear ses_i indig,
i(idschool) eqs(int) nrf(1) adapt
Running adaptive quadrature
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
= -55094.61
= -54598.877
= -54489.213
= -54489.213
Adaptive quadrature has converged, running Newton-Raphson
Iteration 0:
log likelihood = -54489.213
Iteration 1:
log likelihood = -54236.507
Iteration 2:
log likelihood = -54217.441
Iteration 3:
log likelihood = -54217.18
Iteration 4:
log likelihood = -54217.18
number of level 1 units = 10084
number of level 2 units = 564
Condition Number = 1283.6933
gllamm model
log likelihood = -54217.18
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -8.198358
.5652368
-14.50
0.000
-9.306201
-7.090514
gender | -10.04702
1.053157
-9.54
0.000
-12.11117
-7.982871
nsibbl | -1.419342
.2613114
-5.43
0.000
-1.931503
-.9071808
nuclear |
-4.11179
1.191921
-3.45
0.001
-6.447912
-1.775668
ses_i |
.4432147
.0580544
7.63
0.000
.3294303
.5569991
indig | -16.41488
2.601117
-6.31
0.000
-21.51298
-11.31679
_cons |
360.9406
9.283431
38.88
0.000
342.7454
379.1358
-----------------------------------------------------------------------------Variance at level 1
-----------------------------------------------------------------------------2413.8148 (35.301464)
Variances and covariances of random effects
-----------------------------------------------------------------------------***level 2 (idschool)
var(1): 1413.3884 (111.42859)
------------------------------------------------------------------------------
11
*Model 2 - School variables
gllamm
math age gender nsibbl nuclear ses_i indig gestion
tipo area bserv dtotal prog_ebi, i(idschool) eqs(int) nrf(1) adapt
Running adaptive quadrature
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
=
=
=
=
=
-54787.036
-54495.168
-54276.704
-54069.796
-54069.796
Adaptive quadrature has converged, running Newton-Raphson
Iteration 0:
log likelihood = -54069.796 (not concave)
Iteration 1:
log likelihood = -54055.649
Iteration 2:
log likelihood = -54054.891
Iteration 3:
log likelihood = -54054.886
number of level 1 units = 10084
number of level 2 units = 564
Condition Number = 9373.0729
gllamm model
log likelihood = -54054.886
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -7.705684
.5605493
-13.75
0.000
-8.80434
-6.607027
gender | -10.23462
1.045321
-9.79
0.000
-12.28341
-8.185827
nsibbl | -1.106948
.2591939
-4.27
0.000
-1.614959
-.5989372
nuclear | -3.827367
1.185166
-3.23
0.001
-6.15025
-1.504485
ses_i |
.1848875
.0567318
3.26
0.001
.0736953
.2960798
indig | -13.46722
2.534508
-5.31
0.000
-18.43476
-8.499676
gestion | -39.15573
4.01795
-9.75
0.000
-47.03076
-31.28069
tipo |
11.7064
3.839827
3.05
0.002
4.180478
19.23232
area |
4.186199
4.298805
0.97
0.330
-4.239304
12.6117
bserv |
10.77714
1.933419
5.57
0.000
6.987705
14.56657
dtotal |
.4046668
.1582621
2.56
0.011
.0944787
.7148548
prog_ebi | -16.14732
4.418169
-3.65
0.000
-24.80677
-7.487869
_cons |
270.3501
44.04093
6.14
0.000
184.0315
356.6687
-----------------------------------------------------------------------------Variance at level 1
-----------------------------------------------------------------------------2399.0027 (34.841723)
Variances and covariances of random effects
-----------------------------------------------------------------------------***level 2 (idschool)
var(1): 803.02789 (62.165406)
------------------------------------------------------------------------------
*Main problem: Time to estimate the models*
*******************************************
12
III. Mixed models SPSS
*To run the multilevel model we have to use the command MIXED
***********************************************a two level model
*MIXED [dependent variable] WITH [independent variables]
*/ PRINT = SOLUTION TESTCOV
: To print the model and the test for the matrix of variance covariance
*/ FIXED = INTERCEPT [independent variables]
: Indicate the fixed part in the model
*/ RANDOM = INTERCEPT | SUBJECT([id level variable])
: Indicate the random part and the grouping variable for the data
***********************************************a three level model
*MIXED [dependent variable] WITH [independent variables]
*/ PRINT = SOLUTION TESTCOV
: To print the model and the test for the matrix of variance covariance
*/ FIXED = INTERCEPT [independent variables]
: Indicate the fixed part in the model
*/ RANDOM = INTERCEPT | SUBJECT([id level variable])
: Indicate the random part and the grouping variable for level 2
*/ RANDOM = INTERCEPT | SUBJECT([id level variable 2]*[id level variable 3])
: Indicate the random part and the grouping variable for level 2 and level
3
*********************************************** a two level model with random
coefficient
*MIXED [dependent variable] WITH [independent variables]
*/ PRINT = SOLUTION TESTCOV
: To print the model and the test for the matrix of variance covariance
*/ FIXED = INTERCEPT [independent variables]
: Indicate the fixed part in the model
*/ RANDOM = INTERCEPT [other random variables] | SUBJECT([id level variable])
COVTYPE(UN)
: Indicate the random part and the grouping variable for the data
and the type of matrix
*
covariance for the level 2 (unestructured)
*Examples*
**********
*Null Model
mixed math
/ print = solution testcov
/ fixed = intercept
/ random = intercept | subject(idschool).
13
Estimates of Fixed Effects(a)
95% Confidence Interval
Parameter
Intercept
Estimate
Std. Error
291.58139
2.171077
1
a Dependent Variable: MAth.
df
539.227
Sig.
134.303
Lower Bound
.000
Upper Bound
287.316585
295.846196
Estimates of Covariance Parameters(a)
95% Confidence Interval
Parameter
Residual
Intercept [subject = Variance
idschool]
a Dependent Variable: MAth.
Estimate
2489.8932
07
2422.1624
26
Std. Error
Wald Z
Sig.
Lower Bound
Upper Bound
35.663150
69.817
.000
2420.966727
2560.782067
160.49373
8
15.092
.000
2127.170040
2758.063864
*Model 1 - Student variables
mixed math with age gender nsibbl nuclear ses_i indig
/ print = solution testcov
/ fixed = intercept age gender nsibbl nuclear ses_i indig
/ random = intercept | subject(idschool).
Estimates of Fixed Effects(a)
95% Confidence Interval
Parameter
Intercept
Nsibbl
Estimate
360.98869
8
-8.197397
10.047433
-1.418299
nuclear
-4.110831
Age
gender
Std. Error
df
Sig.
Lower Bound
Upper Bound
9.142899
9964.925
39.483
.000
343.066768
378.910628
.564615
9830.408
-14.519
.000
-9.304158
-7.090636
1.053241
9675.797
-9.540
.000
-12.112006
-7.982860
.259937
9791.842
-5.456
.000
-1.927829
-.908768
1.191850
9579.555
-3.449
.001
-6.447110
-1.774552
.442489
.053298
2.547377
16.401693
a Dependent Variable: MAth.
8310.954
8.302
.000
.338012
.546965
8871.112
-6.439
.000
-21.395141
-11.408245
ses_i
Indigo
Estimates of Covariance Parameters(a)
95% Confidence Interval
Parameter
Residual
Intercept [subject = Variance
idschool]
a Dependent Variable: MAth.
Estimate
2415.1550
55
1418.4124
24
Std. Error
Wald Z
Sig.
Lower Bound
Upper Bound
35.330614
68.359
.000
2346.891618
2485.404054
111.94861
1
12.670
.000
1215.125679
1655.708409
*Model 2 - School variables
mixed math with age gender nsibbl nuclear ses_i indig gestion tipo area bserv dtotal prog_ebi
14
/ print = solution testcov
/ fixed = intercept age gender nsibbl nuclear ses_i indig gestion tipo area bserv dtotal prog_ebi
/ random = intercept | subject(idschool).
Estimates of Fixed Effects(a)
95% Confidence Interval
Parameter
Intercept
Nsibbl
Estimate
270.52710
2
-7.704743
10.237467
-1.104413
nuclear
-3.823929
1.185488
9732.357
-3.226
.001
-6.147731
-1.500126
ses_i
.055918
10022.105
3.269
.001
.073193
.292416
2.510379
8566.070
-5.333
.000
-18.309099
-8.467202
4.042729
502.568
-9.695
.000
-47.138085
-31.252623
Tipo
.182805
13.388150
39.195354
11.727876
3.866161
506.687
3.033
.003
4.132195
19.323557
Area
4.210408
4.328804
489.480
.973
.331
-4.294922
12.715738
Bserv
10.776182
1.946826
547.469
5.535
.000
6.952018
14.600345
.404681
.159381
prog_ebi
4.441248
16.208304
a Dependent Variable: MAth.
514.081
2.539
.011
.091562
.717800
549.555
-3.649
.000
-24.932203
-7.484406
Age
gender
Indigo
gestion
Std. Error
df
Sig.
Lower Bound
Upper Bound
44.334909
547.503
6.102
.000
183.439762
357.614443
.560797
9941.506
-13.739
.000
-8.804018
-6.605467
1.045682
9863.263
-9.790
.000
-12.287217
-8.187717
.259030
9865.989
-4.264
.000
-1.612164
-.596662
Dtotal
Estimates of Covariance Parameters(a)
95% Confidence Interval
Parameter
Residual
Intercept [subject = Variance
idschool]
a Dependent Variable: MAth.
Estimate
2400.2913
25
817.25668
9
Std. Error
Wald Z
Sig.
Lower Bound
Upper Bound
34.869622
68.836
.000
2332.911916
2469.616793
63.490120
12.872
.000
701.828974
951.668455
15
IV. Hierarchical Linear Model Program - HLM
1. Select the datasets for level 1 and level 2
2. Select the variables that will be used for the analysis at each level
16
3. Save the template file (.mdmt) and the mdm file (.mdm) with names that you prefer.
4. Press Make MDM to create the data file with the information of both levels. In the
case of a error, a message will appear on the screen.
5. Press Check Stats and save the descriptive statistics of your dataset at each level
(mean, standard deviation, minimum and maximum).
6. Press Done and you will have the following window
7. Once you have obtained the window with the data at both levels, we could start
running the different multilevel models.
17
8. Running a null model: Do a click on Math and select the option Outcome variable
Thus, we could see that we are running a simple anova with random effects by school
18
Select Run Analysis from the menu and then under option, choose Run the model
shown
Then go to File from your menu options and select View Output and a txt file will
appear with the regression analysis developed by the program
Program:
Authors:
Publisher:
HLM 6 Hierarchical Linear and Nonlinear Modeling
Stephen Raudenbush, Tony Bryk, & Richard Congdon
Scientific Software International, Inc. (c) 2000
techsupport@ssicentral.com
www.ssicentral.com
------------------------------------------------------------------------------Module:
HLM2.EXE (6.04.2754.2)
Date:
30 March 2009, Monday
Time:
3:35:36
------------------------------------------------------------------------------SPECIFICATIONS FOR THIS HLM2 RUN
Problem Title: no title
The data source for this run = C:\seminar\datasets\hlm\sixthgrade.mdm
The command file for this run = whlmtemp.hlm
Output file name
= C:\seminar\datasets\hlm\hlm2.txt
The maximum number of level-1 units = 10327
The maximum number of level-2 units = 564
The maximum number of iterations = 100
Method of estimation: restricted maximum likelihood
Weighting Specification
-----------------------
Level 1
Level 2
Precision
Weighting?
no
no
no
Weight
Variable
Name
The outcome variable is
Normalized?
MATH
The model specified for the fixed effects was:
---------------------------------------------------Level-1
Coefficients
----------------------
Level-2
Predictors
---------------
19
INTRCPT1, B0
INTRCPT2, G00
The model specified for the covariance components was:
--------------------------------------------------------Sigma squared (constant across level-2 units)
Tau dimensions
INTRCPT1
Summary of the model specified (in equation format)
--------------------------------------------------Level-1 Model
Y = B0 + R
Level-2 Model
B0 = G00 + U0
Iterations stopped due to small change in likelihood function
******* ITERATION 5 *******
Sigma_squared =
Tau
INTRCPT1,B0
2489.89329
2422.15990
Tau (as correlations)
INTRCPT1,B0 1.000
---------------------------------------------------Random level-1 coefficient
Reliability estimate
---------------------------------------------------INTRCPT1, B0
0.911
---------------------------------------------------The value of the likelihood function at iteration 5 = -5.579951E+004
The outcome variable is
MATH
Final estimation of fixed effects:
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
291.581398
2.171072
134.303
563
0.000
---------------------------------------------------------------------------The outcome variable is
MATH
Final estimation of fixed effects
(with robust standard errors)
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
291.581398
2.169134
134.423
563
0.000
---------------------------------------------------------------------------Final estimation of variance components:
----------------------------------------------------------------------------Random Effect
Standard
Variance
df
Chi-square P-value
Deviation
Component
-----------------------------------------------------------------------------
20
INTRCPT1,
U0
49.21544
2422.15990
563
9762.13458
0.000
level-1,
R
49.89883
2489.89329
----------------------------------------------------------------------------Statistics for current covariance components model
-------------------------------------------------Deviance
= 111599.013039
Number of estimated parameters = 2
21
9. Running the model 1 Student variables only
We add the level one variable by clicking on each variable and selecting the option add
variable uncentered. There are other options like: group mean centered and grand mean
centered. However, for this exercise we will run all the variables uncentered.
Program:
Authors:
Publisher:
HLM 6 Hierarchical Linear and Nonlinear Modeling
Stephen Raudenbush, Tony Bryk, & Richard Congdon
Scientific Software International, Inc. (c) 2000
techsupport@ssicentral.com
www.ssicentral.com
------------------------------------------------------------------------------Module:
HLM2.EXE (6.04.2754.2)
Date:
30 March 2009, Monday
Time:
3:38: 7
------------------------------------------------------------------------------SPECIFICATIONS FOR THIS HLM2 RUN
Problem Title: no title
The data source for this run = C:\seminar\datasets\hlm\sixthgrade.mdm
The command file for this run = whlmtemp.hlm
Output file name
= C:\seminar\datasets\hlm\hlm2.txt
The maximum number of level-1 units = 10327
The maximum number of level-2 units = 564
The maximum number of iterations = 100
Method of estimation: restricted maximum likelihood
Weighting Specification
-----------------------
Level 1
Weighting?
no
Weight
Variable
Name
Normalized?
22
Level 2
Precision
no
no
The outcome variable is
MATH
The model specified for the fixed effects was:
---------------------------------------------------Level-1
Coefficients
---------------------INTRCPT1, B0
#
AGE slope, B1
#
GENDER slope, B2
#
NSIBBL slope, B3
#
NUCLEAR slope, B4
#
SES_I slope, B5
#
INDIG slope, B6
Level-2
Predictors
--------------INTRCPT2, G00
INTRCPT2, G10
INTRCPT2, G20
INTRCPT2, G30
INTRCPT2, G40
INTRCPT2, G50
INTRCPT2, G60
'#' - The residual parameter variance for this level-1 coefficient has been set
to zero.
The model specified for the covariance components was:
--------------------------------------------------------Sigma squared (constant across level-2 units)
Tau dimensions
INTRCPT1
Summary of the model specified (in equation format)
--------------------------------------------------Level-1 Model
Y = B0 + B1*(AGE) + B2*(GENDER) + B3*(NSIBBL) + B4*(NUCLEAR) + B5*(SES_I) +
B6*(INDIG) + R
Level-2 Model
B0 = G00 + U0
B1 = G10
B2 = G20
B3 = G30
B4 = G40
B5 = G50
B6 = G60
Run-time deletion has reduced the number of level-1 records to 10084
Iterations stopped due to small change in likelihood function
******* ITERATION 6 *******
Sigma_squared =
Tau
INTRCPT1,B0
2415.14524
1418.53123
Tau (as correlations)
INTRCPT1,B0 1.000
---------------------------------------------------Random level-1 coefficient
Reliability estimate
---------------------------------------------------INTRCPT1, B0
0.865
---------------------------------------------------The value of the likelihood function at iteration 6 = -5.421304E+004
23
The outcome variable is
MATH
Final estimation of fixed effects:
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
360.992707
9.142942
39.483
563
0.000
For
AGE slope, B1
INTRCPT2, G10
-8.197324
0.564615
-14.518
10077
0.000
For
GENDER slope, B2
INTRCPT2, G20
-10.047485
1.053240
-9.540
10077
0.000
For
NSIBBL slope, B3
INTRCPT2, G30
-1.418230
0.259937
-5.456
10077
0.000
For NUCLEAR slope, B4
INTRCPT2, G40
-4.110753
1.191849
-3.449
10077
0.001
For
SES_I slope, B5
INTRCPT2, G50
0.442432
0.053298
8.301
10077
0.000
For
INDIG slope, B6
INTRCPT2, G60
-16.400386
2.547400
-6.438
10077
0.000
---------------------------------------------------------------------------The outcome variable is
MATH
Final estimation of fixed effects
(with robust standard errors)
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
360.992707
9.682294
37.284
563
0.000
For
AGE slope, B1
INTRCPT2, G10
-8.197324
0.586866
-13.968
10077
0.000
For
GENDER slope, B2
INTRCPT2, G20
-10.047485
1.131407
-8.881
10077
0.000
For
NSIBBL slope, B3
INTRCPT2, G30
-1.418230
0.249420
-5.686
10077
0.000
For NUCLEAR slope, B4
INTRCPT2, G40
-4.110753
1.231744
-3.337
10077
0.001
For
SES_I slope, B5
INTRCPT2, G50
0.442432
0.057503
7.694
10077
0.000
For
INDIG slope, B6
INTRCPT2, G60
-16.400386
2.673658
-6.134
10077
0.000
---------------------------------------------------------------------------Final estimation of variance components:
----------------------------------------------------------------------------Random Effect
Standard
Variance
df
Chi-square P-value
Deviation
Component
----------------------------------------------------------------------------INTRCPT1,
U0
37.66339
1418.53123
563
6158.23620
0.000
level-1,
R
49.14413
2415.14524
----------------------------------------------------------------------------Statistics for current covariance components model
-------------------------------------------------Deviance
= 108426.073337
Number of estimated parameters = 2
24
10. To run the model 2 School variables, we need to click on the level 2 button and
select where we want to add the level two variables: intercept or coefficients; in other
words if we want to add predictors for the intercept or cross-level interactions. In this
case for simplicity we only add predictors to the intercept.
Program:
Authors:
Publisher:
HLM 6 Hierarchical Linear and Nonlinear Modeling
Stephen Raudenbush, Tony Bryk, & Richard Congdon
Scientific Software International, Inc. (c) 2000
techsupport@ssicentral.com
www.ssicentral.com
------------------------------------------------------------------------------Module:
HLM2.EXE (6.04.2754.2)
Date:
30 March 2009, Monday
Time:
3:40:15
------------------------------------------------------------------------------SPECIFICATIONS FOR THIS HLM2 RUN
Problem Title: no title
The data source for this run = C:\seminar\datasets\hlm\sixthgrade.mdm
The command file for this run = whlmtemp.hlm
Output file name
= C:\seminar\datasets\hlm\hlm2.txt
The maximum number of level-1 units = 10327
The maximum number of level-2 units = 564
The maximum number of iterations = 100
Method of estimation: restricted maximum likelihood
Weighting Specification
-----------------------
Level 1
Weighting?
no
Weight
Variable
Name
Normalized?
25
Level 2
Precision
no
no
The outcome variable is
MATH
The model specified for the fixed effects was:
---------------------------------------------------Level-1
Coefficients
---------------------INTRCPT1, B0
#
#
#
#
#
#
AGE
GENDER
NSIBBL
NUCLEAR
SES_I
INDIG
slope,
slope,
slope,
slope,
slope,
slope,
B1
B2
B3
B4
B5
B6
Level-2
Predictors
--------------INTRCPT2, G00
GESTION, G01
TIPO, G02
AREA, G03
PROG_EBI, G04
BSERV, G05
DTOTAL, G06
INTRCPT2, G10
INTRCPT2, G20
INTRCPT2, G30
INTRCPT2, G40
INTRCPT2, G50
INTRCPT2, G60
'#' - The residual parameter variance for this level-1 coefficient has been set
to zero.
The model specified for the covariance components was:
--------------------------------------------------------Sigma squared (constant across level-2 units)
Tau dimensions
INTRCPT1
Summary of the model specified (in equation format)
--------------------------------------------------Level-1 Model
Y = B0 + B1*(AGE) + B2*(GENDER) + B3*(NSIBBL) + B4*(NUCLEAR) + B5*(SES_I) +
B6*(INDIG) + R
Level-2 Model
B0 = G00 + G01*(GESTION) + G02*(TIPO) + G03*(AREA) + G04*(PROG_EBI)
+ G05*(BSERV) + G06*(DTOTAL) + U0
B1 = G10
B2 = G20
B3 = G30
B4 = G40
B5 = G50
B6 = G60
Run-time deletion has reduced the number of level-1 records to 10084
Iterations stopped due to small change in likelihood function
******* ITERATION 6 *******
Sigma_squared =
Tau
INTRCPT1,B0
2400.29539
817.21163
Tau (as correlations)
INTRCPT1,B0 1.000
---------------------------------------------------Random level-1 coefficient
Reliability estimate
----------------------------------------------------
26
INTRCPT1, B0
0.799
---------------------------------------------------The value of the likelihood function at iteration 6 = -5.404159E+004
The outcome variable is
MATH
Final estimation of fixed effects:
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
270.525672 44.332588
6.102
557
0.000
GESTION, G01
-39.195039
4.042508
-9.696
557
0.000
TIPO, G02
11.727700
3.865951
3.034
557
0.003
AREA, G03
4.210208
4.328564
0.973
557
0.332
PROG_EBI, G04
-16.207826
4.441016
-3.650
557
0.001
BSERV, G05
10.776193
1.946724
5.536
557
0.000
DTOTAL, G06
0.404681
0.159373
2.539
557
0.012
For
AGE slope, B1
INTRCPT2, G10
-7.704750
0.560796
-13.739
10071
0.000
For
GENDER slope, B2
INTRCPT2, G20
-10.237444
1.045681
-9.790
10071
0.000
For
NSIBBL slope, B3
INTRCPT2, G30
-1.104433
0.259029
-4.264
10071
0.000
For NUCLEAR slope, B4
INTRCPT2, G40
-3.823956
1.185488
-3.226
10071
0.002
For
SES_I slope, B5
INTRCPT2, G50
0.182821
0.055918
3.269
10071
0.001
For
INDIG slope, B6
INTRCPT2, G60
-13.388784
2.510361
-5.333
10071
0.000
---------------------------------------------------------------------------The outcome variable is
MATH
Final estimation of fixed effects
(with robust standard errors)
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
270.525672 48.248823
5.607
557
0.000
GESTION, G01
-39.195039
4.046229
-9.687
557
0.000
TIPO, G02
11.727700
3.976412
2.949
557
0.004
AREA, G03
4.210208
4.054383
1.038
557
0.300
PROG_EBI, G04
-16.207826
4.640755
-3.492
557
0.001
BSERV, G05
10.776193
2.026056
5.319
557
0.000
DTOTAL, G06
0.404681
0.174626
2.317
557
0.021
For
AGE slope, B1
INTRCPT2, G10
-7.704750
0.585244
-13.165
10071
0.000
For
GENDER slope, B2
INTRCPT2, G20
-10.237444
1.119275
-9.146
10071
0.000
For
NSIBBL slope, B3
INTRCPT2, G30
-1.104433
0.245914
-4.491
10071
0.000
For NUCLEAR slope, B4
INTRCPT2, G40
-3.823956
1.219071
-3.137
10071
0.002
For
SES_I slope, B5
INTRCPT2, G50
0.182821
0.056285
3.248
10071
0.002
For
INDIG slope, B6
INTRCPT2, G60
-13.388784
2.500433
-5.355
10071
0.000
---------------------------------------------------------------------------Final estimation of variance components:
----------------------------------------------------------------------------Random Effect
Standard
Variance
df
Chi-square P-value
Deviation
Component
----------------------------------------------------------------------------INTRCPT1,
U0
28.58691
817.21163
557
3813.89027
0.000
27
level-1,
R
48.99281
2400.29539
----------------------------------------------------------------------------Statistics for current covariance components model
-------------------------------------------------Deviance
= 108083.173301
Number of estimated parameters = 2
11. If we want to have random coefficients, we need to click next to the coefficient for
each variable at Level 2 Model, and then add a random component for this variable
across schools.
In this case, we are adding a random component for the coefficient of age; this means that
this coefficient varies randomly across schools.
28
V. Multilevel in SAS
The command to run a multilevel model in SAS is PROC MIXED and the syntax is:
PROC MIXED DATA = [name of the data] METHOD = REML COVTEST;
CLASS [level 2 id];
MODEL [dependent variable] = [independent variables] / SOLUTION;
RANDOM [random variables] / SUBJECT=[level 2 id]
RUN;
COVTEST
METHOD
CLASS
MODEL
SOLUTION
RANDOM
SUBJECT
RUN
: option calls the significance test of random components (Wald test).
: method that will be used to estimate the model.
: indicates which variable is categorical.
: to define the model that will be estimated.
: requires to SAS to print the results of the estimation in the output
: indicates the component or variables that will be random in the model
: indicates the grouping variable in the model
: to run the program.
*********************************************a two level model
PROC MIXED DATA = [name of the data] METHOD = REML COVTEST;
CLASS [level 2 id];
MODEL [dependent variable] = [independent variables] / SOLUTION;
RANDOM [random variables] / SUBJECT=[level 2 id];
RUN;
*********************************************a three level model
PROC MIXED DATA = [name of the data] METHOD = REML COVTEST;
CLASS [level 2 id and level 3 id];
MODEL [dependent variable] = [independent variables] / SOLUTION;
RANDOM [random variables] / TYPE=UN SUBJECT=[level 3 id];
RANDOM [random variables] / TYPE=UN SUBJECT=[level 2 id([level 3 id])];
RUN;
Note: TYPE = UN indicates that will be estimated the unstructured variance for the
random effects.
*********************************************a two level model with random
coefficients
PROC MIXED DATA = [name of the data] METHOD = REML COVTEST;
CLASS [level 2 id];
MODEL [dependent variable] = [independent variables] / SOLUTION;
RANDOM [random variables] / TYPE=UN SUBJECT=[level 2 id];
29
RUN;
*Example*
*********
*Model Null
PROC MIXED DATA = achievement METHOD = REML COVTEST;
CLASS idschool;
MODEL math = / SOLUTION;
RANDOM intercept / SUBJECT=idschool;
RUN;
*Model 1 Student variables
PROC MIXED DATA = achievement METHOD = REML COVTEST;
CLASS idschool;
MODEL math = age gender nsibbl nuclear ses_i indig / SOLUTION;
RANDOM intercept / SUBJECT=idschool;
RUN;
*Model 2 School variables
PROC MIXED DATA = achievement METHOD = REML COVTEST;
CLASS idschool;
MODEL math = age gender nsibbl nuclear ses_i indig gestion tipo area bserv dtotal
prog_ebi / SOLUTION;
RANDOM intercept / SUBJECT=idschool;
RUN;
30
VI. Summary
In sum, the three programs give us the same results in terms of significance, coefficients and standard errors estimated, as we could
see in the following table:
Table 1. Math Achievement in Math for Sixth Grade in Peru (2004)
Age (years)
Female
Number of siblings
Nuclear family
Socioeconomic status
Indigenous
Intercept
Level 1 variance
Level 2 variance
GLLAMM
se ( )
-8.20
0.57
-10.05
1.05
-1.42
0.26
-4.11
1.19
0.44
0.06
-16.41
2.60
360.94
9.28
2414
1413
63
37
***
***
***
**
***
***
***
***
***
XTMIXED
se ( )
-8.20
0.56
-10.05
1.05
-1.42
0.26
-4.11
1.19
0.44
0.05
-16.40
2.55
360.99
9.14
2415
1418
63
37
***
***
***
**
***
***
***
-8.20
-10.05
-1.42
-4.11
0.44
-16.40
360.99
***
***
2415
1418
MIXED
se ( )
0.56
1.05
0.26
1.19
0.05
2.55
9.14
63
37
***
***
***
**
***
***
***
-8.20
-10.05
-1.42
-4.11
0.44
-16.40
360.99
***
***
2415
1419
HLM
se ( )
0.59
1.13
0.25
1.23
0.06
2.67
9.68
63
37
***
***
***
**
***
***
***
***
***
***p<.001, **p<.01, *p<.05, +p<.10
The main difference between the different programs relies on the type of dependent variable that we want to model. Thus, in the case
of HLM and GLLAMM, it is possible to model two level or three level models with dependent variables with discrete or binary
outcomes; while the XTMIXED and MIXED in STATA and SPSS respectively could be employed with continuous variables. Also,
one of the disadvantages of STATA is that XTMIXED and GLLAMM could not be estimated in combination with sampling weights,
while SPSS and HLM allow the user to use sampling weights. Also, in the case of HLM, one advantage is that it is possible to use
sampling weights at each level.
31
Useful links for multilevel modeling
UCLA: Statistical Computing web page
http://www.ats.ucla.edu/
Urbana Champaign Illinois: Multilevel Analysis Seminar
http://www.ed.uiuc.edu/courses/EdPsy490CK/
University of Texas at Austin: Division of Statistics + Scientific computation
http://ssc.utexas.edu/
Indiana University: Stat/Math Center
http://www.indiana.edu/~statmath/contact/consult.html
32