0% found this document useful (0 votes)

47 views9 pages

Regn Lect 5

Epidemiology linear regression

Uploaded by

Martha Reuben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views9 pages

Regn Lect 5

Epidemiology linear regression

Uploaded by

Martha Reuben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1

Lecture 5: Multiple Linear Regression

Objectives

At the end of this session participants should

 Be able to interpret the equation of a multiple linear regression model

 Understand some of the criteria and principles used in choosing which variables to
include in a multiple linear regression model

 Understand the relationship between ANOVA models and linear regression models

In multiple regression we investigate how a single response variable Y is related to two or more
explanatory variables x1, x2, etc.

e.g. rhknow might depend on education level as well as age

The model used for multiple regression is

Yi = β0 + β1 x1 + β2 x2 + εi

Where, as before, the deviations εi are assumed to be independent and normally distributed with
mean zero and constant variance σ2.

As before we have to estimate the parameters β0, β1 and β2 using sample statistics b0, b1 and b2
and this is done as for simple linear regression using the method of least squares by
minimizing

D = ∑ εi 2 = ∑ {yi - (β0 + β1 x1 + β2 x2)}2

With respect to β0 , β1 and β2.

The calculations involved are very time consuming and can only effectively be done using a
statistics package.

Note that often the question of interest is “Which of the x-variables are most important for
predicting y?”
2

Ex: We can carry out a multiple regression analysis to see whether rhknow depends on age
(x1) , education (x2) or both.

. regress rhknow age educ

Source | SS df MS Number of obs = 203

-------------+------------------------------ F( 2, 200) = 6.14

Model | 68.0241289 2 34.0120645 Prob > F = 0.0026

Residual | 1108.62612 200 5.54313059 R-squared = 0.0578

-------------+------------------------------ Adj R-squared = 0.0484

Total | 1176.65025 202 5.82500122 Root MSE = 2.3544

------------------------------------------------------------------------------

rhknow | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | .0587707 .0268695 2.19 0.030 .0057868 .1117545

educ | .5905042 .266571 2.22 0.028 .0648539 1.116155

_cons | 6.25016 .6579523 9.50 0.000 4.952746 7.547574

Thus our overall regression is highly significant (F=6.14 ; P=0.0026) so we can reject the null
hypothesis that rhknow is related to neither age nor education and can conclude that rhknow
depends on at least one of them. Note that the R2 has increased slightly to 5.78% - so the
majority of the variation in rhknow is still unexplained.

The question remains as to whether we need both age and education in our model. We can
check this using the post estimation test command in Stata, which tests whether a particular
coefficient is significantly different from zero in the model – if we cannot reject the null
hypothesis that a coefficient is zero, we could then omit it from the model (i.e. we can omit
variables which are “not significant”).

. test age

( 1) age = 0

F( 1, 200) = 4.78

Prob > F = 0.0299

Thus in this case we reject the null hypothesis that the coefficient for age is zero and we need to
keep age in our model.

. test educ

( 1) educ = 0

F( 1, 200) = 4.91

Prob > F = 0.0279

Thus we also reject the null hypothesis that the coefficient for education is zero and we need to
keep education in our model.

So in this case the best model is the one which has both age and education as explanatory
variables. Note that education is actually an ordinal variable, which is acceptable as an
explanatory variable – the effect can be interpreted as follows: for an increase of one level in
education, rhknow increases by 0.59 on average, adjusting for the effect of age. Similarly for a
1 year increase in age, rhknow increases by 0.059 on average, adjusting for the effect of
education. Thus we can see that adjusting for education reduces the magnitude of the age
effect, even though age remains statistically significant.

Model selection

In the case of two x variables we are in fact choosing between 4 models

(a) no x variable is important

(b) x1 alone is important
(c) x2 alone is important
(d) x1 and x2 are both important (as in our example above)

In the case of three x variables there are 8 models (a) no x variable (b) x1 alone (c) x2 (d) x3
(e) x1 & x2 (f) x1 & x3 (g) x2 & x3 (h) x1, x2 & x3

In the case of 4 x variables there are 16 models, in the case of 5 x-variables 32 models etc.
Thus the number of potential models increases rapidly and we need some strategies to search
for a suitable model. In general there are a number of criteria that are used and this is still an
area of on-going research. However one possible strategy that is easy to implement is that of
stepwise regression. There are two basic versions of this strategy: in forward selection we
4

choose the variable that will give us the biggest reduction in the residual sum of squares at each
stage and add it to the model, stopping when no variable not in the model significantly improves
the fit of the model. In backward elimination we start by fitting the model with all potential
regressors and successively remove the term which leads to the smallest increase in the
residual sum of squares, stopping when the term so removed would lead to a significantly worse
model.

Note that there are a number of theoretical criticisms of these procedures:

(1) The derived model will give an over-optimistic impression – the P-values for the selected
variables will be too small, confidence intervals will be too narrow and the proportion of
variance explained (R2) will be too high. This is because these quantities do not reflect
the fact that the model was selected using a stepwise procedure.
(2) The regression coefficients will be too large (i.e. too far from the null values). The
performance of the model in predicting future values of the outcome will be less good
than we expect.
(3) Computer simulations have shown that minor changes in the data may lead to important
changes in the variables selected for the final model.
(4) There are some cases where the two procedures can lead to different models.
(5) Stepwise procedures should never be used as a substitute for thinking about the
problem. We should include variables known from previous work to be associated with
the outcome, and exclude variables for which an association is implausible.

Note that the higher the original number of exposure variables from which the model was
selected, the higher the probability of selecting variables with chance associations and thus the
worse the problem will be.

When implementing this strategy we use a slightly liberal p-value for inclusion / exclusion (say
P=0.15 or even P=0.20) so as not to lose any variable that could have predictive power. It is
also useful to carry out both procedures – and we can have more confidence in our final model
if it is chosen by both.

In some cases there is a variable that is of intrinsic interest and we can fix it in all models using
the lockterm1 option in Stata.
5

Both methods are implemented in Stata using the sw prefix. For forward selection we specify
the option pe to give the nominal significance level for a term to enter the model and the option
pr to give the nominal significance level for a term to be dropped from the model.

Ex We can find the “best” model for predicting rhknow using age, education and income level
(also an ordinal variable) as candidate regressors.

. sw regress rhknow age educ income , pe(0.10)

begin with empty model

p = 0.0073 < 0.1000 adding educ

p = 0.0299 < 0.1000 adding age

Source | SS df MS Number of obs = 203

-------------+------------------------------ F( 2, 200) = 6.14

Model | 68.0241289 2 34.0120645 Prob > F = 0.0026

Residual | 1108.62612 200 5.54313059 R-squared = 0.0578

-------------+------------------------------ Adj R-squared = 0.0484

Total | 1176.65025 202 5.82500122 Root MSE = 2.3544

------------------------------------------------------------------------------

rhknow | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

educ | .5905042 .266571 2.22 0.028 .0648539 1.116155

age | .0587707 .0268695 2.19 0.030 .0057868 .1117545

_cons | 6.25016 .6579523 9.50 0.000 4.952746 7.547574

------------------------------------------------------------------------------
6

. sw regress rhknow age educ income , pr(0.10)

begin with full model

p = 0.1481 >= 0.1000 removing income

Source | SS df MS Number of obs = 203

-------------+------------------------------ F( 2, 200) = 6.14

Model | 68.0241289 2 34.0120645 Prob > F = 0.0026

Residual | 1108.62612 200 5.54313059 R-squared = 0.0578

-------------+------------------------------ Adj R-squared = 0.0484

Total | 1176.65025 202 5.82500122 Root MSE = 2.3544

------------------------------------------------------------------------------

rhknow | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | .0587707 .0268695 2.19 0.030 .0057868 .1117545

educ | .5905042 .266571 2.22 0.028 .0648539 1.116155

_cons | 6.25016 .6579523 9.50 0.000 4.952746 7.547574

------------------------------------------------------------------------------

Thus we see that in this case both procedures lead to the same model – the model with age
and education, and given these terms income does not improve the prediction of rhknow.

Categorical Explanatory factors

Often we are interested in comparing three or more groups e.g. in the regression model we
assumed that the effect of education was linear i.e. that the effect of moving from level 1 to level
2 is the same as the effect of moving from level 2 to level 3. We could alternatively treat educ as
a 3 level categorical variable and look at whether the mean rhknow varies between the three
education groups.

Recall that if we want to only investigate the effect of a single categorical explanatory factor on
an outcome variable, then this can be done using a oneway analysis of variance (recall the
example from lecture 2 investigating the effect of two different drugs or a placebo on the
lymphocyte counts of mice).
7

use mice , clear

. oneway lcount drug , tab

| Summary of lcount

drug | Mean Std. Dev. Freq.

------------+------------------------------------

1 | 68 9.3005376 5

2 | 60 6.363961 5

3 | 55 3.5355339 5

------------+------------------------------------

Total | 61 8.4006802 15

Analysis of Variance

Source SS df MS F Prob > F

------------------------------------------------------------------------

Between groups 430 2 215 4.62 0.0325

Within groups 558 12 46.5

------------------------------------------------------------------------

Total 988 14 70.5714286

Bartlett's test for equal variances: chi2(2) = 2.9923 Prob>chi2 = 0.224

Analysis of variance and regression

There is a very close connection between multiple regression models and the analysis of
variance models. To see this we can measure the effect of drug B and the placebo C relative to
drug A i.e. we set drug A to be the reference or baseline level.

We can define variables to measure the effect of drug B and drug C relative to the baseline as
follows:

. gen drugb = 0

. replace drugb=1 if drug==2

(5 real changes made)

. gen drugc = 0

. replace drugc=1 if drug==3

(5 real changes made)

. list drug drugb drugc , noobs

+----------------------+

| drug drugb drugc |

|----------------------|

| 1 0 0 |

|----------------------|

| 2 1 0 |

|----------------------|

| 3 0 1 |

+----------------------+

Thus the variables drugb and drugc jointly specify the drug received by any mouse
If drugb=0 and drugc=0 then the mouse received drug A
If drugb=1 and drugc=0 then the mouse received drug B
If drugb=0 and drugc=1 then the mouse received placebo C
We can now fit the multiple regression model of lcount on drugb and drugc
9

reg lcount drugb drugc

Source | SS df MS Number of obs = 15

-------------+------------------------------ F( 2, 12) = 4.62
Model | 430 2 215 Prob > F = 0.0325
Residual | 558 12 46.5 R-squared = 0.4352
-------------+------------------------------ Adj R-squared = 0.3411
Total | 988 14 70.5714286 Root MSE = 6.8191

------------------------------------------------------------------------------
lcount | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drugb | -8 4.312772 -1.85 0.088 -17.39672 1.396722
drugc | -13 4.312772 -3.01 0.011 -22.39672 -3.603278
_cons | 68 3.04959 22.30 0.000 61.35551 74.64449
------------------------------------------------------------------------------

We can see that the ANOVA table from the regression gives identical results (with an F-value of
4.62) to that obtained by carrying out the one-way analysis of variance.

We defined drugb and drugc in such a way that our baseline level is drug A i.e. if drugb=0 and
drugc=0 then the mouse received drug A. For drug A the mean value of lcount is 68 – which is
the value of the intercept (_cons).
The parameter drugb measures the effect of drug B which is just the difference in mean values
between drug B and drug A
i.e. -8 = 60 – 68
Similarly the parameter drugc measures the effect of the placebo C (relative to the baseline)
which is just the difference in mean values between placebo C and drug A
i.e. -13 = 55 – 68
From the t-tests in the regression output we can see that the difference between the placebo C
and drug A is statistically significant, but the difference between drug B and drug A is not
statistically significant.

Thus we can use regression models to fit Analysis of Variance models

Lecture 3 - LRM
No ratings yet
Lecture 3 - LRM
40 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Lecture 2 - LRM
No ratings yet
Lecture 2 - LRM
43 pages
Regn Lect 4
No ratings yet
Regn Lect 4
9 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Review: Multiple Regression: Holding The Other Explanatory Variables Constant or Fixed
No ratings yet
Review: Multiple Regression: Holding The Other Explanatory Variables Constant or Fixed
7 pages
EC212: Introduction To Econometrics Multiple Regression: Inference (Wooldridge, Ch. 4)
No ratings yet
EC212: Introduction To Econometrics Multiple Regression: Inference (Wooldridge, Ch. 4)
89 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Lecture Set 4
No ratings yet
Lecture Set 4
39 pages
Chapter 3 Notes Part 3
No ratings yet
Chapter 3 Notes Part 3
9 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Chapter 9
No ratings yet
Chapter 9
38 pages
Statistical Analysis Techniques
No ratings yet
Statistical Analysis Techniques
3 pages
Econometrics: Linear Regression Basics
No ratings yet
Econometrics: Linear Regression Basics
52 pages
Homework 3
No ratings yet
Homework 3
10 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
04 16 Simple Regression
No ratings yet
04 16 Simple Regression
47 pages
The Nature of Econometrics and The Modelling Process: Session 1
No ratings yet
The Nature of Econometrics and The Modelling Process: Session 1
51 pages
Lec 5 V 11
No ratings yet
Lec 5 V 11
44 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Chapter 04 Linear Regression With One Regressor
No ratings yet
Chapter 04 Linear Regression With One Regressor
111 pages
Homework 3
No ratings yet
Homework 3
10 pages
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
No ratings yet
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
39 pages
Regression Insights for Academics
No ratings yet
Regression Insights for Academics
71 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
QM 3 Multiple Regression 1
No ratings yet
QM 3 Multiple Regression 1
48 pages
AMA3602Final2024Fall Ray
No ratings yet
AMA3602Final2024Fall Ray
21 pages
Text - On - Class Econometrics
No ratings yet
Text - On - Class Econometrics
17 pages
Notes Book
No ratings yet
Notes Book
39 pages
Topic0 Introduction
No ratings yet
Topic0 Introduction
9 pages
October 25, 2011
No ratings yet
October 25, 2011
27 pages
Regression Basics for Epidemiologists
No ratings yet
Regression Basics for Epidemiologists
18 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Part 2 - Multiple Regression Model
No ratings yet
Part 2 - Multiple Regression Model
49 pages
Multivariate Statistics Introduction
No ratings yet
Multivariate Statistics Introduction
20 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
Assignments
No ratings yet
Assignments
6 pages
Unit 4 Multiple Linear Regression
No ratings yet
Unit 4 Multiple Linear Regression
3 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Problem Set 2 Quantitative Methods UNIGE
No ratings yet
Problem Set 2 Quantitative Methods UNIGE
10 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Wooldridge 7e Ch06 SM
No ratings yet
Wooldridge 7e Ch06 SM
9 pages
Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
Minitab Tip Sheet 15
No ratings yet
Minitab Tip Sheet 15
5 pages
EC501 Lecture 03
No ratings yet
EC501 Lecture 03
30 pages
Regresssion Analysis
No ratings yet
Regresssion Analysis
19 pages
Lecture 7
No ratings yet
Lecture 7
14 pages
Solutions Week 10
No ratings yet
Solutions Week 10
7 pages
Additional Material - Linear Regression
No ratings yet
Additional Material - Linear Regression
11 pages
Probit and Logit Models Stata Program and Output PDF
No ratings yet
Probit and Logit Models Stata Program and Output PDF
10 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
60 pages
Interpretation and Statistical Inference With OLS Regressions
No ratings yet
Interpretation and Statistical Inference With OLS Regressions
17 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
Ansprac 2
No ratings yet
Ansprac 2
6 pages
Chapter 6
No ratings yet
Chapter 6
18 pages
Regn Prac 4
No ratings yet
Regn Prac 4
1 page
Regn Prac 3
No ratings yet
Regn Prac 3
1 page
Regn Prac 5
No ratings yet
Regn Prac 5
1 page
Regn Prac 2
No ratings yet
Regn Prac 2
1 page
Regn Prac 1
No ratings yet
Regn Prac 1
4 pages
Regn Lect 3
No ratings yet
Regn Lect 3
10 pages
Regn Lect 7
No ratings yet
Regn Lect 7
26 pages
HIV Testing
No ratings yet
HIV Testing
260 pages
Regn Lect 6
No ratings yet
Regn Lect 6
8 pages
Cervical Cancer
No ratings yet
Cervical Cancer
91 pages
Knowledge Attitude and Practice On Cervi
No ratings yet
Knowledge Attitude and Practice On Cervi
87 pages
Traffic Speed and Student Marks Analysis
No ratings yet
Traffic Speed and Student Marks Analysis
15 pages
Estimating Market VaR with Garch Models
No ratings yet
Estimating Market VaR with Garch Models
28 pages
The Unscrambler Tutorials
No ratings yet
The Unscrambler Tutorials
179 pages
Application of ANOVA
50% (10)
Application of ANOVA
18 pages
How To Use SPSS
100% (7)
How To Use SPSS
134 pages
CGPA of UMP Students回复
No ratings yet
CGPA of UMP Students回复
55 pages
Package Rcmdrplugin - Nmbu': R Topics Documented
No ratings yet
Package Rcmdrplugin - Nmbu': R Topics Documented
9 pages
Arima
No ratings yet
Arima
14 pages
Multilevel Analysis Techniques and Applications Second Edition Joop Hox Available Instanly
100% (1)
Multilevel Analysis Techniques and Applications Second Edition Joop Hox Available Instanly
144 pages
4.3 Population Mean Variance Is Unknown
No ratings yet
4.3 Population Mean Variance Is Unknown
16 pages
3 6 PDF
No ratings yet
3 6 PDF
2 pages
WST 311 SG 2024
No ratings yet
WST 311 SG 2024
11 pages
Ec692 Final With Answers
No ratings yet
Ec692 Final With Answers
40 pages
ML Yifan Lu REPORT
No ratings yet
ML Yifan Lu REPORT
22 pages
CintarchSlides AnalysisandInterpretationofNon-parametricData Module3ppt - IntroductiontoChisquaretest
No ratings yet
CintarchSlides AnalysisandInterpretationofNon-parametricData Module3ppt - IntroductiontoChisquaretest
26 pages
Adjustment Computations: Spatial Data Analysis (6th Edition) Ghilani
No ratings yet
Adjustment Computations: Spatial Data Analysis (6th Edition) Ghilani
10 pages
Im ch01
No ratings yet
Im ch01
11 pages
Data Mining
No ratings yet
Data Mining
2 pages
Correlation and Regression
80% (5)
Correlation and Regression
24 pages
Measures of Association For Tables (8.4) : - Difference of Proportions - The Odds Ratio
No ratings yet
Measures of Association For Tables (8.4) : - Difference of Proportions - The Odds Ratio
24 pages
Probit Model
No ratings yet
Probit Model
5 pages
Midterm Testbank
No ratings yet
Midterm Testbank
13 pages
4) Results and Discussion
No ratings yet
4) Results and Discussion
28 pages
Warnick Et Al., 1995
No ratings yet
Warnick Et Al., 1995
12 pages
Kode Script
No ratings yet
Kode Script
25 pages
Time Series With Excel
No ratings yet
Time Series With Excel
11 pages
Report-5 2
No ratings yet
Report-5 2
2 pages
Statistics Assignment Analysis
No ratings yet
Statistics Assignment Analysis
8 pages
Phase3 PDF
No ratings yet
Phase3 PDF
4 pages
New Findings On Key Factors in Uencing The UK's Referendum On Leaving The EU
No ratings yet
New Findings On Key Factors in Uencing The UK's Referendum On Leaving The EU
11 pages

Regn Lect 5

Uploaded by

Regn Lect 5

Uploaded by

1

Lecture 5: Multiple Linear Regression

At the end of this session participants should

 Be able to interpret the equation of a multiple linear regression model

e.g. rhknow might depend on education level as well as age

The model used for multiple regression is

D = ∑ εi 2 = ∑ {yi - (β0 + β1 x1 + β2 x2)}2

With respect to β0 , β1 and β2.

. regress rhknow age educ

Source | SS df MS Number of obs = 203

-------------+------------------------------ F( 2, 200) = 6.14

Model | 68.0241289 2 34.0120645 Prob > F = 0.0026

Residual | 1108.62612 200 5.54313059 R-squared = 0.0578

-------------+------------------------------ Adj R-squared = 0.0484

Total | 1176.65025 202 5.82500122 Root MSE = 2.3544

rhknow | Coef. Std. Err. t P>|t| [95% Conf. Interval]

age | .0587707 .0268695 2.19 0.030 .0057868 .1117545

educ | .5905042 .266571 2.22 0.028 .0648539 1.116155

_cons | 6.25016 .6579523 9.50 0.000 4.952746 7.547574

Prob > F = 0.0299

Prob > F = 0.0279

In the case of two x variables we are in fact choosing between 4 models

(a) no x variable is important

Note that there are a number of theoretical criticisms of these procedures:

. sw regress rhknow age educ income , pe(0.10)

begin with empty model

p = 0.0073 < 0.1000 adding educ

p = 0.0299 < 0.1000 adding age

Source | SS df MS Number of obs = 203

-------------+------------------------------ F( 2, 200) = 6.14

Model | 68.0241289 2 34.0120645 Prob > F = 0.0026

Residual | 1108.62612 200 5.54313059 R-squared = 0.0578

-------------+------------------------------ Adj R-squared = 0.0484

Total | 1176.65025 202 5.82500122 Root MSE = 2.3544

rhknow | Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ | .5905042 .266571 2.22 0.028 .0648539 1.116155

age | .0587707 .0268695 2.19 0.030 .0057868 .1117545

_cons | 6.25016 .6579523 9.50 0.000 4.952746 7.547574

. sw regress rhknow age educ income , pr(0.10)

begin with full model

p = 0.1481 >= 0.1000 removing income

Source | SS df MS Number of obs = 203

-------------+------------------------------ F( 2, 200) = 6.14

Model | 68.0241289 2 34.0120645 Prob > F = 0.0026

Residual | 1108.62612 200 5.54313059 R-squared = 0.0578

-------------+------------------------------ Adj R-squared = 0.0484

Total | 1176.65025 202 5.82500122 Root MSE = 2.3544

rhknow | Coef. Std. Err. t P>|t| [95% Conf. Interval]

age | .0587707 .0268695 2.19 0.030 .0057868 .1117545

educ | .5905042 .266571 2.22 0.028 .0648539 1.116155

_cons | 6.25016 .6579523 9.50 0.000 4.952746 7.547574

Categorical Explanatory factors

use mice , clear

. oneway lcount drug , tab

drug | Mean Std. Dev. Freq.

Source SS df MS F Prob > F

Between groups 430 2 215 4.62 0.0325

Within groups 558 12 46.5

Total 988 14 70.5714286

Bartlett's test for equal variances: chi2(2) = 2.9923 Prob>chi2 = 0.224

Analysis of variance and regression

. replace drugb=1 if drug==2

(5 real changes made)

. replace drugc=1 if drug==3

(5 real changes made)

. list drug drugb drugc , noobs

| drug drugb drugc |

reg lcount drugb drugc

Source | SS df MS Number of obs = 15

Thus we can use regression models to fit Analysis of Variance models

You might also like