BES Tutorial Sample Solutions, S1/13
WEEK 11 TUTORIAL EXERCISES (To be discussed in the week starting
May 20)	
1.	 Use	a	calculator	to	compute	the	sample	least	squares	regression	line	for	
,	given	the	following	six	observations.	
the	model		
y	 2	 8	 6	 12	 9
11
x	 1	 4	 3	 10	 10 8
10
6
10
6;	
6 	 2
	74	
12 9 11
8	
6
6 	 11 8 	 	62	
TN:	 Dividing	 these	 2	 terms	 by	 (n1)	 =	 5	 gives	 the	 sample	 covariance	
between	x	&	y	and	the	sample	variance	of	x,	respectively.			
	
Thus	the	sample	regression	line	is		
	
	
0.8378
2.9732
62
0.8378	
74
6 2.9732	
0.8378 	
2.	 Suppose	 the	 relationship	 between	 the	 dependent	 variable	 weekly	
household	 consumption	 expenditure	 in	 dollars	 (y)	 and	 the	 independent	
variable	 weekly	 household	 income	 in	 dollars	 (x)	 is	 represented	 by	 the	
simple	regression	model	(i	refers	to	the	ith	observation	or	household):					 	
	 Suppose	a	sample	of	observations	yields	least	squares	estimates	of		
	 b0	=	32		and	b1	=	0.82.		 	
(a) What	does	 	represent	in	the	model?	
It	 is	 the	 random	 disturbance	 term.	 	 It	 includes	 any	 purely	 random	 factors	 or	
errors	and	factors	that	have	been	left	out	of	the	model	but	whose	influence	is	
considered	minor.		
(b) State	 the	 basic	 (classical)	 assumptions	 made	 about	 the	 s	 in	 this	
model.		Explain	in	words	what	the	assumptions	mean.		
|
0	for	all	observations.	The	conditional	mean	of	the	disturbance	
(i)
does	not	depend	on	x	and	is	normalized	to	zero.		Note	this	is	different	from	
Keller	who	only	mentions	the	normalization	to	zero.	That	the	conditional	
mean	 of	 the	 disturbances	 does	 not	 depend	 on	 x	 ensures	 unbiasedness	 of	
the	OLS	estimator	and	so	is	the	much	more	important	component	of	this	
assumption.	 Relating	 back	 to	 the	 previous	 part	 of	 the	 question	 it	 implies	
that	 omitted	 factors	 that	 might	 affect	 expenditure	 but	 appear	 in	 the	
disturbance	are	assumed	to	be	uncorrelated	with	x.		
, 	are	drawn	by	simple	random	sampling	and	hence	iid.	
(ii)
(iii) The	standard	deviation	of	 	is	constant	for	all	observations.		It	is	denoted	
by	 	 and	 we	 say	 the	 disturbances	 are	 homoskedastic.	 Here	 that	 implies	
the	 variability	 in	 food	 expenditure	 does	 not	 depend	 on	 income	 which	 is	
possibly	problematic	in	practice.			
(iv) The	 disturbances	 for	 any	 two	 observations	 are	 independent.	 	 This	 will	
imply,	 in	 particular	 that	 there	 is	 no	 correlation	 between	 disturbances	
associated	 with	 different	 observations.	 In	 this	 example	 the	 factors	 in	 the	
disturbance	for	household	i	are	not	correlated	with	those	for	household	j.	
(v) 	 is	normally	distributed	for	all	observations.		
(c) Does	 the	 estimate	 of	 b0	 =	 32	 make	 sense?	 	 If	 not,	 does	 this	
necessarily	invalidate	the	model?		Explain	your	answer.	
2
	
This	indicates	that	if	a	household	had	a	zero	weekly	income	then	on	average	
such	 a	 household	 would	 have	 negative	 consumption,	 which	 does	 not	 make	
sense.		However,	this	does	not	necessarily	invalidate	the	model.		It	may	be	that	
the	 linear	 model	 is	 only	 a	 reasonable	 approximation	 for	 some	 range	 of	
household	 incomes,	 not	 including	 incomes	 near	 zero.	 	 In	 particular,	 the	
relationship	 may	 be	 nonlinear	 for	 values	 of	 	 x	 near	 zero.	 	 The	 conclusion	 is	
that	we	should	be	careful	in	interpreting	the	intercept	term,	as	it	may	not	be	
very	meaningful	in	some	cases.	
	
(d) Interpret	both	 1	and	b1.	What	does	the	model	predict	would	be	the	
change	in	y	following	a	$10	increase	in	x	from	some	initial	level?	
1		is	the	(unknown)	population	change	in	the	value	of		y	resulting	from	a	one	
unit	 increase	 in	 x,	 whereas	 	 b1=0.82	 is	 an	 estimate	 of	 1.	 	 In	 this	 particular	
example	this	is	the	marginal	propensity	to	consume	that	would	be	discussed	in	
economics	 courses.	 The	 predicted	 change	 in	 y	 following	 a	 $10	 increase	 in	 x	
10 0.82 $8.20.	
would	be	10	
(e) Suppose	 we	 measured	 y	 and	 x	 in	 cents	 rather	 than	 dollars.	 	 What	
effect	would	this	have	on	the	estimated	coefficient	of	x?		What	effect	
would	it	have	on	the	estimated	intercept?	
In	this	case:	$x	becomes	100x	cents	and	$y	becomes	100y	cents.	The	estimated	
coefficient	of	 x i 	when	the	variables	are	measured	in	dollars	is	given	by	
	
If	we	let	  	be	the	estimated	slope	coefficient	when	the	variables	are	measured	
in	cents,	we	have	
 100
100  100
100 	 100 
	
 100
100 
100 
	
Also,	denote	by	  		the	estimated	intercept	in	this	case	then	we	have	
	
100
100  100
100 	
	
Thus	estimation	of	this	model	(with	the	same,	but	rescaled	data)	would	lead	
3200.		
to	an	unchanged	b1,	whilst	the	intercept	term	would	become	100
3
(f) Suppose	y	were	measured	in	dollars	but	x	were	measured	in	cents.		
What	effects	would	this	have	on	the	estimated	coefficient	of	x?	
	
Denote	the	estimated	slope	and	intercept	in	this	case	by	let	  		and		  	.		Then	
	
 100
100 
	 100 
	
 100
100 
100 
100
	
100 
	
	
Now	 estimation	 of	 this	 model	 would	 lead	 to	 the	 estimated	 coefficient	 of	 the	
income	 variable	 being	 0.0082	 and	 estimated	 intercept	 would	 be	 unchanged.		
This	makes	sense	since:	
 If	 income	 is	 measured	 in	 dollars,	 we	 predict	 expenditure	 (in	 dollars)	 will	
increase	by$0.82	if	household	income	increases	by	one	dollar.	
 If	 income	 is	 measured	 in	 cents,	 we	 predict	 expenditure	 (in	 dollars)	 will	
increase	by	$0.0082	if	household	income	increases	by	one	cent.	
	
(g) Distinguish	 between	 	 and	  	 (the	 residual	 associated	 with	
observation	i).		Illustrate	your	answer	with	a	diagram	
	
	
	 	 as	 an	 estimate	 of	 the	 true	 random	 disturbance	
We	 can	 think	 of	 
.	
associated	with	observation	i	,
	
	
	 				 														 	
	
	
3. Computing	Exercise	#4	 	
Refer	 to	 the	 computing	 program	 and	 answer	 Discussion	 Questions	 4.1	
and	4.2	associated	with	simple	linear	regression.
Q4.1	Discussion:	
		
Based	 on	 the	 information	 you	 obtained,	 describe	 the	 relationship	
between	the	returns	on	the	individual	stock	(Intel)	and	the	returns	on	
the	overall	market	(S&P).	
As	indicated	below	in	the	Line	fit	plot	produced	for	the	second	question,	there	
is	 a	 positive	 correlation	 between	 the	 returns	 on	 Intel	 stock	 and	 the	 overall	
market	 return.	 However	 there	 is	 considerable	 variation	 around	 the	
superimposed	linear	relationship.	
	
	
Q4.2	Discussion:	
		
i)
What	is	the	sample	regression	line?	
From	the	Excel	regression	output	below:	
	
0.022 1.472 , 	
	
ii) Is	there	sufficient	evidence	to	infer	at	the	5%	significance	level	that	
there	is	a	linear	relationship	between	the	return	on	Intel	
Corporation	stock	and	the	return	on	the	total	market?	
Appropriate	hypothesis	to	be	tested	is:	
	
:	
0;	 :	
0	
	
which	according	to	the	Excel	output	yields	a	pvalue	of	0.0069	and	so		for	any	
significance	level	greater	than	0.0069	(which	includes	5%	)	we	would	reject	
the	null	and	conclude	there	is	evidence	to	suggest	a	linear	relationship.	
	
iii) Is	there	sufficient	evidence	to	infer	at	the	5%	significance	level	that	
Intel	Corporation	stock	is	more	sensitive	than	the	average	stock?	
Now	the	appropriate	hypothesis	to	be	tested	is:	
	
:	
1;	 :	
1	
	
The	standardized	test	statistic	for	this	hypothesis	is:	
5
1.47163 1
0.52052
0.9061	
	
Using	a	t	critical	value	and	40	degrees	of	freedom	(actually	47	degrees	of	
freedom	but	this	value	not	in	tables)	yields	a	rejection	region	of		t>1.684.	
Alternatively	with	a	relatively	large	sample	size	we	can	invoke	the	CLT	and	use	
the	5%	normal	critical	value	of	1.675.		
	
In	either	case	the	calculated	test	statistic	falls	well	short	of	the	reject	ion	
region	and	we	cannot	reject	the	null	hypothesis.	
	
iv) Discuss	the	significance	of	the	findings?	
While	there	is	evidence	of	a	strong	positive	relationship	between	the	returns,	
the	evidence	of	whether	the	Intel	stock	is	more	or	less	sensitive	to	the	market	is	
weak.	The	point	estimate	of	1.472	indicates	evidence	in	favour	of	being	more	
sensitive	but	we	cannot	exclude	the	possibility	that	it	is	in	fact	less	sensitive.	
The	95%	CI	provided	by	Excel	is	(0.424,	2.519)	and	hence	includes	values	
consistent	with	both	possibilities.	
	
	
v)
Explain	the	meaning	of	the	regression	and	residual	sum	of	squares.	
The	 total	 sums	 of	 squares	 representing	 the	 total	 variation	 (0.4446)	 in	 the	
dependent	 variable	 (returns	 on	 Intel	 stocks)	 can	 be	 decomposed	 into	 two	
parts:	 a	 regression	 sum	 of	 squares	(0.0658)	 representing	 that	 part	 explained	
by	the	regression	model	and	the	residual	sum	of	squares	(0.3788)	representing	
that	part	left	over	and	unexplained	by	the	model.		In	this	case	the	latter	is	large	
relative	to	the	former	leading	to	an	R2	of	0.148	indicating	that	only	14.8%	of	
the	variation	in	Intel	stock	is	being	explained	by	the	market	model.	
	
This	is	consistent	with	our	initial	observation	from	the	scatter	plot	that	there	
was	considerable	variation	around	the	trendline.	See	also	the	line	fit	plot	that	
overlays	the	estimated	market	model	on	the	bivariate	scatter.	
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.3848
0.1480
0.1295
0.0907
48
ANOVA
df
Regression
Residual
Total
Intercept
INDEX
SS
MS
0.065822161 0.065822
0.378800255 0.008235
0.444622416
1
46
47
Coefficients
0.02192
1.47163
Standard Error
0.01508
0.52052
t Stat
1.45365
2.82722
Significance F
F
7.993182 0.0069287
P-value Lower 95% Upper 95% Lower 95.0%Upper 95.0%
0.15283
-0.00843
0.05228
-0.00843
0.05228
0.00693
0.42387
2.51938
0.42387
2.51938
INDEXLineFitPlot
0.25
0.20
0.15
0.10
INTEL
0.05
0.1
INTEL
0.00
0.05
0.05 0
0.05
0.1
PredictedINTEL
0.10
0.15
0.20
INDEX