Introduction To Statistics
Introduction To Statistics
Chapter-One
Basic concepts, Method of data collection and presentation
1. Introduction
The word statistics comes from the Italian word “statistita” which means “state man”, in spite of the fact that
people had been recording and using data for various purpose in the distance past, the term was first used in the
early 18th century in many part of Europe, to signify the application of recorded data for political purpose of the
state. Now day statistical information plays a pivotal role in a wide range of fields, many of which influence and
affect our day to day activities. Statistics is used in almost all fields of human endeavor. In the recent past
statistics has become part of the natural science, social science, research, business, management, planning,
economics, industry, behavioral sciences, agriculture, engineering and many other experimental sciences.
2. Definition of Statistics
Before we get in to further discussion we should introduce the meaning of the word statistics:-
Statistics in its plural sense
It is equivalent to referring to a collection of numerical facts, figures or statistical data. This meaning of the
word is widely used when reference is made to facts and figures on employment or unemployment, rate of
traffic accident, death, birth, student’s enrollment at university etc.
Statistics in its singular sense
In this case statistics has it modern meaning and refer to subject area that is concerned with the extraction of
relevant information from available data with the aim to make sound decisions. In this case, it means a branch
of mathematics or applied research, which is concerned with the development and application of methods and
techniques for collecting, organizing, presenting, analyzing, and interpreting quantitative data in such a way that
the reliability of conclusions based on the data may be evaluated objectively in terms of probability statements.
Classification of statistics
Statisticians commonly classify statistical techniques in to two broad categories as:-
• Descriptive
• Inferential
Descriptive statistics: - this part of statistics deals only with describing some characteristics of the data
without going beyond the data. It encompasses any kind of data processing techniques, which is assigned to
summarize or describe important features of the data. This comprises the first three stages of statistical
investigation namely: collection, organization and presentation of data.
E.g 20% of the students in my class are married
1
Inferential statistics: - this is concerned with drawing statistically valid conclusions about the
characteristics of the population based on information from a sample. It is part of statistics which is
concerned with generalizing from sample to population using probability, performing hypothesis testing,
determining relationship between variables and making prediction.
E.g. at least 5% of all killings during last year in city Y were due to terrorists
Stages in statistical investigation
Data collection: - this is the 1st in statistical investigation, gathering information is our basic purpose for
the investigation. If data are needed and if not readily available, then they have to be collected. Data may be
collected by the investigator directly using methods like interview, questionnaire, and observation or may be
available from published or unpublished sources.
Data organization: - it is a stage where we edit our data. The collected data might involve irrelevant
figures, incorrect facts, omissions and mistakes. Errors that may have been included during data collection
will have to be edited. After editing, we may classify (arrange) data according to their common
characteristics, which is called organizing.
Data presentation: - this stage is presenting the organized data in the form of tables and diagrams.
Graphs may also be used to give the data a valid meaning and make the presentation attractive.
Data analysis: - this is the stage where we critically study to draw conclusion about them. Analysis
usually involves highly complex and sophisticated mathematical techniques.
Data interpretation: - this stage means drawing conclusion from the data which form the basis for
decision making. This is the stage where we draw valid conclusion from the results obtained through data
analysis.
Definitions of some statistical terms
➢ Population: - is the complete collection of individuals, objects or measurements that have a
characteristic in common.
Population may be finite or infinite, if population values consists of a fixed number of thus values, the
population said to be finite. On the other hand if a population consists of an endless succession of
values, then the population is an infinite one. It may also be a population quantitative or qualitative.
➢ Sample: - when a population is infinite it is impossible to obtain all possible observations for they are
infinitely many. If the nature of the study is also destructive, we can’t obtain information from each and
every member of the population, for the fact that it is destructive. Therefore, we will be forced to deal
with a representative part of the population in analyzing the data, such a representative part of the
population is called sample.
➢ Census: - is the process of collecting data covering all the units in the population.
2
➢ Parameters: - are numerical characteristics of the population defined for each variable of interest. Or a
statistical measure obtained from a population.
➢ Statistic: - is a measure, which is obtained from the sample data to make statements about an unknown
parameter, or is a measure obtained from a sample.
➢ Frame: - is a list of element covering the survey population, serves as a base for sample selection.
➢ Data: - is a set of related observations from which conclusions may be drawn.
➢ Variable: - a characteristics or attribute associated with each unit in the population that can assume
different values.
➢ Elementary unit: - is an element or group of elements on which information is required.
➢ Sampling unit: - for the purpose of sample selection, population is divided into a finite number of
distinct, non – over lapping and identifiable units.
Application and limitation of statistics
Application of statistics
Statistics is used in almost all fields of human activities and used by governmental bodies, private business
firms and research agencies as an indispensable tool. Particularly it is used in the following area.
➢ Design and analysis of experiments to testing of new aides and computing hypothesis.
➢ For short term and long term rational planning and decision making and control.
➢ To asses past trend and current status and to forecast future economic activities for a firm, an
industry or economy as a whole.
➢ Determination of man power requirements personnel selection, making research, financial analysis,
distribution of analysis and development.
➢ In public administration and in the social science like in the studies of poverty, population, voting
pattern, accidents etc.
➢ In communicating information, drawing conclusions and inference from data and guiding planning
and decision.
Limitation of statistics
Even though statistics is growing in popularity and in being successfully employed by the seekers of truth
in numerous fields of learning, still it has limitations, some of them are: -
➢ Deals directly only with quantitative characteristics.
➢ Doesn’t deal with individual measurements rather it studies aggregate of facts.
➢ Results are true only in general and on the average.
➢ Ignorant or wrongly motivated persons can miss use statistics.
3
In summary, statistics is a highly developed science with deep rooted mathematical base. It is
applicable to a large number of economic, social and business phenomena. It is a backbone of
industrial research, basic science research and planning.
Scale of measurement
➢ Nominal data: - as the name implies it consists of “naming” observations, or classifying them in to
various mutually exclusive and collectively exhaustive categories.
It indicates only that there is a qualitative difference among categories.
E.g. sex of an individual
The regional number of Ethiopia
➢ Ordinal data: - they are nominal data which have order and consensus, or measurements with
ordinal scales are ordered in the sense that higher numbers represent higher values.
They can have meaningful inequality but we can’t get meaningful difference.
E.g. military ranks comparing 3 star general and 4 star general
➢ Interval data:- they are ordinal data in which the difference between units have meaning
There is no true zero, it is arbitrary
The ratios of different values are meaningless
E.g. the temperature of town X is 300c in Monday
➢ Ratio data: -they are interval data, which also have true zero, which shows the absence of
something. And which make possible to state relations in terms of proportion or ratios.
E.g. income of a person
CHAPTER TWO
5
❖ Untrained interviewer may distort the meaning of the questions.
❖ Attribute of the interviewer may affect the responses given due to:
a) Bias of the interviewer and b) his/her social or ethnic characteristics.
❖ It costs much in terms of time and money. Training of interviewers, salary for interviewers.
In many instances interviewers go house to house in order to locate the respondents.
Employing a bilingual interviewer can also increase cost.
4. Telephone Interviews
Advantages
It is less expensive in time and money compared with face to face interviews. The interviewer is able to
help the respondent if he/she doesn’t understand the question (as seen with face to face interview)
- Broad representative samples can be obtained for those who have telephone lines.
- May assure the uniformity of interviewer.
Disadvantage
❖ Under representation of those groups which do not have telephones.
❖ Problem with unlisted telephone number in the directory.
❖ Respondent may be substituted by another.
❖ Depending on the time of day the phone calls are made, different types of persons are reached
which will create bias in the sample.
❖ Problem with questions with multiple options for answers and complicated questions.
❖ Repeated calls may be necessary.
5. Self administered questionnaire returned by mail (mailed questionnaire)
Here the questionnaire is mailed to respondents to be filled. Sometimes it is known as self enumeration.
Advantages
- These are the cheapest. There is no need for trained interviewer. There is no interviewer bias.
- Mailed questionnaire can be coordinated from one central location.
Disadvantage
- Low response rate
- Uncompleted questionnaires due to omission or invalid responses.
- No assurance that the questionnaire was answered by the right person
- Needs intense follow up to get a high response rate.
III. The use of documentary sources
6
Extracting information from existing sources (e.g. Hospital records) is much less expensive than the other
two methods. It can be an important source of data.
Limitation: It is difficult to get information needed, when records are compiled in unstandardized manner.
1.2.2. Source and Types of Data
Data may be obtained from two sources, primary and secondary.
1. Primary sources: sources that can supply first hand information for immediate use.
Primary data: data originally collected for the purpose at hand.
Example: observe signs, measure characteristics, record symptoms and interview respondent, etc.
2. Secondary sources: the source in which data are obtained from records of individual that have been
collected by persons other than the investigator for other purpose.
Example: Hospital records, vital statistics and registers, etc.
Secondary data: the data obtained from secondary sources.
1.2.3. Method of Data Presentation
The data collected in survey or other empirical inquiry are called raw data. These unorganized data are not in a
way to be assimilated. It is therefore, necessary to reduce and present the data with their relevant features.
Tables
Tables include the systematic arrangement of statistical data in column and rows. Important features are:-
a. Tables should be simple and self explanatory.
b. Each row or column should be labeled concisely and clearly giving units of measurement for all
quantitative data.
c. The title should describe the content of the table and the scale should be understood without reference to
the text. A good title will answer the question: what? When? And where?
d. Percentage should add up to 1000%.
e. Any necessary explanatory footnotes should be included at the bottom of the table.
1.2.3.1.Frequency distribution and tables
Frequency: - is the number of counts assigned to individuals having a particular characteristic.
Frequency distribution: the set of frequencies of all possibilities is called frequency distribution of the
variable.
Based on the type of data, we can have two type of frequency distribution, tables.
a) Qualitative frequency tables (categorical frequency distribution)
Table 1 Data on smoking status by gender of a sample of health workers, Jimma Hospital 1986 E.C.
a. Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Gender M F M M F F F M M M F F F F M F M F M M
7
Smoking status Y N N Y N N Y N N N Y N N Y Y Y N N Y N
Information on each of the characteristics (gender, and smoking status) is displayed for each health worker.
a. Characteristics Tally Frequency
Gender
Male //// //// 10
Female //// //// 10
Smoking status
No //// //// // 12
Yes //// ///
Summarize what is presented in (a)
10
Frequency (fi) 3 4 3 5 5
Note: this is only example
The data are grouped in to a set of non-overlapping intervals.
Class limits (CL): these are extreme values for each class. They are called lower and upper class limits and
are used for discrete values.
For our example:
Lower class limits (LCL): are 23, 27, 31, 35, and 39
Upper class limits (UCL): are 26, 30, 34, 38, and 42
Note: I. usually class intervals are ordered from smallest to largest.
II. The lower limit of the first class interval should be equal to or smaller than the smallest
measurement in the data.
III. The upper limit of the last class interval should be equal to or greater than the largest measurement.
Class Boundaries (CB)
With continuous data, values such as 26.5 will not fit any of the class given above. It is therefore necessary
to set exact limit or true limits which are known as class boundaries.
Exact limits refer to values of continuous measurement.
a) Lower class boundary (LCB): given a class limit, the LCB is obtained by subtracting half the unit of
measurement from the LCL of the class.
The unit of measurement is the gap between the UCL of the class and the LCL of the next higher class.
Thus LCBi=LCLi - (LCLi+1-UCLi)/2
b) Upper class Boundary (UCB): UCB is the average of the upper class limit and the next lower class
limit.
i.e. UCB is obtained by adding half the unit of measurement to the UCL of the class.
Thus
UCBi= UCLi+ (LCLi+1-UCLi)/2
= (UCLi+LCLi+1)/2
Note: UCBi=LCBi+1
Proof:
Consider:
LCLi+1=LCLi+1-(LCLi+2-UCLi+1)/2
But UCLi+1-UCLi=LCLi+2-LCLi+1=w
⇒UCLi+1=LCLi+2 +UCLi-LCLi+1
Substituting this in to the formula, we have
11
LCBi+1= (LCLi+1+UCLi)/2
Examples:
Convert the following class limit into class boundaries
a) 5-9 b) 44.5-49.4 c) 78.25-80.24
10-14 49.5-54.4 80.25-82.24
15-19 54.5-59.4 82.25-84.24
a) LCB1=LCL1- (LCL2-UCL1 )/2=5- (10-9)/2=4.5
The UCB1= (UCL1+LCL1+1)/2= (9+10)/2=9.5
a) 4.5-9.5 b) 44.45-49.45 c) 78.245-80.245
9.5-14.5 49.45-54.45 80.245-82.245
14.5-19.5 54.45-59.45 82.245-84.245
Class Marks (mi): are the mid points of the classes.
Mi= (LCLi + UCLi)/2 or (LCBi + UCBi)/2
e.g. (5+9)/2=7 or (4.5+9.5)/2=7
Note: mi+1=mi + w
m2=7+5=12 also (10+14)/2=12
Advantage of grouping
• Provides information about the range of the data.
• Gives an impression about the values that are frequent and infrequent.
• It provides data that can be easily used for graphical representation.
Disadvantage of grouping
• Information may be lost, since individual values displayed.
• Something that can be determined from the original data cannot be determined from grouped data.
Modified frequency distribution
I) The cumulative frequency distribution: is used when one is interested to know how often the
measurements fall below or above a certain level.
Less than cumulative frequency (LCF): the LCF of a value of a variable is the number of
individual with value less than or equal to that value.
More than cumulative frequency (MCF): the MCF of a value of a variable is the number of cases
with value greater or equal to that value.
Example:
Class limit Frequency Less than LCF More than MCF
23-26 3 22.5(<23) 0 22.5(>22) 20
12
27-30 4 26.5(<27) 3 26.5(>26) 17
31-34 3 30.5(<31) 7 30.5(>30) 13
35-38 5 34.5(<35) 10 34.5(>34) 10
39-42 5 38.5(<39) 15 38.5(>38) 5
42.5(<43) 20 42.5(>42) 0
Relative frequency distribution: the proportion of individuals expressed as percentage of the total.
frequency of each class
Rel.freq. = x100
total freqency of item
⇒R.fi=fi/n x 100
Example
Class Freq. Relative freq. Cumulative
limit (%) R.freq.
23-26 3 3/20x100=15 15
27-30 4 4/20x100=20 35
31-34 3 3/20x100=15 50
35-38 5 5/20x100=25 75
39-42 5 5/20x100=25 100
Total 50 100
Note:- about 75% of the women are in the age group 23-28 years.
1.2.3.2. DIAGRAMATIC PRESENTATION OF DATA
The essential advantages of these presentations lie in the fact that they facilitate comparisons.
Bar chart
There are three types of bar charts
I) Simple
II) Component
III) Multiple
I) Simple Bar chart: the bars may be vertical or horizontal, with their height or width representing the size
of the data. It helps to make simple comparison b/n data.
- The bars do not overlap.
- The space b/n the bars must be equal and narrow.
13
- It shows changes in the totals of different categories.
Example Construct a simple bar diagram for the following table showing annual cases of HIV reported in
Ethiopia as of July 31, 1993.
Year of report 1986 1987 1988 1989 1990 1991 1992 1993
Cases 2 17 87 190 448 885 3256 2814
14
Percentage component Bar chart
Similar to actual component bar, except the components are expressed as percentages of the total.
- All bars are equal in height.
- Mostly used to compare relative variation b/n data.
Example: draw a percentage component bar chart for the vaccination data, previously described
Soln
Male Female
DPT= 250/550 X 100=45.5% 300/550 X 100=54.5%
POLIO=300/620 X100=48.4% 320/620 X100=51.6%
BCG=200/410 X100=48.8% 210/410 X100=51.2%
III. Multiple Bar chart
These are used when two or more inter-related data are to be compared. Height of bars shows actual values of
each component. It is used to present more than one variable.
Example: draw a multiple bar chart for the vaccination data.
15
PIE CHART
Pie charts are used to show the partitioning of a total into its components parts using circles. The circles should
be divided into sectors proportional in size to the frequencies of the categories they represent.
Steps in drawing a pie chart
1. Convert freq. distribution into percentage frequency distribution.
2. Draw a circle of any of radius and note that the circle is represented by an angle of 3600.
3. Convert percentage into degree measures. Since the whole circle (3600) represents 100% of the
observation, 3.60 will represent 1%.
Example
Draw the pie chart for the following table. First construct a table providing the central angles.
16
Histogram
A histogram presents grouped frequency distribution of a continuous type. The real limits of the class make up
the horizontal axis, while the vertical axis has as its scale the frequency of occurrence. Comparison can be made
using the height or areas of the bar.
Method of construction histogram
I) Obtain a frequency distribution with class boundaries and class midpoints.
II) Construct bars on the horizontal axis with center at the class midpoint and width equal to the class
width.
III) The height of each bar should correspond to the respective class frequency.
Example: consider the following grouped age data
Class Mid
S.N Class limit Frequency
boundaries point
1. 15-19 14.5-19.5 17 2
2. 20-24 29.5-24.5 22 8
3. 25-29 24.5-29.5 27 6
4. 30-34 39.5-34.5 32 12
5. 35-39 34.5-39.5 37 7
6. 40-44 39.5-44.5 42 6
7. 45-49 44.5-49.5 47 4
8. 50-54 49.5-54.5 52 3
9. 55-59 54.5-59.5 57 1
10. 60-64 59.5-64.5 62 1
Note: each cell contains a certain proportion of the total area, depending on the frequency.
For example, the fourth cell contains 12/50 of the area. ⇒The relative frequency of occurrence of values
between 29.5-34.5.
Histogram
17
Frequency polygon:- is a multi-sided figure where the frequency is plotted against the class midpoint. The
steps are:
I) Construct a histogram
II) Mark the midpoint on the top of each bar
III) Join these marks with straight lines
IV) Extend these lines on both ends so that it reaches the horizontal axis at the class mid points. This
allows the total area to be enclosed.
Frequency distributions of age
Class limit 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
Mid point 17 22 27 32 37 42 47 52 57 62
Frequency 2 8 6 12 7 6 4 3 1 1
Note: the total area under the frequency polygon is equal to the area under the histogram.
Ogives or cumulative frequency curve
Points are plotted in association with the exact values on the horizontal axis and the cumulative frequency
values on the vertical axis. Then connect the points with straight lines.
-the curves obtained are called the “less than” and “more than” curves.
Note: Cumulative frequencies are plotted at the class boundaries. Frequency polygons are plotted at class
marks. The sum of the frequencies of two or more classes is cumulative frequency.
Consider the age data
Class Frequency Less than LCF More than MCF
18
limit
<23 0 22.5(<23) 0 22.5(>22) 20
23-26 3 26.5(<27) 3 26.5(>26) 17
27-30 4 30.5(<31) 7 30.5(>30) 13
31-34 3 34.5(<35) 10 34.5(>34) 10
35-38 5 38.5(<39) 15 38.5(>38) 5
39-42 5 42.5(<43) 20 42.5(>42) 0
19
CHAPTER 3
SUMMARIZING DATA
MEASURE OF CENTRAL TENDENCY
INTRODUCTION
In this unit we shall discuss measure of central tendency. These are also known as measure of central location or
central values. The most important objective of statistical analysis is to determine a single value for the entire
mass of data, which describes the overall level of the group of observations and can be called a representative of
the whole set of data. It tells us where the center of the distribution of data is located on the scale that we are
using. There are several such measures, but here we shall discuss the most commonly used measures of central
tendency. This includes: mean, median and mode.
Section one is concerned with four different types of means, namely arithmetic means, weighted mean,
geometric mean and harmonic means.
Section two deals with mode and median .we hope that you are familiar with arithmetic mean, median and
mode for ungrouped data.
Do you know how to compute mean, median and mode for a grouped data? This is one of the main concerns of
this unit. There are also other measures of central tendency (sometimes called measure of non-central location,
such as quartiles, deciles and percentiles. We shall discuss about these in section three.
Section one: means
Among the types of means we discuss four of them, which are suitable for a particular type of data. These are
Arithmetic mean, weighted mean, geometric mean and harmonic mean.
We shall discuss each of these one by one in this section. In chapter I we have seen that the meaning of the
word “population” in statistics is quite different from that of our every day language. What about the word
average? The word average occurs frequently in our everyday usage. we usually say “average family size”,
20
average family income, or the average marks of students etc. in this sense ,obviously, it refer to the arithmetic
mean .however ,in statistics “average ”in general means any measure of central location.
Finally we put the following remark, which is going to be used throughout this course. If mean is mentioned, it
implies arithmetic mean, while the other means are identified by their name.
Properties of summation
The symbol Σ is capital sigma, the Greek letter for Summation.
1. In our notation ∑𝑛𝑖=1 𝑋𝑖 stands for the sum of X’s (i.e. X1+X2+…+Xn).
2. ∑𝑛𝑖=1(𝑌𝑖 +𝑋𝑖 ) =∑𝑛𝑖=1 𝑋𝑖 + ∑𝑛𝑖=1 𝑌𝑖 6. ∑𝑛𝑖=1 𝑌𝑖 𝑋𝑖 = 𝑋1 𝑌1 + 𝑋2 𝑌2 + ⋯ + 𝑋𝑛 𝑌𝑛
3 . ∑𝑛𝑖=1 𝑌𝑖 𝑋𝑖 = 𝑋1 𝑌1 + 𝑋2 𝑌2 + ⋯ + 𝑋𝑛 𝑌𝑛 7. ∑𝑛𝑖=1 𝐶𝑋𝑖 = 𝐶 ∑𝑛𝑖=1 𝑋𝑖 4 . ∑𝑛𝑖=1 𝐶 =
𝑛𝐶 8. ∑𝑛𝑖=1(𝑋𝑖 ± 𝐶) = ∑𝑛𝑖=1 𝑋𝑖 ± 𝑛𝐶
5. 1 + 2 + 3 + 4 + ⋯ … … … + 𝑛= n(n+1) 9. 12+22+32+42+…………+n2 = n(n+1)(2n+1)
2 6
Desirable properties of good measure of central tendency
1. It should be easy to calculate and understand.
2. It should be rigidly defined. It should have one and only one interpretation so that the personal prejudice or bias
of the investigator does not affect the value or its usefulness.
3. It should be representative of the data. If it is calculated from a sample, then the sample should be random
enough to be accurately representing the population.
4. It should have sampling stability. It should not be much affected by sampling fluctuations. This means that if we
pick 10 different groups of college students at random and we compute the average of each group, then we
should expect to get approximately the same value from these groups.
5. It should not be affected much by extreme values, if a few very small or very large value are presented in the
data, they will unduly influence the value of the average by shifting it to one side or the other and hence the
average would not be really typical of the entire series. Hence, the average chosen should be such that it is not
unduly influenced by extreme values.
1.1 Arithmetic mean and its properties
The arithmetic mean of a sample is the sum of all the observation divided by the number of observations in the
sample.
the sum of all values in the sample
i.e. Sample mean or arithmetic mean= number of values in the sample
Where 𝑋̅ stands for the sample mean or arithmetic mean and is read as “X bar”
Σ is the Greek capital letter sigma and indicates the operation of addition .so ∑𝑛𝑖=1 Xi stands for the sum of all
the Xi’s
If we take an entire population, the population mean denoted by µ is given by
X1+X2+X3+⋯+XN ∑𝑁
𝑖=1 𝑋𝑖
µ= =
N 𝑁
Activities 1.1 Find the arithmetic mean for the data given below.
Values(xi) 3 5 4 2 7 6
Frequency(fi) 2 1 3 2 1 1
Please try to answer before reading further!
Suppose the data are given in the form of discrete frequency distribution with frequencies f 1, f2, f3…fn associated
with the values of the variable x1, x2, x3…xn resp.
As there are f1 items with values x1,f2 items with values x2 etc the sum of all values equals
f1x1+f2x2+f3x3+f4x4+………….+fnxn and the total number of items is obviously f1+f2+f3+….+fn =∑𝑛𝑖=1 fi =n
Thus by formula 1.1 the arithmetic mean is given by
f1x1+f2x2+f3x3+f4x4+...+fnxn
𝑋̅ = 1.2
f1+f2+f3+...+fn
22
Combined mean
“If we have arithmetic means X̅1, X̅2… X̅n of n groups having the same unit of measurement of a variable,
based on n1, n2… nn observations respectively, we can compute the combined mean of the variant values of the
groups taken together from the individual means by the formula
n1x̅1+ n2x̅2+⋯+ nnx̅n ∑𝑛
𝑖=1 nix̅i
𝑋̅com= = ∑𝑛
1.3
n1+ n2+⋯+n n 𝑖=1 ni
The advantage of formula 1.3 is that we don’t have to do the entire calculation for the means of the combined
set of observations, if the mean of each observation is known.
Exercise The mean weight of 150 students in certain class is 60kgs.the mean weight of boys in the class is 70kg
and that of girl’s is 55kg .find the number of boys and girls in the class?
Do you recall how to calculate a class mark? If not please review chapter 2.
Exercise The net income of a sample of large importers of antiques was organized into the following table.
Net income 2-4 5-7 8-10 11-13 14-16
Number of importers 1 4 10 3 2
4 4
So ∑𝑖=1(xi − x̅ )2 = ∑𝑖=1(xi − 7)2 =26
If the deviations from some other value are squared their sum would be larger than 26.for instance take the
value 5 and then consider the following table.
Values(xi) 5 9 4 10 sum
xi- c 5-5=0 9-5=4 4-5=-1 10-5=5 8
(xi- c)2 (0)2=0 42=16 (-1)2=1 52=25 42
𝑛 4
So ∑𝑖=1(xi − c)2 = ∑𝑖=1(xi − 5)2 =42>26
Note: the importance of this property will be seen when we discuss standard deviation in unit 4.
3. Easy to calculate and understand
4. The mean is sensitive to extreme values.
e.g. 5, 9, 13, 12, and 16 has the mean 11 but if we have 100 instead of 5 the mean will be 30.
5. Uniqueness: the mean of any set of data is unique.
6. It can be used for further treatment.
- Comparison of means.
- Test on means.
24
Advantages and disadvantages of the arithmetic mean
1. It is easy to understand and to compute.
2. All the values are included in computing the mean.
3. A set of data has only one mean, thus, it is unique.
4. Every set of interval level and ratio level data has a mean.
Weighted mean
In the computation of arithmetic mean we had given equal importance to each observation. Sometimes the
individual values in the data may not be equally importance. When this is the case, we assigned to each weight
which is proportional to its relative importance and calculate the weighted mean.
The weighted mean of a set of values x1, x2, x3…xn with corresponding weights w1, w2…wn denoted by x̄w and
computed by:
w1x1+w2x2+⋯+ wkxn
x̄w = w1+w2+⋯+wn
∑𝑛
𝑖=1 wixi
This may be shortened to: x̄w = ∑𝑛
1.5
𝑖=1 wi
The calculation of cumulative grade point average (CGPA) in colleges and universities is a good example of
weighted mean.
Exercise If a student scores “A “in a 3 credit hours course ,”B” in a 4 credit hours course ,”C” In another 4
credit hours course and “D” in a 2 credit hours course and the numerical values of the letter grades are A=4,B=3
C=2,D=1,compute his /her GPA for the semester.
Geometrical mean
In algebra geometric mean is calculated in case of geometric progression, but in statistics we need not bother
about the progression, here it is particular type of data for which the geometric mean is of great importance
because it gives a good mean value. If the observed values are measured as ratios, proportions or percentages,
Geometric mean gives a better measure of central tendency than other means.
25
The Geometrical mean of n positive values is defined as the nth root of their product .that is, if all the given
observations
x1, x2, x3…xn are positive, then
G.M=(x1.x2.x3………….xn) 1/n 1.7
For instance, the G.M of 4, 8 and 6 is
G.M = (4x8x6)1/3= (512)1/3=8
Can G.M be calculated if any one or more values are zero or negative? Why?
In case the observed values x1,x2,x3,……..,xn have the corresponding frequencies f1, f2, f3…fn then
G.M=( x1f1. x2f2. x3f3. x4f4.………….. xnfn )1/n 1.8
Where n= ∑𝑛𝑖=1 fi
In case of grouped data, class marks of the class interval are considered as xi and formula 1.8 can be used as
such
G.M = (m1f1. m2f2..………….. mnfn ) 1/n
Where n= ∑𝑛𝑖=1 fi
Exercise The man gets three annual raises in his salary. At the end of first year he gets an increase of 4%, at the
end of the second year he gets an increase of 6% and at the end of the third year he gets an increase of 9% of his
salary. What is the average percentage increase in the three periods?
Exercise Compute the Geometric mean of the following values.
2, 8, 6, 4, 10, 6, 8, 4
We present below the method of computing Geometric mean using logarithm table.
Though standard techniques are available to find over square root and cube root, yet for large values of n, nth
root is not easy to compute .to overcome this difficulty , Geometric mean is computed through logarithm. Now
hoping that you recall properties of logarithm, we formulate (1.7) & (1.8) interns of logarithm (with base ten).
From formula (1.7) when reduced to its logarithmic form, it will be
1 1
Log (G.m) =log (x1.x2.x3…xn)1/n =n log (x1, x2, x3… xn) =n (logx1+logx2+…+logxn)
1
So G.M =antilog (n (logx1+logx2+…+logxn)) 1.9
26
1
So G.M = antilog (n(f1logx1+f2logx2+f3logx3+…+fnlogxn))
1
= antilog ( (∑𝑛𝑖=1 fi logxi)) 1.9
n
Where n=∑𝑛𝑖=1 fi
For frequency distribution with frequency fi corresponding to values xi, i=1, 2, 3… k similarly for grouped
data.
1
Log (G.m) = ∑𝑛𝑖=1 fi logmi/ ∑𝑛𝑖=1 fi = (n (∑𝑛𝑖=1 fi logxi)) 1.11
Where n= ∑𝑛𝑖=1 fi
Where (mi) and (fi) are the class mark and frequency of the ith class interval respectively. Taking antilog of both
sides in (1.11) we obtain
1
G.M=antilog (n (∑𝑛𝑖=1 fi log mi)) 1.12
Where n= ∑𝑛𝑖=1 fi
The following example illustrate how Geometrical mean is computed through logarithm,
Exercise Find the Geometrical mean of 2, 4, 8, 12, 16, 24
Exercise Given the following frequency distribution of a grouped data.
CI 10-14 15-19 20-24 25-29 30-34 35-39 40-44
frequency 10 15 17 25 18 12 8
Harmonic mean
Another important mean is the harmonic mean, which is suitable measure of central tendency when the data
pertains to speed, rates and time.
Let x1,x2,x3,……..,xn be n variant values in a set of observation, then the harmonic mean is given by
n n
H.M= 1 1 1 This may be shortened to: H.m =∑𝑛
+ +⋯+
x1 x2 xn 𝑖=1 1/xi
The following is a good example in which the application of harmonic mean is appropriate.
Exercise; A motorist travels for three days at a rate (speed) of 480km/day. On the first day he travels 10 hours
at a rate of 32km/h, on the second day 12hours at a rate of 40km/h, on the third day 15hours at a rate of 32km/h.
what is the average speed?
Note: Here harmonic mean gives the correct average speed because the man travelled equal distances on three
speeds. If, however, he had travelled for equal time interval, the arithmetic mean would have been the correct
average.
27
If the data are arranged in the form of frequency distribution in which an observation xi has frequency fi (i=1, 2,
3…k), the harmonic mean is given by
n ∑𝑛 fi
H.M=f1/x1+f2/x2+⋯+fn/xn this may be shortened to: H.m=∑𝑛 𝑖=1fi/xi
𝑖=1
Finally would like to point out that the relationship between among the three means.
(If all the observations are positive) is given as A.M≥G.M≥H.M
All these three means are equal if all positive valued observations are equal.
Median and mode
In this section we deal with two other measure of central tendency, namely, median & mode. It has been pointed
out that arithmetic mean cannot be calculated whenever there is frequency distribution with open ended
intervals. Also the mean is to a great extent affected by the extreme values of the set of observations. Hence in
such cases the arithmetic mean cannot be better described using median or mode.
In fact there are a number of circumstances in which we use these instead of any other of measure of central
tendency. Now we will discuss in details.
Median
Suppose we sort all the observations in numerical order, ranging from smallest to largest or vice versa. The
median is the middle value in the sorted list.
Median of ungrouped data
The median is found by arranging the data in order of magnitude. The median is then the value of the middle
term. We denote it by x̃.
For example: Suppose the sales commission ($) of 15 representatives were as follows:
23, 16, 31, 77, 21, 14, 32, 6, 155, 9, 36, 24, 5, 27, 19
Placing the data in order of magnitude, we have 5, 6, 9, 14, 16, 19, 21, 23, 24, 27, 31, 32, 36, 77, and 155
The value of the middle term is the 8th value that is 23. (There are seven values smaller than 23 and seven
values larger than 23).the median is therefore 23.
In the above illustration the number of observations is an odd number 15. In such case there is always a single
value in the middle of the list.
? How is the median determined for an even number of ungrouped data?
As before, we first order the observations. In this case there is not clearly defined middle observation. Instead
we find two observations in the middle of the ordered list. The middle is then taken as the mean of the two
central observations. For instance, if there are six items with values.
25, 29, 30, 32, 35, 65
The median is 31, obtained by determining the arithmetic mean of the two central observations 30 & 32.
x̃ = (30+32)/2=31
28
We locate it by counting down to the 3.5th items. Notice that 31 is not among the given values.
We summarize the above discussion as follows.
Let x1, x2, x3… xn be n ordered observations. Then median value is given by:
x̃ =Xn+1 if n is odd
2
n+1 th
That is median is the ( ) observation if n is odd.
2
n n
Or median is the mean of (2)th & (2 + 1)th observation if n is even.
Exercise Find the median from the following data of the heights in inches of a group of 14 students.
61,62,63,64,64,60,65,61,63,64,65,66,64,63
Now consider the case were the data are arranged in the form of frequency distribution. Suppose the ordered
values x1, x2, x3…xk have their corresponding frequencies f1, f2…fk the median for it can be calculated in the
following manner.
Construct the less than cumulative frequency .this is because as you know less than cumulative distribution tells
us the number of values that below or above the specified value of the observations.
n+1 th
If n=∑𝑛𝑖=1 fi is odd, find ( ) and search for the smallest less than cumulative frequency which is greater
2
n+1
than 𝑜𝑟 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 ( ). The variant value corresponding to this less than cumulative frequency is the median.
2
n 𝑛
If n is even, find (2) &( 2 + 1) and then search for the smallest less than cumulative frequency that is greater
n 𝑛
than or equal to (2) & ( 2 + 1). then the arithmetic mean of the variant values corresponding to these less than
29
The formula we use to compute the median of grouped data is
𝑛
( −𝑐𝑓)𝑤
2
Median = x̃ = lcb𝑋̃ + 1.17
𝑓𝑚
30
The mode
In every day speech, something is “in the mode” if it is fashionable or popular. In statistics this “popularity”
refers to frequency of observations, and the most frequently observed value in a collection of observations is
therefore called the mode.
The modal wage, for example, is the wage received by more individuals than any other wage.
Mode: is the value of the observations that appears most frequently.
For a given set of data, mode may or may not exist and even if it exists may not be unique. To illustrate this
consider the following three sets of data.
Set A: 10, 10, 9, 8, 5, 4, 5, 12, 10 mode=10
Set B: 10, 10, 9, 9, 8, 12, 15, 5 mode=9 &10
Set C: 4, 6, 7, 15, 12, 9 no mode
Thus it is possible for a frequency distribution to have more than one mode.
Distribution with one mode is called uni modal, those with two modes bimodal, and those with more than two
modes are called multi modal.
In the above illustration the distribution described in set A is uni modal and inset B bimodal.
Remark: In a set of observed values, all values occur once or equal number of times, there is no mode. (See set
C above).
What about for a grouped data?
If the data is grouped such that we are given frequency distribution of finite sized class intervals, we do not
know the value of every item, but we easily determine the class with highest frequency .the mode of the
distribution lies in this class. For this purpose
We call the class with highest frequency modal class.
In this case, the problem of determining the value of the mode is not so straight forward as in the ungrouped
case. Having located the modal class of the data, the next problem is to interpolate the value of the mode with in
this “modal class”.
This is made by the use of interpolation formula given below.
To compute the modal value of a grouped data we use the formula:
∆1
Mode = 𝑋̂ = lcb𝑋̂ + (∆ +∆ )𝑊 1.18 where; ∆1=𝑓𝑚 -𝑓𝑝 ∆2 =𝑓𝑚 -𝑓𝑠
1 2
31
Where: lcb𝑋̂ –is the lcb of the modal class. (That is the class with highest frequency).
𝑓𝑚 -is the frequency of the modal class. 𝑓𝑝 -is the frequency of the class preceding
the modal class.
𝑓𝑠 - is the frequency of the class succeeding the modal class. W -is the width of the modal class.
Now we discuss how to apply formula (1.18) to find the modal value of a grouped data with the help of the
following example.
Exercise The wages of newly hired, unskilled employees work grouped into the following distribution.
Compute the modal age?
Ages 18-20 21-23 24-26 27-29 30-32
number 4 8 11 20 7
Exercise The following table shows the distribution of a group of families according to their expenditure per
week.
The median and the mode of the following distribution are known to be 25.50Birr and 24.50 Birr respectively.
Two frequency values are however missing from the table. Calculate the missing frequencies.
Class interval 1-10 11-20 21-30 31-40 41-50
frequency 14 a 27 b 15
Properties of mode
1. it is not affected by extreme values of a set of observations.
2. It can be calculated for distribution with open ended classes.
3. It can be computed for all levels of data nominal, ordinal, interval and ratio.
4. The main drawback of mode is that often it does not exist.
5. Often its values are not unique.
Measure of non central location (Quintiles’)
There are three types of quintiles. These are:
1. Quartiles
The quartiles are the three points, which divide a given order data into four equal parts. These Q1, Q2, Q3
n+1 th
Q1 is the value corresponding to ( ) order observation.
4
n+1 th
Q2 is the value corresponding to 2( ) order observation.
4
n+1 th
Q3 is the value corresponding to 3( 4
) order observation.
E.g. Consider the age data given below and calculate Q1, Q2, and Q3
32
19, 20, 22, 22, 17, 22, 20, 23, 17, 18
Solution: First arrange the data in ascending order, n=10
17, 17, 18, 19, 20, 20, 22, 22, 22, 23
n+1 th 10+1 th
Q1 = ( ) =( ) = (2.75)th observation = 2nd observation + 0.75(3rd - 2nd)observation=17+0.75(18-
4 4
17)=17.75
Therefore 25% of the observations are below 17.75
n+1 th 10+1 th
Q2= 2( ) =2( ) = (5.5)th observation =5th +0.5(6th - 5th) = 20+0.5(20-20)=20
4 4
n+1 th 10+1 th
Q3= 3( ) = 3( ) = (8.25)th observation =8th +0.25(9th - 8th)=22+0.25(22-22)=22
4 4
Where lcbqi - is the lower class boundary of the ith quartile class
fi - is the frequency of the ith quartile class
lcfp - is the lcf corresponding to the class immediately preceding the ith quartile class.
Note: Q2= median
2. Percentiles (P)
Percentiles are symbolized by p1, p2 …p99 and divide the ordered distribution into 100 groups.
The percentile corresponding to a given value (x) is computed by
no of values below (c) + 0.5 c+0.5
Percentile (P) = Total no of observation(n) x100% = x100%
n
Exercise A teacher gives a 20 point test to 10 students. The scores are shown below. Find the percentile rank of
a score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
For finding a data value corresponding to a given percentile
1. Arrange the data in increasing order.
𝑛𝑝
2. Find c, c= 100.
Where: lcbpm, fpm and lcfppm will have a similar interpretation as in quartiles.
3. Deciles (D)
Deciles are the nine points, which divide the given ordered data into 10 equal parts.
For the grouped data, the computations of the 9 deciles can be done as follows:
Calculate kn/10 and search for the minimum lcf which is ≥kn/10 k=1,2,3,……9
The class corresponding to this lcf is called the kth percentile class. This is the class where Dk lies.
The unique value of the kth percentile (pk0 is then calculated by the formula
𝑘𝑛
( −𝑙𝑐𝑓𝑝)𝑤
DK = lcb𝑑𝑘 + 10
k=1, 2, 3,….9
𝑓𝑑𝑘
Where: lcbdk ,fdk and lcfpdk will have a similar interpretation as quartiles and percentiles.
Note that: median=Q2=D5=P50 and D1, D2, D3,…, D9 correspond to P10,P20,P30,……P90
Q1,Q2,Q3 corresponding to P25,P50,P75
Exercise for the following FD data, find
a) Q1, Q2,Q3 b) P25,P30,P50,P75 c) D1,D2,D3 and D5
interval 21-22 23-24 25-26 27-28 29-30
f 10 22 20 14 14
CHAPTER FOUR
34
Measures of central tendency locate the center of the distribution. But they do not tell how individual
observation are scattered on ether side of the center. The spread of the observations around the center is known
as dispersion or variability.
- Small dispersion indicates high uniformity of the observation while larger dispersion indicates less
uniformity.
Objective of measure of dispersion
- To measure reliability of the average being used.
- To control variation in a product.
- To compare variability among two or more groups.
Measure of dispersion can be classified as absolute and relative form.
Absolute measures of dispersion- are expressed in concrete units. That is units in terms of which the data have
been expressed.
Example: centimeters, kilograms, etc.
Relative measure of dispersion: is a quotient obtained by dividing the absolute measure by quantity in respect to
which absolute deviation has been computed. It is a pure number and is usually expressed in a percentage form.
Relative measures are used for making comparisons b/n two or more distributions.
35
Range is a measure of absolute dispersion and as such can not be used for comparing variability of two
distributions expressed in different units. Measurements made in Kg are not comparable with dispersion
measured in centimeters.
The solution is to use relative range or any other relative measure of variation.
range
Relative= 𝑋
𝐿 +𝑋𝑆
Example: consider weekly earning of workers in two laboratories of the same type.
range 15
R.R for laboratory B = 𝑋 = 32+17 = 0.306
𝐿 +𝑋𝑆
36
3. It can not be tale into account the entire set if data.
4. It does not tell anything about the distribution of values in series relative to measure of central tendency.
Inter-quartile range= (Q3-Q1) = the difference between the third and the first quartile. The larger the inter
quartile rang the larger the variability. It is not affected by exterm value. It is a good indicator of the absolute
variability.
Quartile deviation (semi-inter-quartile range): is defined as half of the inter quartile range.
Quartile deviation= ½ (Q3-Q1)
Coefficient of quartile deviation – this is a relative measure of variation.
𝑄 −𝑄
Relative measure = 𝑄3+𝑄1
1 3
If quartile deviation is to be used for comparing the variability of two series, then it is necessary to convert the
absolute measure to a coefficient of quartile deviation.
Characteristic of quartile deviation
1. The size of the quartile deviation gives an indication about uniformity. If Q.D is small, it denotes large
uniformity. Thus a coefficient of quartile deviation is used for comparing uniformity or variation in
different distributions.
2. Quartile deviation is not a measure of dispersion in the sense that it does not show the scatter around on
average, but only a distance on scale. Consequently quartile devation is regarded as a measure of
partition.
3. It can be computed when the distribution has open-ended classes.
E.g.: The weight of a sample of six students from a class (in Kg) is given below
53, 56, 57, 59, 63, and 66
i. What is the mean deviation from the mean?
ii. What is the mean deviation from the median?
53+56+⋯+66
Solution: 𝑋̅= = 59 𝑋̃ = 58
6
∑𝑛 ̅
𝑖=1|𝑋𝑖 − 𝑋 | ∑|53− 59|+|56− 59|+⋯+|66− 59|
𝑀𝐷𝑥̅ = = = 3.67
𝑛 6
∑𝑛 ̃
𝑖=1|𝑋𝑖 − 𝑋 | ∑|53− 58|+|56− 58|+⋯+|66− 58|
𝑀𝐷𝑥̃ = = = 3.67
𝑛 6
38
∑𝑛 ̅
𝑖=1 𝑓𝑖 |𝑋𝑖 − 𝑋 |
𝑀𝐷𝑥̅ = ∑ 𝑓𝑖
∑𝑛 ̃
} For ungrouped frequency distribution
𝑖=1 𝑓𝑖 |𝑋𝑖 − 𝑋 |
𝑀𝐷𝑥̃ = ∑ 𝑓𝑖
∑𝑛 ̅
𝑖=1 𝑓𝑖 |𝑚𝑖 − 𝑋|
𝑀𝐷𝑥̅ = ∑ 𝑓𝑖
∑𝑛 ̃
} For grouped frequency distribution
𝑖=1 𝑓𝑖 |𝑚𝑖 − 𝑋|
𝑀𝐷𝑥̃ = ∑ 𝑓𝑖
The coefficient of mean deviation from the mean and from the median are given
𝑀𝐷𝑥̅
Coefficient of 𝑀𝐷𝑥̅ = 𝑋̅
𝑀𝐷𝑥̃
Coefficient of 𝑀𝐷𝑥̃ = 𝑋̃
E.g. Find the coefficient of mean deviation from the mean and from the median for the weight of six students in
previous example.
𝑀𝐷𝑥̅ 3.6 𝑘𝑔
Solution: Coefficient of 𝑀𝐷𝑥̅ = = = 0.0622
𝑋̅ 59 𝑘𝑔
𝑀𝐷𝑥̃ 3.67 𝑘𝑔
Coefficient of 𝑀𝐷𝑥̃ = = = 0.0633
𝑋̃ 58 𝑘𝑔
The variance is a measure of dispersion. It tells us something about the scatter of scores around the mean. The
variance use the distance of our values from their mean. If the values are grouped near to the mean the variance
will be little. Usually the variance is not accompanied with the measure scale, if it would be the case it would be
the square of the unit of measure. It is defined as the mean squared deviation from the mean, and symbolized by
a small sigma squared - 𝛿 2 Its formula is: Let x1, x2… xN be the values of the observations of size N, then
(𝑥𝑖 −𝜇)2 𝑥𝑖 2 𝑥 2
𝛿 2 = ∑𝑁
𝑖=1 , 𝑖 = 1,2, … , 𝑁 or 𝛿 2 = ∑𝑁
𝑖=1 − (∑𝑁 𝑖
𝑖=1 𝑁 )
𝑁 𝑁
And the standard deviation is positive square root of the variance of the given observation. And it is
given by
(𝑥𝑖 −𝜇)2
𝛿 = √∑𝑁
𝑖=1 , 𝑖 = 1,2, … , 𝑁
𝑁
➢ If the data are given in the form of frequency distribution in which the variate value 𝑥𝑖 has its corresponding
frequency 𝑓𝑖 (i=1,2,3,…,k) then the population variance is given by
39
𝑓𝑖 (𝑥𝑖 −𝜇)2
𝛿 2 = ∑𝑘𝑖=1 , where N is total number of observation
𝑁
ii. The sample variance of the set of x1,x2,x3,…,xn of n observation is denoted by s2 and is computed by
2
(𝑥𝑖 −𝑥̅ )2 𝑥2 (∑𝑛
𝑖=1 𝑥𝑖 )
𝑠 2 = ∑𝑛𝑖=1 or s 2 = ∑𝑛𝑖=1 𝑛−1
𝑖
−
𝑛−1 𝑛(𝑛−1)
(𝑥𝑖 −𝑥̅ )2
𝑠 = √∑𝑛𝑖=1 For sample standard deviation
𝑛
➢ If the observation 𝑥𝑖 occurs 𝑓𝑖 times for 𝑖 = 1,2,3, … , 𝑘, then the sample variance is computed by:
𝑓𝑖 (𝑥𝑖 −𝑥̅ )2
𝑠 2 = ∑𝑘𝑖=1 , where n is the total number of observation
𝑛−1
➢ To compute variance and standard deviation for grouped data we can use the same formula in the above
way but in this case 𝑥𝑖 ’s will be the class mark of the distribution.
E.g. compute the variance and standard deviation for the following data
𝑥𝑖 : 3 6 5 3 4 3
E.g. find the sample variance and standard deviation of the data gen below
Class frequency
1–5 4
Properties of variance and standard deviation
6 - 10 1
1. If a 11 - 15 2 constant value is added or subtracted from each observation, the variance
and 16 – 20 3 standard deviation is remain the same.
2. If constant k multiplies each value in a given data set, then the new
variance and standard deviation will be obtained by multiplying the original variance and standard deviation
by k2 and k respectively.
3. If each value of a distribution is divided by constant k, then the new variance and standard deviation will be
obtained by dividing the original variance and standard deviation by k2 and k respectively.
40
The pooled or combined variance:- which is when we want to have combined of many variances from different
k distribution where each of them have their individual variance, then it is given by:
Where 𝑑𝑘 = 𝑥̅𝑘 − 𝑥̅𝑐 and 𝑥̅𝑐 is a combined mean for the given group
E.g. the mean and variance of scores earned by two groups on computation yielded the following results, find
the pooled variance.
n1 = 12 𝑥̅1 = 5.5 S12 = 25.5
n2 = 15 𝑥̅2 = 8.5 S22 = 64.2
The coefficient of variation is the relative measure of variation. It is a pure number independent of units of
measurement and thus is suitable for comparing the variability, homogeneity or uniformity of two or more
distributions.
The coefficient of variation is also a useful measure to compare the variability of two or more distributions
measured in the same units but their means are unequal. The formula is given by:-
𝛿 𝑆
𝐶𝑉 = 𝜇 × 100% 𝑜𝑟 𝐶𝑉 = 𝑥̅ × 100%
A set of observation with les CV is considered more consistent or stable, also the larger the CV, the greater the
variability in the set of data.
E.g. two workers on the same job shows the following result over a long period of time
A B
41
CHAPTER FIVE
5. Elementary Probability
42
Introduction
Defn. Probability (p):- is a numerical description of chance occurrence of a given phenomena under certain
condition. It is used to measure the degree of certainty.
Definition of some probability terms
Random experiment:- is a process that leads to well defined results called outcomes.
Example: tossing a coin two times and observing the number of heads appearing on a top.
An outcome: is the result of a single trial of a random experiment.
Example: when a coin is tossed, there are two outcomes.ie H &T
Sample space (s): -is a set of all possible out comes of a random experiment.
Example: rolling a die s= (1, 2, 3….6) s= (no of outcome) n
Find the sample space for the gender of the children if a family has 3 children .use b for boy and g for girls.
Solution: n=3, no of outcome=2 i. e b or g s= (no of outcome)n =23=8
S= (BBB, BBG, BGB, GBB, GGG, GGB, GBG, BGG)
Events: - a subset of sample space and it consists of one or more outcomes of a random experiment.
Example: getting an odd numbers in rolling a die.
Solution; Let A is an event of getting odd numbers. A= (1, 3, 5)
Complement of an event:- is a set of outcomes in the sample space that are not included in the outcome of an
event. The complement of E is denoted by E’.
Example: a) find the complement of an event of getting 4 in rolling a die.
Solution: let B IS an event of getting 4 in a rolling of a die.
B=4 ; B’= {1, 2, 3, 5 or6}
NB: B+B’=S
b) If tossing two coins and getting all heads.
Soln. let A be an event of getting all heads in tossing two coins.
A= {HH}, A’= {at least one tail} = {HT, TH, TT} b/c AUA’=S
Mutually exclusive events:- if two events cannot occur at the same time (i.e. they have no outcome in
common).
Example: The event of getting a 4 and getting a 6 when a single card is drawn from a deck are mutually
exclusive events. Since a single card cannot be both 4 and 6.
-the event of getting a 4 and a heart in a single draw are not mutually exclusive.
Equally likely events: - events that have the same probability of occurring.
Example: when a single die is rolled, each outcome has the same probability (p) of 1/6.
43
Independent events: - if two events A and B are independent, then the occurrence of A does not affect the
occurrence of A does not affect the occurrence of B.
Example: Rolling a die and getting a 6, and then rolling a second die and getting a 3.
Drawing a card from a deck and getting a queen, replacing it, and drawing a second card and getting a queen.
Dependent events: - when the occurrence of the 1st event affects the occurrence of the second event.
Example: Drawing a card from a deck, not replacing it, and then drawing a second card.
Principles of Counting
1. Addition Principle: if a task can be accomplished by k distinct procedures where the ith procedures has ni
alternatives ,the total number of ways of accomplishing the task equals
n1+n2+…………..+nk
Example1: there are two transportation means from city A to city B, either using bus transportation or train
transportation. There are 3 buses and 2 trains .how many ways of transportation is there from city A to city B?
Example2: suppose one wants to purchase a certain commodity and this commodity is on sale in 5 government
owned shops, 6 public shops and 10 private shops. How many alternatives are there for the person to purchase
this commodity?
2. Multiplication Principle
Rule1: if a sequence of n events in which the first one has k1possiplites, the second events has k2, the third
event has k3,and so forth, the total possibilities will be:
k1.k2…….kn
Example1: a paint manufacturer wishes to manufacture several different paints. the categories include 3 types of
colors (i.e. red, white, blue),two types of type(i.e. latex and oil) and two types of use(i.e. outdoor& indoor).how
many different one color, one type and one use?
Example2: a nurse has 3 patients to visit. How many different ways can she make her rounds if she visit each
patients only one?
Rule2: if each event in the sequence of n events has k different possibilities then, the total number of
possibilities of the sequence will be
k.k.k.k…..k=kn
example1: the digits 0,1,2,3 and 4 are to be used in a 4-digit ID card. How many d/t cards are possible if (a)
repetitions are permitted? B) if repetitions not permitted?
Soln. a. 5c1 5c1 5c1 5c1 = 5*5*5*5=54 =625 cards
b. 5c1 4c1 3c1 2c1 = 120 cards
44
Example 3: (a) an urn contains 4 balls whose colors are red, blue, black and white. A ball is selected, its color
is noted, and it is replaced, then a 2nd ball is selected, and its color is noted. How many color schemes are
possible?
(b) if the 1st ball is not replaced. How many different outcomes are there?
Soln a). 4*4=16=4c1*4c1
b). 4*3=12=4c1*3c1urn
3. Permutations
Definition: permutation is an arrangement of n distinct in a specific order.
n!
Permutation Rule 1:-the arrangement of n objects taken r objects at a time .it is written as. npr = (n−r)!
Example1:-in how many ways can the letters A, B and C be arranged taken to at a time.
n! 3! 3×2×1
nPr=(n−r)! = 3P2 = (3−2)! = =6 i.e. ab, ac, ba, bc, ca, cb
1!
Example2:-in how many ways can a laboratory technician mount 10 specimens on 4 microscopes?
n! 10! 10×9×8×7×6!
nPr=(n−r)! = 10P4 = (10−4)! = = 5040 ways
6!
Permutation Rule 2:-the number of permutations of n distinct objects taken all together is n! or nPn.
Note:- n×n-1×n-2×……3×2×1
0! =1and1! =1
Example:-in how many can a student arrange his/her 6 different books on a shelf?
Permutation Rule 3:-the number of permutation of n distinct objects can be arranged in a circle is (n-1)! Ways.
Note-subtracting 1 is used for starting point
Example:-consider arranging the letters ABC on a circle ways (3-1)! =2ways
Permutation Rule 4:-the number of permutation of n objects in which k1 are alike, k2 are alike, etc is
n!
k1!×k2!×……×kp!
Where, k1+K2+K3+……. +KP=n
Example:-how many different permutations of n objects can be made from the letters in the word MISSISSIPPI
Sol/n the no of M=1, I=4, S=4, & P=2
11!
= 34650 d/t arrangements.
1!4!4!2!
4. COMBINATIONS
Combination is a selection of distinct objects without regard to order.
Combination is used when the order of arrangement is not important, as in the selection process.
n 𝑛!
The number of combinations of r objects selected from n objects is denoted by C r =𝑟!(𝑛−𝑟)!
Example: given the letters A, B, C & D list the permutation & combination for selecting two letters.
Permutation: AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC, 4P2=12
45
Combination: AB, AC, AD, BC, BD, CD, 4C2=6
Example: in an English class the students are given the choice of 8 d/t essay topics. In how many ways can 4
students can choice a topic?
a. if no 2 students may choose the same topic. 8p4=1680 ways
b. if there is no restriction on the choice of the topics 84 ways of choosing topics
a committee of 5 people must be selected from 5 men and 8 women. How many ways can selection be
Example: Done if there are at least 3 women on the committee?
Sol/n
The committee can consists of 3 women and 2 men or 4 women and 1 men or 5 women’s.
8c3* 5c2 + 8c4*5c1+8c5*5c0=966
Example: when a single die is rolled, what is the probability of getting a number less than 5
Sol/n let A= the event that getting numbers less tan 5 in rolling a die
S= sample space= {1, 2, …, 6} A={1, 2, 3, 4}
𝑛(𝐴) 4 2
P (A) = 𝑛(𝑆) =6=3
Example 1: a box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of these candles are
selected at random without replacement, what is the probability?
a) All will be defective?
b) 6 will be non defective?
c) All will be non defective?
46
2. The frequents Approach (Empirical probability):
This approach to probability is based on relative frequencies.
Definition: suppose we repeat a certain experiment n times and let A be an event of the experiment and let k be
the number of times that event A occurs.
Therefore the probability of the event A happening in the long run is given by:
In a given frequency distribution, the probability of an event (E) being in a given class is p (E) =
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑠𝑡 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠
𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑖𝑛𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
Example 2: The national center for health statistics reported that of every 539 deaths in recent years, 24
resulted that from automobile accident, 182 from cancer, and 323 from other disease. What is the probability
that particular death is due to an automobile accident?
3. Axiomatic approach
Let E be experiment and S be a sample space associated with E- with each event A areal number called the
probability of A satisfies the following properties called axioms of probability or postulates of probability.
1. P(A)> 0
2. P(S)= 1, S is the sure event
3. If A and B are mutually exclusive events, the probability that one or the other occur equals the sum of
the two probabilities. i.e p(AUB)=P(A)+P(B)
4. P(A’)=1-P(A)
5. 0<P(A)<1
6. P(ϕ)=0, ϕ is impossible event
Example 3: A fair die is thrown twice. Calculate the probability that the sum of spotson the face of the die that
turn up is divisible by 2 or3.
47
𝑃(𝐴𝑛𝐵)
P(A/B)= , if p(B) ≠0
𝑃 (𝐵)
𝑃(𝐴𝑛𝐵)
P(B/A)= , if p(A) ≠0
𝑃 (𝐴)
P (AnB)= p(A)p(B/A)=p(B)p(A/B)
48
Chapter 6: Probability Distribution
6.1. Introduction
This unit introduces the concept of a probability distribution, and to show how the various basic probability
distributions (Binomial, Poisson, and Normal) are constructed.
All these probability distributions have immensely useful applications and explain a wide variety of real life
situations which call for computation of desired probabilities.
49
Specifically X(HHH)=3, X(HHT)=X(HTH)=X(THH)=2, X(HTT)=X(THT)=X(TTH)=1, X(TTT)=0
Discrete Random variable – let x be a r.v. If the number of possible values of x is finite or countable infinite,
we call x a discrete r.v
- The possible values of x can be listed as x1, x2, x3… xn
- Let x be discrete r.v with each possible outcome x, we associate a number P (xi) = P (X=xi) called the
probability of x. The numbers P (xi) must satisfy the following
Requirements for probability distribution
1. The sum of the probabilities of all the events in the sample space must be equal
to 1.i.e ∑ 𝑃(𝑥) = 1
2. The probability of each event in the sample space must be between or equal to
zero and one (0&1), i.e. 0≤ P(xi) ≤ 1.
Continuous Random variable – x is continuous if it assume all values in some interval (c, d) where c, d ε R
and there exist a function f, called the probability density function (pdf) of x satisfying the following conditions.
a. f(x)≥0.∀x
∞
b. ∫−∞ f(x)dx =1
c. For any a and b with -∞<a<b<∞, we have
b
P (a<x<b) = ∫a f(x)dx
Remark: a. P(x=a)=0
b. P(a < x < b)= P(a ≤ x ≤ b)= P(a < x ≤ b)= P(a ≤ x < b)
a. x 0 1 2 3
p(x) ¼ ¼ ¼ ¼
x 0 1 2 3
50
b. P(x) -1 ½ ¼ ¼
c. x 1 2 3 4
P(x) ¼ ¼ ½ ¼
Exercise: construct a probability distribution for the number of girls a family with two
children has.
Defn: - let x be a discrete r.v with possible values x1, x2, x3… xn … with probability P(x1), P(x2) … P (xn) …
respectively. Then the expected value of x or the mean value of x denoted by E(x) or μ x respectively is defined
as
μx =E(x) =∑ni=1 xi p(xi)
- If x assume finite number of values
μx =E(x) =∑ni=1 xi p(xi)
- If all outcomes are equally likely
E(x) =1/n (∑ xi)
E.g.4: In a family with two children, Find the mean of the number of children who will be girls.
E.g.5: One thousand tickets are sold at $1 each for a color television valued at $ 350. What is the expected
value of the girls if a person purchases one ticket?
51
- The standard deviation is the square root of the variance √var(x)
E.g.8: In a certain developing country 30% of the children are under nourished. In a random sample of 25
children from a country, what is the probability that the number of under nourished will be
a) Exactly 10?
b) Less than 5?
c) 5 or more?
d) Between 3 or 5 inclusive?
e) Less than 7, but more than 4?
E.g.9: Suppose that number female and male students in this class is 50 and 50 respectively.
Suppose a random sample of 5 students are taken from this class, what is the probability that
a. All of them are male students?
b. 2 of them are female students?
c. At least 2 of them are female students?
The mean, variance and standard deviation of a binomial distribution can be found by using the following
formulas
Mean = μ = n.P
Variance = σ2 = n.P.q
s.d = σ =√n. P. q
E.g.: A die is rolled 480 times. Find the mean, variance and s.d of the number 2s that will be rolled.
Solution: This is binomial situation, where getting 2s is success and not getting 2s is failure,
n=480 P=1/6 q=5/6
μ= n.P = 480x1/6 = 80
σ2 = n.P.q = 480x1/6x5/6 = 66.7
σ=√n. P. q = √66.7 = 8.2
On average, there will be eighty 2s with a s.d of 8.2
53
2. The Poisson Distributions
Are used to model situations where the random variable x is the number of occurrences of a particular event
over a given period of time (space). Together with this property, the following conditions must also be fulfilled.
• Events are independent of each other
• Events occur singly
• Events occur at a constant rate ( in other words for a given time interval the mean number of
occurrences is proportional to the length of the interval)
The poison distribution is used as a distribution of rare events such as
➢ Number of telephone calls made to a switch board in a given minute.
➢ Number of misprints within a page
➢ Number of bacteria per slide
➢ Number of road accidents on a particular motorway in one day
➢ Number of natural hazards per year. etc. have a poison distribution
The processes that given rise to such events are called poison processes.
The probability that the number of occurrences, x, over a given period of time is equal to k is
𝑒 −𝜆 𝜆𝑘
P(x=k) = , k=0, 1, 2 … e=2.7182818…
𝑘!
Where λ is the average or mean number of events that occur in that set interval
X~ poi(λ)
The mean and variance of a poison distribution is given by
μ=λ
σ2 =λ
E.g.10: If 1.6 accidents can be expected on a particular motorway on any given day. What is the probability that
there will be 3 accidents on any given day?
E.g.11: If a bank receives on the average 6 bad checks per day. What is the probability that it will receive 4 bad
checks on any given day?
E.g.12: Suppose that bank customers arrive randomly and independently on a week day afternoons at an
average of 3.2 customers every 4 minutes. What is the probability that
a. Exactly 2 customers arrive in a 4 minutes inter on a week day afternoon?
b. Almost 1 customer arrives in one minute’s interval?
c. Two customers will arrive in 8 minutes interval?
54
d. One or more customers will arrive in 12 minutes
E.g.13: On a particular production lime, the probability that an item is defective is 0.01 using a suitable
approximation; find the probability that in a batch of 200 items.
a. There are no defective item
b. There are exactly 5 defective items.
6.4. Common continuous distributions
Normal distribution: is a continuous, symmetric, bell shaped distribution of a variable.
55
The probability distribution of a normal distribution with mean μ and variance σ2 is given by
−(x−μ)2⁄
e 2σ2
f(x) = , -∞<x<∞, -∞<μ<∞, 0<σ2 <∞
σ√2π
All normally distributed variables can be transformed into the standard normally distributed variable by using
the formula for the standard score:
value − mean X−μ
Z= or Z =
s.d σ
The probability of any value x lies between two values a and b is given by the area under the standard normal
distribution.
Procedure to find the area under the standard normal distribution curve
1. Between 0 and any Z value: look up the Z 2. In any tail:
value in the table to get the area. a. Look up the Z value in the table to get the
area.
b. Subtract the area from 0.5
56
6. To the right of any z value, where z is less
3. Between two z values on the same side of than the mean:
the mean: a. Look up both Z value to get the area.
a. Look up both Z value to get the area. b. Add 0.5 to the areas.
b. Subtract the smaller area from the larger
area.
Procedure
5. To the left of any z value, where z is greater 1. Draw the picture.
than the mean: 2. Shade the area desired.
a. Look up both Z value to get the area. 3. Find the correct figure.
b. Add 0.5 to the areas. 4. Follow the direction.
Note: The table gives the areas between 0 and any z
value to the right of 0, and all areas are positive.
- Then calculating the value of Z using
𝑥−𝜇
Z= , i.e Z ~ N (0, 1), 𝜇 = 0, 𝜎 = 1
𝜎
- Given a normally distributed r.v x with mean 𝜇 and standard deviation 𝜎. The probability of any value x lies
between two values a and b is given by
𝑎−𝜇 𝑥−𝜇 𝑏−𝜇
P (a<x<b) = p ( < < )
𝜎 𝜎 𝜎
𝑎−𝜇 𝑏−𝜇
=p( <Z< )
𝜎 𝜎
57
E.g.14: Find the area under the standard normal E.g.15: Find the area under the standard normal
distribution which lies. curve which lies
a. between Z=0 & Z=0.96 a. Between Z=-0.67 & Z=0.75
P (0<Z<0.96) =?
P (-0.67 <Z<0.75) =?
P (0<Z<0.96) =0.3315 =P (-0.67<Z<0) + P (0<Z<0.75)
b. Between Z= -1.45 & Z=0 =P (0<Z<0.67) +P (0<Z<0.75) since P (-
0.67<Z<0) =P (0<Z<0.67) B/c they are symmetric
=0.2486 + 0.2734
=0.522
P (-1.45<Z<0) =?
P (-1.45<Z<0) =P (0<Z<1.45) Because of b. Between z=2.13 and z=2.94
symmetric
=0.4265
58
E.g.17: A random variable x has a normal c. Greater than 76.4?
distribution with mean 80 and standard deviation d. Between 81.2 and 86.0?
4.8. What is the probability that it will take a value?
a. Less than 87.2?
b.
E.g.18: A normal distribution has mean 62.4, find its standard deviation if 20.05% of the area under the normal
curve lies to the right of 72.9
E.g.19: A random variable has a normal distribution with standard deviation 5. Find it’s mean if the probability
that the random variable will assume a value less than 52.5 is 0.6915.
59
Jimma University Department of Statistics
CHAPTER SEVEN
Sampling in statistics is common and important. Some of the major reasons why sampling is necessary are: -
1. Sampling saves money, labor and time. The cost of the obtaining information through a sample would
be a lot less than obtaining it through a census.
2. Sampling is the only option for the study in some specialized area. Highly trained personnel and
specialized equipment are needed in medical sciences. Observation or experimentation could be
destructive in nature in quality control like testing the average duration of bulbs and testing the quality
of wine, beer, and etc. In such areas Sampling is the only feasible option for the study.
3. If the population is too large (infinite) to cover sampling is the only way for the study.
60
Abiyot Negash
Jimma University Department of Statistics
A sample is selected in such a way that each item or person in the population being studied has a known non-
zero likelihood of being included in the sample. The main advantage of probability sampling is that one get
estimates that are unbiased and having a measurable precision. There are four method of probability sampling.
i. Simple random sampling
ii. Stratified sampling
iii. Cluster sampling
iv. Systematic sampling
Simple random sampling:-is the sampling procedure in which each item or person in the population has the
same chance of being included in the sample .The selection of the elements may be done using lottery method
or random number table.
Systematic sampling:-is the sampling procedure which assumes numbering each subject of the population and
selecting every Kth element. If we N units in the population that are numbered 1 to N, the sampling procedure is
explained as follows. To select a sample of n units, we take a unit at random from the first K units and every K th
unit thereafter. For instance, if K is 10 and if the first unit drawn is number 8, the subsequent unit are numbers
8, 18, 28, 38, 48, 58 and soon. The constant K is usually approximated by N , where N is the number of the
n
population and n is the number of sample size.
Example:-Let N=150 and n=10, where N and n are the population and the sample size respectively, the K can
B. non-probability sampling
It is a procedure that considered only convinces and personal judgments or not all element have a chance of
being included in the sample .The element are selected based on the subjective knowledge of the researcher
about the element. The most common types of non-probability sampling are convenience, judgmental and quota
sampling.
61
Abiyot Negash
Jimma University Department of Statistics
Convenience sampling: - member of the population are chosen based on their relative cases of access.
Example: - to sample friends
- co-workers
Judgmental sampling or purposive sampling:-the researcher chooses the sample based on who they think
would be appropriate the study .this is used primarily when there is a limited number of people that have
expertise in the area being researched.
Quota sampling:-a quota is established (say 30% men) and researchers are free to choose any respondent they
wish as long as the quota is met.
Before we give the definition of sampling distribution of mean lets give the definition of these terms.
Sampling with replacement: - it is the process of selecting items one by one replacing the already selected
item before next selection. If we have a population of size N sample of size n then, there are 𝑁 𝑛 possible
samples of size n.
Sampling without replacement:-It is a process of selecting items one by without replacing the selected items.
If we have a population size N and sample size n we have N combination of n (NCn) different samples of size n
from the population of size of N.
Sampling distribution of sample mean is a probability distribution of all possible sample means obtained from
samples and sample size from the same population.
(µ) .That is x
=µ
x
= ……….. If this sampling with replacement.
n
N −n
= …… this is sampling without replacement
x
n N −1
Example-consider population size N =3 consisting 1, 2, 6 for a sample size of n=2 with replacement. Find
a. sampling distribution of sample mean
62
Abiyot Negash
Jimma University Department of Statistics
b. the mean of the population
c. the standard deviation of the population
d. the mean of the sampling distribution of sample means
e. the standard deviation of sampling distribution of sample means
f. What can you suggest about a and d as well c and e.
Example:-in a class of 200 students 130 are male .find the proportion of female students
Solution;
Let x = number of female=n-130=70
n = Number of student in a class=200
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑒𝑚𝑎𝑙𝑒 𝑥 70
P̂ = = = = 0.35
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑛 200
=
pˆ i ………..Sampling with replacement
ˆ
p n
N
B. The standard deviation of sampling distribution of sample proportion with replacement
Q
pˆ
=
n Where, = population proportion, Q = 1 − .
Q N − n
pˆ
=
n
N −1
…….This is sampling without replacement
63
Abiyot Negash
Jimma University Department of Statistics
Example:-suppose that an urn contains three balls of which two red and the rest black. If we take a random
sample of two balls, with replacement .Find
a. The sampling distribution of sample proportion
b. The proportion of red ball
c. The mean of sampling distribution of sample proportion of red ball
d. The standard deviation of sampling distribution of sample proportion
If a random sample of size n is selected from population of size N having a mean and variance , then when
2
𝑃𝑄⁄
goes to infinity, the distribution of p will be with mean p and variance Q . p̂ ~ 𝑁 (𝑃, 𝑛)
n
pˆ −
The standard normal variable of sample proportion is calculated by = Q
n
Example:-40% of students in college are business majors’. A random sample of 90 students was selected.
a. Calculate the mean and standard error of sample proportion of business majors
b. What is the probability that the sample proportion of business major will be greater than 0.35
Solution a) Given n = 90
= 40% = 0.4 Q = 1 − = 1 − 0.4 = 0.6
Since the population proportion and the mean of sampling distribution of sample proportions are equal. So
Q 0.4 0.6
ˆ
p
=
n
=
90
= 0.052
( pˆ − ) 0.35 −
b) ( pˆ 0.35) = Q = 0.35 − 0.4 = ( −0.96 )
n 0.052 0.05
If a random sample of size n is selected from population of N having mean µ and variance , then, if n goes
2
to infinity the distribution of sample mean will be normal with mean µ and variance that is
64
Abiyot Negash
Jimma University Department of Statistics
𝑥̅ −𝜇
Suppose that x assumes n values X1, X1, X3 … Xn. The ratio 𝑍 = 𝜎 is called a standard normal variable.
⁄ 𝑛
√
𝜎
⁄ 𝑛 = standard deviation of the 𝑥̅
√
Example:-The average age of a vehicle registered in U.S 8 years or 96 months .assume that the standard
deviation is 16 months. If the random sample of 36 car is selected .Find the probability that the mean of their
age is between 90 and 100.
Solution
x = 36 (90 x 100) = ?
= 16
= 96
90 − 96 X − 100 − 96
(90 x 100) = ( 16
16
) = (−2.25 1.5)
36 n 36
65
Abiyot Negash
Jimma University Department of Statistics
Statistical inference may be divided into two major areas:-estimation and hypothesis test.
What is estimation? Statistical estimation is a procedure of using a sample statistics to estimate a population
parameter. This procedure called an estimator. The particular value taken by the estimator is called an estimate.
Statistical estimation has two components:-point estimation and interval estimation.
One important problem of statistical inference is the estimation of unknown population parameters from the
corresponding sample statistics. Here, the parameter of interest is the population mean,µ ,which is to be
estimated. We take a simple random sample of size n and get observations x1,x2 ……xn. Then the quantity
statistics is
∑ n
xi
𝑋̅ = i=1 an estimator of the population mean µ. The sample mean 𝑋̅ is unbiased estimator since the mean of
n
For most part, the sample mean will be somewhat different from the population mean due to sampling error.
Therefore we can ask a question “how good is a point estimate?”The answer is that there is no way of knowing
how close the estimate is to the population mean. This answer places some doubt on the accuracy of point
estimates and to overcome such a problem we deal with interval estimates; i.e. we can accompany a point
estimate by an interval estimate.
An interval estimate of population mean is the range of values used to estimate the population mean. When an
interval estimate is made certain probability statement is done. The confidence interval for the mean is a
66
Abiyot Negash
Jimma University Department of Statistics
specific interval estimate of the population mean which is determine by using data obtained from a sample and a
specific confidence level of the estimate.
The confidence level of an interval estimate of µ is the probability that the interval estimate will contain the
parameter. That is, p (aL< µ<aU) =1-α, where 1-α is confidence level, aL and aU are lower and upper confidence
limits respectively. The interval aL< µ<aU computed from the selected sample is called a (1-α) 100%confidence
interval. When we try to give an interval estimate for µ we need to consider several conditions on the sample
size n and the population variance. That is, whether the sample size is large or small (n≥30 or n< 30), or where
the population variance is known or not known.
𝝈
Where, = Z𝛼/2 . That is,
√𝒏
𝝈 𝝈
µ=𝑋̅±Error ( ) then – ≤𝑋̅- µ ≤ , −Z𝛼/2 ≤𝑋̅- µ ≤Z𝛼/2
√𝒏 √𝒏
𝝈 𝝈
𝑋̅ − Z𝛼/2 ≤ µ ≤𝑋̅ +Z𝛼/2
√𝒏 √𝒏
𝝈
P {µ𝜖(𝑋̅ ±Z𝛼/2 )} =1-α, and (1-α) is known as the level of confidence.
√𝒏
𝝈 𝝈
𝑋̅ − Z𝛼/2 and 𝑋̅ +Z𝛼/2 are known as the lower and upper confidence limits respectively. The
√𝒏 √𝒏
𝝈 𝝈
interval (𝑋̅ − Z𝛼 , 𝑋̅ +Z𝛼/2 ) is called the (1-α) 100% confidence interval (Interval estimate) of µ.
2 √𝒏 √𝒏
The most commonly used α’s 0.1, 0.05 and 0.01so that (1-α) 100% gives the 90%, 95% and 99% interval
estimate, respectively.
67
Abiyot Negash
Jimma University Department of Statistics
P (0 ≤ Z ≤ Z0.025 ) =0.5-0.025=0.475 Z0.025 =1.96 𝑎𝑛𝑑 − Z0.025 =-1.96
Example 1: the president of a certain university wishes to estimate the average age of the students
currently enrolled. From past studies the standard deviation is known to be 2 years. A random sample of 50
students is selected and the mean is found to be 23.2 years. Find the 95%confidence interval for the population
mean.
95% confidence interval ⇒(1-α) =0.95, α=0.05, α/2=0.025 and Z𝛼/2 =±1.96
𝝈
𝑋̅ ± Z𝛼/2 = 𝑋̅ ± 1.96𝑥 2⁄ =23.2±0.55= (22.65, 23.75) ⇒26.65<µ<23.75
√𝒏 √50
Conclusion: we are 95% confident or sure that the true average age of the students in this university will be
contained between 22.65 years and 23.75 years.
Example 2: in a certain study, the sample mean 𝑋̅ =18.85, the sample size n=80 and standard deviation 𝜎 =
5.55.construct the 90% confidence interval for µ.
For 90% confidence interval for µ, we have (1-α) =0.9, α=0.1, α/2=0.05 andZ𝛼/2 =1.645
𝝈
Then 𝑋̅ ± Z𝛼/2 =18.85±1.02 = (17.83, 19.87) ⇒17.83< µ <19.87
√𝒏
We are 90% sure that the true but unknown population mean will be contained within the interval (17.83,
19.87).
Exercise: construct the 95% and 99% confidence interval for the above example and compare the results.
In general, one should note that as the level of confidence increase the interval gets wider and wider.
The (1-α) 100%confidence interval for µ, when 𝜎 is unknown and n≥30, is given by
68
Abiyot Negash
Jimma University Department of Statistics
𝒔
𝑋̅ ± Z𝛼/2 i.e., we substitute 𝜎 by s, sample standard deviation.
√𝒏
Example: random sample of 49 female shoppers showed that they spent an average of birr 23.45 per visit with
standard deviation s=2.80 birr.
a) Find the 90% confidence interval for the true average expenditure.
b) If an average of 18.6 minutes per visit with standard deviation of 5 minutes, find the 90% confidence
interval for the true mean time a female spends in grocery shopping.
a) Let µ1 be the true average expenditure, a female spends in grocery shopping per visit.
𝒔
The (1-α) 100% confidence interval for µ1 is given by 𝑋̅ ± Z𝛼/2 . For the 90% confidence interval,
√𝒏
𝒔
we have α=0.1, α/2=0.05,⟹ Z𝛼/2 =1.645 then𝑋̅ ± Z𝛼/2 =23.45±0.658=(22.79,24.11)
√𝒏
We are 90% sure (confidence) that true mean average expenditure of a female per visit will be contained
in the interval (22.79, 24.11).
b) Let the true mean time a female spends in grocery per visit be µ2.the respective point estimate is given
by 𝑋̅ =18.6 minutes. The 90% confidence interval will be
𝒔 𝟓
𝑋̅ ± z𝛼/2 =18.6±1.645 =18.6±1.175= (17.42, 19.78).
√𝒏 √𝟒𝟗
The true mean time will be contained within the interval (17.42, 19.78) minutes in 90% of the time.
When the sample size is less than 30 the central limit theorem does not apply. But we can make one basic
assumption. That is, the parent population is normal, which means we are sampling from a normal
population. If this assumption that is met, then the sampling distribution of the mean is normal, but the
𝑋̅−μ
quality 𝝈 will still have the standard normal distribution.
⁄ 𝒏
√
𝑋̅ −μ
If X~N (µ, 𝝈2) and 𝝈2is known and n<30, then Z=𝝈 ~N(0,1)
⁄ 𝒏
√
Example: the pulse rate of 12 patients increased on the average by 22.33 beats per minute .from previous
study it is known that 𝝈 for this population is 4.28.construct the 99% confidence interval for the mean.
𝛼
Solution: given: n=12<30, 𝝈 =4.28, 𝑋̅=22.33,α=0.01⟹ = 0.005,Z0.005=2.575
2
𝝈
µ= 𝑋̅ ± Z𝛼/2 =22.3±2.575x4.28⁄ =22.33±3.18⟹19.15< µ <25.51
√𝒏 √12
69
Abiyot Negash
Jimma University Department of Statistics
IV). for small sample case and when 𝝈 is unknown
𝑋̅ −μ
if n<30 and we are sampling from a normal population whose variance is unknown, then the quantity 𝒔 will
⁄ 𝒏
√
have a t-distribution with (n-1) degrees of freedom and the(1-α)100% confidence interval for the population
mean will be given by μ = 𝑋̅±tα/2,n-1𝒔⁄ .
√𝒏
Example: The IQ’S of 16 students from a certain class showed a mean of 107 with standard deviation of
10.construct the 90% confidence interval for the mean.
𝛼
Given: n=16, 𝑋̅=107,s=10,α=0.1⟹ 2 = 0.05, tα/2,n-1=t0.05,15=1.753
8.2 point and interval estimation of the proportion: large sample size
Some times the need would be to estimate the population or percentage. The sample proportion 𝑝̂ is a sample
statistic, and it possesses a sampling distribution. We know that for large samples:
✓ The sampling distribution of 𝑝̂ is approximately normal.
✓ The mean µ 𝑝̂ of the sampling distribution of 𝑝̂ is equal to the population proportion p.
✓ The standard deviation 𝜎𝑝̂ of the sampling distribution of the sample proportion 𝑝̂ is given
𝑝̂ q̂
as√ ⁄𝑛 , where q= 1-p.
The sample considered to be large if np and nq are both greater than 5. When estimating the value of a
population proportion, we don’t know the values of p and q, so we cannot compute 𝜎𝑝̂ . We use the values of s 𝑝̂
as an estimate of the 𝜎𝑝̂ , where s 𝑝̂ is calculated as
𝑝̂ q̂
s 𝑝̂ =√ ⁄𝑛 .the value of the sample proportion 𝑝̂ computed from a sample is a point estimate of the
population proportion p.
70
Abiyot Negash
Jimma University Department of Statistics
physicians said that deception is acceptable on rare occasions to benefit their patients. Find a 95% confidence
interval for the proportion of all physicians who hold this view.
Solution: let p is the population proportion and 𝑝̂ is the sample proportion.
Given: n=109, 𝑝̂ =0.87, q=1-p=1-0.87=0.13, confidence level is 95% or 0.95
𝑝̂ q̂
Calculate s 𝑝̂ =√ ⁄𝑛 = √(0.87𝑥0.13 ⁄ 109) = 0.032 .the confidence interval for p is
Example 2: a survey of voters was conducted and 52% said that they would prefer a candidate from party A’S.
Assuming that the sample size for this study was 1500, construct a 99% confidence interval for the proportion
of all voters who hold this view.
Solution: let p is proportion of all voters who prefer a candidate from party A’s and 𝑝̂ is the sample
proportion. From the given information,
𝑝̂ q̂
Calculate s 𝑝̂ =√ ⁄𝑛 =√0.52𝑥0.48⁄1500=0.013 the confidence interval for p is
𝜎
We have observed that =Z is the maximum error of estimate for µ. Suppose we predetermine the size of
√𝑛
the maximum error and want to determine the size of the sample that will yield this maximum error. Given
the confidence level and the standard deviation of the population, the sample size that will produce a
𝒁𝟐 𝝈 𝟐
predetermined maximum error of the confidence interval estimate of µ is n= . IF we do not know 𝜎 find
𝟐
the sample standard deviation s and substitute s for 𝜎 in the formula.
Example: a university dean wishes to estimate the average number of hours his part-time instructors teach per
week. The standard deviation from a previous study is 2.6 hours. How large a sample must be selected if he
wants to be 99% confident of finding whether the true mean differs from the sample mean by 1 hour?
71
Abiyot Negash
Jimma University Department of Statistics
𝒁𝟐 𝝈 𝟐 (𝟐.𝟓𝟖)𝟐 (𝟐.𝟔)𝟐
Solution: Z𝛼/2 =2.58, =1,𝜎 =2.6 n= = =44.997264=45
𝟐
𝟏𝟐
𝑝𝑞
The maximum error E of the interval estimation of the population proportion is = z𝜎 𝑝̂ =Z√ 𝑛 . given the
confidence level and the value of p and q,the sample size that will produce a predetermined maximum error of
𝒁𝟐 𝐩𝐪
the confidence interval estimate of p is n= . In most cases, the value of p and q are not known to us. In such
𝑬𝟐
✓ Use the most conservative estimate of the sample size n by using p=0.5 and q=0.5 since the
product of these two is greater than the product of any other pair of values for p and q.
✓ Use a preliminary sample and calculate 𝑝̂ and 𝑞̂ for this sample. Then we use these values of 𝑝̂
and 𝑞̂ to find.
Example: The EZ Company wants to estimate the proportion of defective items produced by a machine with
0.02 of the population proportion for a 95% confidence level. Suppose a preliminary sample of 200 items
showed that 7 percent of the items produced on this machine are defective, how large a sample should EZ
company select?
Solution: the value of z for 95% confidence level is 1.96,E=0.02, 𝑃̂=0.07, 𝑞̂=1-𝑃̂ 1-0.07=0.93
𝒁𝟐 𝐩
̂𝐪̂ =(1.96)2 (0.07)(0.93)
n= =625.22≈ 626
𝟐 (0.02)2
Introduction:
To establish and to investigate the relation ship between the sample measure and population characteristics
(parameter), we make use of hypothesis testing.
Definition: hypothesis testing is a rule or a procedure for determining whether or not an assertion or a statement
about a population parameter (in this case the mean) is true.
Suppose we have a certain sample taken from the population and the sample mean is ̅,
x we set up an
assertion that it came from a population with mean, μ.This implies that the discrepancy between x̅and μ is only
72
Abiyot Negash
Jimma University Department of Statistics
due to chance: i.e. in the long run, repeated sampling will produce data which will result in a mean discrepancy
between x̅ and μ of zero.
We can try to determine the probability of statistical probability of getting a discrepancy between x̅ and μ
as large as or larger than the actual one .this can be done from the knowledge of the sampling distribution ofx̅.
this probability is preferred to as the level of significance. Then we can conclude that either the assertion
(hypothesis) is true or the hypothesis is false.
Steps to perform hypothesis testing:
1. Write the original claim and identify whether it is the null hypothesis or the alternative hypothesis.
2. Write the null and alternative hypothesis .use the alternative hypothesis to identify the type of test.
3. Write down all information from the problem.
4. Compute the test statistic
5. Find the critical value using the tables
6. Make a decision to reject or fail to reject the null hypothesis. A picture showing the critical value and
test statistic may be useful.
7. Write the conclusion.
73
Abiyot Negash
Jimma University Department of Statistics
P {type I error} =p{rejecting HO when it is true}=α, where α is known as level of significance and 1- α is the
probability rejected.
P {type II error} =p {not rejecting HO when it is false} =β.
Probability value (p-value): the probability of getting the results if the null hypothesis is true. If this
probability is to small (smaller than the level of significance), then we reject the null hypothesis.
Types of tests concerning means:
a) Two –sided (two –tailed) tests:
HO: µ = µo H1: µ≠µo α
We want to accept HO when it is true with probability(1-α)
b) One- side Tests
I) HO: µ = µo H1: µ< µo α (left-tailed test)
ii) HO: µ = µo H1: µ> µo α (right-tailed test)
Generalization: In carrying out any test,
❖ We formulate HO so that p {Type I error} can be calculated.
❖ We formulate H1 in such a way that the rejection of HO is equivalent to the acceptance of H1.
❖ We specify α, the level of significance.
❖ We set a criterion for testing HO versos H1.
❖ We arrive at a decision (accept or reject HO) and conclusion.
with 𝜇 o given in HO. this statistics is to be compared with the critical value, which is going to depend on H1and
the level of significance.
i) Two-sided test :
HO: µ = µo H1: µ ≠ µo
74
Abiyot Negash
Jimma University Department of Statistics
x̅−μo
Zcal = 𝜎 , and Z critical =± Z𝛼/2 ⇒p (Z > Z𝛼/2 )=𝛼/2 ⇒{0 ≤Z≤ Z𝛼/2 }=0.5- 𝛼/2
⁄ 𝑛
√
Because of symmetry, p{Z<- Z𝛼/2 }= 𝛼/2. there for we find ± Z𝛼/2 from the table and then Zcal
is compared with ± Z𝛼/2 (Z critical). if Zcal< - Z𝛼/2 ,we reject Ho. if Zcal> Z𝛼/2 ,we reject Ho
If - Z𝛼/2 < Zcal < Z𝛼/2, we accept Ho. this means that we accept Ho if Zcal falls in the acceptance region; we
reject HO if Zcal falls in the rejection regions.
If Zcal=Zcritical, we reserve our judgment of accepting or rejecting HO: we have to increase the sample size in
order to come up with the conclusion of accepting or rejecting HO.
𝐱̅−μ𝐨
When n≥ 30 and is𝜎 unknown Z= 𝒔 ~N (0, 1)
⁄ 𝒏
√
There fore the calculated value of the test statistics will be compared with ±Z𝛼/2 - Z𝛼 or Z𝛼 depending on
whether we have two sided or one sided tests respectively.hwe follow the same procedure as given above in (a).
Examples: 1) according to norms established for a reading comprehension test, 8th grader should have an
average of 83.2 with standard deviation of 8.6.if 36 randomly selected students from Tepi School averaged
88.7,test the null hypothesis that µ=83.2 against µ>83.2 at α=0.01 and thus check the directress of claim that her
8th grade students are above average.
75
Abiyot Negash
Jimma University Department of Statistics
2) A shopkeeper believes that the average age of customers who purchase a certain brand of
jeans is 25 years of age. A random sample of 35 customers gave an average age of 24.6 years
with standard deviation of 1 year. Is the claim of the shopkeeper true at 5% level of
significance?
Given: n=35(large sample case) and 𝜎 is unknown in which case we substitute it by S=1, 𝑋̅ =
26.6 µ0=25 Years (claim)
𝑋̅ −𝜇 24.6−25
Zcal=s =1 =-2.366, and Zcritical=±Zα/2=±Z0.025 =±1.96
⁄ n ⁄
√ √35
Conclusion: the average age of the customers is different from 25 years; i.e., the shopkeeper’s
claim is not true at 5% level of significance.
We use standard normal distribution (Z) as long as the variable is normally distributed and 𝜎 is known, which is
similar to (a) above.
c) Small sample test(n<30):when 𝝈 is unknown:
When X~N (µ, 𝜎 2 ), n is small, 𝜎 2 is unknown in testing HO against any alternative, the calculated value of the
̅ −𝝁
𝑿
test statistic is Tcal=𝐬 ~tn-1 .here since we have estimated σ by s and there fore the degree of freedom will be
⁄ 𝐧
√
(n-1). Otherwise the critical values are ±tα/2,n-1 ; - tα,n-1 ; tα,n-1 depending on whether H1: µ ≠ µ0; H1: µ < µ0; H1:
µ > µ0;respectively.
Example: a job placement director claims that the average starting salary for statistics graduates is birr
24000(yearly).a random sample of 10 statistics graduates had a mean of birr 23450 and standard deviation of
birr 400.test HO:µ=birr 24000 versus the alternative H1:µ≠24000 at α=0.05.
76
Abiyot Negash
Jimma University Department of Statistics
Test statistic is t with n-1 d.f., since n<30 and since δ is unknown.
̅ −𝝁 =(23450−2400)
𝑿
tcal=𝐬 400⁄ =-4.345
⁄ 𝐧
√ √10
This section presents the procedure to test hypothesis about the population proportion p for large samples .the
procedure is similar in many respects to the one for the population mean (µ).the procedure includes the same
steps to that of test of mean .again the tailed can be two-tailed or one-tailed.
If the observations on various items or objects are categorized into two classes (binomial population) we often
want to test the hypothesis, whether the proportion of items in a particular class is Po or not. Thus for binomial
population .the hypothesis
Ho: P=Po versus H1: P≠ Po or H1: P>Po or H1: P<Po
The value of the test statistics Z for the sample proportion p̂ is computed as
̂ −p
p pq
Z= σp̂ ~N (0, 1) where 𝝈p̂ = √ n . The value of p is used in this formula is the one used in the null hypothesis
Solution: let p denote the population proportion o employees favoring the new bonus scheme.
̂ −p
p pq 0.6x0.4 55
Z= σp̂ where δp̂ =√ n =√ =0.04 and p̂=150=0.367
150
0.367−0.6
Z= =-5.825
0.04
77
Abiyot Negash
Jimma University Department of Statistics
Critical value is Zα/2=Z0.005=Z0.5-0.005=Z0.495=2.575
B B1 B2 . . . Bc Total
A
A1 n11 n12 . . . n1c n1.
A2 n21 n22 . . . n2c n2.
. . . . . . . .
. . . . . . . .
. . . . . . . .
Ar nr1 nr2 . . . nrc nr.
Total n.1 n.2 . . . n.c n
The following notation will be used
𝑛𝑖.
eij= 𝑛 x n.j is called the expected frequency of cell (i , j), i=1,2,…, r and j=1,2,3…, c the appropriate test
78
Abiyot Negash
Jimma University Department of Statistics
(nij−eij)2
χ2 = ∑ri=1 ∑cj=1 the value of χ2 becomes small if the discrepancy between nij and eij is
eij
small and it becomes large if the discrepancy is large to test the hypothesis at α level of significance, we
compute the χ2 value from sample observation and compare it with χ2 v (α) where v=(r-1)(c-1) degree
of freedom. And the hypothesis is given by
Ho: two variables are not associated each other.
H1: two variables are associated.
Then, we reject Ho at α level of significance if χ2 > χ2 v (α) otherwise we do not reject Ho.
Example: a researcher wishes to determine whether there is a relation between the gender of an
individual and the amount of alcohol consumed a sample of 68 people is selected and the following data are
obtained.
Alcohol consumption
Gender low moderate high Total
Male 10 9 8 27
Female 13 16 12 41
Total 23 25 20 68
At α=0.1 can the researcher conclude that alcohol consumption is related to gender.
Solution: state the hypothesis and identify the claim
Ho: the amount of alcohol that a person consumes is not associated with individual gender.
H1: The amount of alcohol that a person consumes is associated with individual gender
Critical value χ2 α (v) = χ2 0.1(2) = 4.605
Compute the test statistic. First compute the expected value.
27x23 27x25 27x20
e11= =9.13 e12= =9.93 e13= 7.94
68 28 68
41x23 41x25 41x20
e21= = 13.87 e22 = =15.07 e23= =12.06
68 68 68
(nij−eij)2 (10−9.13)2 (9−9.93)2 (12−12.06)2
χ2 = ∑ri=1 ∑cj=1 = + +……. + =0.283
eij 9.13 9.93 12.06
Make the decision: the decision is not reject the null hypothesis since χ2 =0.285< χ2 α (v) =4.6o5
Summarize the result: there is no enough evidence to support the claim that the amount of alcohol a
person consumes is associated with the individual’s gender.
79
Abiyot Negash
Jimma University Department of Statistics
CHAPTER NINE
SIMPLE LINEAR REGRESSION AND CORRELATION
9.1.Introduction To Regression
Regression analysis: - is statistical method that helps to formulate an algebraic relationship between two or
more variables in the form of an equation to estimate the value of the dependent variable which is continuous.
- The variable whose values is estimated using algebraic equation is called dependent (response) variable
and variable whose values is used as the basis for estimate is called independent (predicator) variable.
- The linear algebraic equation used for expressing a dependent variable in terms of independent variable
is called linear regression equation.
- The linear relationship between the dependent variable Y and an independent variable X which can be
expressed with respect to the population parameter β0 and β1 is given by the model
Y = β0 + β1 x+ ε
where β0 = the intercept or the average value of Y when X is 0.
β1 = the slope or the change in Y for a unit change in X, and
ε= the random or error term
The random term is introduced in the model because the relationship between the dependent and the
independent variable is not exact. In other words, there may be other factors that could affect Y in addition to X,
but we could not measure because of lack of literature or knowledge about those variable. Whatever the number
of independent variables we use in the model, we have to include the random term, as there still be other that we
could not control but could affect the dependent variable, as human knowledge about the nature or social
phenomenon is not exhaustive. Especially, in the field of social science controlling the behavior of human
beings is not possible. Thus, the inclusion of the random term in regression model is justified.
β0 and β1 can not be calculated easily. To solve this, samples are used to estimate, β̂0 to estimate β0 and β̂1 to
estimate β1, and the estimated regression line is
ŷ= β̂0 + β̂1x
Parameter β0 and β1 of simple linear regression model
The population regression equation Y = β0 + β1 x+ ε can be estimated by the sample regression line ŷ= β̂0 +
β̂1x Where ŷ – estimated average (mean) value of the dependent variable y for a given value of independent
variable x.
80
Abiyot Negash
Jimma University Department of Statistics
The coefficients β0 and β1 are estimated by β̂0 and β̂1 which are given by
n ∑ XY−(∑ X )(∑ Y )
β̂1 = ∑ 2 2
n X − (∑ X )
∑Y ∑X
β̂0 = n − β̂1 ( n ) = ̅
Y - β̂1̅
X
These values are obtained on the assumption that the error term ε has zero mean. The method used to obtain the
coefficients is called the method of ordinary least squares (OLS).
E.g. publisher wants to determine the relationship between annual % increase in advertizing expense (x)
and annual % increases in sales revenue (y) for different firms.
a. Fit the regression line.
b. Estimate the increase in sales revenue expected from an increase of 7.5% in advertizing.
c. Compute the correlation coefficient and interpret.
Firms A B C D E F G H
X 1 3 4 6 8 9 11 14
Y 1 2 2 4 6 8 8 9
Solution:
∑ xi = 56, ∑ yi = 40 ∑ XY = 373
81
Abiyot Negash
Jimma University Department of Statistics
∑ x 2 = 524 ∑ y 2 = 370 n= 8
̅ =∑ x = 56 = 7,
X ̅ = ∑ y = 40 = 5, then
Y
n 8 n 8
n ∑ XY−(∑ X )(∑ Y ) 373−(8)(7)(5) 373−280 93
a. β̂1 = ∑ 2 2 = 524−8(72 ) =524−392 = 132 = 0.7045
n X − (∑ X )
β̂0 = Y
̅ - β̂1X
̅ = 5 – 0.7045(7) = 0.07
⇒ Then the fitted model is
ŷ = 0.07 + 0.7045x
⇒when advertisement expense is zero, sales revenue is 7%.
b. When x = 7.5% = 0.075
ŷ = 0.07 + 0.7045(0.075)
= 0.1248 = 12.48%
⇒when we increase advertisement expense by 7.5% the revenue will increase by 12.48%.
̅Y
∑ XY− nX ̅
c. r= ̅ 2 ) (∑ y2 − nY
̅2
√(∑ x2 − nX )
373 – 8(7)(5) 373 –280 93
= = = = 0.62
√(524−8(72 ))(370−8(52 )) √(524−392)(370−200) √(22440
⇒ the correlation coefficient 0.62 implies that, there is strong linear relationship.
82
Abiyot Negash