0% found this document useful (0 votes)
30 views30 pages

UNIT 2 Stat I

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views30 pages

UNIT 2 Stat I

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT 2: DATA COLLECTION AND PRESENTATION

2.1. MEANING OF COLLECTION OF DATA

Collection of data implies a systematic and meaningful assembly of information


for the accomplishment of the objective of a statistical investigation. It refers to
the methods used in gathering the required information from the units under
investigation.

The quality of data greatly affects the final output of an investigation. Hence,
utmost care should be attached to the data collection process and every
possible precaution should be taken to ensure accuracy while collecting data.
Otherwise, with inaccurate and inadequate data, the whole analysis is likely to
be faulty and also the decisions to be taken will also be misleading.

2.2. PRIMARY AND SECONDARY DATA


2.2.1. MEANING AND DISTINCTION BETWEEN PRIMARY AND
SECONDARY DATA
Statistical data may be obtained either from primary or secondary source. A
primary source is a source from where first-hand information is gathered. On
the other hand, secondary source is the one that makes data available, which
were collected by some other agency. Clearly, a source, which is not primary, is
necessarily a secondary source. Primary sources are original sources of data.

Data obtained from a primary source is called primary data. Likewise, data
gathered from a secondary source is known as secondary data. For example,
assume that a simple study is to be conducted to see the age distribution of
HIV/AIDS victim citizens. Clearly, the variable of study is age. Data about the
age of HIV/AIDS victim citizens may be obtained by making direct interview
with the victims. Note, in this specific case, the victim citizens are primary
sources. Moreover, the data to be collected from them are primary data.
Alternatively, one may use records of hospitals and other related agencies to
obtain the age of the victim citizens without the need of tracing the victims

1
personally. Therefore, the records of the hospitals, in our case, are secondary
sources and the data copied from such records are secondary data.

In most cases, secondary data is obtained from such sources as census and
survey reports, books, official records, reported experimental results, previous
research papers, bulletins, magazines, newspapers, web sites, and other
publications. Different organizations and government agencies publish
information (data) in the form of reports, periodicals, journals, etc. In the case
of Ethiopia, the Central Statistics Authority (CSA) is the first to be mentioned
in publishing such relevant information (secondary data).

2.2.2. ADVANTAGES AND DISADVANTAGES OF PRIMARY AND


SECONDARY DATA
The following are major advantages of primary data over that of secondary
data.

 The primary data gives more reliable, accurate and adequate


information, which is suitable to the objective and purpose of an
investigation.
 Primary source usually shows data in greater detail.
 Primary data is free from errors that may arise from copying of figures
from publications, which is the case in secondary data.

The disadvantages of primary data are:

 The process of collecting primary data is time consuming and costly.


 Often, primary data gives misleading information due to lack of
integrity of investigators and non-cooperation of respondents in
providing answers to certain delicate questions.

Advantage of Secondary data:

 It is readily available and hence convenient and much quicker to obtain


than primary data,
 It reduces time, cost and effort as compared to primary data,

2
 Secondary data may be available in subjects (cases) where it is
impossible to collect primary data. Such a case can be regions where
there is war.

Some of the disadvantages of Secondary data are:

 Data obtained may not be sufficiently accurate,


 Data that exactly suit our purpose may not be found,
 Error may be made while copying figures.

The choice between primary data and secondary data is determined by factors
like nature and scope of the enquiry, availability of financial resources,
availability of time, degree of accuracy desired, and the collecting agency.
Often, primary data are used in situations where secondary data do not provide
adequate basis of analysis. Meaning, when the secondary data do not suit a
specific investigation we use primary data. Unless for such cases, most
statistical investigations rest up on secondary data since it minimizes cost and
saves time. Nevertheless, the following points should be carefully considered
while using secondary data in our investigation.

 One should closely examine whether or not the data are suitable for the
intended study,
 The source of data should be viewed, keeping in mind whether, at any time,
it is reliable or not. If there is any doubt about the reliability of data, it
should not be used,
 It should be noted that the data is not obsolete,
 In case the data are based on a sample, one should see whether the sample
is a proper representative of the population,
 It should be the case that skilled persons only have handled the primary
data carefully.

Finally, it should be clear that primary data in the hands of one person might
be secondary in the hands of another. That is why it is often said, “the
difference between primary and secondary data is largely one of degree.”

3
2.3. METHODS OF COLLECTING PRIMARY DATA
After discussing the two sources of data, primary and secondary, it is logical to
say a few words about the methods employed in collecting data from its original
or primary source.
Many authors commonly state three methods of collecting primary data. These
are:
a. Personal Enquiry Method (Interview method)
b. Direct Observation
c. Questionnaire method

a) Personal Enquiry Method (Interview method)


In personal enquiry method, a question sheet is prepared which is called
schedule. The schedule contains all the questions, which would extract a
complete report from a respondent. Usually, schedules are pre-tested so as to
remove certain discrepancies like ambiguities of the questions and irrelevant
questions. This pre-testing process is called a pilot survey. It is worth
mentioning that the schedule is not directly given to the respondent. Rather, it
is the interviewer who asks those questions on the schedule and jot down the
interviewee‟s (respondent‟s) response. Depending on the nature of the
interview, personal enquiry method is further classified into two types.

 Direct Personal Interview: It is a type of personal enquiry where there


is a face-to-face contact with the persons from whom the information is
to be obtained. In other words, the investigator contacts each respondent
personally, without the interference of third party, and asks questions
given in the schedule one by one and notes down respondent‟s replies on
the schedule.
 Indirect Personal Enquiry (Interview): It is the second type of personal
enquiry where the investigator contacts third parties called witnessed
who are capable of supplying the necessary information. Here, the
information is not collected directly from the respondent but from a third
person who knows the respondent well. Such an approach is useful in
4
case where the respondent is expected to conceal information about him
or her. For example, if an enquiry about the habit of using condoms is
distributed in a village, most of the villagers may not provide the correct
information. Thus, it would be wiser to get the required information from
other parties, like the nearby condom dealing shop.
b) Direct Observation
In this approach, an investigator stays at the place of survey and notes down
the observation himself. There is no enquires in the case of direct observation.
For example, an investigator making a study on nutritional status of children
may directly (physically) measure the weight, height, and other required
parameters himself/herself. Direct observation is more experimental and
usually applied in scientific studies. It is time consuming and also costly.
c) Questionnaire Method
Under this method, a list of questions related to the survey is prepared and
sent to the various respondents by post, Web sites, e-mail, etc. However, this
method cannot be used if the respondent is illiterate. It is a method that is
often used in many statistical investigations.

The following are the major points that we need to take into account while
preparing a questionnaire.

 The number of questions should be small. Naturally, respondents are not


comfortable with lengthy questionnaires. Lengthy questionnaires usually
bore respondents. Hence, fifteen to twenty five questions in a questionnaire
are optimal. If a lengthy questionnaire is unavoidable, it should preferably
be divided into two or more parts.

 The questions should be short, clear, simple and unambiguous. Moreover,


the questions must be arranged in a logical order so that natural and
spontaneous reply to each is induced. For instance, it is not appropriate to
ask a person how many packets of cigarette he/she smokes before asking
whether he/she smokes or not.

5
 Questions of sensitive nature should be avoided. Sensitive questions are
those questions that are too personal and pecuniary like “ Sources of
income”, “Drinking habit”, etc. The logic here is that respondents do not
willingly answer sensitive questions. Such information, if necessary, may be
gathered through interview or through other indirect questions.

 Questions should be capable of objective answers. As much as possible,


avoid subjective questions and keep to questions of fact. To this end,
multiple answer questions can be used.

 Mail questionnaires should be accomplished by a covering letter, which


should state the purpose of the questionnaire, promise of confidentiality of
responses, etc.

Furthermore, the questions preferably designed in such a way can easily be


answered as yes/no.

2.4. LEVEL (SCALE) OF MEASUREMENT


There are four general levels of measurements:
These are: Nominal, ordinal, interval and ratio levels of measurements

I. NOMINAL LEVEL
The terms nominal level of measurements and nominal scaled are commonly
used to refer to data that can only be classified in to categories. In the strict
sense of the words, however, there are no measurements and no seals involved.
In stead, there are just counts.

Look at the information presented in the table below,


Religion reported by the population of the United States 14 years old and older

6
Religion Total
Protestant 78,952,000
Roman catholic 30,669,000
Jewish 3,868,000
Other religion 1,545,000
No religion 3,195,000
Religion not reported 1,104,000
Total 119,333,000

In the above table, the arrangement of religions could have been changed. This
indicates that for nominal level of measurement, there is no particular order for
the groupings. Further, the categories are considered to be mutually exclusive.
Nominal level is considered the most primitive, the lowest or the most limited
type of measurement
II. ORDINAL LEVEL
Look at the data below.
Ratings of the company commander

Rating Number of nurses


Superior 6
Good 28
Average 25
Poor 17
Inferior 0

The table lists the ratings of company commander by the nurses under her
command. This is an illustration of the ordinal level of measurement. One
category is higher than the next one; that is, “Superior” is higher rating than”
good”, “good” is higher than “average”, and so on.

If 1 is substituted for “superior”, 2 substituted for „good‟ and so on, a 1 ranking


is obviously higher than a 2 ranking, and a 2 ranking is higher than a 3
ranking. However it cannot be said that (as an example) a company

7
commander rated good is twice as competent as one rated average, or that a
company commander rated superior is twice as competent as one rated good. It
can only be said that a rating of superior is greater than a rating of good, and a
good rating is greater than an average rating.

The major difference between a nominal level and an ordinal level of


measurement is the “greater than” relationship between the ordinal-level
categories. Otherwise, the ordinal seal of measurement has the same
characteristics as the nominal scale; namely, the categories are mutually
exclusive and exhaustive.

III. INTERVAL LEVEL


The interval scale of measurement is the next higher level. It includes all the
characteristics of the ordinal scale, but in addition, the distance between
values is a constant size. If one observation is greater than another by a certain
amount, and the zero point is arbitrary, the measurement is on at least an
interval scale. For example, the difference between temperatures of 70 degrees
and 80 degrees is 10 degrees. Likewise, a temperature of 90 degrees is 10
degrees more than a temperature of 80 degrees, and so on. Scores on a
statistics or mathematics examination are also examples of the interval scale of
measurement.

IV. RATIO LEVEL


Ratio level is the highest level of measurement. This level has all the
characteristics of interval level. The distances between numbers are of a
known, constant size; the categories are mutually exclusive, and so on.

The major differences between interval and ratio levels of measurement are
these: (1) Ratio-level data has a meaningful zero point and (2) the ratio between
two numbers is meaningful. Money is a good illustration having zero dollars
has meaning you have none! Weight is another ratio-level measurement.

If the dial on a scale is zero, there is a complete absence of weight. Also, if you
earn $40,000 a year and John earns $ 10,000, you earn four times what he

8
does. Likewise, if you weigh 80 kg. and John weight 40 kg., you weigh twice
John. But such comparisons are impossible in interval level of measurement.

2.5. CLASSIFICATION OF DATA

After collecting relevant information (data) for the purpose of statistical


investigation, the next important task is classification and presentation of this
data. It is difficult to group the meaning of any considerable volume of
numerical data unless their mass is some hours reduced to relatively few
convenient classes or categories and presented with the help of some kinds of
visual aid.

This section discusses classification of data. Presentation of data using graphs


and charts will be seen in the next unit.

Classification: - is the process of arranging things in groups or classes


according to their resemblance.

Purposes of Classification: -
 To eliminate unnecessary detail.
 To bring out clearly points of similarity & dissimilarity
 To enable one to form mental pictures of objects on measurements
 To enable one to make comparisons and draw inferences
2.6. METHODS TYPES OF CLASSIFICATION
1. Geographical Classification: - Data are arranged according to places
like continents, regions, and countries
Example

Region Dominant Language Spoken


East Africa Amharic
West Africa French
North Africa Arabic
South Africa English

9
2. Chronological Classification:- Data are arranged according to time like
year, month.
Example
Year (in EC) Population (in million)

1974 30

1986 52

1991 60

3. Qualitative Classification: - Data are arranged according to attributes like


color, religion, marital-status, sex, educational background, etc.
Example 3.

Employees in a Factory x

Educated Uneducated

Male Female Male Female

4. Quantitative Classification:- In this type of classification, the statistical


data is classified according to some quantitative variables. The variable may
be either discrete or continuous.

Example 4.
Mr. x Height (X) in cm

A 160

B 182

C 175

D 178

10
A. Discrete Variables – are variables that are associated with enumeration or
counting
Example
 Number of students in a class
 Number of children in a family, etc

B. Continuous Variables – are variables associated with measurement.


Example
 Weights of 10 students.
 The heights of 12 persons.
 Distance covered by a car between two stations etc.

2.7. FREQUENCY DISTRIBUTION


When the raw data have been collected, they should be put in to an ordered
array in an ascending or descending order so that it can be looked at more
objectively. Then this data must be organized in to a “FD” which simply lists
the values or classes with their corresponding frequencies in a tabular form.
Here, frequency refers to the number of observations a certain value occurred
in a data.
The tabular representation of values of a variable together with the
corresponding frequency is called a Frequency Distribution (FD).
Definition:

A frequency distribution is the organization of raw data in table form, using


classes and frequencies.

11
Types of Frequency distribution

A. Ungrouped Frequency Distribution (UFD)


Shows a distribution where the values of a variable are linked with the
respective frequencies.
Example 7. Consider the number of children in 15 families.
1 0 3 2 0
2 4 1 3 1
4 1 2 2 3
Construct ungrouped FD for the above data.
Solution:
No. of Children No. of Family Frequency
(Values) (Tallies)

0 // 2

1 //// 4

2 //// 4

3 /// 3

4 // 2

Total 15

Exercise
Consider the following scores in a statistics test obtained by 20 students in a
given class.
10, 4, 4, 7, 5, 7, 7, 8, 5, 7, 8, 5, 10, 8, 7, 5, 7, 8, 7, 4
Prepare an ungrouped FD
B. Grouped Frequency Distribution (GFD)

If the mass of the data is very large, it is necessary to condense the data in to
an appropriate number of classes or groups of values of a variable and indicate
the number of observed values that fall in to each class. Therefore, a GFD is a

12
frequency distribution where values of a variable are linked in to groups &
corresponded with the number of observations in each group.
Example* 2.8

Values (xi) 1 - 25 26 - 50 51 - 75 76 - 100

Frequency (fi) 3 10 18 6

COMMON TERMINOLOGIES IN A GFD


i. Class:- group of values of a variable between two specified numbers called
lower class limit
(LCL) & upper class limit (UCL)

*
In Example , the GFD contains four classes: 1 – 25, 26 – 50, 51 – 75, and 76 –

100
LCL1 = 1, UCL1 = 25 LCL3 = 51, UCL3 = 75
LCL2 = 26, UCL2 = 50 LCL4 = 76, UCL4 = 100
ii. Class Frequency (or Simply Frequency): refers to the number of
observations corresponding to a class.

In Example * the class frequency of the 1st, 2nd, 3rd, & 4th classes are
respectively 3, 10, 18 and 6.

iii. Class Boundaries: are boundaries obtained by subtracting half of the unit
of measurement (u) from the lower limits or by adding ½ (u) on the upper limits
of a class.
i.e UCBi = UCLi + ½ (u)
LCBi = LCLi - ½ (u)
Where UCBi = Upper Class Boundaries and
LCBi = Lower Class Boundaries
Remark: The unit of measurement (u) is the gap between any two successive
classes. i.e
u = lower limit of a class – upper limit of the preceding class.

13
In Example *, consider the 2nd class, 26 – 50, since u = 26 – 25 = 1,

LCL2 = 26 UCL2 = 50
LCB2 = 26 - ½(1) = 25.5 UCB2 = 50 + ½(1) =50.5
iv. Class Width (size of a class or class interval): it is the difference between
the upper and lower class limits or the difference between the upper and lower
class boundaries of any class.
Remarks:
1. If both the LCL & UCL are included in a class, it is called an inclusive
class. For inclusive classes,
Class width (cw) = UCBi - LCBi

2. If LCL is included and the UCL is not included in a class, it is called an


exclusive class. For exclusive classes

cw = UCLi – LCLi

To be consistent, we use inclusive classes.


v. Class Mark (cm): it is the mid point (center) of a class
cmi = UCBi + LCBi
2
Note:- the difference between any two successive class marks is equal to
the width of a class
vi. Range (R) : is the difference between the largest (L) and the smallest
(S) values in a data

R=L–S

Exercise consider the following GFD

Class Frequency (f)


5–9 2
10 – 14 6
15 – 19 12
20 – 24 7
25 – 29 3
Total 30

14
a. What is the class frequency of the 3rd class?
b. How many observations (items) are linked into the last class?
c. Find i. The LCL and UCL of the fourth class
ii. The UCB and LCB of the third class
iii. The class interval ( class width) of the fifth class
iv. The class mark (mid point) of the second class

RULES FOR FORMING A GROUPED FREQUENCY DISTRIBUTION


To construct a GFD the following points should be considered
1) The classes should be clearly defined. That is each observation should
fall in to one & only one class.
2) The number of classes neither should either to be too larger nor should
be too small. Normally, 5 to 20 classes are recommended
3) All the classes should be of the same width. An approximate suitable
class width can be obtained as:
Range R L S
cw  i.e cw  
Number of Classes n n

R
Example 8. Let  6.8263
n
o If all the observations are whole numbers, cw = 7
o If all the observations are to one decimal places, cw = 6.8
o If all the observations are to two decimal places, cw = 6.83, etc.
Note that a suitable number of classes can be obtained by using the formula
n  1 + 3.322 logN
up/down to the nearest whole number, where N is the total number of
observations.
Alternatively n can also be determined by formula

Where n=Number of Classes


N=Total number of observations

Remark Unequal class intervals create problem in graphing and computing


some statistical measures

15
4) Determine the class limits
i. Determine the lower class limit of the first class (LCL1), then
LCL2 = LCL1 + cw, LCL3 = LCL2 + cw,… LCLi+1 = LCLi + cw
ii. Determine the upper class limit of the first class (UCL1) i.e.
UCL1 = LCL1 + cw – u, where u = the unit of measurement, then
UCL2 = UCL1 + cw , UCL3 UCL2, … , UCLi+1 = UCLi + cw
5) Complete the GFD with the respective class frequencies.
Example 9. The number of customers for consecutive 30 days in a
supermarket was listed as follows:

20 48 65 25 48 49
35 25 72 42 22 58
53 42 23 57 65 37
18 65 37 16 39 42
49 68 69 63 29 67
a. Construct a GFD with a suitable number of classes
b. Complete the distribution obtained in (a) with class boundaries & class
marks
Solution: i. Range = Largest value – smallest value
= 72 – 16 = 56
ii. N = 30 (total number of observations)
 number of classes, n = 1 + 3.322 log30
 n = 1 + 3.322 log30
= 1 + 3.322 (1.4771)
= 5.9
Hence a suitable number of class n is chosen to be 6
Range 56
iii. Class width =  = 9.33 = cw
n 6
For the sake of convenience, take cw to be 10 (note that it is also
possible to choose the cw to be 9).

16
iv. Take lower limit of the 1st class (LCL1) to be 16 & u = 1
i.e. LCL1 = 16 and UCL1 = LCL1 + cw – u =16+10-1 = 25
LCL2 = LCL1 + cw = 16 + 10 = 26 UCL2 = UCL1 + cw = 25 + 10 = 35
LCL3 = LCL2 + cw = 26 + 10 = 36 UCL3 = UCL2 + cw = 35 + 10 = 45
Therefore, the GFD would be A
a)

Class (xi) Frequency (fi)

16 – 25 7

26 – 35 2

36 – 45 6

46 – 55 5

56 – 65 6

66 – 75 4

b)

Class (xi) Frequency (fi) CBi cmi

16 – 25 7 15.5 – 25.5 2.05

26 – 35 2 25.5 – 35.5 30.5

36 – 45 6 35.5 – 45.5 40.5

46 – 55 5 45.5 – 55.5 50.5

56 – 65 6 55.5 – 65.5 60.5

66 – 75 4 65.5 – 75.5 70.5

Exercise
Construct a grouped frequency distribution for the following ages of 50 persons
with 6 classes.
37 40 69 35 36 70 72 62 36 72
65 64 47 59 55 42 45 50 46 65
54 63 51 50 61 60 58 58 56 58

17
55 45 49 51 50 56 44 60 70 44
52 43 55 46 42 62 57 48 60 55
I. CUMULATIVE FREQUENCY DISTRIBUTION (CFD)

It is the collection of values of a variable above or below specified values in a


distribution. CFD is of two types.
a. ‘Less Than’ Cumulative Frequency Distribution (<CFD): shows the
collection of cases lying below the upper class boundaries of each class.

b. ‘More Than’ Cumulative Frequency Distribution (>CFD): shows the


collection of cases lying above the lower class boundaries of each class.

Remark: The frequency distribution does not tell us directly the number of
units above or below specified values of the classes this can be determined
from a “cumulative Frequency Distribution‟

Example 11 Consider the frequency distribution with a class width 3

Class Frequency (fi) Less than Cumulative More than


(xi) Frequency (<cfi) Cumulative
Frequency (>cfi)
3–6 4 4 30
7 – 10 7 11 26
11 – 14 10 21 19
15 – 18 6 27 9
19 – 22 3 30 3

This means that from „less than‟ cumulative frequency distribution there are 4
observations less than 6.5, 11 observations below 10.5, etc and from „more
than‟ cumulative frequency distribution 30 observations are above 2.5, 26
above 6.5 etc.
II. RELATIVE FREQUENCY DISTRIBUTION (RFD)

It enables the researcher to know the proportion or percentage of cases in each


class. Relative frequencies can be obtained by dividing the frequency of each

18
class by the total frequency. It can be converted in to a percentage frequency
by multiplying each relative frequency by 100%. i.e.
fi
Rf i 
n

Where Rfi – is the relative frequency of the ith class


fi – is the frequency of the ith class
n – is the total number of observations
Note: Pfi = Rfi  100%
Where Pfi is percentage frequency of each class.

Example 14: The relative and percentage frequency distribution of Example 9


is :

xi fi Rfi %freq. (Pfi)

3–6 4 4/30 4/30  100

7 – 10 7 7/30 7/30  100

11 – 14 10 10/30 10/30  100

15 – 18 6 6/30 6/30  100

19 – 22 3 3/30 3/30  100

Total 30 1 100%

QUESTIONS

1. Determine whether each statement is true or false.

a. A frequency distribution is the organization of raw data, in table


form, that lists values or classes with their corresponding
frequencies.
b. The mid point of a class can be obtained by adding the upper and
lower limits, and dividing by 2.
c. If the gap between any two successive classes is one and the limits
of a class are 10-19, then the width of the class is 9.
19
d. If the limits of a class in a frequency distribution are 26-30, then
the boundaries are 25.5-30.5.
e. When data is first collected, it is called raw data.
f. A frequency distribution should contain between 50 and 100
classes.
g. It is not important to keep the width of each class the same in a
frequency distribution.

2. Classify each variable as discrete or continuous.


a) Number of cartoons of milk manufactured each day.
b) Temperatures of airplane interiors at a given airport.
c) Lifetimes of transistors in a stereo set.
d) Weights of newborn calves.
3. 100 employees were surveyed in a factory to find out their ages. The result
was obtained as follows.

32 21 28 31 35 46 48 49 49 48
36 37 22 31 28 34 20 45 44 48
38 33 33 23 28 29 33 26 36 30
43 42 32 36 24 27 27 32 45 45
39 39 38 32 33 25 30 28 37 36
42 43 38 40 35 34 20 30 36 32
40 38 38 40 46 36 35 21 31 35
41 42 39 40 46 44 32 37 22 27
41 39 40 38 44 45 48 36 32 23
40 41 40 44 49 49 49 49 37 33
Construct a Grouped Frequency Distribution (GFD) with five classes for the above data.

20
PRESENTATION OF DATA
INTRODUCTION
This unit deals with the study of organizing a set of raw data in to a Frequency Distribution (FD)
and describes the distribution graphically in a histogram, a frequency polygon, & a cumulative
frequency curve (ogive). The other types of numerical information will be summarized &
presented in the form of bar chart, pie chart or a pictogram.
Definition:

Presentation is a statistical procedure of arranging and putting data in a form of tables,


graphs, charts and/or diagrams

A. HISTOGRAM

After you complete a frequency distribution, your next step will be to construct a “picture” of
these data values using a histogram. A histogram is a graph consisting of a series of adjacent
rectangles whose bases are equal to the class width of the corresponding classes and whose
heights are proportional to the corresponding class frequencies. Here, class boundaries are
marked along the horizontal axis (x – axis) and the class frequencies along the vertical axis ( y –
axis) according to a suitable scale. It describes the shape of the data. You can use it to answer
quickly such questions a,s are the data symmetric? And where do most of the data values lie?

Example 1. Considers the following GFD and construct a histogram


Class (xi) Frequency (fi)

3–6 4

7 – 10 7

11 – 14 10

15 – 18 6

19 - 22 3

Total 30

Solution:

21
Histogram for the above distribution

10
Class frequency (fi)

8
6
4
2

2.5 6.5 1.05 14.5 18.5 22.5


Class boundaries (CBi)

CYP 1 construct a histogram for the following distribution


Class (xi) Frequency (fi)

5 – 10 4

10 – 15 7

15 – 20 9

20 – 25 12

25 - 30 6

30 – 35 5

B. FREQUENCY POLYGON

It is a line graph of frequency distribution. Although a histogram does demonstrate the shape of
the data, perhaps the shape can be more clearly illustrated by using a frequency polygon. Here,
you merely connect the centers of the tops of the histogram bars (located at the class midpoints)
with a series of straight lines. The resulting figure is a frequency polygon. Here the class marks
are plotted along the x – axis and the class frequencies along the y – axis. Empty classes are
include at each end so that the curve will anchor with the x – axis.

Example 2. Construct a frequency polygon for the frequency distribution given in Example9
22
Solution:
A frequency polygon for the
distribution in example 9

14
12
frequency (fi)

10
8
6
4
2
0
0.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5
Class marks (cmi)

CYP 2 Construct a frequency polygon for the frequency distribution given under CYP 1

C. CUMULATIVE FREQUENCY CURVE, (OGIVE)

It is the graphic representation of a cumulative frequency distribution Ogives are of two kinds.
„Less than‟ ogive and „more than‟ Ogive < Ogive and > Ogive.
A) „Less than’ ogive: here, upper class boundaries are plotted against the „less than‟
cumulative frequencies of the respective class & they are joined by adjacent lines.
Example 3. Draw a „less than‟ ogive for the frequency distribution in Example 11

Solution:
A less than ogive showing the frequency
distribution above
40
Less than cumulative

30
frequency (<Cfi)

20
10
0
6.5 10.5 14.5 18.5 22.5
Upper class boundary (UCBi)

23
B) „More than’ ogive: here, lower class boundaries are plotted against the „more than‟
cumulative frequencies of their respective class and they are joined by adjacent line

Example 4. Draw a „More than‟ ogive for the frequency distribution in Example 11
Solution:
A more than ogive for the above frequency
distribution

35
More than cumulative frequency (>Cfi

30
25
20
15
10
5
0
2.5 6.5 10.5 14.5 18.5
lower class boundaries (LCBi)

D. LINE GRAPH

It represents the relation ship between time (on the x-axis) and values of variable (on the y-axis).
The values are recorded with respect to the time of occurrence.

Example 5. Draw a line graph for the following time series.

Year 1986 1987 1988 1989 1991

Values 20 10 30 15 1

Solution:

24
A line graph showing the above time series

35
30 30
25 25

Values
20 20
15 15
10 10 10
5
0
1986 1987 1988 1989 1990 1991
Year

E. VERTICAL LINE GRAPH

Is a graphical representation of discrete data (or characteristics expressed with whole numbers)
with respect to the frequencies? Vertical solid lines are used to indicate the frequencies.

Example 6. Draw a vertical line graph for the following data


Family A B C D E

Number of children 2 1 5 4 3

Solution:
Y
5

X
A B C D E
Vertical line graph showing number of children in family A, B, C, D and E

25
F. BAR CHART (BAR DIAGRAM)

Histogram, Frequency polygon, ogives are used for data having an interval or ratio level of
measurement. The other kinds of presenting statistical data suitable for a particular kind of
situations are bar charts, pie chart and pictograph.

Bar chart is a series of equally spaced bars of uniform width where the height (length) of a bar
represents the amount (magnitude) of frequency corresponding with a category. Bars may be
drawn horizontally or vertically. Vertical bar graphs are preferred as they allow comparison with
other bars.

TYPES OF BAR CHARTS

A. SIMPLE BAR CHART:


It represents a single set of data (variable) classified in different categories. Singular bars are
drawn with the respective frequencies.

Example18: Revenue (in millions of Birr) of company x from 1980 to 1982 is given below

Year Revenue
1980 50
1981 150
1982 200

Solution:
A simple bar chart showing revenues of company
X from 1980 to 1982
250
200
Revenue

150
100
50
0
1980 1981 1982
year

26
B. MULTIPLE BAR CHART:
Here two or more bars are grouped with the corresponding frequency to represent two or more
interrelated data in each category. The bars of related variables are kept adjacent to each other
for every set of values. These charts can be used if the overall total is not required and each bar
is shaded or colored separately and a key is given to distinguish them.

Example19: The following table shows the production of wheat and maize in hundreds of
quintals.

Year Maize Wheat

1980 40 80

1981 20 60

1982 60 100

Solution:
The number of quintals(in thousands) of
wheat and maize production

100
100
80
80
60 60
Number of 60
quintals 40
40 maize
20 wheat
20
0
1980 1981 1982
Year

C. SUBDIVIDED BAR CHART:


It is used to present data by subdividing a single bar with respect to the proportional frequency.
Each portion of the bar is then shaded or colored and a key is give to distinguish them.

Example20: The number of quintals of wheat and maize (in millions of quintals) produced by
country x in the indicated years.

27
Year Wheat Maize

1980 150 150

1981 300 200

1982 350 100

Solution:
The number of quintals of wheat and
maize produced by country X

600
500
Number of quintals

400 200 100 Maize


300
200 150 350
Wheat
100 300
150
0
1980 1981 1982
Year

D. PERCENTAGE BAR CHART:


It is a subdivided bar chart where percentages are used in each classification rather than the actual
frequencies.
Example 21: construct percentage bar chart for the data in Example 19.

Solution:
Year % of Wheat Production % of Maize Production

1980 150/300  100 = 50 150/300  100 = 50

1981 300/500  100 = 60 200/500  100 = 40

1982 350/450  100 = 78 100/450  100 = 22

Percentage of wheat and maize production from 1980-1982


Percentage produced

100%
22
80% 50 40
60% wheat
40% 78 maize
50 60
20%
0%
1980 1981 1982
Year

28
G. PIE CHART

A pie chart is a circle divided in to various sectors with areas proportional to the value of the
component they represent. It shows the components in terms of percentages not in absolute
magnitude. The degree of the angle formed at the center has to be proportional to the values
represented.

Example 22: the monthly expenditure of a certain family is given below.


Items Expenditure % Proportion (Pfi) Degrees (360o Rfi)

Clothing 100 100/1000  100 = 10 100/1000  360o = 36

Food 350 350/1000  100 = 35 350/1000  360o = 126

House Rent 250 250/1000  100 = 25 250/1000  360o = 90

Miscellaneous 300 300/1000  100 = 30 300/1000  360o = 108

Total 1000 100% 360o

Solution: The pie chart for the above expenditure is as follows

Food
300 350
House rent
Clothing
Misc.
100 250

H. PICTOGRAPH (PICTOGRAM)

A pictograph is a graph that uses symbols or pictures to represent data.

Example 23: In comparing the population of a country from 1990 to 1992, we simply draw
pictures of people where each picture may represent 1000,000 people.
1992 -  Key:  = 1,000,000
1991 - 
1990 - 

29
QUESTION

Direction: Answer each of the following questions.


1. Determine whether each statement is true or false.
a. The ogive uses cumulative frequencies.
b. Histogram can be drawn by using vertical or horizontal bars.
c. In the construction of a frequency polygon, the class limits are used for the x-axis.
d. Data collected over a period of time can be graphed by using a pie chart.
e. When the data is represented graphically by symbols or pictures, the graph is
called a frequency curve.
2. Construct a histogram, frequency polygon, and both ogives to represent the data shown
below.

Class Boundaries (CBi) Frequency fi

5.5-10.5 1

10.5-15.5 2

15.5-20.5 3

20.5-25.5 5

25.5-30.5 4

30.5-35.5 3

35.5-40.5 2

3. Construct a horizontal and vertical bar chart for the areas (in square miles) of each of the
great lakes in Ethiopia.
Lake Area (km2)
Tana 3600
Abaya 1160
Chamo 551
Ziway 434
Shala 409

30

You might also like