0% found this document useful (0 votes)
24 views16 pages

Data Collection - Presentation

The document provides an overview of data collection methods, distinguishing between primary and secondary data, and describes various techniques for summarizing and presenting statistical data. It emphasizes the importance of graphical and tabular representations, detailing types of graphs like bar diagrams and pie charts, as well as the construction of frequency distribution tables. Additionally, it outlines the principles for preparing tables and the steps involved in creating frequency distribution tables.

Uploaded by

nick bd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views16 pages

Data Collection - Presentation

The document provides an overview of data collection methods, distinguishing between primary and secondary data, and describes various techniques for summarizing and presenting statistical data. It emphasizes the importance of graphical and tabular representations, detailing types of graphs like bar diagrams and pie charts, as well as the construction of frequency distribution tables. Additionally, it outlines the principles for preparing tables and the steps involved in creating frequency distribution tables.

Uploaded by

nick bd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Data collection

Definition of statistical data: The quantitative or qualitative information collected under


any enquiry are termed as data. Data consists of the information usually obtained by
some process of counting measuring or observation for some statistical investigation e.g.
the marital status of the teachers in Asmara University daily temperature in the
Chittagong City for one year, heights of students in the Premier University.

Methods of collecting data (primary and secondary data):


Following are the main methods of collection of statistical data. According to the source
of collection, data can be classified as primary or secondary. The data which is directly
collected from the field for some statistical enquiry, is called primary data, while, the data
which is collected from some existing source like, statistical office, journal, periodicals ,
newspaper, etc. are called secondary data. For example, if the information about the
certain characteristics of the students undertaking course Sta-501 in Premier University
are collected interviewing all the students individually, the type of data is known as
primary data, while if such information are collected from some existing record in the
students’ Record Office of the University, the type of data is called secondary data.
However primary data can be collected by any of the following methods,
a. Direct personal enquiry method
b. Indirect oral investigation
c. By filling of schedule
d. By mailed questionnaire
e. Collecting information from local agents and correspondents
f. By direct observation
While secondary data is collected from existing records, such as Bureau, newspaper,
journal, magazine, official publications, etc.
Condensation or Summarization of data
The collected statistical data can be summarized in two forms viz
i) Graphical and ii) Tabular form (Frequency Distribution)

5
Graphical representation of statistical data
In addition to presenting statistical data in tabular form, one can present the same
through some visual aids, which refers to graphs and diagrams. This is one of the most
convincing and appealing ways in which statistical data may be presented. Such a
presentation gives a bird’s eye view of the entire data and therefore the information
presented in this form is easily understood. When frequency distribution are constructed
primarily to condense large sets of data into ‘an easy to digest’ form, the graphical and
diagrammatical representation are preferred.
The most common form of graphs and diagrams are the i) bar diagram ii) Pie chart iii)
line diagram iv) scatter diagram v) histogram vi) frequency polygon and frequency curve
and vii) cumulative frequency polygon and cumulative frequency curve or ogive. Bar
diagram and Pie chart are constructed for categorical data while the remaining graphs and
diagrams are constructed for data measured in interval scale or quantitative data,
moreover, histogram, frequency polygon and ogive are used for graphical presentation of
the data already presented by frequency distribution table.

Bar diagram
A bar diagram, also known as bar chart, consists of some separate horizontal or vertical
lengths proportional to the magnitudes they represent. The width of these bars have no
significance but are taken to make the charts look attractive. In presenting the bars, there
is no necessity of having a continuous scale. Bar diagram may be of different types, e.g.
vertical, horizontal, component, multiple.
In order to illustrate all types of bar diagrams, we are to consider different types of data,
because different types of data are represented by different types of bar diagrams. For
example, let us consider the following sets of data.

6
Table . Population (in million ) of some country in last five census

Year Male Female Total


population population population
Year 1 66.55 63.45 130
Year 2 60.87 59.13 120
Year 3 52.93 47.07 100
Year 4 38.66 36.34 75
Year 5 31.25 28.75 60

Horizontal Bar diagram


Census Year

1
0 50 100 150
Total Population

Component Bar Diagram


Male and Female

150
population

100
50
0
1 2 3 4 5
Census Year

7
Multiple Bar Diagram

Male, Female and


Total Population
150
100
50
0
1 2 3 4 5
Census Year

Pie Chart:
Also known as the Pie diagram, as an effective way of presenting data in circular form,
especially the categorical data, when the whole quantity is taken as 100. Data other than
categorical, can also be employed for constructing the Pie chart after suitable and
meaningful classification or grouping the data. The Pie chart consists of a circle
sub0diveded into sectors, whose areas are proportional to the various parts into which the
whole quantity is divided. The sectors may be shaded or colored differently to show their
individual contribution to the whole. As a circle consists of 3600, the sectors of whole
quantity is to be expressed proportional to 3600. For example, if there are four segments
or sectors in the whole data sectors and if sectors are A, B, C and D with corresponding
sizes N1, N2 , N3, and N4, so that N1+ N2 + N3 + N4 = N. Then, the angle to be presented
for A is (N1/N)* 3600, for B is (N2/N)* 3600, for C is (N3/N)* 3600 and for D is (N4/N)*
3600. For example, if we represent the total population of five years given in table 3 with
the help of a pie chart, the degrees to be allotted for the year 1 is (130/450)* 360 0 =
96.490 , for the year 2 (120/450)* 3600 = 89.070 , for the year 3 it is 74.230, for the year
4 it is 55.670 and finally for the year 5 it is 43.540. Thus the pie chart is as follows:

8
5
1 1
4 2
3
4
3 2 5

Exercise: Population (in million) in six places of a country in a certain period are given
below:

Place: I II III IV V VI
Population: 34 21.9 27.5 13.2 7.8 7.1
Represent the data by bar diagram and Pie chart.

Tabular Presentation: Frequency distribution


Thus, before dealing with the frequency distribution or frequency distribution table it is
necessary to be acquainted with the following concepts.
Frequency : The number of observations or values falling into each group or class is
called class frequency or simply frequency. The frequency thus shows how many times a
particular value or observation is repeated. For example, if in a set of data, a value 10
occurs 6 times, then 6 is the frequency of 10.
Class, Class interval and Class width: In a frequency distribution, the raw data is
arranged or grouped with respect to some specified categories, each category is termed
as a class, the domain of a class is called class interval and the numerical difference
between the lower and upper limits of a class or range of a class is called class width or
width of a class. Although there is a basic difference between class width and class
interval, the width of a class is also popularly known as class interval. Moreover, in case
of class intervals the classes are not continuous, while if the class intervals are made
continuous, then these are called class boundaries.

9
Open and closed interval: An open interval is an interval with one of its limits ( in
either side) indeterminate, for example, age less than 45 years, weight more than 45
pounds, are open intervals, while a closed interval is that in which both upper and lower
limits are determined, for example, 40-45 years, 35-45 pounds, etc.
Principles of preparing a table: Although there is no hard and fast rule in preparing
tables, certain general rules may be followed, which are as follows:
a) A table should have a title that explains the content of the table clearly.
b) A table should have as simple as possible information and avoid unnecessary
details
c) Each row and column of the table should be labeled clearly and concisely.
d) The column and row total should be provided, if necessary.
e) Units of the data and the sources of data, if any, should be provided in the table
f) Any explanation of the data, if any, inside the table should be provided beneath
the table as footnote.

Different Columns of a frequency distribution table


The main columns in a frequency distribution table are :
a. Class interval or class boundaries
b. Tally marks
c. Frequencies
Sometimes, for some convenient three other columns are also computed in a frequency
distribution table, e,g,
d. Cumulative frequencies
e. Relative frequencies and
f. Relative cumulative frequencies

10
Steps in constructing a frequency distribution table
There is no hard and fast rule for the construction of a frequency distribution table, the
Statistician construct it as convention of his research, however, one can follow the
following steps in construction of a frequency distribution table –
a) determine the range of data
b) determine the number of classes of frequency distribution table. Usually, decision
about the number classes is taken on the basis of range of data series so that the
class interval in each class is a multiple of 5. For this at first range of the
distribution is divided by the 5 and 25, then any suitable number in between two
can be considered as number of classes, however, it is a very crude method of
determining number of classes. Sometimes logically, number of classes are also
computed using the formula (i) = 1 + 3.322log10 n, where ‘i’ is the number of
classes and ‘n’ is the total number of observation under study. However, the class
interval is to be selected so that the number of classes is not too small or too large.
c) the main columns of the frequency distribution table are class interval, tally marks
and the number of frequency. Lower limit of the first class is generally the lowest
value of the data and successive classes are to be determined maintaining the class
interval until the largest observation of the data considered.
d) in case of the ungrouped data, give tally marks in tally marks column against that
class interval in which the observation belongs and continue this process until the
last observation of the data set.
e) the total number of tally marks should be equal to the number of observations.

Let us clarify the above mentioned method of construction of a frequency table with the
following example.

11
Suppose the marks obtained by 100 students in a course in Statistics in certain
examination are as follows:
54 32 38 44 48 41 30 43 46 41
47 32 26 25 41 33 51 43 45 32
51 50 34 38 44 38 54 32 39 41
42 38 41 25 45 36 40 50 52 30
41 3 27 30 40 42 52 48 46 37
48 39 26 54 47 49 38 26 27 49
47 49 32 51 49 33 47 55 25 28
37 36 44 53 48 54 29 37 39 40
50 30 55 48 36 34 27 53 28 55
48 36 46 48 32 29 54 49 47 54

In this example, the smallest observation is 25 and the largest observation is 55, so range
is 55-25 = 20, dividing 20 by 5 yields 4, so we can start construction of frequency
distribution table using class interval 5 in seven classes as follows:

The frequency distribution of the marks of 100 students in Statistics


Class interval Tally Marks Frequency Cumulative Frequency
25-29 //// //// /// 13 13
30-34 //// //// //// 15 28
35-39 //// //// //// 15 43
40-44 //// //// //// / 16 59
45-49 //// //// //// //// // 22 81
50-54 //// //// //// // 17 98
55-59 // 2 100

The tally marks and frequency columns of the above table mean that 13 observations or
marks belong to the class interval 25-29, 15 observations or marks belong to the class
interval 30-34, 16 observations or marks belong to the class interval 40-44 and so on. It
may also be interpreted as that 13 students got marks between 25-29, 15 students got
marks between 30-34 and so on.

12
Since the marks are integers in nature or the discrete numbers, the class intervals are
written as in above table. However, we could write the class intervals as 25-30,30-35,35-
40 and so on, in this case upper boundary or lower boundary of the interval is considered
to avoid the question for which class do the observation like 30,35 or 40 will be
considered? In such case generally, upper class boundary is excluded in certain class
interval. That mean, in first class interval the observations less than 30 will considered, in
second class interval, the observation less than 35 will be considered.
For the interval 30-34, the smaller number, 30, is the lower class limit and the larger
number, 34, is the upper class limit, the number 29.5 is called the lower class boundary
and the number 34.5 is called the upper class boundary. The number of observations
falling in a particular class is called the class frequency and is denoted by the letter ‘f’,
the numerical difference between the upper and lower class boundaries is defined as the
class width. The class width of all the classes of a frequency distribution may be equal or
unequal. Frequency distribution with equal class width is easier to handle with, hence,
generally, equal class width is preferred without some exceptional cases.
Again if we consider the algebraic formula for determining the number of classes, it
becomes i = 1 + 3.322 log10100 = 1 + 3.322 x 2 = 7.6 (approx. 8), in such case one can
choose the number of classes as 7 or 8.

13
Problem # 1. The marks obtained by 100 students in Statistics are as follows:
54 32 38 44 48 41 30 43 46 43
47 32 26 25 41 33 51 43 45 32
51 50 34 48 44 48 54 32 39 41
42 38 41 25 45 36 40 5 52 30
41 32 27 30 40 42 52 48 49 37
48 39 26 54 47 39 38 26 27 49
47 49 32 51 49 33 33 47 55 25
28 37 36 44 53 48 54 29 37 39
40 50 30 55 49 36 34 28 53 28
52 47 46 48 32 29 54 49 47 53
Construct a frequency distribution table with the following data.

Problem # 2. A study of air pollution in a city yielded the following daily readings of the
concentration of sulfur dioxide (in parts per million)
0.01 0.14 0.17 0.11 0.18 0.20 0.13 0.17 0.10 0.07
0.15 0.09 0.09 0.10 0.06 0.12 0.27 0.10 0.12 0.05
0.06 0.08 0.15 0.05 0.09 0.11 0.13 0.07 0.10 0.12
0.05 0.14 03.09 017 0.10 0.08 0.14 0.02 0.08 0.11
0.08 0.01 0.18 0.11 0.13 0.16 0.19 0.12 0.12 0.01
0.05 0.11 0.04 0.13 0.08 0.06 0.07 0.14 0.03 0.19
Construct a frequency distribution table with the given data.

Histogram
The most common form of graphical presentation of a frequency distribution is the
histogram. A histogram is constructed by placing the class boundaries on the horizontal
axis of a graph and the frequencies on the vertical axis. Each class is shown on the graph
by drawing a rectangle whose base is the class boundary and whose height is the
corresponding frequency of the class.
A histogram is basically a set of adjacent rectangles, whose areas represent the
frequencies, while in a bar diagram the spacing of bars are quite arbitrary and the heights

14
of the bars represent frequencies. A bar is a one-dimensional while a histogram is a two-
dimensional figure. Again a histogram is constructed for numerical data of continuous
frequency distribution and a bar diagram is usually constructed for categorical data.
Again we consider the following frequency distribution table

Table 1: The frequency distribution of the marks of 100 students in Statistics


Class interval Class boundaries Frequency Cumulative Frequency
25-29 24.5 – 29.5 13 13
30-34 29.5 – 34.5 15 28
35-39 34.5 – 39.5 15 43
40-44 39.5 – 44.5 16 59
45-49 44.5 – 49.5 22 81
50-54 49.5 – 54.5 17 98
55-59 54.5 – 59.5 2 100

Thus the histogram for the above mentioned data is as follows:

Histogram for the data given in


table 1
25

20
Frequency

15

10

0
Class boundaries
1

15
Frequency Polygon

A frequency polygon provides an alternative to a histogram as a way of graphical


presentation of frequency distribution. The presentation involves placing the mid-values
of the classes on the horizontal axis and the frequencies on the vertical axis. However,
instead of using rectangles, as it is done in case of histogram, we find the class mid-points
on the horizontal axis and then plots points directly above the class mid-points at a height
corresponding to the frequency of the class. Classes of zero frequency are added at each
end of the frequency polygon touches the horizontal axis at both ends of the graphs. This
makes the figure a frequency polygon. The frequency polygon is then formed by
connecting the points with straight lines. Let us again consider the table one of the marks
of 100 students.

Table 2: The frequency distribution of the marks of 100 students in Statistics

Class interval Mid-values Frequency


25-29 27 13
30-34 32 15
35-39 37 15
40-44 42 16
45-49 47 22
50-54 52 17
55-59 57 2

16
Frequency Polygon

25

20

Frequency
15

10

0
1 2 3 4 5 6 7
Mid values

The histogram and the frequency polygon are equally good techniques for presenting
continuous data. The histogram is more often preferred when single distributions are
presented, while the frequency polygon is largely used for comparison of two or more
distributions.

Frequency curve
When a frequency polygon is smoothed through a free hand curve, then the curve is
known as frequency curve.

frequency curve and frequency


polygon

25
20
Frequency

15

10
5

0
1 2 3 4 5 6 7
Mid values

17
Ogive or cumulative frequency curve
An ogive is based on a cumulative frequency distribution, the frequencies are to be
cumulated just by summing the class frequencies. Two types of cumulative distributions
are used to draw an ogive ‘less than type’ and ‘more than type’. ‘Less than type’ of ogive
is more popularly used. Thus a curve of cumulative frequency distribution or cumulative
frequency distribution is called an Ogive. To construct a less than type ogive, the upper
class boundaries are to put on the horizontal axis and the cumulative frequencies are
shown on the vertical axix. A point is than plotted directly above each upper class limit at
height corresponding to the cumulative frequency at that upper class limit. The points are
then connected by straight lines taking one additional points against the lower class limits
for the first class at a height of zero. If a free hand curve is drawn considering the lines,
then the resulting graph is an Ogive. An Ogive looks like an elongated S.

Let us again consider the frequency distribution of table 1 and draw the Ogive for the
data.

Class interval Class boundaries Frequency Cumulative


Frequency
25-29 24.5 – 29.5 13 13
30-34 29.5 – 34.5 10 23
35-39 34.5 – 39.5 7 30
40-44 39.5 – 44.5 15 45
45-49 44.5 – 49.5 27 72
50-54 49.5 – 54.5 17 89
55-59 54.5 – 59.5 11 100

18
Cumulative frequency polygon and ogive

120

Cumulative frequencies
100

80

60

40

20

0
29.5 34.5 39.5 44.5 49.5 54.5 59.5
Upper class boundaries

Problem: Construct an Ogive for the data given in the table 2.

Scatter Diagram:
Scatter diagrams are useful for displaying information on two quantitative variables,
which are believed to be interrelated, height and weight, age and height, income and
expenditure, etc. are the examples of such data sets which are assumed to be related to
each other and which can be displayed by scatter diagram. The nature of the relationship
between two considered variables can be determined from the shape of scatter diagram.
Let us consider the following data:

Monthly Income-Expenditure data of ten families


Income (in ‘000) Expenditure (in
‘000)
45 41
38 30
75 50
48 40
65 55
50 42
80 62
41 20
23 18
60 55

19
Scatter diagram

70
60

Expenditure 50
40
30
20
10
0
0 20 40 60 80 100
Income

It is clear from the diagram that the ages of wives are increasing with the ages of
husbands. So there is a positive relationship between the two variables.
Some notations used in statistics
If x1, x2 ..............., xn are the n observations of a study, then,
n
a) x1 + x2 + ...............+ xn =  xi
i 1
n
 xi
2
b) x12 + x22 + .............+ xn2 =
i 1
n
c) ax1 + ax2 + ...............+ axn = a  xi
i 1
d) a + a + ................a ( n-times) = na
n
e) (ax1 +b) + (ax2 + b) +..............+ (axn + b) = a  xi + nb
i 1
n n n
f) (x1 + y1) + (x2 +y2 )+ ...............+(xn + yn ) =  (xi  y i ) =  x i +  y i
i 1 i 1 i 1
n
g) x1x2 ...............xn =  xi
i 1
n
h) x1y1 + x2y2 + ...............+ xnyn =  xi y i
i 1

20

You might also like