Introduction to Statistics
Econ-2041
1
Chapter One
Introduction
2
Outline
Definition of Statistics/what statistics deals?
Classification of Statistics
Definition of Statistical Terms
Applications, Uses and Limitations of Statistics
Types of Variables and Data
3
Statistics: a branch of mathematics that deals with the effective
management and analysis of data.
It deals the methods of collection, organization, presentation,
analysis and interpretation of numerical data,
Thus, leads to drawing valid conclusions and making reasonable
decisions about the variables in the investigation.
Statistics can be divided in to two main branches depending
on how data can be used.
Descriptive statistics
Inferential statistics
4
Descriptive Statistics: it’s the procedures used to organize and
summarize masses of data concerned with the collection,
processing, summarizing and describing important feature of the
data without any attempt to conclude (interpret) about the
variable in the investigation.
Example; frequency computation, measures of central tendency
and measure of dispersion.
Inferential Statistics: it’s a method used to generalize (making
conclusion) about large data (population) by examining only
small portion of the data (sample).
5
Inferential statistics involves making estimation or testing
hypothesis about the characteristics of the population.
Example; the average income of all families (population) in
Ethiopia can be estimated by taking sample from a few hundred
families and making conclusion about average income.
6
Stages in Statistical Investigation
There are five stages/steps in any statistical investigation.
Data Collection: it’s the process of measuring, counting, gathering,
assembling (ordering) the raw data up on which the statistical
investigation to be based.
Data can be collected in a variety of ways including census (total
counting); but survey (taking sample) has being used as the most
common methods of data collection.
The survey can be done in the most common methods are:
Telephone survey
Mailed questionnaire
7
Organization: it’s the way of summarizing of the data in meaningful way,
e.g. table form, chart, graph ... etc.
Presentation: it’s the process of re-organization, classification,
compilation (re-gathering missed data), and summarization of the data to
present in a meaningful form.
Analysis: it’s the process of extracting relevant information from the
summarized data about the variables which the investigation concerned.
Inference: it’s interpretation and further observation of the various
statistical measures through the analysis of the data by implementing
descriptive statistics; collecting, organizing, presenting and analyze the
8
Some Statistical Terms
Some statistical terms used in statistical investigation are;
Population: it’s the complete set of possible measurements for
which inferences are to be made.
The population represents the target of an investigation, in which
the objective of the investigation has being to draw conclusions
about the population, hence call it target population.
The population could be finite (numerically measurable); e.g.
average income of the household,… or infinite (numerically
immeasurable); e.g. consumer utility, human wisdom,…etc. ...
9
The two ways of investigation are; census & sample survey.
Census: it’s a complete enumeration (counting and measuring)
of the entire population.
But, most real problems can not be realized census, hence the
investigator take sample.
Sample: set of measurements of the portion of the population in
the course of an investigation.
It should be selected using sampling technique (probability or
non-probability) in a way that they represent the population
very well.
10
Monthly production data of a certain factory in the past
10 years.
Small portion of a finite population (average income of
the household, salary of the employee,…etc.).
Parameter: the descriptive measure of the population that has
fixed value.
Popn mean, popn variance and popn standard deviation, etc.
Statistic: descriptive measure of sample (sample mean, sample
variance and sample standard deviation) that has different values
taken in different samples . 11
Sampling: the method of sample selection from the population.
Sampling Unit: the unit of the target population to be sampled.
In the study of socio-economic status of the households,
households will be the sampling unit.
In the study of the performance of freshman students in
some college, student will be the sampling unit.
Sample Size: number of observation to be included in the sample.
Sampling Frame: the list of all elements in a population.
In households’ investigation, all households’ list called sampling
frame.
12
Errors: error in sample survey refers the difference between
population value (parameter) and the corresponding sample value
(statistic).
It indicates that the sample does not perfectly represent the
population.
There are two types of errors in sample survey.
Sampling Error: it’s the discrepancy (difference in value) between
the population value and sample value.
It may arise due to inappropriate sampling techniques applied
in sample survey.
13
Non-sampling Errors: errors arises due to procedure bias during
the survey, such as;
Incorrect responses
Measurement errors
Errors at different stages in processing the data.
14
Variable: it’s an item of interest to be included in the investigation.
Variables can be qualitative, or quantitative
Qualitative Variables; are non-numeric variables which can't be
measured in numerical terms, but can take on numerical value by
assign in numbers (coding).
E.g.; gender (male - 1 and female - 2), state of birth (Ethiopian-1,
others (foreign) - 2, rural-1, urban - 2, etc.) and the like.
Quantitative Variables; are numerical variables which can be
measured in numerical terms.
E.g.; no of student in the class, measurement of temperature,
average income of the household, salary of the employee. … 15
Quantitative variables are either discrete or continuous.
Discrete Variables: are variables having certain values, usually
“having gaps" between the values (can’t express in decimal), such
as the number of bedrooms in the house, no of students in the class)
etc.
Continuous Variables: are variables having any value within a
specific range (express in decimal), such as; air pressure, height,
weight…etc.)
16
Applications, Uses and Limitations of Statistics
Applications of Statistics
Almost all human beings in their daily life are subjected to
statistics and obtaining numerical facts;
E.g.; fixing the price of the commodities; cost of raw material,
dd of products in the mkt, tax obligations, labor cost and sales
price should be analyzed.
Statistics has being applicable in some process; invention of
certain drugs, extent of environmental pollution… etc.
17
Statistical data analysis render valuable assistance in the proper
understanding of economic problems, and economic policy
formulation.
Economic problems almost always involve facts that are
capable of being expressed numerically.
Statistical methods are extensively used in all branches of
economics, some of are:
Time-series; used for studying the behavior of economic variables
in continuous time; like price, production, and consumption of
commodities.
18
Index Numbers (cross-sectional); useful in economic planning as
indicate the change over a specified period of time. e.g. the level of
coffee product in the last two years.
Demand Analysis; used to study the relationship between different
dependable economic variables; like price of the commodity and its
output.
Based on the existing data, statistical method has used to
forecasting economic variable, such as expecting price,
demand, supply…etc.), and
Economic problems (inflation, unemployment…etc.)
19
Uses of Statistics
Mainly, statistics has being useful to enlarge knowledge of
complex phenomena, some uses of statistics are:
It presents facts in a definite and precise form.
Data reduction by using sample survey.
Measuring the magnitude of variations in data.
Furnishes a technique of comparison.
Estimating unknown population characteristics.
Testing and formulating of hypothesis and make conclusions.
Studying the relationship between two or more variables.
Forecasting future events. 20
Limitations of Statistics
As a science, statistics has its own limitations; some are:
Deals with only quantitative information.
Deals with only aggregate of facts, not with individual
data items.
Statistical data are only in approximate, not
mathematically correct.
Statistics can be easily mis-used
21
Types of Data
Data can be classified basis on;
Scale of measurement
Variable type
Source of data
Data Based on Scale of Measurement: There are four scale of
measurements.
Nominal Scale: it measure categorical variables by categorizing
the possible values of the variables.
22
Nominal scale doesn't possess order (comparison), distance
(difference between the categories), and fixed zero (measure of
non-attribute variables, in which the numbers assigned to the
categories do not represent any real quantities).
E.g.; gender, religion affiliation, martial status, blood type,
etc. ...
Ordinal Scale: it measures the data by classified in to distinct
categories in ranks with respect to the values of the variable with
proper ordering without stating the difference among the categories,
and measuring non-attributes.
In ordinal scale, statement of comparison, such as “greater than” or “less
23
than” has being acceptable without being able to state how much greater or
Example;
Student’s grade (A, B, C, D,…etc.)
Class rank of students (1, 2, 3,…etc.)
Rating scale of performance (Excellent, Very good, Good,
Fair, Poor)
Consumer preference (more or less preferable)
24
Interval Scale: it measures the data by ordering and stating the
difference between the quantity measurements, but cannot measure
non-attributes characteristic.
Interval scale separate items by rank, and measures the difference
in rank positions.
But, it can't measure true zero; it does not have the capacity to
measure the complete absence of a trait (non-attributes).
E.g.; measure of temperature in degree Celsius or Fahrenheit
can not be zero.
25
Ratio Scale: it measure the data by posses order (rank), distance
(difference in variables), and absolute zero of the variable.
Ratio scale represents the actual amounts of measure of
variables, and involved significance comparison.
E.g.; weight, height, number of students, time, age, … etc.
In ratio scale, an absolute zero of length, weight, time…etc.
can be conceived.
The zero point on a cent-meter scale of ratio indicates the
complete absence of length or height.
But, it’s not obtainable, the absolute zero of temperature
26
Data Based on Types of Variables
Variables can be broadly classified into qualitative (categorical)
and quantitative (numerical) variables.
Qualitative-Variables: the variables in which their characteristic
being studied and measured in non-numeric terms.
But, they can take numerical value only by categories and
denoted (assigned) with number (coding).
E.g.; gender, religious affiliation, type of automobile owned,
place of birth, color, etc.
27
Quantitative-Variables: the variables that can be measured and
expressed numerically.
E.g.; balance in checking bank account, number of children in a
family, time taken to finish an exam, etc.
Quantitative variables can be either discrete or continuous.
Discrete Variables: are variables having certain values, usually
“having gaps" between the values (can’t express in decimal), such as
the number of bedrooms in the house, no of students in the class) etc.
Continuous Variables: are variables having any value within a
specific range (express in decimal), such as; air pressure, height,
weight…etc.) 28
Types of Data Based on Source
Based on their source, data can be classified as primary and
secondary data.
Primary Data: it can be measured or collected by the investigator
or the user directly from the source.
They are collected for the first time from individuals and the
incidence around the source, and they are original in character.
There are two activities involved in primary data collection.
Planning
Measuring/collecting
29
Planning
Identify source and elements of the data.
Decide whether or not to consider sample/census basis on
population size.
If sampling has preferred, decide on sample size, selection
method (sampling technique), … etc.
Decide measurement scale and procedure.
Set up the necessary organizational structure.
30
Measuring (collecting)
There are different sources of collecting primary data.
Focus Group
Mail Questionnaires
Door-to-Door Survey
Mall-Intercept (shopping place)
New Product Registration/ Innovation
Personal Interview
Experiments
31
Secondary-Data: they gathered/compiled from published and
unpublished sources.
Researchers (investigators) does not obtain secondary data himself
directly rather gathered indirectly from published and unpublished
materials which already collected by the individuals and/or
institution and they have been passed through the statistical
investigation processes.
Unpublished sources, such as; personal documents like diaries,
letters, photographs, …etc.
Published sources are; books, manuscripts, records, newspapers,
magazines, census reports…etc.
32
In use secondary data, investigators should have checked:
The type and objective of the situations.
The purpose for which the data are collected and compatible
(being relate) with the present problem.
The nature and classification of data appropriation to our
problem.
Unbiasedness and absence of mis-reporting in the published
data.
33