Chapter 1
Defining and Collecting
Data
Statistics
The science of collecting, analyzing, presenting
and interpreting data, to make decisions based
on such analysis.
In other words, the method that helps transform
data into useful information for decision makers.
Types of statistics:
Descriptive statistics: Inferential statistics:
consists of methods for organizing, consists of methods that use
displaying, and describing data by sample results to help make
using tables, graphs, and summary decisions or predictions about a
measures. population.
methods of organizing,
summarizing, and presenting data
in an informative way.
In statistics, data is collected from either a population
or a sample.
The difference between a population
and a sample
A population is a collection of all possible
individuals, objects, or items about which you want to
draw a conclusion.The population is the “large group”
A sample is a part, of the population selected for
analysis. The sample is the “small group”
A Parameter is a descriptive measure of population
A Statistic is a descriptive measure of sample
Population Sample
All individuals individuals selected
from the population
Selecting a sample is less time consuming, less complex and
less costly than selecting every item in the population. Also, the
sample results can be used to derive population results.
What is a variable?
A variable is any characteristics, number, or
quantity that can be measured or counted.
Age, sex, business income and expenses,
capital expenditure, class grades, hair colour
and car type are examples of variables.
It is called a variable because the value may
vary between data units in a population, and
may change in value over time.
Types of Variables
Categorical (qualitative) variables have values that
can only be placed into categories, such as “yes” and
“no.”
these variables are qualitative variables and tend to
be represented by a non-numeric value
Numerical (quantitative) variables have values
that represent a counted or measured quantity.
They can be classified as discrete or continuous.
Discrete variables arise from a counting process
Continuous variables arise from a measuring process
Summary of Types of Variables
Variables
Qualitative Quantitative
Examples:
Gender
Type of car Discrete Continuous
Brand of PC
(Defined categories) Examples: Examples:
Number of Children Weight of student
Number of cars sold Amount of income
(Counted items) (Measured characteristics)
Measurement Scales for Variables
Variables can also be identified by the level of
measurement, or measurement scales.
Why know the level of measurement of data?
The level of measurement of the data determine the
calculations that can be done to summarize and present
the data.
To determine the statistical tests that should be
performed on the data
Measurement Scales for Variables
Nominal and Ordinal Scales
Values for a qualitative variable are measured on a
nominal or on an ordinal scale.
Nominal scale data is classified into distinct
categories in which no ranking is implied. e.g. (eye
color, gender, car type, name of school)
Ordinal scale data is classified into distinct
categories in which ranking is implied. e.g. (measure
consumers' satisfaction)
Nominal and Ordinal Scales (con’t.)
An ordinal scale example:
Categorical Variable Ordered Categories
Attitude Strongly agree, agree, disagree,
strongly disagree
Clothing size Small, medium, large, extra large
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades A, B, C, D, F
Interval and Ratio Scales
An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true
zero point (such as Temperature on Celsius scale).
A ratio scale is an ordered scale in which the
difference between the measurements is a
meaningful quantity and the measurements have a
true zero point (such as height in centimeters, age
in years or days, salary in pounds).
Questions
Choose the correct answer:
1) In statistics, a population consists of:
A) All people living in a country
B) All people living in the area under study
C) All subjects or objects whose characteristics are being
studied
D)A selection of a limited number of elements
C) All subjects or objects whose characteristics are
being studied
2) A data set is a:
A) Set of decisions made about the population
B) Set of graphs and pictures
C) Collection of observations on one or more variables
D) Score collected from an element of the population
C) Collection of observations on one or more variables
3) Under inferential statistics, we study
A) The methods to make decisions about one or more
populations based on sample results
B) How to make decisions about a mean, median, or mode
C) How a sample is taken from a population
D) Tables composed of summary measures
A) The methods to make decisions about one or more
4) A Wall Street Journal poll asked 2,150 persons in
the U.S. a series of questions to find out their view
on the U.S. economy, the 2,150 persons make up:
A) The population B) The sample
C) The primary data source D) The secondary data
source
B) The sample
5) the possible responses to the question "What is
your annual income rounded to the nearest
thousands?" result in
A) a nominal scale variable B) an ordinal scale variable
C) an interval scale variable D) a ratio scale variable
D) a ratio scale variable
6) Which of the following is a continuous numerical
variable?
A) The color of a student’s eyes
B) The number of employees of an insurance company
C) The amount of milk in a 2-liter carton
D) The number of gallons of milk sold at the local grocery
store yesterday
C) The amount of milk in a 2-liter carton
7) The possible responses to the question "How would you
rate the quality of your purchase experience with 1 =
excellent, 2 = good, 3 = decent, 4 = poor, 5 = terrible?"
result in
A) a nominal scale variable B) an ordinal scale variable
C) an interval scale variable D) a ratio scale variable
8) The possible responses to the question "Out of a
100 point score with 100 being the highest and 0
being the lowest, what is your satisfaction level with
the Blu-ray player that you purchased?" result in
A) a nominal scale variable B) an ordinal scale variable
C) an interval scale variable D) a ratio scale variable
C) an interval scale variable
9) A home delivery restaurant has segmented its
delivery in north, south, east and west zones, the
four zones are an example for:
A) Qualitative and ordinal B) Qualitative and nominal
C) Numerical and ratio scale D) Numerical and interval scale
B) Qualitative and nominal
10) The estimation of population average student
expenditure on education based on the sample
average expenditure of 1000 students, Is an example
of:
A. Inferential statistics. B. Descriptive statistics.
C. A parameter D. A statistic
A. Inferential statistics
11) The possible responses to the question "In which
year were you born?" are values from a:
A. Discrete variable B. Continuous variable.
C. Qualitative variable D. Non of all
A. Discrete variable
Establishing A Business Objective
Focuses Data Collection
Examples Of Business Objectives:
A marketing research analyst needs to assess the
effectiveness of a new television advertisement.
A pharmaceutical manufacturer needs to determine
whether a new drug is more effective than those
currently in use.
An operations manager wants to monitor a
manufacturing process to find out whether the quality of
the product being manufactured is conforming to
company standards.
An auditor wants to review the financial transactions of
a company in order to determine whether the company is
in compliance with generally accepted accounting
principles.
Collecting Data
After defining the variables, we can proceed
with the data collection task.
Collecting data is a critical task because if we
collect data that are flawed by biases or other
types of errors, the results will be incorrect.
Data collection consists of :
identifying data sources
deciding whether the data were collected
from a population or a sample
cleaning your data.
a) Sources of Data
Primary data source ⇒ if you collect your own
data for analysis. You are using a primary data
source, such as:
data collected from survey
data collected from an experiment.
observed data
Secondary data source ⇒ if the person
performing data analysis is not the data collector
is called the secondary data sources, such as:
analyzing census data
examining data from print journals or data
published on the internet.
Sources of data fall into the
following categories
Published sources
Experimentation
Surveying
Observation
Examples Of Published sources
Financial data on a company provided by
investment services.
Industry or market data from market research
firms and trade associations.
Stock prices, weather conditions, and sports
statistics in daily newspapers.
Examples of Data From A
Designed Experiment
Consumer testing of different versions of a
product to help determine which product should
be produced further.
Material testing to determine which supplier’s
material should be used in a product.
Market testing on alternative product
promotions to determine which promotion to
use more broadly.
Examples of Survey Data
A survey asking people which laundry has the
best stain-removing abilities
People being surveyed to determine their
satisfaction with a recent product or service
experience.
Examples of Data Collected
From Observation
Measuring the time it takes for customers to be
served in a fast food establishment.
Measuring the volume of traffic through an
intersection to determine if some form of
advertising at the intersection is justified.
b) Data Is Collected From A Population
or A Sample:
It has been covered above.
c) Data Cleaning
Often find “irregularities” in the data
Typographical or data entry errors
Values that are impossible or undefined
Missing values
Outliers
Both Excel & Minitab can be used to
address these irregularities
Cross-section versus Time-
series Data
Cross-section data ⇒ Data are collected on different
elements at the same point in time or for the same
period of time (ex. net income of 10 companies in
2024).
Time-series data ⇒ Data are collected on the same
element for the same variable at different points in
time or for different periods of time (ex. annual net
income of company x from 2014 to 2024).
Questions
Choose the correct answer:
1) Sam was working on a project to look at global
warming and accessed an Internet site where he got
average global surface temperatures from 1866.
Which of the four methods of data collection was he
using?
A. Published sources B. Experimentation.
C. Surveying D. Observation
A. Published sources
2) The British Airways Internet site provides a
questionnaire instrument that can be answered
online. Which of the 4 methods of data collection is
involved when people complete the questionnaire?
A. Published sources B. Experimentation.
C. Surveying D. Observation
C. Surveying
3) Cross-selection data are collected:
A) on the same variable for the same variable at different
points on time.
B) on different elements at the same point of time
C) for a qualitative variable
D) on different elements for the same variable for different
periods of time
B) on different elements at the same point of time
4) Time series data are collected:
A) on the same element for the same variable at different
points in time.
B) on a variable that involves time e.g. minutes, hours,
weeks
C) for a qualitative variable
D) on different elements for the same period of time
A) on the same element for the same variable at different
points in time.