CHAPTER 1
INTRODUCTION
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
WHAT IS STATISTICS?
Definition
Data: Any observations that have been
collected
Statistics is a group of methods used to
collect, analyze, present, and interpret
data and to make decisions.
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Key Terms
Population: A population consists of all
elements – individuals, items, or objects – whose
characteristics are being studied. The population
that is being studied is also called the target
population.
Or
The entire category under consideration. Or the
complete set of elements being studied. The
population size is usually indicated by a capital N.
Examples: every lawyer in the United States;
all single women in the United States.
3
Key Terms
Sample. A portion of the population selected for study is referred
to as a sample.
or
That portion of the population that is available, or to be made
available, for analysis. A good sample is representative of the
population. We will learn about probability samples and how they
provide assurance that a sample is indeed representative. The
sample size is shown as lower case n.
If your company manufactures one million laptops, they might take a
sample of say, 500, of them to test quality. The population size is N =
1,000,000 and the sample size is n= 500.
Census: A survey that includes every member of the population is called
a census. The technique of collecting information from a portion of the
population is called a sample survey.
Figure 1.1 Population and Sample
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Key Terms
Introduction 6
TYPES OF STATISTICS
Descriptive Statistics consists of methods for organizing,
displaying, and describing data by using tables, graphs, and
summary measures. Those statistics that summarize a
sample of numerical data in terms of averages and other
measures for the purpose of description.
Descriptive statistics, as opposed to inferential statistics,
are not concerned with the theory and methodology for
drawing inferences that extend beyond the particular set
of data examined.
Thus, a teacher who gives a class, of say, 35 students,
an exam is interested in the descriptive statistics to
assess the performance of the class. What was the class
average, the median grade, the standard deviation,
etc.? The teacher is not interested in making any
inferences to some larger population.
TYPES OF STATISTICS
TYPES OF STATISTICS
Example of inferential statistics from quality control:
GE manufactures LED bulbs and wants to know how
many are defective. Suppose one million bulbs a year
are produced in its new plant in Staten Island. The
company might sample, say, 500 bulbs to estimate the
proportion of defectives.
N = 1,000,000 and n = 500
If 5 out of 500 bulbs tested are defective, the sample
proportion of defectives will be 1% (5/500). This statistic
may be used to estimate the true proportion of defective
bulbs (the population proportion).
In this case, the sample proportion is used to make
inferences about the population proportion.
9
POPULATION VERSUS SAMPLE
A sample that represents the characteristics of
the population as closely as possible is called a
representative sample.
A sample drawn in such a way that each
element of the population has a chance of being
selected is called a random sample. If all
samples of the same size selected from a
population have the same chance of being
selected, we call it simple random sampling.
Such a sample is called a simple random
sample.
Sample with replacement
Sample without replacement
BASIC TERMS
An element or member of a sample or
population is a specific subject or object (for
example, a person, firm, item, state, or country)
about which the information is collected.
A variable is a characteristic under study that
assumes different values for different elements. In
contrast to a variable, the value of a constant is
fixed.
The value of a variable for an element is called an
observation or measurement.
A data set is a collection of observations on one
or more variables.
Table 1.1 Charitable Givings of Six Retailers in
2007
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
TYPES OF VARIABLES
Quantitative Variables
Discrete Variables
Continuous Variables
Qualitative or Categorical Variables
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
TYPES OF VARIABLES
Qualitative variables A variable that
cannot assume a numerical value but can
be classified into two or more nonnumeric
categories.
result in categorical or non-numeric
responses. Also called Nominal, or
categorical data (variable)
Example: Sex MALE FEMALE
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
TYPES OF VARIABLES
Quantitative variables A variable that can
be measured numerically is called a
quantitative variable.
The data collected on a quantitative variable
are called quantitative data.
result in numerical responses, and may be
Discrete variables
Continuous variables
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Quantitative Variables
Discrete variables A variable whose values
are countable is called a discrete variable.
In other words, a discrete variable can
assume only certain values with no
intermediate values.
Example: How many courses have you
taken at this College? ____
Quantitative Variables
Continuous variables A variable that can assume
any numerical value over a certain interval or intervals
is called a continuous variable.
Arise from a measuring process.
Example: How much do you weigh? ____
One way to determine whether data is continuous, is
to ask yourself whether you can add several decimal
places to the answer.
For example, you may weigh 150 pounds but in
actuality may weigh 150.23568924567 pounds.
On the other hand, if you have 2 children, you do
not have 2.3217638 children.
Figure 1.2 Types of Variables
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Cross-Section Data
Definition
Data collected on different elements at
the same point in time or for the same
period of time are called cross-section
data.
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Table 1.2 Charitable Givings of Six Retailers in
2007
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Time-Series Data
Definition
Data collected on the same element for
the same variable at different points in
time or for different periods of time are
called time-series data.
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Table 1.3 Number of Movie Screens
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
SOURCES OF DATA
Data may be obtained from
Internal Sources
External Sources
Surveys and Experiments
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Primary vs. Secondary Data
Primary data. This is data that has been
compiled by the researcher using such techniques
as surveys, experiments, depth interviews,
observation, focus groups.
Types of surveys. A lot of data is obtained
using surveys. Each survey type has advantages
and disadvantages.
Mail: lowest rate of response; usually the lowest cost
Personally administered: can “probe”; most costly;
interviewer effects (the interviewer might influence the
response)
Telephone: fastest
Web: fast and inexpensive
Introduction 24
Primary vs. Secondary Data
Secondary data. This is data that has been
compiled or published elsewhere, e.g.,
census data.
The trick is to find data that is useful. The data was
probably collected for some purpose other than
helping to solve the researcher’s problem at hand.
Advantages: It can be gathered quickly and
inexpensively. It enables researchers to build on
past research.
Problems: Data may be outdated. Variation in
definition of terms. Different units of measurement.
May not be accurate (e.g., census undercount).
Introduction 25
SUMMATION NOTATION
A sample of prices of five literary books:
$75, $80, $35, $97, and $88
The variable price of a book: x
Price of the first book = x1 = $75
Price of the second book = x2 = $80
…
Adding the prices of all five books gives
75+80+35+97+88 = x1+x2+x3+x4+x5 = Σx
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Example 1-1
Annual salaries (in thousands of dollars)
of four workers are 75, 90, 125, and 61,
respectively. Find
(a) ∑x (b) (∑x)² (c) ∑x²
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Example 1-1: Solution
(a) ∑x = x1 + x2 + x3 + x4
= 75 + 90 + 125 + 61
= 351 = $351,000
(b) (∑x)² = (351)² = 123,201
(c) ∑x² = (75)² + (90)² + (125)² + (61)²
= 5,625 + 8,100 + 15,625 + 3,721
= 33,071
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Example 1-2
The following table lists four pairs of m and
f values:
Compute the following:
(a) Σm (b) Σf² (c) Σmf (d) Σm²f
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Example 1-2: Solution
Table 1.4
(a) (b) (c) (d)
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved