0% found this document useful (0 votes)
35 views30 pages

Stat 166 Part 1

The document outlines a course on Probability and Statistics, detailing topics such as data collection methods, descriptive and inferential statistics, and various types of variables and measurement scales. It emphasizes the importance of understanding key statistical concepts like population, sample, parameter, and statistic, as well as the distinction between primary and secondary data sources. Additionally, it highlights the significance of proper data collection methods to ensure valid statistical conclusions.

Uploaded by

ahweneehere
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views30 pages

Stat 166 Part 1

The document outlines a course on Probability and Statistics, detailing topics such as data collection methods, descriptive and inferential statistics, and various types of variables and measurement scales. It emphasizes the importance of understanding key statistical concepts like population, sample, parameter, and statistic, as well as the distinction between primary and secondary data sources. Additionally, it highlights the significance of proper data collection methods to ensure valid statistical conclusions.

Uploaded by

ahweneehere
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

PROBABILITY AND STATISTICS:

STAT 166

By
Jonathan Kwaku Afriyie

Department of Statistics and Actuarial Science


KNUST

archimedes09.jak@gmail.com
jonathan.afriyie@knust.edu.gh

May 29, 2025


1 / 30
COURSE OUTLINE
Introduction: Nature and Uses of Statistics; Some Basic Con-
cepts of Statistics.
Data Collection: Types and Sources of Data, Data Collection
Methods.
Descriptive Analysis of Data: Organisation and Presentation
of Data; Measures of Central Tendency and Dispersion. Quar-
tiles, Percentiles, Skewness and Kurtosis
Elementary Probability Theory: Random Experiments, Def-
initions of Terms and Measure of Probability. Some Basic Laws of
Probability involving Compound Events. Computation of Prob-
abilities involving Simple Events, Application of Counting Tech-
niques and to Decision Problems.
Random Variables and Probability Distributions: Con-
cepts of Random Variables; Definition and Properties of Prob-
ability Distribution; Expectation, Median and Variance of Ran-
dom Variables, Application to Decision Problems.
Special Probability Distributions: Binomial Distribution,
Geometric Distribution, Poisson Distribution & Normal Distri- 2 / 30
STATISTICS

What is Statistics?
Statistics is the science concerned with developing and studying
methods for collecting, organizing, analyzing, interpreting and
presenting empirical data.
Statistics is the science of learning from data.

3 / 30
Notable Packages for Satistical Analysis

4 / 30
Types of Statistics
1 Descriptive statistics

Organizing, summarizing and describing the data


Uses numerical and graphical summaries to characterize
sample data
2 Inferential statistics
Uses sample data to make conclusions about a broader
range of individuals (population) than just those who are
observed (sample)

5 / 30
BASIC TERMS

IVPPSS
It is important that we understand the difference between
population and sample , parameter and statistic ,
before we can understand and appreciate the process of
making statistical inferences.
Before identifying these items we must also identify the
individual and variable(s) of interest.
These six items must be explicitly identified at the
beginning of any statistical analysis for that analysis to be
conducted properly.

6 / 30
DEFINITION OF IVPPSS

Individual
An individual is one of the items examined by the researcher.
An individual is not necessarily a person.

Variable
A variable is the characteristic of interest about each
individual.

Population
A population is the collection of all individuals of interest.

Parameter
A parameter is a summary of all individuals in the
population. It is a number computed from the population.

7 / 30
DEFINITION OF IVPPSS

Sample
A sample is a subset of the population examined by the
researcher.

Statistic
A statistic is a summary of the sample. It is a number
computed from the sample.

8 / 30
PERFORMING IVPPSS
Steps for statistical analysis
First , we determine what item we are actually going to
look at; those are your individuals.
Second , what are we going to record when we look at an
individual, that is the variable.
Third , the population is simply ALL of the individuals.
Fourth , the parameter is the summary(e.g., mean or
proportion) of the variable recorded from all the
individuals in the population.
Fifth , we realize that we cannot see all the individuals in
the population so we examined a few (those few are the
sample).
Finally , the summary of the individuals of the sample is
the statistic. The statistic has to be the same summary of
the sample as the parameter was of the population.
9 / 30
EXAMPLE

Example
My dad owns 60 acres of timber (mostly Oak, Walnut and
Poplar) in Iowa. He wants to measure the
mean-diameter-breast-height (DBH) of the oaktrees on his
property. He measures the DBH of 75 randomly selected
oaktrees. Use this information to perform an IVPPSS.

10 / 30
SOLUTION

Solution
Individual = an oak tree
Variable = Diameter-breast-height (DBH)
Population = All oak trees on Dad’s property
Parameter = mean DBH of all oak trees on Dad’s property.
Sample = 75 oak trees Dad measured
Statistic = mean DBH of the 75 oak trees that Dad
measured.

11 / 30
VARIABLES

Definition
A variable is the characteristic about each individual. The
variable is the information that the researcher records
about each individual. Note that in most “real life” studies
the researcher will be interested in more than one variable.
Studies with one variable are called univariate studies,
studies with two variables are bivariate studies, and
studies with more than two variables are called
multivariate studies.

12 / 30
TYPES OF VARIABLES
There are two main groups of variable types - quantitative and
qualitative variables.
QUANTITATIVE VARIABLE
Quantitative variables are variables with numerical values for
which it makes sense to do arithmetic operations (like adding or
averaging).
Example: Volume, height, prices, Number of students in the
class etc.

QUALITATIVE VARIABLE
Qualitative variables are variables that record to which group or
category an individual belongs. They assume non-numeric
values. Synonyms for qualitative are categorical or attribute.
Example: Religion, Regions in Ghana, Gender etc

Within each main type of variable are two subgroups.


13 / 30
QUANTITATIVE VARIABLES

TYPES OF QUANTITATIVE VARIABLES


The two types of quantitative variables are continuous and
discrete variables.
1 Continuous variables are quantitative variables that
have uncountable number of values. In other words, a
potential value DOES exist between every pair of values of
a continuous variable. eg. height, temperature, weight.
2 Discrete variables are quantitative variables that have
countable number of values. Stated differently, a potential
value DOES NOT exist between every pair of values of a
discrete variable. Typically, but not always, discrete
variables are counts of numbers. eg. Number of children in
a family, number of heads or tails.

14 / 30
QUALITATIVE VARIABLES

TYPES OF QUALITATIVE VARIABLES


The two types of qualitative variables are ordinal and
nominal.
1 Ordinal variables are qualitative variables where a
natural order or ranking exists among the categories.
2 Nominal variables are qualitative variables where NO
order or ranking exists among the categories.

15 / 30
ORDINAL AND NORMINAL EXPLAINED

Ordinal and Nominal


Ordinal and nominal variables are easily distinguished by
determining if the order of the categories matters. For example,
suppose that a researcher recorded a subjective measure of
condition (i.e., poor, average, excellent)and the species of each
duck. Order matters with th condition variable -i.e., the
condition improves from the first (poor) to the last
category(excellent) - and some re-orderings of the categories
would not make sense, i.e., average, poor, excellent does not
make sense. Thus, condition is an ordinal variable. In contrast,
species (eg, mallard, redhead, canvasback, and wood duck) is
nominal because there is no inherent order among the categories
(i.e., any reordering of the categories also “makes sense”)

16 / 30
DATA VRS VARIABLE
The list of observed values for a variable is data.
Example, gender is a variable; the observations male or
female are data.
Qualitative data are observations corresponding to a
qualitative variable.
Quantitative data are observations corresponding to a
quantitative variable.
Discrete data are observations corresponding to a discrete
variable
Continuous data are observations corresponding to a
continuous variable

17 / 30
MEASUREMENT

What is measurement?
Measurement is the application of mathematics to things or
events.
A system of measurement is a crucial component of research.
Simple example: How tall is Jane? More complex example:
How shy is Jane?

18 / 30
Scales of measurement
Nominal Scale
Data that represent categories or names or labels. There is
no implied order to the categories of nominal data.
Observations are classified into mutually exclusive
categories

Examples:
Colour of soil sample: brown, black, yellow
Gender: male or female
Marital status: single, married, divorced
Employment status: employed, unemployed
Religious affiliation
Eye color: blue, brown, green
Sometimes numbers are used to designate category membership.
Here, the numbers do not have numeric implications; they are
simply convenient labels.
Example: Country of Origin
Ghana= 1 Cameroon= 2 Nigeria= 3 Other= 4 19 / 30
Scales of Measurement

Ordinal Scale:
This scale has a logical ordering or ranking of the
categories.
However the precise differences between the ranks does not
exist.

Severity of injury: fatal, serious, minor, no injury


Taste of food: Good, moderate, bad
Academic performance: Excellent, very good, poor
Socio-economic status: High, middle, low

20 / 30
Scales of measurement

Interval Scale
In this scale, data can be ranked and the precise differences
between the ranks does exist; however, there is no meaningful
zero.
Example:
Temperature: the difference between temperature of 100
and 500 indicates that one is warmer than the other.
However, temperature of zero does not means that there is
no temperature.
IQ of a person: the difference between IQ of 50 and 110
indicates that one is more intelligent than the other.
However, IQ of zero does not means that the person has no
intelligence.

21 / 30
Scale of Measurement

Ratio Scale
The ratio scale of measurement possesses all the characteristics
of interval scale and there is meaningful zero.

Examples
Distance between two locations
Weight of soil sample
Area of a landfill
Volume of a landfill

22 / 30
MEASUREMENT

Summary of Measurement Scales


Measurement scales differ by order, equal intervals between
adjacent units and absolute zero point.
Nominal: None
Ordinal: Order
Interval: Order + Equal intervals
Ratio: Order + Equal intervals + True zero
Nominal or ordinal scaled data – Use Bar Charts (simple,
multiple, compound, etc ) or Pie Charts
Interval or ratio scaled data – Use Histogram, polygon,
ogive, etc
Scatter plot to assess association between quantitative
variables.
Note: No inference drawn at this point.

23 / 30
Summary

24 / 30
Data Collection

Data Collection
Data can be collected in a variety of ways. It is the most
important part of statistical procedure because valid
conclusions can only result from data which has been
collected properly.
If proper procedure is used to collect data then the issue of
representativeness can be guaranteed. The use of wrong or
faulty data collection methods would result in wrong
conclusions because no good statistical tool can produce
good results from wrongly collected data.
There are two main types of data:
Primary data
Secondary data

25 / 30
Secondary Source of Data

Secondary Data
This refers to data that was collected by someone other than
the user. That is, data collected for other research purposes.

Sources of secondary data include


Government Records
Census data
population statistics
health records
educational institutes records
Private Organisations/Companies
Published materials (i.e. Books, Journals, websites)

26 / 30
Secondary Source of Data

Advantages of using secondary data


It saves time
It saves money
It may be very accurate

Limitations of using secondary data


It may be incomplete
It may not be exactly what you need
It may not be consistent/reliable
It may be outdated

27 / 30
Primary Source of Data

Primary data
This refers to data that was collected by the user. That is, data
collected for the first time by the researcher for a defined
purpose.

Sources of primary data


There are several methods of collecting primary data, and
sometimes you have to think about creating a particular kind to
suit your research.
Three of the most popular methods are:
Experiment
Observation
Surveys

28 / 30
Primary Source of Data

Advantages of using primary data


It gives original research quality and does not carry bias or
opinion of third parties
Ability to change the content or the course of study when
ever needed
What is needed is what is obtained (if it is well designed)

Limitations of using primary data


Difficulty in designing suitable approach
Cost involving
Time consuming

29 / 30
THANK YOU.

30 / 30

You might also like