PROBABILITY AND STATISTICS:
STAT 166
                        By
            Jonathan Kwaku Afriyie
   Department of Statistics and Actuarial Science
                     KNUST
           archimedes09.jak@gmail.com
          jonathan.afriyie@knust.edu.gh
                  May 29, 2025
                                                    1 / 30
COURSE OUTLINE
 Introduction: Nature and Uses of Statistics; Some Basic Con-
 cepts of Statistics.
 Data Collection: Types and Sources of Data, Data Collection
 Methods.
 Descriptive Analysis of Data: Organisation and Presentation
 of Data; Measures of Central Tendency and Dispersion. Quar-
 tiles, Percentiles, Skewness and Kurtosis
 Elementary Probability Theory: Random Experiments, Def-
 initions of Terms and Measure of Probability. Some Basic Laws of
 Probability involving Compound Events. Computation of Prob-
 abilities involving Simple Events, Application of Counting Tech-
 niques and to Decision Problems.
 Random Variables and Probability Distributions: Con-
 cepts of Random Variables; Definition and Properties of Prob-
 ability Distribution; Expectation, Median and Variance of Ran-
 dom Variables, Application to Decision Problems.
  Special Probability Distributions: Binomial Distribution,
 Geometric Distribution, Poisson Distribution & Normal Distri-      2 / 30
STATISTICS
 What is Statistics?
 Statistics is the science concerned with developing and studying
 methods for collecting, organizing, analyzing, interpreting and
 presenting empirical data.
 Statistics is the science of learning from data.
                                                                    3 / 30
Notable Packages for Satistical Analysis
                                           4 / 30
Types of Statistics
 1 Descriptive statistics
         Organizing, summarizing and describing the data
         Uses numerical and graphical summaries to characterize
         sample data
 2   Inferential statistics
         Uses sample data to make conclusions about a broader
         range of individuals (population) than just those who are
         observed (sample)
                                                                     5 / 30
BASIC TERMS
 IVPPSS
    It is important that we understand the difference between
    population and sample , parameter and statistic ,
    before we can understand and appreciate the process of
    making statistical inferences.
     Before identifying these items we must also identify the
     individual and variable(s) of interest.
     These six items must be explicitly identified at the
     beginning of any statistical analysis for that analysis to be
     conducted properly.
                                                                     6 / 30
DEFINITION OF IVPPSS
 Individual
 An individual is one of the items examined by the researcher.
 An individual is not necessarily a person.
 Variable
 A variable is the characteristic of interest about each
 individual.
 Population
 A population is the collection of all individuals of interest.
 Parameter
 A parameter is a summary of all individuals in the
 population. It is a number computed from the population.
                                                                  7 / 30
DEFINITION OF IVPPSS
 Sample
 A sample is a subset of the population examined by the
 researcher.
 Statistic
 A statistic is a summary of the sample. It is a number
 computed from the sample.
                                                          8 / 30
PERFORMING IVPPSS
 Steps for statistical analysis
     First , we determine what item we are actually going to
     look at; those are your individuals.
     Second , what are we going to record when we look at an
     individual, that is the variable.
     Third , the population is simply ALL of the individuals.
     Fourth , the parameter is the summary(e.g., mean or
     proportion) of the variable recorded from all the
     individuals in the population.
     Fifth , we realize that we cannot see all the individuals in
     the population so we examined a few (those few are the
     sample).
     Finally , the summary of the individuals of the sample is
     the statistic. The statistic has to be the same summary of
     the sample as the parameter was of the population.
                                                                    9 / 30
EXAMPLE
 Example
 My dad owns 60 acres of timber (mostly Oak, Walnut and
 Poplar) in Iowa. He wants to measure the
 mean-diameter-breast-height (DBH) of the oaktrees on his
 property. He measures the DBH of 75 randomly selected
 oaktrees. Use this information to perform an IVPPSS.
                                                            10 / 30
SOLUTION
 Solution
     Individual = an oak tree
     Variable = Diameter-breast-height (DBH)
     Population = All oak trees on Dad’s property
     Parameter = mean DBH of all oak trees on Dad’s property.
     Sample = 75 oak trees Dad measured
     Statistic = mean DBH of the 75 oak trees that Dad
     measured.
                                                                11 / 30
VARIABLES
 Definition
     A variable is the characteristic about each individual. The
     variable is the information that the researcher records
     about each individual. Note that in most “real life” studies
     the researcher will be interested in more than one variable.
     Studies with one variable are called univariate studies,
     studies with two variables are bivariate studies, and
     studies with more than two variables are called
     multivariate studies.
                                                                    12 / 30
TYPES OF VARIABLES
 There are two main groups of variable types - quantitative and
 qualitative variables.
 QUANTITATIVE VARIABLE
 Quantitative variables are variables with numerical values for
 which it makes sense to do arithmetic operations (like adding or
 averaging).
 Example: Volume, height, prices, Number of students in the
 class etc.
 QUALITATIVE VARIABLE
 Qualitative variables are variables that record to which group or
 category an individual belongs. They assume non-numeric
 values. Synonyms for qualitative are categorical or attribute.
 Example: Religion, Regions in Ghana, Gender etc
 Within each main type of variable are two subgroups.
                                                                     13 / 30
QUANTITATIVE VARIABLES
 TYPES OF QUANTITATIVE VARIABLES
 The two types of quantitative variables are continuous and
 discrete variables.
   1 Continuous variables are quantitative variables that
     have uncountable number of values. In other words, a
     potential value DOES exist between every pair of values of
     a continuous variable. eg. height, temperature, weight.
   2 Discrete variables are quantitative variables that have
     countable number of values. Stated differently, a potential
     value DOES NOT exist between every pair of values of a
     discrete variable. Typically, but not always, discrete
     variables are counts of numbers. eg. Number of children in
     a family, number of heads or tails.
                                                                   14 / 30
QUALITATIVE VARIABLES
 TYPES OF QUALITATIVE VARIABLES
 The two types of qualitative variables are ordinal and
 nominal.
   1 Ordinal variables are qualitative variables where a
     natural order or ranking exists among the categories.
   2 Nominal variables are qualitative variables where NO
     order or ranking exists among the categories.
                                                             15 / 30
ORDINAL AND NORMINAL EXPLAINED
 Ordinal and Nominal
 Ordinal and nominal variables are easily distinguished by
 determining if the order of the categories matters. For example,
 suppose that a researcher recorded a subjective measure of
 condition (i.e., poor, average, excellent)and the species of each
 duck. Order matters with th condition variable -i.e., the
 condition improves from the first (poor) to the last
 category(excellent) - and some re-orderings of the categories
 would not make sense, i.e., average, poor, excellent does not
 make sense. Thus, condition is an ordinal variable. In contrast,
 species (eg, mallard, redhead, canvasback, and wood duck) is
 nominal because there is no inherent order among the categories
 (i.e., any reordering of the categories also “makes sense”)
                                                                     16 / 30
DATA VRS VARIABLE
   The list of observed values for a variable is data.
   Example, gender is a variable; the observations male or
   female are data.
    Qualitative data are observations corresponding to a
    qualitative variable.
    Quantitative data are observations corresponding to a
    quantitative variable.
    Discrete data are observations corresponding to a discrete
    variable
    Continuous data are observations corresponding to a
    continuous variable
                                                                 17 / 30
MEASUREMENT
 What is measurement?
 Measurement is the application of mathematics to things or
 events.
 A system of measurement is a crucial component of research.
 Simple example: How tall is Jane? More complex example:
 How shy is Jane?
                                                               18 / 30
Scales of measurement
  Nominal Scale
     Data that represent categories or names or labels. There is
     no implied order to the categories of nominal data.
      Observations are classified into mutually exclusive
      categories
  Examples:
      Colour of soil sample: brown, black, yellow
      Gender: male or female
      Marital status: single, married, divorced
      Employment status: employed, unemployed
      Religious affiliation
      Eye color: blue, brown, green
  Sometimes numbers are used to designate category membership.
  Here, the numbers do not have numeric implications; they are
  simply convenient labels.
  Example: Country of Origin
  Ghana= 1 Cameroon= 2 Nigeria= 3 Other= 4                     19 / 30
Scales of Measurement
  Ordinal Scale:
      This scale has a logical ordering or ranking of the
      categories.
      However the precise differences between the ranks does not
      exist.
      Severity of injury: fatal, serious, minor, no injury
      Taste of food: Good, moderate, bad
      Academic performance: Excellent, very good, poor
      Socio-economic status: High, middle, low
                                                                   20 / 30
Scales of measurement
  Interval Scale
  In this scale, data can be ranked and the precise differences
  between the ranks does exist; however, there is no meaningful
  zero.
  Example:
      Temperature: the difference between temperature of 100
      and 500 indicates that one is warmer than the other.
      However, temperature of zero does not means that there is
      no temperature.
      IQ of a person: the difference between IQ of 50 and 110
      indicates that one is more intelligent than the other.
      However, IQ of zero does not means that the person has no
      intelligence.
                                                                  21 / 30
Scale of Measurement
  Ratio Scale
  The ratio scale of measurement possesses all the characteristics
  of interval scale and there is meaningful zero.
  Examples
      Distance between two locations
      Weight of soil sample
      Area of a landfill
      Volume of a landfill
                                                                     22 / 30
MEASUREMENT
 Summary of Measurement Scales
 Measurement scales differ by order, equal intervals between
 adjacent units and absolute zero point.
     Nominal: None
     Ordinal: Order
     Interval: Order + Equal intervals
     Ratio: Order + Equal intervals + True zero
     Nominal or ordinal scaled data – Use Bar Charts (simple,
     multiple, compound, etc ) or Pie Charts
     Interval or ratio scaled data – Use Histogram, polygon,
     ogive, etc
     Scatter plot to assess association between quantitative
     variables.
     Note: No inference drawn at this point.
                                                                23 / 30
Summary
          24 / 30
Data Collection
  Data Collection
      Data can be collected in a variety of ways. It is the most
      important part of statistical procedure because valid
      conclusions can only result from data which has been
      collected properly.
      If proper procedure is used to collect data then the issue of
      representativeness can be guaranteed. The use of wrong or
      faulty data collection methods would result in wrong
      conclusions because no good statistical tool can produce
      good results from wrongly collected data.
      There are two main types of data:
           Primary data
           Secondary data
                                                                      25 / 30
Secondary Source of Data
  Secondary Data
  This refers to data that was collected by someone other than
  the user. That is, data collected for other research purposes.
  Sources of secondary data include
      Government Records
           Census data
           population statistics
           health records
           educational institutes records
      Private Organisations/Companies
      Published materials (i.e. Books, Journals, websites)
                                                                   26 / 30
Secondary Source of Data
  Advantages of using secondary data
     It saves time
      It saves money
      It may be very accurate
  Limitations of using secondary data
      It may be incomplete
      It may not be exactly what you need
      It may not be consistent/reliable
      It may be outdated
                                            27 / 30
Primary Source of Data
  Primary data
  This refers to data that was collected by the user. That is, data
  collected for the first time by the researcher for a defined
  purpose.
  Sources of primary data
  There are several methods of collecting primary data, and
  sometimes you have to think about creating a particular kind to
  suit your research.
  Three of the most popular methods are:
      Experiment
      Observation
      Surveys
                                                                      28 / 30
Primary Source of Data
  Advantages of using primary data
     It gives original research quality and does not carry bias or
     opinion of third parties
      Ability to change the content or the course of study when
      ever needed
      What is needed is what is obtained (if it is well designed)
  Limitations of using primary data
      Difficulty in designing suitable approach
      Cost involving
      Time consuming
                                                                     29 / 30
THANK YOU.
             30 / 30