Module 1: Descriptive Statistics
Lecture 1: Introduction and Descriptive Statistics
Data Types
Qualitative/categorical
● Mutually exclusive labels (one label cannot mean two things)
● Not often numbers, if so, numbers have no mathematical meaning
- Nominal: ordering/ranking makes no sense, numerical labels are arbitrary
- Ordinal: ordering/ranking has meaning/can be interpreted, numerical labels
respect the ordering
Quantitative/numerical
● Numbers used to record certain events, numbers have mathematical meaning
- Interval: quantity in difference is meaningful, but in ratio is not; zero has no
natural meaning
- Ratio : difference and ratio of two quantities is
also meaningful; zero is meaningful
Using categorical/qualitative data
Frequency distribution
● Frequency: the total number of occurrences for each
category
➗
● Relative frequency: the fraction of total number of
items belonging to category (eg. 102 808 = 0.1262)
● Percent frequency: relative frequency x 100%
Histograms
● Categories on x-axis
● Frequency, relative frequency, percent frequency on y-axis
Using numerical/quantitative data
Frequency distributions and histograms
● Categories on x-axis are grouped (eg. 0-5, 5-10, 10-15)
● Density frequency
Probability theory
● Random variable (r.v.) - a variable’s value appears randomly
● population - the complete pool of a certain random variable
● Sample - a random collection of certain size from the population
Probability distribution
● Probability distribution - the general shape of probability for
values that a random variable may take
Notation
● Random variable denoted by X, Y (capital letters)
- Eg. X: number of children in household
- Eg. Y: amount of time spent by husband on
housework per day
● realisations/observations of a random variable denoted by xᵢ, yᵢ (lowercase letters
with subscript)
- Eg. x₁: number of children in household is 1
- Eg. y₁₃₇:amount of time spent by husband is 137 on housework per day
● N and n denote the size or number of observations.
- N is referred to population size
- n denotes the sample size
Descriptive Statistics
Central tendency
● Measure of central tendency yields info about the centre of a set of numbers
(distribution of a r.v.’s) – does not focus on the span of the dataset or how far values
are from middle numbers
● gives an idea of what a typical, middle, or average that a r.v. can take
● sometimes called measures of location
three measures of central tendency
Mode ● most frequently occurring value in a set of data
● If there are 2 modes, the 2 modes are listed and the data is said to be bimodal
● Datasets with 3 or more modes are referred to as multimodal
● Concept of mode is often used in determining sizes
● Appropriate descriptive summary measure for categorical data
Median ● middle value in an ordered array of numbers
𝑛+1
● locate the median by finding the 2 th term in the ordered array
● Large and small values do not inordinately influence the median – hence the
● best measure of location to use in the analysis of variables in which extreme but
acceptable values can occur at just one end of the data
● Not all info from the dataset is used
● Data must be quantitative or be able to be ranked
Mean ● Average of a set of numbers
● Sample mean is represented by X̄
● Population mean is represented by µ
● Data should be quantitative as it needs to be summed
● Affected by all values – advantage because it reflects all the data, but
disadvantage because extreme values pull the mean towards extremes
● To calculate the mean forecast value, we need to multiply each possible value by
its probability and sum up the products.
- If we denote the r.v. by X:
Variability
● Measures of variability yield info about the likelihood of a realisation of the r.v. is
away from the centre of its distribution, describes the spread/dispersion of a dataset
● Gives an idea of fluctuation and volatility across realisations of the r.v.
● The more variability in a dataset, the less typical they are of the whole set
● Using measures of variability in conjunction with measures of central tendency
makes possible a more complete numerical description of the data (measure of
variability is necessary to complement the mean value when describing data)
● Conveys fluctuations and volatility across realisation of random variable
● The more spread out the r.v. is, the larger the risk/dispersion the variability is
● Also called measures of scale, spread, dispersion or risk
● Measures of variability
- Variance (Var) - average of squared distance from the mean
- Standard deviation (std): square root of variance
- Coefficient of variation - standard deviation/ mean x100%
Variability formulas
Variance
● It computes the average squared distance between data points and their mean,
depending on sample or population
● Population variance
- Finite population
- Denoted by σ² (stigma square) or
Var(X)/Variance of X
● Sample variance
- Denoted by s²
Standard deviation
● Standard deviation solves the problem of squared units. It has the same unit of the
original data
● Population standard deviation
- Denoted by σ (stigma) or std(X)
● Sample standard deviation
- Denoted by s
Coefficient of variation
● Measures standard deviation per unit of
mean
● In finance when the r.v. X denotes assets returns, CV measures risk per unit of
expected return
● It is unit free, because both the numerator and denominator have the same unit as
the original data and they cancel each other
● Population CV
- when σ increase, CV increase
- when µ increase, CV decreases
- Ratio between risk and expected return
Skewness
Shape
● Central tendency and variability are useful to describe and summarise data or the
distribution of r.v.’s
● Skewness - a measure of asymmetry
● Mode: value on the horizontal axis where the high point of the curve occurs
● Mean: towards the tail of the distribution (drawn towards the extreme values)
● Median: generally located somewhere between the mode and the mean
Lecture 2: Probability theory
● Multi-dimensional data
● Experiment: a random process that creates outcomes (eg. the data collection
procedure)
● Sample space: the set of all possible outcomes
● Event: a set of outcomes (can contain no outcome, single outcome or multiple
outcomes) of an experiment to which probability is assigned. So an event is a subset
of the sample space
● Relative frequency: outcomes receive probability corresponding to their number of
occurrences → P(outcomes)= number of occurrences of outcomeı ÷ total number of
occurrences of all outcomes
Law of addition
Joint vs marginal probabilities
● Distinguish joint and marginal probability through multidimensional outcomes
● Joint probability: denotes relative frequency when asking about all dimensions
- Eg. what is relative frequency that customer bought a $49 plan on a weekday
● Marginal probability: displays relative frequency when only asking about a single
dimension
Law of total probability, version 1
● Complement of the event denoted as A’ → pronounced as A prime - meaning not A -
if there is a dash at the top = not the outcome
● When referring to joint probability, we use
intersection “∩”. The event A∩B (it reads: the
intersection of A and B or A intersection B) means
the event where both A and B are true or both A and B occur
Venn diagram: visualisation of probability
● Venn diagram shows logic relations across sets
● The external rectangle indicates the whole sample space
● The internal circle indicates some event A
Joint events
● Joint events such as A ∩ B is the intersection (∩) of A and B
Union of events
● Indicates the event A or B happens
● This is denoted by A∪B, pronounced as the union of A and B or A union B.
So P(A∪B) indicates the probability that A or B is true or that A or B occurs
General rule of addition
Mutually exclusive events
● If event A occurs only if event B does not occur (cannot occur at the
same time), we say A and B are mutually exclusive (events)
● Any event and its complement are mutually exclusive. Either “A
occurs” or “A does not occur
● P(A∩A’) = 0
Collectively exhaustive events
● If the occurrence of events A and B covers the whole sample
space, we say A and B are collectively exhaustive (events
● Any event and its complement are collectively exhaustive. “A
occurs” and “A does not occur” make up all possible outcomes
● P( A∪A’) = 1
Conditional probability and independence
Conditional probabilities
● P(A|B) denotes the probability that event A occurs, conditional on that B occurs.
● The symbol P(X=x|Y=y) denotes the probability of r.v. X taking value x, conditional on
the r.v. Y taking value y
● formula:
● Bayes rule:
Law of total probability
● Joint probability = conditional probability multiplied by the marginal probability
Independent events: formula
● If A and B are independent events, whether or not B occurs should not affect the
probability that A occurs; also, whether or not A occurs should affect the probability
that B occurs
● Formula:
● Bayes rule:
Implications of formulas
Binomial experiments
● Eg. toss a coin 3 times in a row and you are interested in how likely it is that you get
exactly two heads
● A binomial experiment assesses the number of a certain outcome from repeated
independent trials
● Each trial has two possible outcomes (eg. heads or tails, success or failure)
Binomial tree
● When two outcomes are independent, P(A|B) = P(A)
● Suppose we have three products, each can be defect
(D) with probability p or functional (F) with probability q=
=1-p
Binomial distribution
● A r.v. X taking value in (0,1,...,n) is said to follow the binomial distribution denoted by
𝑋 ~ 𝐵𝑖𝑛(𝑛, 𝑝)
𝑥
● 𝑝 : the probability of x successes
𝑛−𝑥
● (1 − 𝑝) : the probability of n-x failures. So in total we have n trials
● The factor (combinatorial operator)
- computes the number of cases/combinations of choosing x
objects from the set of n objects. Remember the factorial
operator m! = 1 x 2 x 3 x … x (m-1) x m
● Properties of binomial distribution:
- Almost all distributions have expectation (i.e. mean) and variance (and thus
standard deviation).
- Every distribution (their pdf) is characterised by some parameters.
→ The binomial distribution has two parameters, 𝒏 (the number of trials) and
𝒑 (the success probability or success rate)
→ the mean (expectation) and variance of 𝑋~𝐵𝑖𝑛(𝑛, 𝑝) are given by: