Math Test Prep File
Math Test Prep File
- https://www.youtube.com/playlist?list=PLM-mb7IpX4moTbG2Pe96z2JcQcmIeFYIr
A playlist with mathematics test training videos
- https://www.youtube.com/watch?v=mk8tOD0t8M0
A very simple video, using numerical examples to explain Mode, Median, Mean, Range, and
Standard Deviation
- https://www.youtube.com/watch?v=qqOyy_NjflU
A simple video showing you how to calculate Standard Deviation and Variance, providing a
numerical example with 6 observations. (This video uses a sample variance formula)
- https://www.youtube.com/watch?v=sOb9b_AtwDg
This video can help you distinguish sample and population variance. More theories are also
explained in this video, telling you what variance actually is
- https://www.khanacademy.org/math/probability/probability-geometry#probability-basics
This video shows the basics of probabilities
- https://www.youtube.com/watch?v=OvTEhNL96v0
This video shows information about Expected Value and Variance of Discrete Random
Variables with numerical examples
- https://www.khanacademy.org/math/ap-statistics/random-variables-ap/discrete-random-
variables/v/variance-and-standard-deviation-of-a-discrete-random-variable
This video shows a numerical example of calculating variance and a standard deviation of a
discrete random variable (5 variables)
1. Some Basics
1.1 Statistics vs. Statistic
Statistics as a field of mathematics dealing with the
collection, explanation and interpretation of data.
A statistic is a measure that tries to capture some
information about the data set.
Examples: mean, median, standard deviation, correlation,
covariance, sample estimate of the population mean, …
(The plural form of statistic is also called “statistics”!)
Some Basics
3
1. Some Basics
1.2 Population vs. Sample
The population of a study is the group
of all items of interest in that study.
A sample is a subset of the population that
is studied. (A desired characteristic is that its
data was obtained randomly.) “Sampling
Process”
The “sampling process” is the method by
which we select observations from the
population to arrive at a sample.
Typically some form of random sampling.
Some Basics
4
1. Some Basics
1.3 Parameters vs. Statistics
Descriptive measures about the
population (e.g., the population mean)
are called parameters.
Descriptive measures about a sample are
called statistics. “Sampling
Process”
We typically use latin-based letters for
sample statistics.
Example: the (sample) mean
(pronounced “x-bar”)
Some Basics
5
1. Some Basics
1.4 Data Types
We differentiate between the following data types:
1. INTERVAL or QUANTITATIVE data
Real numbers
Distances between values with intrinsic meaning.
Examples:
On previous slide: Time in seconds
Length of cars
…
Some Basics
6
1. Some Basics
1.4 Data Types
We differentiate between the following data types:
1. INTERVAL or QUANTITATIVE data
2. ORDINAL data
An ordered ranking among data exists.
Distances do no have intrinsic meaning.
Ex. on previous slide : Song rating: {bad, average, good, excellent}
We can assign numbers to each value, but we need to maintain
the order, e.g., bad=1, average=2, good=3, excellent=4.
Also possible : bad=0, average=23; good=34;
excellent=100; distances between values do not matter.
Some Basics
7
1. Some Basics
1.4 Data Types
We differentiate between the following data types:
1. INTERVAL or QUANTITATIVE data
2. ORDINAL data
3. NOMINAL or CATEGORICAL data
Values have no order, nor any intrinsic numerical value;
any number can be applied to represent a value.
Examples:
On previous slide: Artist, Album, Genre
1. Some Basics
1.4 Data Types
In fact there is a hierarchy among data types:
Example: Exam scores
Interval data
Exam scores (interval data) is often compressed into letter
grades (ordinal data): 94 points = A
Letter grades (ordinal data) can be further compressed into
Ordinal data simple pass/fail categories (nominal data): A = Passed.
DESCRIPTIVE STATISTICS
Descriptive Statistics
2
Answer:
You may try to come up with some
summary statistics that somehow Simply put, all that is
describe the data set. descriptive statistics.
Descriptive Statistics
3
Covariance, Correlation
Descriptive Statistics
4
Population:
We will see the Sigma summation sign
Little “n” is used for sample sizes and over and over again; make sure you
capital “N” for the population sizes. understand it!
Descriptive Statistics
8
Rg = ((1+100%)*(1-50%))^(1/2) -1 = 0
Sample:
Population:
where D(.) is the cumulative distribution function*
* That means that half of the values of the data set are below the value x. More on this in chapter 6.
Descriptive Statistics
13
3. Measures of Dispersion
(B) MAD, Variance & Standard Deviation
More commonly used are MAD, Variance and Standard
Deviation.
3. Measures of Dispersion
(B) MAD, Variance & Standard Deviation
One intuitive way could be to measure the spread of the
data as the average distance of the data points from the
center of the data set.
In math terms that would be:
“m” is some measure
of center – could be
1. Numerator sums up all the the arithmetic mean
distances between each or the median.
observation the center.
2. The denominator divides by
the number of observations to
get the average distance.
Descriptive Statistics
16
3. Measures of Dispersion
(B) MAD, Variance & Standard Deviation
Suppose we use the arithmetic mean of the population (μ)
as the measure of center, so:
Descriptive Statistics
17
3. Measures of Dispersion
(B) MAD, Variance & Standard Deviation
Suppose we use the arithmetic mean of the population (μ)
as the measure of center, so:
Issue: The numerator by
definition is 0 if we sum up
the differences between each
value to the overall mean.
Problem: the distances to
values below the mean and to
values above the mean cancel
each other out!
Ex: data set {10,20,30}.
Descriptive Statistics
18
3. Measures of Dispersion
(B) MAD, Variance & Standard Deviation
2 solutions to get rid of the problem of distances
canceling one another out:
1. We add up the absolute values of the distances.
= Mean absolute deviation (MAD)*
of the population
* There is a variationsof this formula (the median absolute deviation), where the median is used instead of μ.
Descriptive Statistics
19
3. Measures of Dispersion
(B) MAD, Variance & Standard Deviation
The square root of the population variance is
called “standard deviation” (called, “sigma” σ):
= Population standard deviation.
(we have “readable” units again).
Descriptive Statistics
21
3. Measures of Dispersion
(B) MAD, Variance & Standard Deviation
Recall that we said earlier that sample statistics are
different from population statistics – this is the case for
the variance/standard deviation:
Sample Population
Variance
Standard
Deviation
Descriptive Statistics
22
3. Measures of Dispersion
(B) MAD, Variance & Standard Deviation
How to compute the MAD, Variance and Std Deviation?
1. By hand (if data set not too large), or
2. Variance and Std Deviation using Excel
Descriptive Statistics
24
3. Measures of Dispersion
(C) Coefficient of Variation
Is a std dev of 161 points among final scores large? The
standard deviation by itself cannot be interpreted when
the magnitude of the variable under question is unknown.
Population Sample
d
Descriptive Statistics
25
3. Measures of Dispersion
(C) Coefficient of Variation
For our sample of final scores of last year:
The variation among the
final scores is about 20%
of the mean score. This is
not a large variation.
PROBABILITY
Probability
2
* Basic outcomes need to be mutually exclusive. (If we consider the weather as a random experiment,
then for example “sunny” and “rainy” could not be basic outcomes as both can occur simultaneously.)
Probability
3
Examples:
countable
Example:
Experiment: Rolling a die once
Sample Space: {1,2,3,4,5,6}
Probability of any number (by classical approach): 1/6.
Event: Rolling an odd number with a die = {1, 3, 5}
Pr (odd number) = 1/6 + 1/6 + 1/6 = ½ = 50%.
Probability
7
A B
How does the Venn
diagram look like
for the random
experiment “Rolling
a die once”?
Probability
9
A
Complement of A,
written as A or AC and
called “A not” or
“A complement”
Probability
11
Rolling a 1
Pr (“1”) = 16.7%
Not rolling a 1
Pr(“not rolling a 1”) =
100%-16.7% = 83.3%
Probability
12
C:= Pr (A and B)
Probability
13
Marginal
Joint Probabilities
Probabilities
Promoted (B1) Not Promoted (B2)
Female (A1) 0.03 0.12 0.15
Male (A2) 0.17 0.68 0.85
0.20 0.80 1.00
* This only works if the events are independent – more on this below!
Probability
18
20%
promoted
80%
Female Male not promoted
(15%) (85%)
Probability
19
Promoted AND
female = 3% Promoted AND male = 17%
20%
promoted
Not
promoted
AND Not promoted AND male = 68%
female = 80%
not promoted
12%
Probability
20
To
answer our question whether there is discrimination:
Shall we simply compare joint probabilities?
“17% versus 3% of all 20% promotions go to men. So men are
clearly favored.” ?
Probability
21
A B
A and B
Probability
24
In our example:
Probability
27
A
Probability
33
To avoid to double-count
the overlapping part!
B (Because that overlap is part of both
A events A and B.)
Probability
34
Event B = “I am in Pittsburgh.”
A C B
DISCRETE PROBABILITY
DISTRIBUTION
Discrete Prob. Distribution
2
3 4 5 6 7 8 9 In some experiments …
• the RV can take on the same
4 5 6 7 8 9 10 value for several outcomes
• the outcomes themselves are
5 6 7 8 9 10 11 numbers (e.g., return on an invest-
6 7 8 9 10 11 12 ment) in those cases the value
of the RV simply is the numerical
event itself.
Discrete Prob. Distribution
4
1st die 2: 1
↓ 2nd die 1 2 3 4 5 6 3: 2
1 2 3 4 5 6 7 4: 3
5: 4
2 3 4 5 6 7 8
6: 5
3 4 5 6 7 8 9 X= 7: 6
4 5 6 7 8 9 10 8: 5
9: 4
5 6 7 8 9 10 11
10:3
6 7 8 9 10 11 12 11:2
12:1
Var (# of boys)=
(0-1.56)^2*0.11 +
(1-1.56)^2*0.359 +
(2-1.56)^2*0.389 +
(3-1.56)^2*0.141
= 0.745
Number of Boys in Family with 3 Kids
StdDev (# of boys) =
= sqrt(Var (#of boys) )
= sqrt(0.745) = 0.863
Discrete Prob. Distribution
11