Stat 1 Notes Part 1
Stat 1 Notes Part 1
Lecture 1:
Definitions:
Quantitative Variable : Qualitative variable:
Continuous: height, weight at birth, yield Nominal: hair colour, bachelor program, province
Discrete : number of children in a household, Ordinal : grade if eggs (AA/A/B), highest level
number of diseased plants in a field education completed, annual salary (in ordered
categories)
Research Question : Question Variable: Measured property of Sample: Part of the population that
that we want to answer an element of the sample we will study and collect
information from. We want to draw
Units: The elements of a Population: every member of a conclusions about the population, so
sample from which we collect group (persons, objects, etc.) the sample should be representative
the information for which we would like to of the population
collect information
1
Lecture 2: Part 1 – Numerical summaries of data
Assume we are investigating salt in bread
Pop: all loaves of bread sold in Units: loaves of bread Variable: Amount of salt
the Netherlands on one day (g/100g) in bread
Sampling design: SRS from all supermarkets and bakeries, subsequently draw one loaf at random from
each of the selected supermarkets/bakeries (two stage cluster sample, NOT SRS!)
Effect of outliers:
7 4 9 6 5 put in increasing order 7 4 110 6 5 put in increasing order
45679 4 5 6 7 110
M=6 M = 6 (not sensitive to the outlier 110)
mean = 6.2 mean = 26.4
This is a symmetric distribution This is an asymmetric distribution
Measures of variability:
Standard deviation : s = √variance Variance: s2 =
( ȳ) ⋯ ( ȳ)
= ∑((𝑦 − ȳ)
Example: 4 7 3 9 6 ȳ = 5.8
( . ) ⋯ ( . )
Variance = s2 = = 5.7 Standard deviation : s = √5.7 = 2.39
Percentiles: The pth percentile of a set of n ordered observations (smallest to biggest) is the value where at
most p% of the observations are smaller than it and at most (100-p)% of the observations are larger
Example: 10th percentile, you are looking for at which value do you find at most 10% of the observations are
smaller and at most 90% are larger. For the IQR you find at which value the 25 th (Q1) and 75th (Q3)
percentile are, then the value at Q3 – the value at Q1 is your IQR.
Ex: Assume that A= “even outcome” A ={2,4,6} Then, P(A) = 3/6 =0.5
(For rolling a fair fair)
Statistical Events:
3
P(S) =1 ; S is the sample space (set of all possible
outcomes)
Addition law for disjoint events: Events never occurring at the same time are called mutually exclusive
or disjoint events. P(A∪B) = P(A) + P(B), *if A and B are disjoint events.
Example: Calculate the probability that the offspring is also heterozygous (Yy or yY)
A = Male parent passing on Y, female parent passing on y
B = Male parent passing on y, female parent passing on Y
P(A∪B) = P(A) + P(B) = 0.25 + 0.25 = 0.50
General addition law: For events that are not disjoint. P(A∪B) = P(A) + P(B) - P(A∩B) *if A and B are
disjoint (A∩B = Ø), P(A∩B) = 0. Addition law for disjoint events follows. You are essentially just
removing the double counts from the overlap of both events.
Example: A= Male parent passing on Y B = Female parent passing on Y
P(A∪B) = P(A) + P(B) - P(A∩B) = 0.5 + 0.5 – 0.25 = 0.75
Example: Rolling a die - Event A: uneven number = {1 ,3, 5} Event B = number ≥ 3 = {3, 4, 5, 6}
A∪B = {1, 3, 4, 5, 6} A∩B = {3, 5}
P(A∪B) = P(A) + P(B) - P(A∩B) = 3/6 + 4/6 – 2/6 = 5/6
Expectation of a variable: µ or E(y): Expectation/ expected value of a random variable y is the mean
outcome of y “in the long run”
Expectation and variance of a discrete random variable y: A random variable y with possible outcomes y 1,
y2, …, yn and associated probabilities p1, p2, …, pn has the following properties:
4
The expected value of y is: The variance of y is:
µ = E(y) = ∑ yi pi = ∑ outcome x probability σ2 = Var(y) = ∑( yi - µ)2 pi
The expected of the squared differences between y
and its expectation µ
Example: Example: Variance of y when rolling a die is
Expected value of rolling a die: σ2 = Var(y) = ∑( yi - µ)2 pi
µ = E(y) = ∑ yi pi = ∑ outcome x probability = (1-3.5)2 *1/6 + (2-3.5)2 *1/6 + (3-3.5)2 *1/6 +
= 1*1/6 + 2*1/6 + 3*1/6 + 4*1/6 + 5*1/6 + 6*1/6 (4-3.5)2 *1/6 + (5-3.5)2 *1/6 + (6-3.5)2 *1/6
= 3.5 = 2.9
! !
Example: SSFF: = )!
= = )!
= = 12/2 = 6
!( !(
Let S-disease be a disease where the vaccine has p = probability of sufficient protection and p = 0.80
5
Ex: Calculate the probability of the Ex: Calculate the probability of exactly 1 not protected pig:
combination PNP: P(PNP) = 0.8 x 0.2 x
0.8 = 0.128
P(NPP) + P(PNP) + P(PPN) = 0.128 + 0.128 + 0.128 = 0.384
*events are disjoint and thus we can add them
We can use the multiplication law to determine the probability of each outcome for the 2 and 3 randomly
selected pigs. For three this is doable, however if we add more pigs, it become too complex for this method.
So, we used the binominal distribution:
We can use the second example again, where we calculate the probability of exactly 1 not protected pig:
P(2 of 3 pigs are protected) = 3
𝑥 0.80 𝑥 0.20
P(1 of 3 pigs are not protected) = 2
The is the number of ways to choose 2 out of 3 pigs, the 0.802 is the probability of protection and the
0.201 is the probability of insufficient protection. The exponents represent how many pigs have that
probability.
Important Note: 0! = 1 and = =3
Binomial situation in general – assumptions: When you should be uses a binomial distribution:
- n trials - The trials are - Each Trial results in either - The probability of success π (notation in
independent success or failure textbook), remains the same in each trial
Binomial distribution – formula: the number od successes y in n trials in the binomial situation is
distributed according to a binomial distribution (~ means “is distributed as”)
6
Expected value and variance binomial distribution
Suppose 100 pigs get a vaccine against a certain disease.
Let π be the probability of sufficient protection. Extensive
research has shown that this probability is 0.80.
Please note: 𝜋 = is a consistent estimator for π : the larger the sample size n, the closer 𝜋 = gets to
the true π.
Example:
Suppose the number of sufficiently protected pigs y is binomially distributed: y ~ Bin(75, 0.85)
a. Calculate the expected values for the number of successes (success: pig is sufficiently protected)
and the sampling proportion of the successes.
b. Calculate the variances for the number of successes and the sampling proportion of successes
a. µy = nπ = (75)(0.85) = 63.75 b. 𝜎 = 𝑛𝜋(1 − 𝜋) = (75)(0.85)(0.15) = 9.56
𝜇 = 𝜋 = 0.85 𝜎 = 𝜋(1 − 𝜋)/𝑛 = (0.85)(0.15)/75 =0.017
With a statistical test, we can decide whether there is enough evidence (by the data) against the null
hypothesis.
● Null hypothesis (H0): statement to be tested, usually formulated as “no effect” or “no difference”.
● Alternative hypothesis (Ha): statement that you are hoping or suspecting to be true instead of H 0.
Statistical notation:
π = the probability that a pig is sufficient protected.
H0: π = 0.8 or π ≥ 0.8 (null hypothesis)
Ha: π < 0.8 (alternative hypothesis) → “Vaccine does not protect enough against a certain disease”
7
Lecture 5: Exact binomial test for a population proportion/probability π
Hypothesis Testing Steps: *Given (or chosen) level of significance α (i.e., 0.05)
Description: Example:
a. Give H0 and Ha H0 : π = 0.5 and Ha : π > 0.5
b. Describe the test statistic Number of mites that chose the odour
c. Give the distribution of the test statistic, y ~ Bin(100, 0.5)
assuming H0 is true
d. Sketch of the distribution of the test statistic
under H0
8
Binomial test: hypothesis
With a statistical test we can decide whether there is enough evidence (by the data) against the null
hypothesis
Null hypothesis (H0): statement to be tested, usually formulated as “no effect” or “no difference”.
Alternative hypothesis (Ha): statement that you are hoping or suspecting to be true instead of H0
Statistical notation: ● Note: Ha shows that this is a one-sided hypothesis test
π = the probability that a predatory mite chooses the odour
H0: π = 0.5 or π ≤ 0.5 (null hypothesis)
Ha: π > 0.5 (alternative hypothesis) → “predatory mites prefer the odour
How can we show that predatory mites prefer the odour?
Count how many predatory mites choose the odour!
Test Statistic = y = number of predatory mites in the sample choosing for the odour
P-Value:
P(y = 60) = 0.0108 & P(y > 60) = 0.0176
P(y ≥ 60) = P(y = 60) + P(y > 60) = 0.0284
In graphing calculator:
(1- Binomcdf(100,0.5,60)) + Binompdf(100,0.5.60)
= (1 – 0.9824) + 0.0176 = 0.0284
Level of significance α
How to make a decision about the p-value we find?
Before we do the study, we decide there a particular probability that we are going to use as a cutoff. This
is the level of significance (α). If the probability that we find is smaller than α, then we will decide not to
believe the H0. If the probability is larger or equal to α, we accept H0, and Ha is rejected, as we lack the
evidence to reject H0.
Choose a level of significance α before the analysis, e.g., α = 0.05
Using a level of significance α means we accept a probability of at most α to reject H 0 when, in reality, H0
is true.
Often used: α = 0.05
9
However, choice often determined by field of research: Medicine: small(er) values of α (incorrect
conclusions can have large consequences!) e.g., α = 0.01 or less.
Example:
Assume the sample size n is equal to 8 in an experiment with predatory mites. Six out of eight predatory
mites choose the odour. Perform a statistical test with α = 0.05 whether the odour attracts more mites than
the clean air sample.
e. y=6
f. P-value: P( y ≥ 6) for y ~ B(8, 0.5) g. P( y ≥ 6) = 0.1445 > α = 0.05. This means
1-Binomcdf(8, 0.5, 6) + that we do not reject H0 and Ha has not been
Binompdf(8,0.5,6) = 1-0.9648 + 0.1094 shown. It has not been shown (with α =
= 0.1445 0.05) that predatory mites prefer the odour
10
Calculate the p-value and draw the conclusion when 7 out of 8 mites choose the odour. (e. y = 7)
Steps: Answers: Steps: Answers:
f. P-value: P(y ≥ 7) for y ~ B(8, 0.5) g. P( y ≥ 7) = 0.352 < α = 0.05. This mean, we do
1- Binomcdf(8, 0.5, 7) + reject H0, and Ha has been shown. It has been
Binompdf(8, 0.5, 6) = 0.0039 + shown (with α = 0.05) that predatory mites
0.3125 = 0.3516 prefer the odour.
Example:
Problem:
One week before the election date an agency investigates the voter turnout for the municipal elections.
This research was commissioned by the local newspaper and conducted in the Northern Dutch provinces
of Groningen, Friesland and Drenthe based on a random sample of 500 eligible voters.
In the last municipal elections, the voter turnout was 59% in these provinces. It is assumed that the
attention of the general public to the municipal elections is diminishing rapidly and therefore the voter
turnout is decreasing.
Formulate the research question in the form of a statistical test. Provide the null hypothesis (including
a self-defined parameter) as well as the alternative hypothesis and the test statistic.
Answer:
n = 500 Previous election turnout = 59%
“Suspected that less people vote” therefore research question of interest: will fewer than 59% of
the population vote?
If nothing has changed, our hypothesis is that in the sample 59% will vote!
Test Statistic: the number of people in the sample who will vote
H0: π = 0.59 Ha: π < 0.59
Alternative hypothesis
In a statistical test for a probability π with For a two-sided alternative hypothesis Ha: π ≠ π0 both
H0: π = π0 large as well as small outcomes of the T.S. provide
There are three possible alternative evidence against the null hypothesis H0 and in favour of
hypotheses: Ha. We therefore need a two-sided P-value.
Ha: π > π0 → Right-sided P-value: P(T.S. The P-value is the probability of the test statistic
≥ outcome) obtaining the observed value or anything more
Ha: π < π0 → Left-sided P-value: extreme (supporting Ha), assuming that the null
P(T.S. ≤ outcome) hypothesis is true.
Ha: π ≠ π0 → Two-sided P-value In this case: an extreme can be situated on both sides (tails
of the distribution)!
Examples:
Ha: π > 0.59 →Right-sided P-value: P(T.S.≥ outcome) (if voter outcome had increased)
Ha: π < 0.59 →Left-sided P-value: P(T.S.≤ outcome) (if voter outcome had decreased)
Ha: π ≠0.59 →Two-sided P-value (if we don’t know whether to look for increasing or decreasing
outcome)
Suppose the distribution of the T.S. y when the null hypothesis is true
equals: y ~ Bin(n = 9, π0 = 0.5)
Ha: π ≠ π0 → Two-sided P-value
We can say the expected value of this is 4.5 but, in practise, we
will focus on either 4 or 5 mites choosing the odour. This means
we expect either 4 or 5 mites to choose the odour (π 0 = 0.5)
If 0 or 9 mites choose the odour, then we have evidence to support
the alternative hypothesis. So, both sides support H a making this a 2 µy = nπ0 = 4.5
sided (2 tailed) p-value
11
General rules (symmetric distributions):
Two-sided P-value = 2* one-sided P-value.
The one-sided P-value is always the smallest P-value:
If the outcome of the test statistic y0 > μy = nπ0, the two-sided P-value is 2*P(y ≥ y0)
If the outcome of the test statistic y0 < μy = nπ0 , the two-sided P-value is 2*P(y ≤ y0).
Please note: We assume (in this course) in case of π ≠ 0.5 (for example π = 0.3) that the two-sided P-value
is equal to 2* one-sided P-value.
Example:
Assume the sample size n is now equal to 7 in an experiment with predatory mites with two different
odours. We have no prior idea whether the mites have a preference for one or the other odour. Six out of
seven predatory mites choose the odour no.1. Perform a statistical test with α = 0.05 to investigate
whether there is a difference in attraction between the two odours.
Steps: Answers: Steps: Answers:
a. H0 : π = 0.5 Ha ≠ 0.5 b. y = number of predatory mites choosing
odour source no.1
c. Under H0 y ~ Bin(n=7, π=0.5) d. P(y ≤ 5) = 0.9375
P(y = 6) = 0.0547
P(y ≥ 7) = 0.0078
P(y ≥ 6) = 0.0625
e. y = 6 (> nπ0 = 3.5) f. P-value: 2*P(y ≥ 6) if y ~ B(7,0.5) =
0.1250
g. Conclusion: 0.1250 > α = 0.05 Parameter Assume π = Probability that a predatory
therefore do not reject H0, and Ha mite chooses odour 1
has not been shown. It has not been
shown that predatory mites prefer
one or the other odour.
Lecture 6: Normal Distributions: For continuous variables (all values including in the intervals of real
values) The probability distribution thus cannot be described by individual outcomes y i with associated
probabilities pi
Histogram: Used to display observations. Uses bins to organize the observations. A small number of
observations leads to larger bin sizes. As you add more observations, the bin size will increase, and the
histogram will become a curve. When this occurs, it is called a probability density function (a visualization
of the probability distribution for continuous random variables)
Continuous random variable: a probability is
a relative frequency in the long run, therefore
the probability of A, P(A), is given as the area
under the curve in interval A. The total area
under the curve is equal to 1.
To compare with discrete random variables,
let’s say y = number of blue-eyed persons in a
random selected group of 10. We are looking
for the probability that at least 5 people in the 10 who have blue eyes: P(y ≤ 5) ≠ P (y < 5 ) so you must
include the probability that 5 people who have blue eyes into the sum!
For a continuous random variable where y = birth weight, and we are looking which babies’ weight less than
3150 g : P(y ≤ 3150) = P(y < 3150) these both reflect the same area under the curve! This is because it is
impossible to associate a specific probability with a specific outcome: P(y = 3150) = 0
Normal Distribution: Symmetrical, Unimodal (one mode = one peak) and Bell-shaped
12
Notation y ~ N(µ, σ) where µ = population mean and σ = population standard deviation (Greek letters are used to
represent the population) ( ~ means “is distributed as”)
For example: if we are looking at a distribution of
height withing a population:
It is relatively rare to see someone who is super short,
so the bell shapes curve in that area is relatively low.
Same for Tall. Whereas as we near the average height,
the slope raises, as it is quite common to see someone
with an average height.
The shape of the curve is determined by the
population mean and the population standard
deviation.
As the area under the curve is always equal to 1, it does not matter the shape of the normal distribution curve
as up to the area z you are looking for will always be the same area.
This table (this is not the full table) gives the area under the curve for a particular value of z (area or the
space for anything less than or equal to that z value – shaded gray). The shaded area is the probability that Z
is smaller than or equal to a particular z value (P(Z ≤ z))
In the table, the top row and first column correspond to z values. All the numbers in the middle correspond
to areas. So, for example, the z value 3.43 will have an area of .0003 (shown in blue shaded in the table)
Equally you can do this backwards, meaning that if needed you can find a z value by being given an area
13
and work backwards. This doesn’t always give the most accurate z value because as you reach the outer
edges of the Standard Normal distribution, there are many low probabilities (ex: multiple z values have
0.0001 areas)
Transformation: y ~ N(µ/,σ) Z ~ N(0,1)
Every normal distribution can be transformed into a Z distribution using a certain formula: y = µ + z * σ ,
and rearranged it gives z = (y - µ)/ σ “z-score of y”
This is a probability density for a continuous random variable y
and we know it has a particular mean in the middle (µ) and
standard deviation (σ). With this transformation, we are essential
creating a new z axis, so we are transforming y into z.
Ex: For height in the population, we have a mean of 175cm and a
standard deviation of 7.5cm. What is the probability that the height
of an individual is smaller or equal to 165cm?
y ~ N(175,7.5)
P(y ≤ 165) = P(175 + Z x 7.5 ≤ 165)
We can solve for Z using the equation : z = (y - µ)/ σ
z = (165 – 175)/7.5 = -1.33 Now you can look in the table and the
answer is 0.0918
µ = 250
σ = 50
If we know that the gray area in the first image is 0.60, we know that the second image has a gray area of 0.40 as it
would be 1-0.60 = 0.40.
Since the curve is symmetrical, we also know that the gray area in C will be half of B and thus 0.20
That means we can solve for 0.20 = P(y < 250 - k)
Now we look in our table for 0.20 and find the corresponding z which is 0.84
Now we can solve for k using the formula:
z = (y - µ)/ σ
0.84 = (250 – k – 250) / 50
42 = k
Example:
Given : n = 35 (number of units in sample) mean = 294 grams/mile S.D. = 26 grams/mile
thus, y ~N(294,26)
N (µ,σ) µ and σ are unknown but we have a sample mean and sample S.D. that we will use as an
estimator for µ and σ. The formulas for mean and S.D. are called estimators.
14
Calculate the probability that a randomly selected car has a CO2
emission larger than 320 grams/mile
P(y > 320) = P(Z > (320-294)/26) =
P(Z > 1.00) = 0.1587 (remember that the table gives the area up to
y=320 and so on the table we get 0.8413 and since the whole area
on Z is 1, we use the complement rule and do 1-0.8413 = 0.1587)
Calculate the probability that a randomly selected car has a CO2
emission between 270 and 320 grams/mile
P(270 < y < 320) = P ((270 – 294)/26 < Z < (320 – 294)/26)
= P (-0.92 < Z < 1.00) = P(Z < 1.00) – P(Z < -0.92)
= 0.8413 – 0.1788
= 0.6625
Calculate the 10th percentile of this distribution: P(y < k) = 0.10 P (Z < (k -294)/26) =0.10
(k – 294)/26 = -1.28 k = 260.7
Lecture 7: Laws of calculating and the probability distributions of a sample mean and a sum
15
Next, lets assume that someone can add exactly 0.2g of salt (per 100g) to a loaf of bread. What would be
the expected value and s.d. of the salt conc3entration (per 100g)
Well for the expected value, we can simply add the 0.2 g: 1.8 + 0.2 = 2g
*The standard deviation would not change however because the same amount of salt would be added to
every loaf of bread
Example: We use combination when we want to add different normal distributions, so we need new σ and µ
Problem: Assume we are interested in the Given:
distribution of the amount of salt in a slice of bread Bread: Smoked meat
(35g) with a slice of smoked meat (10g) x = salt concentration y = salt concentration
x ~ N(1.8, 0.1) [g/100g] y ~ N(3.6, 0.5) [g/100g]
Solution: Calculate the distribution of z = 0.35*x + 0.10*y
So, to create the new normal distribution, you need to know your new
expected value and your new standard deviation.
Calculation of the expected value and the standard deviation of z:
µz = 0.35µx + 0.10µy = 0.35*1.8 + 0.10 * 3.6 = 0.99
σz = σax+by+c = (σ20.35x+0.10y) = (σ20.35x + σ20.10)
= ( 0.352 * 0.12 + 0.102 * 0.52)
= 0.061
16
Now we know that the distribution of the amount of salt in a randomly selected slice of bread with a
randomly selected slice of smoked meat: z ~ N(µz,σz) N(0.99, 0.061)
Example 2:
Problem: Given:
Assume we are interested in the distribution of the Pita: Shawarma meat
amount of salt in a pita bread with 200g of spiced x = salt concentration y = salt concentration
shawarma x ~ N(0.6, 0.2) [g/pita] y ~ N(0.5, 0.1) [g/100g]
Solution: Calculate the distribution of z = x + 2y
So, to create the new normal distribution, you need to know your new expected value and your new
standard deviation.
Calculation of the expected value and the standard deviation of z:
x ~ N(µx = 0.6, σx = 0.2) and y ~ N(µy = 0.5, σy = 0.1)
Now we know that the distribution of the amount of salt in a randomly selected piece of pita with a
randomly selected 200g of shawarma meat: z ~ N(µz,σz) N(1.6, 0.283)
µ∑y and σ∑y for independent drawings from a single distribution : if you want to add the data from a
sample of more observations into your µ and σ (from the same pop)
Take a simple random sample of size n from a population in which a random variable y has an expected
value (pop mean) µy and standard deviation σy: observations y1, y2. …, yn. Applying the laws of
calculating expected values and variances to the sum of these observations, gives: ∑y = y1 + y2. +… + yn
For µ: µ∑y = nµy For σ: σ∑y = √𝑛 * σy
(µy1+y2+…+yn = µy1 + µy2 + … + µyn = nµy) (σ2y1+y2+…+yn = σ2y1 + σ2y2 + … + σ2yn = nσ2y)
In general:
y ~ N(µy,σy) - Random sample of size n from this normal distribution : y1, y2, …, yn
∑y = y1 + y2. +… + yn
Distribution sum: ∑y ~ N(nµy, √n * σy)
Example:
Problem:
The Dutch Bakery Association is the independent quality and information institution fir the baking
industry. Assume that they would like to check the salt concentration in bread on a certain day. They
therefore draw a simple random sample of loaves of bread and checks these loaves for their salt
concentration.
The max limit for salt in bread is 2.1g/100g.
Assume the NBC advises to bake bread with a (mean) salt concentration of 1.7g/100g. The NBC would
like to check whether their advice is followed.
The NBC randomly selects 2 loaves of bread and determines the mean salt concentration of these 2
loaves. Is it accurate? And what do you think if 10 loaves of bread would be randomly chosen? 10
randomly chosen loaves of bread would be better than 2.
17
The following graph shows two simulations (1000x)
of the mean salt concentration of 2 and 10 loaves:
Both are normally distributed but, the 2000
observations are less accurate than the 10 000
observations. More observations lead to a narrower
slope, as more observations are listed so the
frequency scale is larger, and we see a smaller spread
Variability of 𝑦 can be described by the spread of outcomes of the simulation. Larger sample sizes show a
smaller spread in 𝑦.
Take a simple random sample of size n from a population in which random variable y has an expected
value (pop mean) µy and a standard deviation σy : observations y1, y2, …, yn
Aim: estimate µy Aim: estimate 𝜎
Estimator (formula) : sample mean 𝑦 = (y1 + y2. +… + yn)/n 𝒏𝝈𝟐𝒚 𝝈𝟐𝒚 𝝈𝒚
𝒏𝝁𝒚 𝝈𝟐𝒚 𝟐 = 𝐒𝐨: 𝝈𝒚 =
𝝁𝒚 = = 𝝁𝒚 𝒏 𝒏 √𝒏
𝒏
𝑦 is a consistent estimator for µy because the larger the sample, the closer 𝑦 tends to the unknown true
value of µy.
Implementation: y1, y2, …, yn outcome of 𝑦 is an estimate of µy.
In general:
y ~ N(µy, σy)
Random sample size of n from this normal distribution: y1, y2, …, yn
⋯
𝑦= = Σ𝑦
Distribution sample mean: 𝑦~N(𝜇 , )
√
Example:
Let the slat concentration in bread Calculate the probability that the mean salt concentration in a
(y, gram/100gram) be distributed as simple random sample of 4 loaves of bread is higher than 1.7
y ~ N(1.75, 0.1) g/100g.
Repeat the same calculation for a simple random sample size of 16.
For 4 loaves: For 16 loaves:
P(𝑦 > 1.7) = ??? y = salt concentration in bread P(𝑦 > 1.7) =??? y = salt concentration in bread
µy = 1.75 σy = 0.1 𝑦 = 1.7 n = 4 µy = 1.75 σy = 0.1 𝑦 = 1.7 n = 16
𝜇 = µy = 1.75 𝜎 = = 0.05 𝜇 = µy = 1.75 𝜎 = = 0.025
√ √
𝑦~ N(1.75,0.05) 𝑦 ~ N(1.75,0.025)
. - . -
P(𝑦 > 1.7) = 1 – P(𝑦 < 1.7) = 1 – P(z < ) P(𝑦 > 1.7) = 1 – P(𝑦 < 1.7) = 1 – P(z < )
= 1 – P(z < -1) = 0.8413 = 1 – P(z < -2) = 0.9772
18