0% found this document useful (0 votes)
32 views47 pages

7

Uploaded by

keshavbansal2277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views47 pages

7

Uploaded by

keshavbansal2277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

You are the Product

Manager of the
rising tech start-up,
InnovaTech, aiming
to launch a new
mobile application
that caters to the
diverse consumer
base of the country.
You need to
understand the
preferences and
usage patterns of
potential users to
ensure the
application meets
their needs and is
user-friendly.
Conducting a
detailed survey with
every single mobile
user in India is not
feasible due to the
country's vast
population,
geographic breadth,
and the associated
time and cost
constraints.
On the other hand,
relying solely on data
from a small, specific
group is unlikely to
provide a representative
picture of the broader
Indian population. This
could lead to biased
results, possibly
culminating in ineffective
product strategies.
As the Product Manager of InnovaTech,
how can you effectively and efficiently
gather the information you need to guide
your product development process and
business strategy?

What must you do?


Basic Terminologies in Sampling
• An element is the entity on which data are collected.

• A population is a collection of all the elements of interest.

• A sample is a subset of the population.

• The sampled population (generally referred to as simply


‘population’) is the population from which the sample is drawn.

• A frame is a list of the elements that the sample will be


selected from.
Aspect Population Sampled Population Sample Frame
The actual group from
The entire group you want A smaller group selected
which you draw your A list of elements from
Definition to study or about which you from the sampled
sample, usually a subset of which the sample is drawn.
want to draw conclusions. population to represent it.
the entire population.
A more specific group Specific individuals chosen Specific list or database
All individuals or items of chosen based on practical from the sampled containing the elements of
Scope interest (e.g., all considerations (e.g., population (e.g., 300 the sampled population
consumers in India). consumers in 10 selected consumers from the 10 (e.g., a customer database
states of India). selected states). from selected states).
Often too large to study A manageable subset that A practical tool for
A smaller, more feasible
Practicality entirely due to time, cost, can realistically be organizing and selecting a
group to collect data from.
and logistical constraints. accessed and studied. sample.

Ideally includes every Chosen based on Randomly or systematically Contains all the elements
Selection member of the group of accessibility and practical chosen individuals from the that can be included in the
interest. constraints. sampled population. sample.

Consumers in the states of A list of consumers from


A group of consumers
Maharashtra, Karnataka, the states of Maharashtra,
Example All consumers in India. randomly selected from
Tamil Nadu, Delhi, and West Karnataka, Tamil Nadu,
these five states.
Bengal. Delhi, and West Bengal.
To gather data that can be To provide a
To understand or draw To provide a realistic and
used to infer comprehensive list from
Purpose conclusions about the practical basis for selecting
characteristics of the which a sample can be
entire group. a sample.
population. drawn.
Why do Sampling?
• To collect data to answer a research question about a
population.

• The sample results provide only estimates of the values of the


population characteristics.

• The reason is simply that the sample contains only a portion of


the population.

• With proper sampling methods, the sample results can provide


“good” estimates of the population characteristics.
Sampling from a Finite Population
Simple Random Sample:

A sample of size n from a finite population of size N.

Simple random sampling is a type of probability sampling


method.

Each possible sample of size n has the same probability of being


selected.
Sampling from a Finite Population
Procedure of Simple Random Sample:

• Assign each element in the population a number from 1 to N

• Use a table of random numbers to select n numbers from 1 to N

• Select the corresponding elements in the sample

• Ignore any numbers that are greater than N or have been used before
(sampling without replacement)
Sampling from a Finite Population
Sampling with Replacement:
• Replacing each sampled element before selecting subsequent elements is
called sampling with replacement.
• An element can appear in the sample more than once.

Sampling without Replacement:


• Once an item is selected from the population and included in the sample,
it's not selected again in the sample.
• This ensures that a single item does not skew the sample by being
represented more than once.
• Unless otherwise stated, when we refer to simple random sampling, we
assume the sampling is without replacement.
Aspect Sampling with Replacement Sampling without Replacement

Replacing each sampled element before Once an item is selected, it is not selected
Definition
selecting the next. again.

An element can appear in the sample more An element cannot appear in the sample
Element Repetition
than once. more than once.

Each draw is independent; probabilities Each draw reduces the population size;
Impact on Sample
remain the same. probabilities change.

Does not prevent bias from repeated Ensures no single element skews the
Bias Prevention
elements. sample.

Drawing a marble, noting its color, putting it Drawing a marble, noting its color, not
Example
back, and drawing again. putting it back, and drawing again.

Used when the population size is small and Used when the population size is large
Typical Use
replacement is needed. enough to avoid replacement.

Less common in practice; not typically Commonly assumed unless otherwise


Assumption
assumed. stated.
Sampling from a Finite Population
An online coding bootcamp startup, CodeGuru,
based in India, has received 900 applications
from prospective students for their popular Full
Stack Development course. The applications
were numbered from 1 to 900, as they were
received. The Admissions Director wants to
select a simple random sample of 30 applicants
for a pre-course feedback survey.

Step 1: Assign a random number to each of the


900 applicants. The random numbers generated
by Excel’s RANDBETWEEN() function follow a
uniform probability distribution between 1 and
900.

Step 2: Select the 30 applicants corresponding to


the 30 random numbers.
Sampling from an Infinite Population
• Sometimes we want to select a sample but find that it is
not possible to obtain a list of all elements in the population.

• As a result, we cannot construct a frame for the population.

• Hence, we cannot use the random number selection procedure.

• Most often this situation occurs in the case of infinite


population.
Sampling from an Infinite Population
• Populations are often generated by an ongoing process where
there is no upper limit on the number of units that can be
generated.

• Some examples of on-going processes with infinite


populations are:
• parts being manufactured on a production line
• transactions occurring at a bank
• telephone calls arriving at a technical help desk
• customers entering a store
Sampling from an Infinite Population

A random sample from an infinite population is a sample


selected such that the following conditions are satisfied:

• Each element selected comes from the population of interest.

• Each element is selected independently.


Sampling from an Infinite Population
• Replacement isn't important
• With huge groups, it doesn't matter if you put things back or not after picking them.

• Endless options
• Even if you take one out, there are still many more left to pick from.

• Every item is unique


• In real life examples like store visitors or call center calls, each one is unique and
not repeated.

• Always more to choose from


• These groups can keep generating new items, so there's always more to pick from
Consider a finite population with five elements labeled A, B, C, D,
and E. Ten possible simple random samples of size 2 can be
selected.

a. List the all the samples of size 2.

b. Using simple random sampling, what is the probability that


each sample of size 2 is selected?

c. Assume random number 1 corresponds to A, random number


2 corresponds to B, and so on. List the simple random
samples that will be selected by using the random digits:
8 0 5 7 5 3 2.
Indicate which of the following situations involve sampling from
a finite population and which involve sampling from an infinite
population. In cases where the sampled population is finite,
describe how you would construct a frame.

a. Obtain a sample of licensed bikers in Mumbai.

b. Obtain a sample of autos visiting the Juhu Beach on a typical


weekday.

c. Obtain a sample of students in MBA (Core) at NMIMS Mumbai.

d. Obtain a sample of the orders that are processed by the e-


commerce firm Flipkart.
Point Estimation
• Sample statistics, like the sample mean and sample standard deviation,
are used to estimate population parameters, such as population mean
and population standard deviation.

• This process is referred to as point estimation. Point estimation is a


form of statistical inference.

• In point estimation we use the data from the sample to compute a value
of a sample statistic that serves as an estimate of a population
parameter.

• Point estimates are expected to differ from the actual population


parameters since they are based on a sample and not the entire
population.
Point Estimation
• Ideally, the sampled population should closely mirror the true
population to ensure accurate inferences.
• There can be significant differences between sampled and true
populations, depending on the sampling process.
• For example, if an amusement park samples only from a population
attending on a day with restricted access for a company's
employees, the sampled population would not represent the
general public, thus skewing inferences.
• It's crucial to ensure that the sampled population closely aligns
with the true population when using a sample to make inferences.
Using sound judgment in study design can aid this process.
CodeGuru, a popular coding bootcamp,
received 900 applications from aspiring
coders. The application form includes
information such as the individual's prior
coding experience score and whether the
individual prefers online or in-person
sessions.

The Admissions Director would like to share


the average coding experience score and the
proportion of applicants preferring in-person
sessions among the pool of 900 applicants.

He decides to estimate the values of the


population parameters of interest based on
sample statistics. A sample of 30 applicants is
selected using computer-generated random
numbers.
Prior Coding Prior Coding
Want to Live on Want to Live on
Applicant No. Experience Applicant No. Experience
Campus? Campus?
Score (100) Score (100)
1 65 Yes 16 72 Yes
2 85 No 17 88 No
3 75 Yes 18 60 Yes
4 90 Yes 19 85 No
5 50 No 20 70 Yes
6 72 No 21 75 Yes
7 88 Yes 22 65 No
8 80 Yes 23 82 Yes
9 60 No 24 60 No
10 85 No 25 88 Yes
11 70 Yes 26 70 No
12 75 Yes 27 85 Yes
13 65 No 28 75 Yes
14 82 No 29 65 No
15 90 Yes 30 72 Yes
•𝒙
ഥ is the point estimator of the population mean, 𝜇
σ 𝑥𝑖 2244
ഥ=
𝒙 = = 𝟕𝟒. 𝟖
𝑛 30

• s is the point estimator of the population standard deviation, 𝜎


σ 𝑥𝑖 − 𝑥ҧ 2
𝒔= = 𝟏𝟎. 𝟔𝟑
𝑛−1

•𝒑
ഥ is the point estimator of the population proportion, p
17
ഥ=
𝒑 = 𝟎. 𝟓𝟔𝟕
30

Note: Different random numbers would have identified a different


sample which would have resulted in different point estimates.
Point Point Population Parameter
Estimator Estimate Parameter Value

ഥ = Sample mean
𝒙 μ = Population
74.8 75
score mean score

s = Sample std. σ = Population


deviation for 10.63 std. deviation for 11.07
score score

𝑝ҧ = Sample p = Population
proportion proportion
0.567 0.6
wanting on wanting campus
campus housing housing
Sampling Distribution of 𝑥ҧ
When the population from which we are selecting a random sample does
not have a normal distribution, the central limit theorem is helpful in
identifying the shape of the sampling distribution of 𝑥.ҧ

CENTRAL LIMIT THEOREM


“In selecting random samples of size n from a population, the sampling
distribution of the sample mean 𝑥ҧ can be approximated by a normal
distribution as the sample size becomes large.”
Sampling Distribution of 𝑥ҧ
• Sampling distribution of 𝒙ഥ is a theoretical distribution of 𝑥ҧ that would
result if we were to draw all possible samples of a certain size from a
given population.
• Sampling distribution of the sample mean 𝑥ҧ is the distribution we
would get if we calculated the mean 𝑥ҧ of each of these samples and
plotted these means.
• In reality, we would rarely (if ever) be able to construct the exact
sampling distribution since it would require drawing all possible
samples.
• But in statistics, the concept of the sampling distribution is extremely
useful for understanding the behavior of sample means, especially
their variability and how they relate to the population parameters they
are estimating.
Sampling Distribution of 𝑥ҧ
• We will use the following notation to define the standard
deviation of the sampling distribution of 𝑥ҧ :
Sampling Distribution of 𝑥ҧ
• The standard deviation of 𝑥,ҧ for a finite population is

• The standard deviation of 𝑥,ҧ for an infinite population is

• A finite population is treated as being infinite if

• is the finite population correction factor.

• is referred to as the standard error of the mean.

“As the sample size increases, the standard error of the mean decreases. This indicates
that larger sample sizes provide more precise estimates of the population mean, as the
variability (error) in the sample mean decreases with larger samples.”
Sampling Distribution of 𝑥ҧ
• When the population has a normal distribution, the sampling
distribution of 𝑥ҧ is normally distributed for any sample size.

• In most applications, the sampling distribution of 𝑥ҧ can be


approximated by a normal distribution whenever the sample is size
30 or more.

• In cases where the population is highly skewed or outliers are


present, samples of size 50 may be needed.

• The sampling distribution of 𝑥ҧ can be used to provide probability


information about how close the sample mean 𝑥ҧ is to the population
mean μ .
Sampling Distribution of 𝑥ҧ
What is the probability that a simple random sample of
30 applicants will provide an estimate of the population
mean score that is within ±3 of the actual population
mean μ?

In other words, what is the probability that 𝒙


ഥ will be
between 72 and 78?
𝟗𝟎𝟎 − 𝟑𝟎 𝟏𝟏. 𝟎𝟕
≈2
𝟗𝟎𝟎 − 𝟏 𝟑𝟎

E(𝑥)ҧ = μ = 75
𝒏 = 𝟑𝟎

𝑃 72 ≤ 𝑥ҧ ≤ 78 = 0.8664

72 75 78
𝑵𝒐𝒘, 𝒏 = 𝟑𝟎𝟎

𝟗𝟎𝟎 − 𝟑𝟎𝟎 𝟏𝟏. 𝟎𝟕


≈ 0.5221
𝟗𝟎𝟎 − 𝟏 𝟑𝟎𝟎

𝑃 72 ≤ 𝑥ҧ ≤ 78 = 0.9999999

72 75 78
𝒏 = 𝟑𝟎𝟎
𝑃 72 ≤ 𝑥ҧ ≤ 78 = 0.9999999

72 75 78
As the sample size increases, the standard error of the mean decreases. This
indicates that larger sample sizes provide more precise estimates of the population
mean, as the variability (error) in the sample mean decreases with larger samples.
Definition and Other
Type of Sampling Examples Advantages Disadvantages
Information

More precise results Requires prior


Population divided into Department, location,
Stratified Random with smaller sample knowledge of population
strata. age, industry type.
size strata

Population divided into City blocks, well-defined Cost-effective, easy Larger total sample size
Cluster
clusters. areas. identification required

Relies on a random
Sample selected at Selecting every Nth Easy sample
Systematic ordering of population
regular intervals. element. identification
list

Limited
Sample chosen based Student volunteers, Easy sample selection
Convenience representativeness of
on convenience. easily accessible. and data collection
the sample

Sample selected based Selected elements most Relatively easy sample Quality depends on
Judgment
on judgment. representative. selection selector's judgment
PRACTICE QUESTIONS
Assume the population standard deviation is σ = 25. Compute the
standard error of the mean for sample sizes of 50, 100, 150, and
200.

What can you say about the size of the standard error of the
mean as the sample size is increased?
A population has a mean of 200 and a standard deviation of 50. A
sample of size 100 will be taken and the sample mean 𝒙 ഥ will be
used to estimate the population mean.

ഥ?
a. What is the expected value of 𝒙

ഥ?
b. What is the standard deviation of 𝒙

ഥ.
c. Show the sampling distribution of 𝒙

ഥ show?
d. What does the sampling distribution of 𝒙
A population has a mean of 200 and a standard deviation of 50.
Suppose a sample of size 100 is selected and 𝒙 ഥ is used to
estimate population mean.

a. What is the probability that the sample mean will be within ±5


of the population mean?

b. What is the probability that the sample mean will be within


±10 of the population mean?
Barron’s reported that the average number of weeks an individual is
unemployed is 17.5 weeks. Assume that for the population of all
unemployed individuals the population mean length of unemployment is
17.5 weeks and that the population standard deviation is 4 weeks. Suppose
you would like to select a sample of 50 unemployed individuals for a
follow-up study.

ഥ , the sample mean average for a


a. Show the sampling distribution of 𝒙
sample of 50 unemployed individuals.

b. What is the probability that a simple random sample of 50 unemployed


individuals will provide a sample mean within 1 week of the population
mean?

c. What is the probability that a simple random sample of 50 unemployed


individuals will provide a sample mean within 1/2 week of the
population mean?

You might also like