Statistical Modeling, Lecture 1
Melike Efe
July 7, 2025
Sabancı University
Table of contents
   1. Introduction
   2. Sampling Distributions
                               1
Introduction
Course Rules and Expectations
   Attendance and Participation:
    ▶ Participation is not mandatory.
   Exams:
    ▶ Midterm: 45%.
    ▶ Final Exam: 55% (Date announced by Student Resources).
    ▶ Make-up exams available with a valid medical report. There will be
      a single make-up exam after the final exam period, which will
      replace all your missing exam grades. The make-up exam will be
      cumulative, covering all topics.
      Note: The make-up exam is expected to be more challenging than
      the regular exams.
   Study Advice:
    ▶ Review material weekly.
    ▶ Attend lectures and recitations regularly.
                                                                           2
Why Statistics?
   Statistics is the science of collecting, analyzing, interpreting, and
   presenting data.
   Why Important?
    ▶ make informed decisions based on data.
    ▶ Used in nearly every field, including science, business, and social
      sciences.
   Examples:
    ▶ Business: Optimize marketing strategies using consumer data.
    ▶ Environmental Science: Analyze climate change trends and predict
      future.
    ▶ Insurance: Given your driving record, car information, and coverage,
      what is a fair premium?
    ▶ Clinical Trials: A drug is tested on 100 patients; 56 were cured and
      44 showed no improvement. Is the drug effective?
                                                                             3
Relationship Between Statistics and Probability
    ▶ Probability: Starts with a known model or distribution and predicts
      outcomes.
        ▶ Probability Question: Suppose we have a fair coin (we know it’s
          fair). What is the probability of getting 7 heads in 10 flips?
               This is a probability problem because we know the model (fair coin =
                50% heads)
                                                                                       4
Relationship Between Statistics and Probability
    ▶ Probability: Starts with a known model or distribution and predicts
      outcomes.
        ▶ Probability Question: Suppose we have a fair coin (we know it’s
          fair). What is the probability of getting 7 heads in 10 flips?
               This is a probability problem because we know the model (fair coin =
                50% heads)
    ▶ Statistics: Starts with data and infers the underlying model or
      parameters.
        ▶ Statistics Question: Suppose we flip a mystery coin 10 times and
          observe 7 heads. Can we conclude the coin is fair?
               This is a statistics problem because we start with data (7 heads)
               We try to infer something about the model (is it fair? biased?)
               We might do a hypothesis test or calculate a confidence interval
                                                                                       4
Sampling Distributions
Introduction
   Definition: (Random Sample) If X1 , X2 , . . . , Xn are independent and
   identically distributed random variables, we say that they constitute a
   random sample from the infinite population given by their common
   distribution.
   Remark: If µ and σ 2 denote the mean and variance of the population
   (distribution), it follows from the above definition that E(Xi ) = µ and
   var(Xi ) = σ 2 for all i = 1, . . . , n.
                                                                              5
Definition: If X1 , X2 , . . . , Xn constitute a random sample, then the
sample mean is given by                 Pn
                                              Xi
                                   X = i=1
                                            n
and the sample variance is given by
                                     Pn
                                 2         (Xi − X )
                               S = i=1
                                          n−1
                                                                           6
The Sampling Distribution of the Mean
   Theorem: If X1 , X2 , . . . , Xn constitute a random sample from an infinite
   population with the mean µ and the variance σ 2 , then
                                                     σ2
                          E(X ) = µ and var(X ) =
                                                     n
   Moreover,
                                  E(S 2 ) = σ 2 .
   Definition: (Standard error of the mean) Standard deviation of the
   sample mean X is called the standard error of the mean and it is given by
    σ
   √
     n
       .
                                                                                  7
Chebyshev’s Inequality
   Theorem: (Chebyshev’s Inequality) If µ and σ are the mean and the
   standard deviation, respectively, of a random variable X , then for any
   positive constant c
                                                      σ2
                       P(µ − c < X < µ + c) ≥ 1 −        .
                                                      c2
   Theorem: (Law of Large Numbers) If X1 , X2 , . . . , Xn constitute a
   random sample from an infinite population with mean µ and finite
   vaiance σ 2 , then for any c > 0
                                                     σ2
                      P(µ − c < X < µ + c) ≥ 1 −          .
                                                     nc 2
   This implies
                        lim P(µ − c < X < µ + c) = 1
                       n→∞
                                                                             8
The Sampling Distribution of the Mean: normal case
   Definition: A random variable X has a normal distribution and it is
   referred to as a normal random variable if and only if its probability
   density is given by
                                  (x − µ)2
                           1  −
                  f (x) = √ e        2σ 2 ,    −∞ < x < ∞.
                         σ 2π
   where σ > 0.
   Definition: The normal distribution with µ = 0 and σ = 1 is referred to
   as the standard normal distribution.
                                                                             9
Theorem: If X has a normal distribution with the mean µ and the
                               X −µ
standard deviation σ, then Z =        has the standard normal
                                 σ
distribution.
Theorem: If X1 , X2 , . . . , Xn constitute a random sample from an infinite
normal population with the mean µ and the standard deviation σ, then
     X −µ
Zn =    √ has the standard normal distribution.
     σ/ n
                                                                               10
Table of Standard Normal Probabilities P(0 ≤ Z ≤ z)
       z     0.00     0.01     0.02     0.03     0.04     0.05     0.06     0.07     0.08     0.09
      0.0   0.0000   0.0040   0.0080   0.0120   0.0160   0.0199   0.0239   0.0279   0.0319   0.0359
      0.1   0.0398   0.0438   0.0478   0.0517   0.0557   0.0596   0.0636   0.0675   0.0714   0.0753
      0.2   0.0793   0.0832   0.0871   0.0910   0.0948   0.0987   0.1026   0.1064   0.1103   0.1141
      0.3   0.1179   0.1217   0.1255   0.1293   0.1331   0.1368   0.1406   0.1443   0.1480   0.1517
      0.4   0.1554   0.1591   0.1628   0.1664   0.1700   0.1736   0.1772   0.1808   0.1844   0.1879
      0.5   0.1915   0.1950   0.1985   0.2019   0.2054   0.2088   0.2123   0.2157   0.2190   0.2224
      0.6   0.2257   0.2291   0.2324   0.2357   0.2389   0.2422   0.2454   0.2486   0.2517   0.2549
       ..      ..       ..       ..       ..       ..       ..       ..       ..       ..       ..
      1.0   0.3413   0.3438   0.3461   0.3485   0.3508   0.3531   0.3554   0.3577   0.3599   0.3621
      1.1   0.3643   0.3665   0.3686   0.3708   0.3729   0.3749   0.3770   0.3790   0.3810   0.3830
      1.2   0.3849   0.3869   0.3888   0.3907   0.3925   0.3944   0.3962   0.3980   0.3997   0.4015
      1.3   0.4032   0.4049   0.4066   0.4082   0.4099   0.4115   0.4131   0.4147   0.4162   0.4177
      1.4   0.4192   0.4207   0.4222   0.4236   0.4251   0.4265   0.4279   0.4292   0.4306   0.4319
      1.5   0.4332   0.4345   0.4357   0.4370   0.4382   0.4394   0.4406   0.4418   0.4429   0.4441
      1.6   0.4452   0.4463   0.4474   0.4484   0.4495   0.4505   0.4515   0.4525   0.4535   0.4545
      1.7   0.4554   0.4564   0.4573   0.4582   0.4591   0.4599   0.4608   0.4616   0.4625   0.4633
      1.8   0.4641   0.4649   0.4656   0.4664   0.4671   0.4678   0.4686   0.4693   0.4699   0.4706
      1.9   0.4713   0.4719   0.4726   0.4732   0.4738   0.4744   0.4750   0.4756   0.4761   0.4767
      2.0   0.4772   0.4778   0.4783   0.4788   0.4793   0.4798   0.4803   0.4808   0.4812   0.4817
      2.1   0.4821   0.4826   0.4830   0.4834   0.4838   0.4842   0.4846   0.4850   0.4854   0.4857
      2.2   0.4861   0.4864   0.4868   0.4871   0.4875   0.4878   0.4881   0.4884   0.4887   0.4890
      2.3   0.4893   0.4896   0.4898   0.4901   0.4904   0.4906   0.4909   0.4911   0.4913   0.4916
      2.4   0.4918   0.4920   0.4922   0.4925   0.4927   0.4929   0.4931   0.4932   0.4934   0.4936
      2.5   0.4938   0.4940   0.4941   0.4943   0.4945   0.4946   0.4948   0.4949   0.4951   0.4952
      2.6   0.4953   0.4955   0.4956   0.4957   0.4959   0.4960   0.4961   0.4962   0.4963   0.4964
      2.7   0.4965   0.4966   0.4967   0.4968   0.4969   0.4970   0.4971   0.4972   0.4973   0.4974
       ..      ..       ..       ..       ..       ..       ..       ..       ..       ..       ..
                                                                                                      11
Example: If Z is a random variable having the standard normal
distribution, find P(Z < 1.73) and P(−1.5 ≤ Z ≤ 0.24).
                                                                12
Example: Assume that exam scores Xj of students in a certain
population are independent and normally distributed with µ = 24 and
σ = 10. Assume that 400 randomly chosen students are enrolled in this
course and the passing grade is 21. Find the mean and variance of the
class average X . Estimate the probability that X exceeds the passing
grade.
                                                                        13
The Central Limit Theorem
   Theorem: If X1 , X2 , . . . , Xn constitute a random sample from an infinite
   population with the mean µ and the variance σ 2 , then the limiting
   distribution of
                                           X −µ
                                      Z=      √
                                            σ/ n
   as n → ∞ is the standard normal distribution.
                                                                                  14
Example: A university is planning to accept new students for the next
academic year. The university can accommodate at most 3695 students.
Each applicant is accepted with probability 0.36, and applications follow
a Bernoulli trial model. If the university receives 10,000 applications, use
the Central Limit Theorem (CLT) to estimate the probability that the
university cannot accommodate all accepted students.
                                                                               15
Example: Let X1 , X2 , . . . , Xn be a sequence of i.i.d. exponentially
distributed random variables with E[Xj ] = 1.
Find the smallest value of the sample size n so that for the sample mean
X of size n, P(0.9 ≤ X ≤ 1.1) ≥ 0.95 holds.
                                                                           16