SB11 - Group 1
SB11 - Group 1
STUDENT DETAILS
ASSIGNMENT DETAILS
Title:          SB11_Group 1
Length:         34 pages           Due date:       28/04/2024             Date submitted:   28/04/2024
DECLARATION
         I hold a copy of this assignment if the original is lost or damaged.
         I hereby certify that no part of this assignment or product has been copied from any other student’s
         work or from any other source except where due acknowledgement is made in the assignment.
          I hereby certify that no part of this assignment or product has been submitted by me in another
          (previous or current) assessment, except where appropriately referenced, and with prior
          permission from the Lecturer / Tutor / Unit Coordinator for this unit.
         No part of the assignment/product has been written/ produced for me by any other person except
         where collaboration has been authorised by the Lecturer / Tutor /Unit Coordinator concerned.
         I am aware that this work may be reproduced and submitted to plagiarism detection software programs
         for the purpose of detecting possible plagiarism (which may retain a copy on its database for future
         plagiarism checking).
Student’s signature:          Nguyễn Hương Giang
Student’s signature:          Nguyễn Ngọc Thanh Mai
Student’s signature:          Vũ Tuyết Nhi
Student’s signature:          Nguyễn Ngọc Huyền Trân
Student’s signature:          Trần Thị Cát Tường
Note: An examiner or lecturer / tutor has the right to not mark this assignment if the above declaration has not
been signed.
SB-11 | Group 1
                                   STATISTICS FOR
                             BUSINESS ASSIGNMENT
I. Problem Solving
Problem 1
    ● Given the information P (𝑋 > 9.5) = 0.9255 → the Z-score corresponding to the
        probability of 0.9255 is approximately 1.44.
Since 𝑋 follows a normal distribution, we use the formula for the standard error of mean
⇒ Conclusion: σ ≈ 1.2
µ ≈ 9.0011
The normal distribution is a continuous probability distribution that is symmetric about its
mean, with the highest point of the curve occurring at the mean. This means that the left
and right tails of the distribution are mirror images of each other.
The shape of the normal distribution resembles a bell, with the curve gradually rising from
the mean, reaching its peak at the mean, and then gradually decreasing as it moves away
from the mean in both directions → Mean = Median = Mode.
                                                                                             1
SB-11 | Group 1
The mean of the sample is the midpoint of the confidence interval for the population mean.
Given the confidence interval (9.53055, 10.65263), the mean of the sample is the average
of the two endpoints:
    ● Mean of sample = Average of confidence interval for mean = (9.53055 +
        10.65263)÷ 2 ≈ 10.09159
The variance of the sample can be calculated using the confidence interval for the
population variance. Given the interval (1.12632, 3.57520), the variance is the average of
the two endpoints:
    ● Variance of the sample = Average of confidence interval for variance = (1.12632
        + 3.57520)÷ 2 ≈ 2.35076
For a two-sided confidence interval, the margin of error is the distance from the estimated
statistic to each endpoint. When a confidence interval is symmetric, the margin of error is
half the width of the confidence interval → E = (10.65263 - 9.53055) ÷ 2 ≈ 0.56104
⇒ Conclusion:
    -   The mean of the sample ≈ 10.09159
    -   The variance of this sample ≈ 2.35076
    -   The size of this sample ≈ 29 (28.865)
            c.     Test at 90% level of confidence whether the mean and variance of the
                  population in part b) are the same as those of part a).
                  2                             2
Let X ~ N (µ, σ ) in part a and 𝑋1~ N (µ1, σ1 ) in part b
                                                                                          2
SB-11 | Group 1
    ➢ MEAN
Hypothesis
𝐻0: µ = µ1
𝐻1: µ ≠ µ1
Test statistics
                                                       2             2
                               2        (𝑛1− 1)×𝑠 +(𝑛2− 1)×𝑠1                          2              2
                                                                             (12− 1)×1.2 +(29− 1)×1.533
Pooled-variance: 𝑠𝑝 =                          (𝑛1− 1)+(𝑛2− 1)
                                                                         =         (12− 1)+(29− 1)
                                                                                                          = 2.0938
Decision rule: Reject 𝐻0 if t-calc < -1.685 or t-calc > 1.685; otherwise do not reject 𝐻0.
⇒ Conclusion: There is not sufficient evidence to reject 𝐻0. The mean population in part
                                                                                                                         3
SB-11 | Group 1
    ➢ VARIANCE
Hypothesis
      2     2
𝐻0: σ = σ1
      2      2
𝐻1: σ ≠ σ1
Test statistics
Decision rule: Reject 𝐻0 if F > 𝐹𝑈 or F < 𝐹𝐿; otherwise do not reject 𝐻0.
⇒ Conclusion: There is not sufficient evidence to reject 𝐻0. The population variance in
                                                                                      4
SB-11 | Group 1
Problem 2
              a. Find the probability of the number of defective light bulbs in a box of 12.
                  Draw the graph of the corresponding probability distribution.
The question about finding the probability of the number of defective light bulbs in a box
of 12 involves a situation where there are only two possible outcomes for each bulb:
defective or not defective.
Moreover, the probability of having defective light bulbs remains unchanged. Therefore,
this question uses Binomial Distribution to solve.
                                        12
                        𝑃(𝑋 = 𝑥) = ∑
                                       𝑥=0
                                             ( )0. 06 (1 − 0. 06)
                                              12
                                               𝑥
                                                        𝑥            12−𝑥
   X                 P(X)
   0      0.4759203148
   1      0.3645347092
   2      0.1279749511
   3      0.027228713
   4      0.003910506655
   5      0.0003993708924
   6      0.0000297403856
   7      0.000001627133559
   8      0.00000006491224304
   9      0.000000001841482072
  10      0
   11     0
   12     0
                                                                                           5
SB-11 | Group 1
The customer's decision to reject a box depends on how many defective bulbs. Therefore,
the question uses hypergeometric distribution to solve as it helps determine the
probability of getting a specific number of defective bulbs (successes) out of the drawn
samples (2 bulbs) from a finite box (12 bulbs).
    ● If s = 0 ⇒ x=0
         Probability of a box containing 0 defective light bulb is accepted: P (X = 0) = 1
         Probability of a box containing 0 defective light bulb is rejected: 1 - P (X = 0) = 0
                                                                                                     6
SB-11 | Group 1
    10             0.015151515          0.984848485
    11                    0                  1
    12                    0                  1
                                                                      E
    x    P (Rejected| X=x)                P(X=x)           (Proportion of rejected
                                                                   boxes)
0 0 0.4759203148 0
10 0.984848485 0 0
11 1 0 0
12 1 0 0
Total 0.1164
⇒ Conclusion: The expected proportion of boxes rejected is also the probability of a box
being rejected is 0.1164.
                                                                                           7
SB-11 | Group 1
            d. The customer examines each box until accepts one. What is the expected
                  and variance of the number of boxes he needs to examine?
Because the question asks the number of boxes examined before the first success (the box
is accepted), we use Geometric distribution to solve the problem
                                                  1          1
                                         μ=       π
                                                      =   0.8836
                                                                   = 1,1317
                                         2   1−π          1−0.8836
                                       σ=     2       =            2   = 0.1491
                                              π           0.8836
            e.    Suppose 1000 boxes are examined by the above rule. Find the probability
                  that the number of boxes rejected is between 20 and 40 (inclusive). You
                  should find your answers by two different methods.
       X              P(X)
       …
      120            0.03645
      121            0.03492
                                                                                         8
SB-11 | Group 1
      122            0.03314
      123            0.03116
      124            0.02904
      125            0.02681
      126            0.02452
      127            0.02223
      128            0.01997
      129            0.01779
      130             0.0157
      131            0.01373
      132            0.01191
      133            0.01024
      134            0.00873
      135            0.00738
      136            0.00618
      137            0.00513
      138            0.00423
      139            0.00346
      140            0.0028
       ...
                  = P (0.31 ≤ 𝑍 ≤ 2.38)
                  = P (Z ≤ 2.38) – P (Z ≤ 0.31)
                  = 0.9913 – 0.6217 = 0.3696
                                                                             9
SB-11 | Group 1
⇒ Conclusion: the probability that the number of boxes rejected between 120 and 140 is
around 0.36552, there is a small error in Normal Approximation method ( less than 0.01)
because what we take is only a random sample from a population to estimate the binomial
probability.
a. Manufacturers
The diagram illustrates the proportion of 5 car manufacturers among 500 vehicles.
In details:
Ford is the manufacturer that most frequently appears, accounting for 28.8% of the total
proportion. Vauxhall follows closely behind with 28.0%, a difference of 0.8%. Next in a
                                                                                      10
SB-11 | Group 1
descending order are Volkswagen at 18.2% and BMW at 16.8%. Toyota appears least
often among the 500 vehicles, accounting for only 8.2%.
b. Propulsion type
The diagram illustrates the percentage of each fuel type that is used among 500 vehicles.
The codes of its propulsion type are decoded in the pie chart as follows:
In details:
Out of 5 propulsion types that are provided, 3 types are seen among 500 vehicles, which
are Petrol, Diesel, and Electric/Petrol.
                                                                                            11
SB-11 | Group 1
Petrol is the most used among 3 types of fuels, dominating more than half of the overall
proportion (58.2%). Next, ranking second is Diesel at 37.6%. Lastly, Electric/Petrol’s
proportion is recorded at 4.2%, meaning that usage of this propulsion type is the lowest
out of 500 vehicles. The differences in proportions of the 3 propulsion types are not
extremely large, but certainly significant.
2. Provide descriptions for the variables Engine Size, Mass, and CO2 Emission.
a. Engine size
The diagram illustrates the distribution of engine capacity among 500 vehicles in cubic
centimeters.
                                                                                      12
SB-11 | Group 1
In details:
The histogram reveals a positively skewed distribution (1.87 > 0), which means a higher
concentration towards smaller engine sizes. The kurtosis value of 8.14 indicates a highly
leptokurtic distribution, meaning a sharper peak and heavier tails than a normal
distribution. Therefore, this distribution is asymmetrical.
    ● The average engine size is 1667.2 cc and the median of 1596 cc provided
        another measure of central tendency, lower than the mean due to the positive skew.
        Moreover, the minimum value (647 cc) being closer to the mean compared to the
        maximum value (4951 cc) further proves the positive skew.
    ● Two formulas are used to identify potential outliers: [Q3 + 1.5 * IQR] and [Q1 - 1.5
        x IQR]. After analyzing, there are 2 extreme outliers present in the data set, these
        outliers are located on the right tail of the diagram.
In conclusion, this diagram does not represent a perfectly normal distribution, with
factors that strongly prove this:
b. Mass
The diagram illustrates different masses of vehicles among 500 vehicles in kilograms,
including the average driver’s mass at 75 kilograms.
                                                                                          13
SB-11 | Group 1
In details:
This histogram demonstrates a positively skewed distribution (0.7 > 0), suggesting
there are more vehicles with lighter masses. A kurtosis of 0.64 suggests a slightly
leptokurtic distribution, with a flatter distribution of data points around the mean
compared to a normal distribution. Therefore, this distribution is asymmetrical.
    ● The average vehicle mass is 1413 kilograms (kg) and the median of 1395 kg
        provided another measure of central tendency, slightly lower than the mean due to
        the positive skew.
                                                                                       14
SB-11 | Group 1
    ● Two formulas are used to identify potential outliers: [Q3 + 1.5 * IQR] and [Q1 - 1.5
        x IQR]. After analyzing, there are approximately 6 extreme outliers located on
        the right tail of the diagram.
In conclusion, this diagram does not represent a perfectly normal distribution, with
factors that strongly prove this:
c. CO2 emission
The diagram depicts the different amount of carbon dioxide (CO2) emission among 500
vehicles in grams per kilometer.
                                                                                           15
SB-11 | Group 1
In details:
From the statistics, skewness at 1.13 proves that the distribution is positively skewed
(1.13 > 0), an interpretation of slightly higher concentration towards lower values. With the
kurtosis value at 2.73, it is an indication of a leptokurtic distribution, meaning a sharper
peak than a normal distribution would have. Therefore, this distribution is
asymmetrical.
    ● The average amount of CO2 emission from 500 vehicles is 135.35 grams per
        kilometer (g/km) which provides a baseline understanding of 500 vehicles typical
        CO2 output. The median value is 125 g/km, lower than the mean due to the
        positively skewed distribution.
    ● At 36.97, the standard deviation shows how spread out the data of CO2 emission
        is to the mean value.
    ● The interquartile value is at 42.00. This suggests that the middle half of the data
        points falls within this range.
    ● Two formulas are used to identify potential outliers: [Q3 + 1.5 * IQR] and [Q1 - 1.5
        x IQR]. After analyzing, there are approximately 7 to 15 extreme outliers located
        on the right tail of the diagram.
In conclusion, this diagram does not represent a perfectly normal distribution, with
factors that strongly prove this:
                                                                                          16
SB-11 | Group 1
The matrix plot illustrates the correlation between engine size and mass in a dataset of 500
vehicles.
A positive correlation is evident between mass and engine size. This implies that, as
mass increases, the engine size also tends to increase. This is demonstrated by the upward
slope of data points in the diagram. However, certain dispersion around the upward trend
also implies that other factors besides engine size might affect the mass.
In conclusion, the data above suggests a moderately strong positive correlation between
engine size and mass, with larger engines generally corresponding to heavier vehicles.
                                                                                          17
SB-11 | Group 1
4. Test at 1% significance level whether the variance of the CO2 emission level in
 ● The F-test: This test assumes the two samples come from populations that are
     normally distributed.
 ● In this case, since the tests we will be considering are based on a normal
     distribution, we are expecting to use the F-test.
 ●   You can be 99% confident that the true population variance for CO2 Emission in
     2002 lies between (790.229, 1476.920).
 ● You can be 99% confident that the true population variance for CO2 Emission in
     2016 lies between (604.363, 887.415).
 ⇒ The confidence intervals for the variance of CO2 emission in 2002 (790.229, 1476.920)
 and 2016 (604.363, 887.415) do not overlap considerably, indicating that the variances
 are likely different.
                                                                                      18
SB-11 | Group 1
Estimated Ratio of variances: The ratio of variances is the variance of the CO2
Emission in 2002 divided by the variance of the CO2 Emission in 2016.
 99% CI for Ratio using F: We can be 99% confident that the ratio of the two
 population variances is between 1022 and 2.130. Because the interval does not
 contain the value 1, we can conclude that the population variances differ.
 The graph below illustrates the 99% CI for the ratio of the two population variances that
 can not reach “1”. In other words, the 2 population variances are different.
                                                                                        19
SB-11 | Group 1
Test statistic
We can conclude from the test statistic by finding the critical value and applying the
decision rule.
We can use Minitab to draw the graph and the critical values:
Because F > Fu (1.45595 > 1.425), there is sufficient evidence at 1% significance level to
reject Ho.
                                                                                       20
SB-11 | Group 1
P - value:
 We can conclude from the p-value. The p-value is 0.006, which is less than the
 significance level (1%), so the decision is to reject the null hypothesis.
 ⇒ Conclusion: The null hypothesis states that the ratio between the variance is 1.
     Because the p-values are less than the significance level α = 0.01, we reject the null
     hypothesis. In other words, we have enough evidence to conclude that the
     variance between the CO2 emissions in 2002 and 2016 are different. The
     variance of 2002 is found to be significantly higher than 2016 at significance level of α
     = 0.01.
5. Test at 1% significance level whether there is a significant decrease in the mean CO2
The result of 4 shows that the two population variances are different. Therefore, when
using minitab, we do not assume equal variances.
                                                                                            21
SB-11 | Group 1
SE Mean:
    ● SE mean for the CO2 emissions in 2016 is smaller than SE mean for the CO2
        emissions in 2002 (1.4 compared to 2.8) as the sample size for CO2 emissions in
        2016 is larger.
    ● It means the CO2 emissions in 2016 provides more precise estimates of the
        population mean.
Difference:
    ● Difference is the difference between the means of the two samples (CO2 emissions
        in 2002 and CO2 emissions in 2016).
    ● 52.45 = 173.2 - 120.8
                                                                                     22
SB-11 | Group 1
T value:
We can compare the t-value to critical values of the t-distribution to determine whether to
reject the null hypothesis.
P value:
However, using the p-value of the test to make the same determination is usually more
practical and convenient. Because the p-value is 0.000, which is less than the significance
level of 0.01, the decision is to reject the null hypothesis. In other words, we can conclude
that the amount of CO2 Emissions in 2002 and 2016 are different.
 ⇒ Conclusion: The null hypothesis states that the ratio between the variance is 1.
     Because the p-values are less than the significance level α = 0.01, we reject the null
     hypothesis. In other words, we have enough evidence to conclude that the mean
     between the CO2 emissions in 2002 and 2016 are different. The mean of 2002 is found
     to be significantly higher than 2016 at the significance level of α = 0.01.
                                                                                               23
SB-11 | Group 1
Hypotheses
    ● Null Hypothesis (H0): The mean CO2 emissions for all vehicle manufacturers are
        equal.
    ● Alternative Hypothesis (H1): At least one manufacturer's mean CO2 emission differs
        from the others.
● H0: μ1 = μ2 = μ3 = μ4 = μ5
(where μ1, μ2, μ3, μ4, μ5 represent the population means of the 5 manufacturers)
Assumptions
    ● CO2 emissions for each manufacturer follow a normal distribution, with the same
        but unknown variances.
ANOVA test
                                                                                         24
SB-11 | Group 1
    ● To further support this conclusion, we conduct the F Test using α = 0.01 with F4,
        133 (Numerator: df1 = 4; Denominator: df2 = 133). The decision rule is:
The F-Distribution Plot below illustrates the probability of the emissions rates between
manufacturers being equal.
                                                                                       25
SB-11 | Group 1
    ● To further support this conclusion, we conduct the F Test using α = 0.01 with F4, 357
        (Numerator: df1 = 4; Denominator: df2 = 357). The decision rule is:
The F-Distribution Plot below illustrates the probability of the emissions rates between
manufacturers being equal.
                                                                                        26
SB-11 | Group 1
7. Determine the regression equation for CO2 emission with independent variables
Engine Size and Mass. Comment on the fit of the model. Give the interpretation of the
        ○ b0 = 83.57 is the expected CO2 emission for those with 0 Mass and 0 Engine
            Size. It is not meaningful because an engine size of zero is not physically
            possible.
        ○ b1 = 0.03744: For every increase in the engine size, the CO2 Emission increases
            on average by 0.03744 (g/km), holding other variables constant.
                                                                                      27
SB-11 | Group 1
        ○ b2 = -0.00760: For every increase in Mass volume, the CO2 Emission decreases
            on average by 0.00760 (g/km), holding other variables constant.
        ○ The standard error of the coefficient measures the precision of the estimate of
            the coefficient. The smaller the standard error, the more precise the estimate.
        ○ The standard error of the Engine Size coefficient is smaller than that of Mass
            (0.00392 compared to 0.00733). Therefore, the estimate of the coefficient for
            the Engine Size has greater precision.
● P-value:
        ○ The p-value of Engine Size (0) is less than the significance level (0.01), we can
            conclude that there is a statistically significant association between the
            Engine Size and CO2 Emissions.
        ○ The p-value of Mass (0.3) is greater than the significance level (0.01), we cannot
            conclude that there is a statistically significant association between the Mass
            and the CO2 Emissions.
● VIF:
VIF of Engine Size and Mass is 1,71, which is between 1 and 5. We can conclude that both
Engine Size and Mass are moderately correlated.
                                                                                              28
SB-11 | Group 1
● P-value:
            ○ The P-value of Regression equals 0, which is less than the significance level
                  (0.01). This means the model explains variation in the response. At least one
                  coefficient is different from 0.
            ○ The P value for the estimated coefficient of Engine Size is less than the
                  significance level (0.01). It means Engine Size is significantly related to CO2
                  Emissions.
            ○ The P value for the estimated coefficient of Mass is higher than the
                  significance level (0.01). It means Mass is not related to CO2 Emissions at
                  the level of 0.01.
            ○ The P value for Lack-of-Fit is 0, which is less than the significance level. We
                  conclude that the model does not correctly specify the relationship between
                  the response and the predictors.
                                                                                              29
SB-11 | Group 1
    ● Since the CO2 Emission regression yields R2 = 21.5%, we could say that X (“Engine
        Size” and “Mass”) “explains” 21.5 percent of the variation in Y (CO2 Emission). On
        the other hand, 78.5 percent of the variation in CO2 Emission is not explained by
        Engine Size and Mass.
    ● The R-sq is extremely low (21.50%), indicating that our model does not have a
        strong fit and must be improved.
8. Test at 1% significance level whether the proportion of cars meeting the CO2
Data
                                                                             56
       2016              n1 = 362                    x1 = 56         p1 =   362
                                                                                  = 0.154696
                                                                             0
      2002               n2 = 138                    x2 = 0         p2 =    138
                                                                                  = 0.000000
Hypotheses
To determine whether the proportion of cars meeting the CO2 emission target in 2016 is
greater than that in 2002, we set up the following hypotheses:
    ● Null Hypothesis: The population proportion of cars meeting the CO2 emission
        target in 2016 is smaller equal to that in 2002.
                                                                                               30
SB-11 | Group 1
    ● H0: π1 ≤ π2
(where π1, π2 represent the population proportion of cars meeting the target in 2016 and
2002, respectively)
● H1: π1 > π2
Assumptions
For a test of two proportions, the criterion for normality is nπ ≥ 10 and n(1 - π) ≥ 10 for
each sample
    ● n1p1 = 56 > 10
    ● n1(1 - p1) = 306 > 10
    ● n2p2 = 0 (does not meet the requirement for normality)
    ● n2(1 - p2) = 0 (does not meet the requirement for normality)
However, using Minitab for calculations will likely minimize the error.
Test statistic
● The P-value (p ≈ 0), which is less than the significance level α = 0.01 → reject H0
    ● Given the Z-value (z = 4.90), , the decision rule is: If z > z0.01, we reject H0
        At α = 0.01, the right-tail critical value is z0.01 = 2.576, which is less than z = 4.90,
        confirming that we should reject H0 .
                                                                                              31
SB-11 | Group 1
Confidence interval
To determine whether the proportion in 2016 is greater than that in 2002, we construct the
confidence interval of the difference between these two proportions:
                                               0.154696 (1−0.154696)        0
                    = (0.154696 - 0) 土 2.576            362
                                                                       +   138
= [ 0.10574, 0.20365 ]
The lower bound for the difference of 0.10574 is greater than zero, confirming that the
proportion of cars meeting the CO2 emission target in 2016 is higher than that
in 2002.
⇒ Conclusion: From 2002 to 2016, there has been a remarkable improvement in the
percentage of vehicles meeting the CO2 emission target. These five manufacturers have
made substantial progress in reducing CO2 emissions from their vehicles.
32