Name-Niyati Shah
VIN-633007415
                                                           -
                                                                                                                  STAT-654
email-myatibhavik Shah atamu edu     .                              -
                                                                                                          Assignment-1
  #1 .
          E(i)              =. Bl
         For        simple                   linear                                                       model
                                                                                                            ,
   =>
                                                                  regression
                Yi      =
                                Bo   +       Bixi          + ei
                 the data sex                              (x,, 4) , (x2                    y 2)              (kn , yu)
         for                                                                            ,           ...
                                                                            = (yi Yi)
                                                                                                                                  ↑
         sum
                    o square residuals
                                                                                                          -
                                                                                                2
         :     SSR
                            =                [Yi       -
                                                               (B +         B xi)]
         To     determine                      Boa Bi                   ,
                                                                            we          have to               minimize            SSR
             GST
         ...
               [yi -(P pixi)]                 +                         ( D)                0
                                                                            -
                                                                                    =
         .
               - Eyi            +        EP        +           piEsi            =       0
                                                                                                              y   =
                                                                                                                      E
                                                                                                                      N
         -      -
                    In          +            Bon       +       Bisn             =       0
                                y BY i
         :
                    Bo      =            -
         Now
           ,
                    4
                                              O
         [yi                    -
                                     (B + Bixi)] (                          -
                                                                                xi)         =       0
         :
                E[j-Bi + Bixi-yilxi                                                                 =
                                                                                                          0
         Exi-pis Exi                                                    +
                                                                                pi Exi-Exiyi                                  =       o
                                         Bi [n22 -Exi2)
                         -
         :N          Fy          -                                                              -
                                                                                                        Ex1YI             =   &
          :
                    B           =
                                         nxY           -
                                                               Exy ;
                                         net-Exi ?
                        =
                                Exiyi              -
                                                           nx
                                    Exi            -
                                                               ni
                                                                *
                                Exiyi                               Eyi
                    =                                  -
                                     2(Xi              -
                                                               X)2
&   sample                  correlation
                                                                  coefficient =
                                                                              R2
-
     coefficient                          of          determination
    ①       residuals                         =
                                                      2.      =
                                                                  yi       -Y      ,
    error sun                                                              SSE                      Eci2
                                      of          square
                                                                                           =
                                                                                       22(yi y, )2                  -
                                                                                               E(yi-y)
                                                                                                                            :
    ① lotal             sum
                                          of      square                   SST         :
    +               =
                      E(yi y y y)                     -
                                                                   +           -
                      E[(yi yi) + (yi y)]2            -                                -
                    [(yi-y) + 2 [ (yi-yi) (yi y) E(yi y)                                                    -
                                                                                                                        +       -
        Now
         ,
                             y        =
                                               B + Bixi                        =
                                                                                           y B  -
                                                                                                                        +   Bixi
                        ..                -
                                                  y
                                                              =
                                                                  B(xi x)              -
            Abo
                                                                                           B                Bixi
                            ,
                                    yi-y                      =
                                                                       yi
                                                                                                    -
                                                                  Yi
                                                                           -
                                                                                   Y       +    Bise-Bix ;
                    (yi-yi)                                        (yi             y) + BY (x xi)
                                                              =                                                     -
                                                                               -
            Therefore               ,                 2(yi yi)(yi y)
                                                                   -                        -
            =
                EB      ,       (xi   -
                                          x)      .
                                                      [Bi(xi x)        -
                                                                                   +       Bi (5 x)]    -
        BYE(xi                            -
                                              -)      "
                                                          -
                                                                  BY(xi 5)     -
        I
                O
     vence
         ,
                                SST               =
                                                          E(yi yi)     -
                                                                                       +
                                                                                                E(yi - y)2
                :       337           =        SSE +              Ely - y)2
                                                                                               E(y y)
                                                                                                      2
    where                   SSR                                                        =                        -
             regression                        sum
                                                              of squares
&2   given            that                            MX              =       2
     My(X(X)                      =
                                               Bo                 +       Bi( Mx)             -           +
                                                                                                                B2(X-MX)
                          =
                                  Bo              +
                                                              Bi(x 2)         -
                                                                                              +
                                                                                                      B2(x 2)2  -
                                                                                                          B2(X =               4)
                      -
                              Bo                  +
                                                              Bix             -
                                                                                  2B              +                  4x +
                          Bo                  +
                                                          BIK-2B                          ,       +
                                                                                                      B2x2- 4B2X                   +    4B2
              =
                      Bo-2B                               ,
                                                                  +
                                                                       4B2                +
                                                                                                  BIX-YP2X                    +
                                                                                                                                       B2x2
                                                                                                                                                                        *
      comparing                                   the                  equation                       with
                                                                                                                    Mylx           -8 5-3
                                                                                                                                        .       .
                                                                                                                                                    2x +   0   .
                                                                                                                                                                   FX
          Be      =       0 7 .
                      4 B2                                3           2
          B1
                  -
                                              =    -
                                                                  .
          Bi              4(0 7)                                      3 2
                                                                  -
      :           -
                                          .
                                                          =               .
                              3 2 + 2 8
           Bi
      -               =   -
                                  .
                                                              .
                      =
                          -
                              0       .
                                          Y
      And
                      Bo-2B1                              +           4B2             =- 8.5
          :
                  Bo-2(0 4)                           -
                                                                      +       4(0 7)  .
                                                                                                  =       -   8.5
                  2.
      :
     Thus     ,
                      the                     centered                            model                    is
     My(X(X)
                                  =            -
                                                          21          -
                                                                                  0   .
                                                                                          4(X 2) +    -
                                                                                                                     0
                                                                                                                         .
                                                                                                                             7(x   -
                                                                                                                                       2)   .
                                                       *Help of ChatGPT was used in learning some part of the code.
Question D
R Code with Outputs
#Question D1
setwd("C:/Users/n-shah/OneDrive - Texas A&M University/Semester 3/STAT 654 Stats")
# Read the data from the text file
hc <- read.table("HardwoodTensileStr-1.txt", header=TRUE, sep="")
# View the data
head(hc)
Concentration Strength
1           1.0      6.3
2           1.5     11.1
3           2.0     20.0
4           3.0     24.0
5           4.0     26.1
6           4.5     30.0
# Center the predictor
hc$Concentration_centered <- hc$Concentration - mean(hc$Concentration)
y <- hc$Strength
# Fit a third order polynomial model
hc3 <- lm(Strength ~ poly(Concentration_centered, 3, raw=TRUE), data=hc)
#Summary of the model to get the adjusted R-squared and p-value
summary(hc3)
lm(formula = Strength ~ poly(Concentration_centered, 3, raw = TRUE),
    data = hc)
Residuals:
    Min      1Q     Median       3Q       Max
-4.6250 -1.6109     0.0413   1.5892    5.0216
Coefficients:
                                              Estimate Std. Error t value                    Pr(>|t|)
(Intercept)                                  44.975562   0.869032 51.754                      < 2e-16    ***
poly(Concentration_centered, 3, raw = TRUE)1 4.339394    0.350978 12.364                     2.87e-09    ***
poly(Concentration_centered, 3, raw = TRUE)2 -0.548873   0.039199 -14.002                    5.11e-10    ***
poly(Concentration_centered, 3, raw = TRUE)3 -0.055188   0.009789 -5.638                     4.72e-05    ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.585 on 15 degrees of freedom
Multiple R-squared: 0.9707, Adjusted R-squared: 0.9648
F-statistic: 165.4 on 3 and 15 DF, p-value: 1.025e-11
The adjusted R2 is 0.9648 and the p-value is 1.025e-11. Which is much lower than 0.01 significance level.
Thus, it can be said that the model is statistically significant and can be used to predict the tensile
strength.
#Question D2
# Plot the data and the fitted model
plot(hc$Concentration_centered, y, main="Scatter Plot with Third Order Polynomial Fit", xlab =
"Concentration(centered)", ylab = "Strength", pch=19)
points(hc$Concentration_centered,fitted(hc3), col="red",pch=19)
# Adding the fitted curve
curve(predict(hc3,newdata = data.frame(Concentration_centered=x)), add = TRUE, col="blue",lwd=2)
As seen in the above graph, it appears that the third order polynomial fits the dataset well.
#Question D3
# Residuals vs Fitted values plot
plot(fitted(hc3),resid(hc3), xlab = "Fitted Values", ylab="Residuals", main = "Residuals vs Fitted Values")
abline(h=0, col="red", lty=2)
# Fit a fifth order polynomial model to this data
hc5 <- lm(Strength ~ poly(Concentration_centered, 5, raw=TRUE), data=hc)
#Summary of the model to get the adjusted R-squared and p-value
summary(hc5)
Call:
lm(formula = Strength ~ poly(Concentration_centered, 5, raw = TRUE),
    data = hc)
Residuals:
     Min       1Q   Median                  3Q          Max
-2.65167 -0.91159 -0.03811             0.96396      2.56865
Coefficients:
                                               Estimate Std. Error t value                               Pr(>|t|)
(Intercept)                                  43.6187788 0.7309210 59.676                                  < 2e-16     ***
poly(Concentration_centered, 5, raw = TRUE)1 5.3479308 0.3896655 13.724                                  4.11e-09     ***
poly(Concentration_centered, 5, raw = TRUE)2 -0.1378567 0.1059263 -1.301                                 0.215700
poly(Concentration_centered, 5, raw = TRUE)3 -0.1630817 0.0289147 -5.640                                 8.06e-05     ***
poly(Concentration_centered, 5, raw = TRUE)4 -0.0114448 0.0026525 -4.315                                 0.000840     ***
poly(Concentration_centered, 5, raw = TRUE)5 0.0021978 0.0005163     4.257                               0.000935     ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.703 on 13 degrees of freedom
Multiple R-squared: 0.989,    Adjusted R-squared: 0.9847
F-statistic: 233.1 on 5 and 13 DF, p-value: 3.022e-12
The adjusted R-squared is 0.98 which is greated than third order polynomial and the p-value is lower than 0.01 and that of the
third order polynomial.
Question D4
#p-values for the polynomial terms (excluding the intercept)
p_values <- summary_hc5$coefficients[-1, "Pr(>|t|)"]
# Finding the index of the term with the largest p-value (adding one to the index considering the
intercept)
largest_p_value_index <- which.max(p_values) + 1
#Generating the polynomial terms excluding the one with the largest p-value
poly_terms <- paste0("I(Concentration_centered^", 1:5, ")", collapse=" + ")
poly_terms <- strsplit(poly_terms, " + ")[[1]]
# Split into individual terms
poly_terms <- poly_terms[-largest_p_value_index]
# Removing the term with largest p-value
newf <- as.formula(paste("Strength ~ ", paste(poly_terms, collapse=" + ")))
# Fitting the model with the updated formula
hc5_reduced <- lm(newf, data=hc)
summary_hc5_reduced <- summary(hc5_reduced)
Call:
lm(formula = newf, data = hc)
Residuals:
     Min       1Q   Median            3Q        Max
-2.65167 -0.91159 -0.03811       0.96396    2.56865
Coefficients:
                              Estimate Std. Error t value Pr(>|t|)
(Intercept)                 43.6187788 0.7309210 59.676 < 2e-16                 ***
I(Concentration_centered^1) 5.3479308 0.3896655 13.724 4.11e-09                 ***
I(Concentration_centered^2) -0.1378567 0.1059263 -1.301 0.215700
I(Concentration_centered^3) -0.1630817 0.0289147 -5.640 8.06e-05                ***
I(Concentration_centered^4) -0.0114448 0.0026525 -4.315 0.000840                ***
I(Concentration_centered^5) 0.0021978 0.0005163     4.257 0.000935              ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.703 on 13 degrees of freedom
Multiple R-squared: 0.989,    Adjusted R-squared: 0.9847
F-statistic: 233.1 on 5 and 13 DF, p-value: 3.022e-12
# Scatterpot of the data with the fitted curve from the final model
plot(hc$Concentration_centered, hc$Strength, main="Scatter Plot with Reduced Model Fit",
   xlab="Concentration (Centered)", ylab="Strength", pch=19)
lines(sort(hc$Concentration_centered), predict(hc5_reduced)[order(hc$Concentration_centered)],
col="blue", lwd=2)
The significance after omitting the polynomial term with the highest p value didn’t affect the model.
The R-squared is 0.9847 which is the same.
The scatter plot of 5 degree polynomial visually and statistically fits the dataset better and precisely
compared to the third order polynomial.
Question E
Question E1
setwd("C:/Users/n-shah/OneDrive - Texas A&M University/Semester 3/STAT 654 Stats")
data <- read.table("TreeAgeDiamSugarMaple-1.txt", header = TRUE, sep = "")
x=data$Diamet
y=data$Age
lm1=lm(y~poly(x,1,raw=TRUE), data=data)
lm2=lm(y~poly(x,2,raw=TRUE), data=data)
lm3=lm(y~poly(x,3,raw=TRUE), data=data)
lm4=lm(y~poly(x,4,raw=TRUE), data=data)
lm5=lm(y~poly(x,5,raw=TRUE), data=data)
lm6=lm(y~poly(x,6,raw=TRUE), data=data)
lm7=lm(y~poly(x,7,raw=TRUE), data=data)
lm8=lm(y~poly(x,8,raw=TRUE), data=data)
AIC <- c(AIC(lm1), AIC(lm2), AIC(lm3), AIC(lm4), AIC(lm5), AIC(lm6), AIC(lm7), AIC(lm8))
BIC <- c(BIC(lm1), BIC(lm2), BIC(lm3), BIC(lm4), BIC(lm5), BIC(lm6), BIC(lm7), BIC(lm8))
AIC
BIC
AIC
[1]   239.5899 230.0744 231.8443 233.4604 235.2434 237.2078
[7]   234.6909 234.6980
BIC
[1]   243.4774 235.2577 238.3235 241.2355 244.3142 247.5745
[7]   246.3534 247.6563
which.min(AIC)
[1] 2
which.min(BIC)
[1] 2
Here, the least value of AIC and BIC is seen in second order polynomial. Hence, the same has been taken
for further analysis.
Question E2
summary(lm2)
Call:
lm(formula = y ~ poly(x, 2, raw = TRUE), data = data)
Residuals:
    Min      1Q    Median        3Q      Max
-25.451 -10.027    -1.046     8.201   31.469
Coefficients:
                          Estimate Std. Error t value Pr(>|t|)
(Intercept)              6.4693224 5.7627144    1.123 0.27271
poly(x, 2, raw = TRUE)1 0.4545286 0.0658670     6.901 3.89e-07 ***
poly(x, 2, raw = TRUE)2 -0.0004106 0.0001149 -3.573 0.00154 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.68 on 24 degrees of freedom
Multiple R-squared: 0.9072, Adjusted R-squared: 0.8995
F-statistic: 117.4 on 2 and 24 DF, p-value: 4.062e-13
plot(lm2$fitted,lm2$res,main="",xlab="fitted",ylab="residuals",pch=19)
abline(h=mean(lm2$res),col="red")
P-value is : 4.062e-13
Question E4
new <- data.frame(Age=110)
prediction <- predict(lm2, newdata=new, interval="predict", level=0.95)
prediction
       fit          lwr          upr
  1     7.765916     -26.614207 42.14604
  2     8.411693     -25.920323 42.74371
  3     10.334476    -23.860730 44.52968
  4     12.880690    -21.148054 46.90943
  5     13.508629    -20.481620 47.49888
  6     14.139317    -19.813281 48.09191
  7     15.395654    -18.484949 49.27626
  8     29.935475    -3.391954    63.26290
  9     32.262878    -1.021172    65.54693
  10    43.505908    10.277105    76.73471
     fit         lwr         upr
11   47.816181   14.548155   81.08421
12   48.877911   15.595732   82.16009
13   49.929246   16.631410   83.22708
14   50.977566   17.662541   84.29259
15   53.054053   19.700554   86.40755
16   74.100469   40.126265   108.07467
17   74.541587   40.551983   108.53119
18   95.148398   60.531815   129.76498
19   95.500596   60.876930   130.12426
20   108.199612 73.486420    142.91280
21   122.220881 87.938708    156.50305
22   130.676961 96.543286    164.81064
23   132.241131 96.559324    167.92294
24   132.058646 97.289721    166.82757
25   132.083716 97.279233    166.88820
26   132.233644 97.063349    167.40394
27   132.198768 97.159801    167.23773