0% found this document useful (0 votes)

10 views32 pages

Da Lab Exp 7,8,9,10,11,12

The document outlines a series of experiments for a Data Analytics Lab course, focusing on implementing various statistical and machine learning techniques using R. Key experiments include ARIMA for time series forecasting, hierarchical clustering, data visualization methods, and descriptive analytics on healthcare data. Each experiment provides a detailed aim, description, program code, and expected output to guide students in practical applications of data analytics.

Uploaded by

saipawan185063

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views32 pages

Da Lab Exp 7,8,9,10,11,12

Uploaded by

saipawan185063

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 32

DATA ANALYTICS LAB

(AcademicYear: 2024-25) .Tech III Year – II Semester (R22)

DATA ANALYTICS LAB

Experiment 7 Date :
-------------------------------------------------------------------------------------------------------------------------------
Experiment 1 : Write a R program to Implement ARIMA on Time Series data

Aim: Write a R program program to Implement ARIMA on Time Series data

Description :
ARIMA (Autoregressive Integrated Moving Average) is a statistical model used for time series
analysis and forecasting, predicting future values by combining past observations (AR),
differencing to achieve stationarity (I), and past errors to refine predictions (MA).

ARIMA models explain a given time series based on its own past values (lags) and lagged
forecast errors.
Components:
Autoregressive (AR): This part of the model uses past values of the time series to predict future
values.
Integrated (I): This component addresses non-stationarity by differencing the time series data,
making it stationary (i.e., having a constant mean and variance over time).

Moving Average (MA): This part incorporates past forecast errors to improve the accuracy of
future predictions.

Notation:
A non-seasonal ARIMA model is often represented as ARIMA(p, d, q), where:
p is the order of the autoregressive (AR) part.
d is the order of integration (the number of times the data needs to be differenced).
q is the order of the moving average (MA) part.

To build an ARIMA model:

Data Preparation: Collect and prepare the time series data.

Stationarity Check: Ensure the data is stationary or make it stationary through differencing.

Model Identification: Determine the appropriate values for p, d, and q using techniques like
autocorrelation function (ACF) and partial autocorrelation function (PACF) plots.

1 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Parameter Estimation: Estimate the model parameters using techniques like maximum
likelihood estimation.

Model Evaluation: Evaluate the model's performance using metrics like root mean squared
error (RMSE) or mean absolute error (MAE).

Steps involved in ARIMA Model :

1. Load and Prepare the Time Series Data
For demonstration, we use the built-in AirPassengers dataset.

2. Check for Stationarity

ARIMA requires a stationary series, meaning that statistical properties like mean and variance
should be constant over time.

If p-value > 0.05, the data is non-stationary, and we apply differencing.

If p-value ≤ 0.05, the data is stationary.

3. Apply Differencing (If Necessary)

If the time series is non-stationary, differencing is required.

4. Identify ARIMA Parameters (p, d, q)

Determine ARIMA parameters manually using ACF (AutoCorrelation Function) and PACF (Partial
AutoCorrelation Function) plots.

Applications:
ARIMA models are widely used for various time series forecasting tasks, including:
Predicting stock prices.
Forecasting sales and demand.
Analyzing financial data.
Understanding and predicting trends in various datasets

Program:

# Install required packages if not already installed

if (!require(forecast)) install.packages("forecast", dependencies = TRUE)
if (!require(tseries)) install.packages("tseries", dependencies = TRUE)
install.packages("forecast")

# Load necessary libraries

library(forecast)
library(tseries)
2 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
# Load a sample time series dataset (AirPassengers dataset)
data(AirPassengers)
ts_data <- ts(AirPassengers, start = c(1949, 1), frequency = 12)

# Open a new plot window

dev.new()

# Plot the original time series data

plot(ts_data, main = "AirPassengers Time Series", ylab = "Passengers", xlab = "Year", col =
"blue")

# Check stationarity using Augmented Dickey-Fuller (ADF) test

adf_test <- adf.test(ts_data)
print(adf_test)

# Open a new plot window for ACF

dev.new()
acf(ts_data, main = "ACF Plot")

# Open a new plot window for PACF

dev.new()
pacf(ts_data, main = "PACF Plot")

# If the series is non-stationary, apply first-order differencing

if (adf_test$p.value > 0.05) {
ts_data_diff <- diff(ts_data, differences = 1) # Keep the original ts_data unchanged
print("Differencing applied to make the series stationary.")
} else {
ts_data_diff <- ts_data
}

# Re-check stationarity after differencing

adf_test_diff <- adf.test(ts_data_diff, na.action = na.omit)
print(adf_test_diff)

# Determine the best ARIMA model automatically

best_model <- auto.arima(ts_data) # Use original ts_data for ARIMA fitting

# Print model summary

summary(best_model)

# Forecast for the next 12 months

forecast_values <- forecast(best_model, h = 12)
3 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
# Open a new plot window for forecast
dev.new()
plot(forecast_values, main = "ARIMA Forecast", col = "blue")

# Print forecasted values

print(forecast_values)

# Check residuals to validate the model

checkresiduals(best_model)

Output :

Augmented Dickey-Fuller Test

data: ts_data
Dickey-Fuller = -7.3186, Lag order = 5, p-value =
0.01
alternative hypothesis: stationary

4 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

5 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
6 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
EXPERIMENT 8 Date :
----------------------------------------------------------------------------------------------

Write R program for Object segmentation using hierarchical based methods

AIM: To implement hierarchical based methods

Description : Hierarchical clustering is a technique used to group similar data points
together based on their similarity creating a hierarchy or tree-like structure

A dendrogram is like a family tree for clusters. It shows how individual data points or
groups of data merge together.

Types of Hierarchical Clustering

1. Agglomerative Clustering
2. Divisive clustering

Workflow for Hierarchical Agglomerative clustering

1. Start with individual points
2. Calculate distances between clusters
3. Merge the closest (smallest distance) clusters
4. Update distance matrix
5. Repeat steps 3 and 4 until only one cluster left.
6. Create a dendrogram

7 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Proogram :
# Finding distance matrix
distance_mat <- dist(mtcars, method = 'euclidean')
distance_mat

# Fitting Hierarchical clustering Model

# to training dataset
set.seed(240) # Setting seed
Hierar_cl <- hclust(distance_mat, method = "average")
Hierar_cl

# Plotting dendrogram
plot(Hierar_cl)

# Choosing no. of clusters

# Cutting tree by height
abline(h = 110, col = "green")

# Cutting tree by no. of clusters

fit <- cutree(Hierar_cl, k = 3 )
fit

table(fit)
rect.hclust(Hierar_cl, k = 3, border = "green")

OUTPUT:

Distance matrix:

 The values are shown as per the distance matrix calculation with the
method as euclidean.
8 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
 Model Hierar_cl:

 In the model, the cluster method is average, distance is euclidean and no. of objects are 32.

 So, Tree is cut where k = 3 and each category represents its number of clusters.

9 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

EXPERIMENT 9 Date :
----------------------------------------------------------------------------------------------
Write R program for Perform Visualization techniques (types of maps - Bar, Colum,
Line, Scatter, 3D Cubes etc)

AIM: To implement data Visualization techniques (tBar, Colum, Line,

Scatter, 3D Cubes etc)

Consider the following airquality data set for visualization in R:

Ozone Solar R. Wind Temp Month Day

41 190 7.4 67 5 1

36 118 8.0 72 5 2

12 149 12.6 74 5 3

18 313 11.5 62 5 4

NA NA 14.3 56 5 5

28 NA 14.9 66 5 6

a) AIM: To implement Bar Graph using R

PROGRAM:

# Horizontal Bar Plot for

# Ozone concentration in air
barplot(airquality$Ozone,

10 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

main = 'Ozone Concenteration in air',
xlab = 'ozone levels', horiz = TRUE)

OUTPUT:

b) AIM: To implement Histogram

# Histogram for Maximum Daily Temperature

data(airquality)

hist(airquality$Temp, main ="La Guardia Airport's\

Maximum Temperature(Daily)",
xlab ="Temperature(Fahrenheit)",
xlim = c(50, 125), col ="yellow",
freq = TRUE)

OUTPUT:

11 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

c) AIM: To implement scatter graph

Program:

# Scatter plot for Ozone Concentration per month

data(airquality)

plot(airquality$Ozone, airquality$Month,

main ="Scatterplot Example",

xlab ="Ozone Concentration in parts per billionn",

ylab =" Month of observation ", pch = 19)

12 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

d) AIM: To implement line graph

PROGRAM:

# Create the data for the chart.

v <- c(17, 25, 38, 13, 41)
t <- c(22, 19, 36, 19, 23)
m <- c(25, 14, 16, 34, 29)
# Plot the bar chart.
plot(v, type = "o", col = "red",
xlab = "Month", ylab = "Article Written ",
main = "Article Written chart")
lines(t, type = "o", col = "blue")
lines(m, type = "o", col = "green")
OUTPUT:

13 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

e) AIM: To implement 3D plots graph

PROGRAM:

# import and load rgl package

install.packages("rgl")
library(rgl)
# Generate some sample data
x <- seq(-5, 6, by = 0.1)
y <- seq(-5, 7, by = 0.1)
z <- outer(x, y, function(x, y) dnorm(sqrt(x^2 + y^2)))
# Create a 3D surface plot
persp3d(x, y, z, col = "blue")

# add animation
play3d(spin3d(axis = c(0, 0, 1)), duration = 10)

14 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Experiment 10 Date :
-------------------------------------------------------------------------------------------------------------------------------
Write a R program to perform Descriptive analytics on healthcare data

AIM: To implement Descriptive Analytics on healthcare data

15 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Description : Descriptive analytics is the process of summarizing and interpreting
historical data to understand what has happened in the past.

Goals of Descriptive Analytics in Healthcare :

 Understand patient demographics (e.g., age, gender distribution)
 Analyze clinical metrics like BMI(Body mass index), blood pressure, cholesterol
 Identify prevalence of conditions like diabetes, hypertension
 Spot trends over time (e.g., increasing obesity rates)
 Evaluate resource use (e.g., hospital admissions, medication use)

Program :
# Install required packages if not already installed
if (!require("summarytools")) install.packages("summarytools", dependencies = TRUE)
# Load the libraries
library(rgl)
library(dplyr)
library(summarytools)

# Try to load the data from CSV, if not found, create a sample dataset
file_path <- "health_data.csv"

if (!file.exists(file_path)) {
message("File not found. Creating sample dataset...")
set.seed(123)
health_data <- data.frame(
Age = sample(20:80, 100, replace = TRUE),
Gender = factor(sample(c("Male", "Female"), 100, replace = TRUE)),
BMI = round(runif(100, 18, 35), 1),
BloodPressure = sample(90:180, 100, replace = TRUE),
Cholesterol = sample(150:300, 100, replace = TRUE),
Diabetes = factor(sample(c("Yes", "No"), 100, replace = TRUE))
)
write.csv(health_data, file_path, row.names = FALSE)
} else {
health_data <- read.csv(file_path, stringsAsFactors = TRUE)
}

# View structure
str(health_data)

# Summary statistics for numeric variables

numeric_vars <- select(health_data, where(is.numeric))
16 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
summary(numeric_vars)

# Frequency tables for categorical variables

cat_vars <- select(health_data, where(is.factor))
lapply(cat_vars, table)

# Cross-tabulation: Gender vs Diabetes

print(table(health_data$Gender, health_data$Diabetes))

# Descriptive report using summarytools

print(dfSummary(health_data), method = "browser")

dev.new()
# 1. Plot Healthcare Attributes
plot(health_data, col ="magenta",)

dev.new()
# Plots
# 2. Age Distribution
hist(health_data$Age, main =" Age Distribution ",
xlab ="Age(in Yeaars)",
xlim = c(0, 125), col = "green",
freq = TRUE)

Output :

'data.frame': 100 obs. of 6 variables:

$ Age : int 50 34 70 33 22 61 69 73 62 56 ...

$ Gender : Factor w/ 2 levels "Female","Male": 1 1 2 2 2 1 1 1 1 2 ...
$ BMI : num 24.8 33 24.2 22.9 20.9 20.9 26.2 22.3 21.7 29.5 ...
$ BloodPressure: int 170 118 115 116 174 96 149 115 130 173 ...
$ Cholesterol : int 264 270 231 245 251 245 175 290 297 297 ...
$ Diabetes : Factor w/ 2 levels "No","Yes": 1 2 1 2 2 1 2 1 1 1 ...

> summary(numeric_vars)

Age BMI BloodPressure

Min. :22.00 Min. :18.10 Min. : 91.0
1st Qu.:34.00 1st Qu.:21.80 1st Qu.:115.8
Median :47.50 Median :26.05 Median :133.5
Mean :49.19 Mean :26.26 Mean :135.8
3rd Qu.:62.00 3rd Qu.:30.57 3rd Qu.:156.0

17 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Max. :79.00 Max. :34.70 Max. :180.0
Cholesterol
Min. :152.0
1st Qu.:193.5
Median :243.0
Mean :230.8
3rd Qu.:268.0
Max. :297.0

$Gender
Female Male
55 45

$Diabetes
No Yes
52 48

> # Cross-tabulation: Gender vs Diabetes

No Yes
Female 32 23
Male 20 25

Graph of Healthcare Attributes

18 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

EXPERIMENT 11 Date :
19 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
----------------------------------------------------------------------------------------------
Write a R program to Perform Predictive analytics on Product Sales data methods

AIM: To Perform Predictive analytics on Product Sales data

Description : Predictive analytics uses historical data and statistical modeling to forecast
future outcomes, determining the likelihood of specific events or trends

Steps in Predictive analytics :

1. Focus on the Future: Predictive analytics is not about understanding past events, but
about anticipating future trends and outcomes.

2. Data-Driven: It relies heavily on data, both current and historical, to identify patterns
and relationships that can be used to make predictions.

3. Statistical Modeling: Techniques like regression analysis, time series analysis, and
machine learning algorithms are used to build models that can predict future outcomes.

4. Decision Support: The predictions generated by predictive analytics can be used to make
informed decisions, such as optimizing business processes, identifying risks, or forecasting
demand.

Example:
Sales Forecasting: Predicting future sales based on historical sales data, market trends, and
promotional activities.

Proogram :
# Load the data
sales_data <- read.csv("product_sales.csv")
head(sales_data )

head(sales_data )

dev.new()
plot(sales_data, col="brown")

# Convert 'Season' to a factor

sales_data$Season <- as.factor(sales_data$Season)

# Split data into training (80%) and testing (20%)

set.seed(123)
sample_index <- sample(1:nrow(sales_data), 0.8 * nrow(sales_data))
train_data <- sales_data[sample_index, ]
test_data <- sales_data[-sample_index, ]

20 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

# Build a linear regression model
model <- lm(Sales ~ Price + Advertising + Season, data = train_data)

# Predict on test data

predicted_sales <- predict(model, newdata = test_data)

# Calculate RMSE
rmse <- sqrt(mean((test_data$Sales - predicted_sales)^2))
cat("Linear Regression RMSE:", rmse, "\n")
summary(model)

dev.new()

# -------------------------
# Plot: Actual vs Predicted
# -------------------------
plot(test_data$Sales, predicted_sales,
main = "Actual vs Predicted Sales",
xlab = "Actual Sales",
ylab = "Predicted Sales",
col = "blue",
pch = 19)
abline(a = 0, b = 1, col = "red", lwd = 2) # Reference line

OUTPUT:

TV Radio Newspaper Sales

1 86.3 30.0 23.9 17.11161
2 236.5 16.6 96.2 20.51756
3 122.7 24.4 60.1 16.92830
4 264.9 47.7 51.5 27.42344
5 282.1 24.1 40.3 22.14082
6 13.7 44.5 88.0 18.23741

21 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Sales Data

Linear Regression RMSE: 1.609265

summary(model)

Residuals:
Min 1Q Median 3Q Max
-2.4771 -0.9458 -0.1164 0.8441 4.2106

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.316178 0.562091 4.121 9.54e-05 ***
TV 0.039634 0.002340 16.940 < 2e-16 ***
Radio 0.328001 0.012074 27.166 < 2e-16 ***
Newspaper 0.021045 0.005086 4.138 8.98e-05 ***
---

22 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Residual standard error: 1.384 on 76 degrees of freedom
Multiple R-squared: 0.9189, Adjusted R-squared: 0.9157
F-statistic: 287 on 3 and 76 DF, p-value: < 2.2e-16

23 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

EXPERIMENT 12 Date :
----------------------------------------------------------------------------------------------

Write a R program to Apply Predictive analytics for Weather forecasting.

AIM: To Apply Predictive analytics for Weather forecasting.

Description : Predictive analytics uses historical data and statistical modeling to
forecast future outcomes, determining the likelihood of specific events or trends

Weather forecasting : It involves a sequence of steps

Step 1 : Data Collection : Data collects from Ground stations like Temperature,
humidity, wind speed, pressure, rainfall, etc.

Step 2 : Data Preprocessing & Quality Control : Raw data is often noisy or
incomplete, so: Missing values are estimated or removed.
Outliers are detected and handled.

Step 3 : Feature Engineering : To improve model performance:

Create derived variables (e.g., wind chill, heat index).
Convert date/time into cyclical features.
Convert categorical data (e.g., weather types) into numeric encodings.

Step 4 : . Model Building : Use Statistical or Machine Learning Models:

Use historical data to learn patterns.
Common methods: Linear regression for temperature
SVM / Decision Trees for classification (e.g., rain prediction)
Time series models like ARIMA, LSTM

Step 5 : . Model Evaluation :

Models are tested using Training/testing split or cross-validation
Metrics like: RMSE / MAE for temperature
Accuracy, Precision, Recall for rain or storm predictions

Step 6 : . Forecasting : Forecasts are generated for:

Short-term (1–3 days): Highly accurate
Medium-term (4–7 days): Good reliability
Long-term (>7 days): Increasing uncertainty

Forecasts may include: Temperature, Rainfall likelihood, Wind speed and

direction, Storm alerts

24 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Proogram :
# Load libraries
library(lubridate)
library(e1071)

# Load data
weather_data <- read.csv("weather_data.csv", stringsAsFactors = FALSE)

# Initial plot
dev.new()
plot(weather_data, main = "Weather Dataset", col = "green")

# Parse date and extract features

weather_data$date <- ymd(weather_data$date)
weather_data$day_of_year <- yday(weather_data$date)
weather_data$month <- month(weather_data$date)
weather_data$weekday <- wday(weather_data$date)

# Create rain label if applicable

if (!"rain_label" %in% names(weather_data) && "rain" %in% names(weather_data))
{
weather_data$rain_label <- as.factor(ifelse(weather_data$rain > 0, "Yes", "No"))
}

# Split into training and testing sets

set.seed(123)
sample_size <- floor(0.8 * nrow(weather_data))
train_indices <- sample(seq_len(nrow(weather_data)), size = sample_size)
train_data <- weather_data[train_indices, ]
test_data <- weather_data[-train_indices, ]

# Train temperature model

model_temp <- lm(temperature ~ humidity + pressure + day_of_year + month +
weekday, data = train_data)
train_data$predicted_temp <- predict(model_temp, newdata = train_data)

# Train SVM model for rain prediction if applicable

rain_model_exists <- FALSE
if ("rain_label" %in% names(weather_data)) {
model_rain_svm <- svm(rain_label ~ humidity + predicted_temp + pressure +
day_of_year + month + weekday,
data = train_data, type = "C-classification", kernel = "radial")

25 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

rain_model_exists <- TRUE
}

# Evaluate temperature model

predictions <- predict(model_temp, newdata = test_data)
rmse <- sqrt(mean((test_data$temperature - predictions)^2, na.rm = TRUE))
cat("Root Mean Squared Error (RMSE):", round(rmse, 2), "\n")

# Summary
summary(model_temp)
if (rain_model_exists) print(summary(model_rain_svm))

# Plot actual vs predicted temperature

dev.new()
plot(test_data$temperature, predictions,
col = "blue", pch = 16,
main = "Actual vs Predicted Temperature",
xlab = "Actual Temperature", ylab = "Predicted Temperature")
abline(0, 1, col = "red", lwd = 2)

# --- Future Forecasting ---

# Generate next 7 days

future_dates <- seq(max(weather_data$date) + 1, by = "day", length.out = 7)

# Create future data frame

future_data <- data.frame(
date = future_dates,
day_of_year = yday(future_dates),
month = month(future_dates),
weekday = wday(future_dates),
humidity = mean(train_data$humidity, na.rm = TRUE),
pressure = mean(train_data$pressure, na.rm = TRUE)
)

# Ensure factor compatibility

if (is.factor(train_data$weekday)) {
future_data$weekday <- factor(future_data$weekday, levels =
levels(train_data$weekday))
}

# Predict temperature
future_data$predicted_temp <- predict(model_temp, newdata = future_data)
future_data$predicted_temperature <- round(future_data$predicted_temp, 2)

# Predict rain if model exists

26 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

if (rain_model_exists) {
raw_preds <- predict(model_rain_svm, newdata = future_data)
raw_preds <- as.character(raw_preds)
future_data$rain_prediction <- ifelse(raw_preds == "Yes", "RAIN=YES",
"RAIN=NO")
} else {
future_data$rain_prediction <- rep("RAIN=NA", nrow(future_data))
}

# Display forecast
cat("\nNext 7 Days Forecast:\n")
print(future_data[, c("date", "predicted_temperature", "rain_prediction")])

# Plot forecast
dev.new()
plot(future_data$date, future_data$predicted_temperature, type = "o",
col = "red", lwd = 4, pch = 5,
main = "7-Day Forecast: Temperature & Rain",
xlab = "Date", ylab = "Temperature (°C)",
ylim = range(future_data$predicted_temperature, na.rm = TRUE) + c(-1, 2))
grid()

# Add temperature labels

text(future_data$date, future_data$predicted_temperature + 0.4,
labels = future_data$predicted_temperature,
col = "red", cex = 1)

# Add rain prediction labels with color mapping

rain_colors <- ifelse(future_data$rain_prediction == "RAIN=YES", "blue",
ifelse(future_data$rain_prediction == "RAIN=NO", "magenta",
"yellow"))

text(future_data$date, future_data$predicted_temperature + 1,
labels = future_data$rain_prediction,
col = rain_colors, font = 2, cex = 0.9)

# Add legend
legend("topright", legend = c("Temperature (°C)", "RAIN=YES", "RAIN=NO",
"RAIN=NA"),
col = c("red", "blue", "magenta", "yellow"),
pch = 16, bty = "n")

# --- Bonus: Plot raw temperature over time with rain indicators ---
dev.new()
plot(weather_data$date, weather_data$temperature, type = "o",
col = "magenta", lwd = 2, pch = 16,

27 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

main = "Temperature Over Time",
xlab = "Date", ylab = "Temperature (°C)")
points(weather_data$date[weather_data$rain > 0],
weather_data$temperature[weather_data$rain > 0],
col = "blue", pch = 17, cex = 1.2)
legend("topright",
legend = c("Temperature", "Rainy Days"),
col = c("magenta", "blue"),
pch = c(16, 17),
bty = "n")

OUTPUT:

> head(weather_data )

date temperature humidity pressure rain

1 2024-01-01 32.01821 73.83717 1022.128 0
2 2024-01-02 32.79764 45.20104 1012.928 1
3 2024-01-03 10.01488 45.15971 1021.298 1
4 2024-01-04 29.06567 57.22615 1011.070 1
5 2024-01-05 22.46109 95.97190 1003.947 0
6 2024-01-06 18.16836 97.38256 1011.058 0

28 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Root Mean Squared Error (RMSE): 10.86

> #Summary of Models

> summary(model_temp)

Call:
lm(formula = temperature ~ humidity + pressure + day_of_year
+
month + weekday, data = train_data)

Residuals:

29 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Min 1Q Median 3Q Max
-16.4739 -9.5256 0.7116 8.9975 16.6716

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -35.73927 174.87503 -0.204 0.839
humidity -0.02731 0.06225 -0.439 0.662
pressure 0.05533 0.17191 0.322 0.748
day_of_year -0.17027 0.13703 -1.243 0.218
month 3.19049 3.97729 0.802 0.425
weekday 0.36879 0.58653 0.629 0.531

Residual standard error: 10.41 on 74 degrees of freedom

Multiple R-squared: 0.05175, Adjusted R-squared: -0.01232
F-statistic: 0.8077 on 5 and 74 DF, p-value: 0.5479

> summary(model_rain_svm)

Call:
svm(formula = rain_label ~ humidity + predicted_temp +
pressure + day_of_year + month + weekday,
data = train_data, type = "C-classification",
kernel = "radial")

Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1

Number of Support Vectors: 72

( 38 34 )
Number of Classes: 2

Levels:
No Yes

30 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

31 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
Next 7 Days Forecast:

date predicted_temperature rain_prediction

1 2024-04-10 15.47 RAIN=NO
2 2024-04-11 15.67 RAIN=NO
3 2024-04-12 15.87 RAIN=NO
4 2024-04-13 16.07 RAIN=NO
5 2024-04-14 13.68 RAIN=NO
6 2024-04-15 13.88 RAIN=NO
7 2024-04-16 14.08 RAIN=NO

32 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

Mini Project Based On Time Series Forecasting Methods: Data Used
No ratings yet
Mini Project Based On Time Series Forecasting Methods: Data Used
14 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
ARIMA Model Python Example - Time Series Forecasting
No ratings yet
ARIMA Model Python Example - Time Series Forecasting
11 pages
cheatsheet的副本
No ratings yet
cheatsheet的副本
8 pages
Python Time Series Cheat Sheet
No ratings yet
Python Time Series Cheat Sheet
7 pages
Time Series Analysis Handbook 03
No ratings yet
Time Series Analysis Handbook 03
12 pages
Arima Model For Aapl
No ratings yet
Arima Model For Aapl
16 pages
Report
No ratings yet
Report
12 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
MIS410 Chapter7
No ratings yet
MIS410 Chapter7
49 pages
Time Series Forecasting for Analysts
No ratings yet
Time Series Forecasting for Analysts
19 pages
TSA R Summary
No ratings yet
TSA R Summary
8 pages
A Data-Driven Model For Software Reliability Prediction
No ratings yet
A Data-Driven Model For Software Reliability Prediction
32 pages
Auto-Regressive Integrated Moving Average Models I
No ratings yet
Auto-Regressive Integrated Moving Average Models I
12 pages
Practicals Data
No ratings yet
Practicals Data
26 pages
Modeling and Analysis of Time Series Data: Chapter 1: Introduction
No ratings yet
Modeling and Analysis of Time Series Data: Chapter 1: Introduction
26 pages
Business Analytis C4
No ratings yet
Business Analytis C4
10 pages
R - Language Lab Manual - PG 2024
No ratings yet
R - Language Lab Manual - PG 2024
29 pages
Resumos Forecasting
No ratings yet
Resumos Forecasting
17 pages
Be A 65 Ads Exp 8
No ratings yet
Be A 65 Ads Exp 8
10 pages
1 What Is ARIMA?: 1.1 A Little Historical Background
No ratings yet
1 What Is ARIMA?: 1.1 A Little Historical Background
5 pages
Datamining Lab Record
No ratings yet
Datamining Lab Record
36 pages
Boxjenkins
No ratings yet
Boxjenkins
18 pages
End Term Project (BA)
No ratings yet
End Term Project (BA)
19 pages
ARIMA Model Building Guide
No ratings yet
ARIMA Model Building Guide
8 pages
FM - Resumes
No ratings yet
FM - Resumes
18 pages
Arima Word
No ratings yet
Arima Word
13 pages
LAB9 Report
No ratings yet
LAB9 Report
6 pages
ARIMA Modeling in R: Auto.arima Guide
No ratings yet
ARIMA Modeling in R: Auto.arima Guide
10 pages
Arima Notes
No ratings yet
Arima Notes
4 pages
Time Series Analysis Project - CAC 40 - 2018
No ratings yet
Time Series Analysis Project - CAC 40 - 2018
33 pages
Arima 1b
No ratings yet
Arima 1b
6 pages
Time Series Forecasting Guide
No ratings yet
Time Series Forecasting Guide
6 pages
Time Series Analysis: Example: Stationary ARIMA
No ratings yet
Time Series Analysis: Example: Stationary ARIMA
25 pages
Econometrics in MATLAB: ARMAX, Pseudo Ex-Post Forecasting, GARCH and EGARCH, Implied Volatility
No ratings yet
Econometrics in MATLAB: ARMAX, Pseudo Ex-Post Forecasting, GARCH and EGARCH, Implied Volatility
18 pages
ARIMAKASYOKI
No ratings yet
ARIMAKASYOKI
5 pages
Stationarity & AR, MA, ARIMA, SARIMA
100% (1)
Stationarity & AR, MA, ARIMA, SARIMA
6 pages
(Quantitative Finance Collector) PDF
No ratings yet
(Quantitative Finance Collector) PDF
57 pages
Lecture 18 Build Arima
No ratings yet
Lecture 18 Build Arima
22 pages
Time Series: International University - Vnu HCMC
No ratings yet
Time Series: International University - Vnu HCMC
35 pages
Sarima Group 11
No ratings yet
Sarima Group 11
21 pages
Time Arima 002
No ratings yet
Time Arima 002
11 pages
Time Series Analysis Guide
No ratings yet
Time Series Analysis Guide
1 page
ARIMA Guide for Economists
No ratings yet
ARIMA Guide for Economists
32 pages
Time Series Analysis Guide
No ratings yet
Time Series Analysis Guide
1 page
AP SHAH ADS Notes Smote
No ratings yet
AP SHAH ADS Notes Smote
52 pages
RDataMining Slides Time Series Analysis PDF
No ratings yet
RDataMining Slides Time Series Analysis PDF
41 pages
ARIMA Modeling Tutorial with R
No ratings yet
ARIMA Modeling Tutorial with R
12 pages
Tutorial 9 - Solutions
No ratings yet
Tutorial 9 - Solutions
21 pages
ARIMA for Stock Price Prediction
No ratings yet
ARIMA for Stock Price Prediction
1 page
Time Series Analysis
No ratings yet
Time Series Analysis
49 pages
Time Series Analysis With Python
100% (1)
Time Series Analysis With Python
64 pages
Project 6 - Time Series PDF
No ratings yet
Project 6 - Time Series PDF
21 pages
MATH545-Time Series
No ratings yet
MATH545-Time Series
79 pages
FDS DW Journal
No ratings yet
FDS DW Journal
28 pages
STAT 5383 - Lab 1: Exploratory Tools For Time Series Analysis
No ratings yet
STAT 5383 - Lab 1: Exploratory Tools For Time Series Analysis
7 pages
Conf1 Ieee Icaesm
No ratings yet
Conf1 Ieee Icaesm
5 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
39 pages
Food Blog Abstract
No ratings yet
Food Blog Abstract
2 pages
Od430176191069200100 1
No ratings yet
Od430176191069200100 1
1 page
Idea Pitching PPT FoundersLab - Vignan
No ratings yet
Idea Pitching PPT FoundersLab - Vignan
9 pages
IBM DB0101EN Certificate - Cognitive Class
No ratings yet
IBM DB0101EN Certificate - Cognitive Class
1 page
Template
No ratings yet
Template
8 pages
The Power of Listening Skills Why Listening Matters: by Nomula Saipawan
No ratings yet
The Power of Listening Skills Why Listening Matters: by Nomula Saipawan
7 pages
Metode Kuadrat Terkecil
No ratings yet
Metode Kuadrat Terkecil
45 pages
Fruit Weight, Firmness and Soluble Solids Content During Ripening of Karešova Cv. Sweet Cherry
No ratings yet
Fruit Weight, Firmness and Soluble Solids Content During Ripening of Karešova Cv. Sweet Cherry
7 pages
Ap Stat 1-7 Notes
No ratings yet
Ap Stat 1-7 Notes
12 pages
Nominal Scales Are Used For Labeling Variables, Without Any Quantitative Value
No ratings yet
Nominal Scales Are Used For Labeling Variables, Without Any Quantitative Value
4 pages
EURAMET Cg-19 V 2.0 Guidelines in Uncertainty Volume 01
No ratings yet
EURAMET Cg-19 V 2.0 Guidelines in Uncertainty Volume 01
29 pages
R Guide for Time Series Enthusiasts
No ratings yet
R Guide for Time Series Enthusiasts
37 pages
KMBN 104 Business Statistics and Analysis
100% (1)
KMBN 104 Business Statistics and Analysis
40 pages
BM2 Chapter 5 Forecasting
No ratings yet
BM2 Chapter 5 Forecasting
24 pages
Topic 7 - Equilibrium
No ratings yet
Topic 7 - Equilibrium
2 pages
2024 12 17 628864v1 Full
No ratings yet
2024 12 17 628864v1 Full
25 pages
Uncertainty Estimates in Regional and Global Observed Temperature Changes: A New Dataset From 1850
No ratings yet
Uncertainty Estimates in Regional and Global Observed Temperature Changes: A New Dataset From 1850
35 pages
Annual Velocities of The Ablation Zone of Panchi Nala Glacier, Western Himalaya - Trends and Controlling Factors
No ratings yet
Annual Velocities of The Ablation Zone of Panchi Nala Glacier, Western Himalaya - Trends and Controlling Factors
15 pages
Anova Biometry
No ratings yet
Anova Biometry
33 pages
Lecture Name:-Spatial Analysis Prof. S.K.Ghosh Dept of Civil Engg
No ratings yet
Lecture Name:-Spatial Analysis Prof. S.K.Ghosh Dept of Civil Engg
119 pages
Phase Diagrams
100% (1)
Phase Diagrams
19 pages
CHAPTER 3-Forecasting PDF
No ratings yet
CHAPTER 3-Forecasting PDF
62 pages
MFX Module 3 Properties of Time Series
No ratings yet
MFX Module 3 Properties of Time Series
76 pages
AP Stats Slides
No ratings yet
AP Stats Slides
55 pages
Climate Model Post-Processing Tool
No ratings yet
Climate Model Post-Processing Tool
36 pages
Unit II Full Notes
No ratings yet
Unit II Full Notes
108 pages
Time Series Models: Zeeshan Khan
No ratings yet
Time Series Models: Zeeshan Khan
29 pages
Multiple Linear Regression: Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance
No ratings yet
Multiple Linear Regression: Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance
16 pages
Poisson Regression - Stata Data Analysis Examples
No ratings yet
Poisson Regression - Stata Data Analysis Examples
12 pages
Lecture 1 Introduction To Food Analysis
100% (1)
Lecture 1 Introduction To Food Analysis
52 pages
Rainfall Mini Project Report
100% (1)
Rainfall Mini Project Report
38 pages
Adiabatic Gas Law Apparatus Manual
No ratings yet
Adiabatic Gas Law Apparatus Manual
8 pages
Importance of Statistics in Industries
100% (2)
Importance of Statistics in Industries
5 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
11 pages
Data Collection Workbook 2
No ratings yet
Data Collection Workbook 2
97 pages
New in SAS 9.2
0% (1)
New in SAS 9.2
33 pages