0% found this document useful (0 votes)
10 views32 pages

Da Lab Exp 7,8,9,10,11,12

The document outlines a series of experiments for a Data Analytics Lab course, focusing on implementing various statistical and machine learning techniques using R. Key experiments include ARIMA for time series forecasting, hierarchical clustering, data visualization methods, and descriptive analytics on healthcare data. Each experiment provides a detailed aim, description, program code, and expected output to guide students in practical applications of data analytics.

Uploaded by

saipawan185063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views32 pages

Da Lab Exp 7,8,9,10,11,12

The document outlines a series of experiments for a Data Analytics Lab course, focusing on implementing various statistical and machine learning techniques using R. Key experiments include ARIMA for time series forecasting, hierarchical clustering, data visualization methods, and descriptive analytics on healthcare data. Each experiment provides a detailed aim, description, program code, and expected output to guide students in practical applications of data analytics.

Uploaded by

saipawan185063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

DATA ANALYTICS LAB

(AcademicYear: 2024-25) .Tech III Year – II Semester (R22)

DATA ANALYTICS LAB

Experiment 7 Date :
-------------------------------------------------------------------------------------------------------------------------------
Experiment 1 : Write a R program to Implement ARIMA on Time Series data

Aim: Write a R program program to Implement ARIMA on Time Series data

Description :
ARIMA (Autoregressive Integrated Moving Average) is a statistical model used for time series
analysis and forecasting, predicting future values by combining past observations (AR),
differencing to achieve stationarity (I), and past errors to refine predictions (MA).

ARIMA models explain a given time series based on its own past values (lags) and lagged
forecast errors.
Components:
Autoregressive (AR): This part of the model uses past values of the time series to predict future
values.
Integrated (I): This component addresses non-stationarity by differencing the time series data,
making it stationary (i.e., having a constant mean and variance over time).

Moving Average (MA): This part incorporates past forecast errors to improve the accuracy of
future predictions.

Notation:
A non-seasonal ARIMA model is often represented as ARIMA(p, d, q), where:
p is the order of the autoregressive (AR) part.
d is the order of integration (the number of times the data needs to be differenced).
q is the order of the moving average (MA) part.

To build an ARIMA model:

Data Preparation: Collect and prepare the time series data.

Stationarity Check: Ensure the data is stationary or make it stationary through differencing.

Model Identification: Determine the appropriate values for p, d, and q using techniques like
autocorrelation function (ACF) and partial autocorrelation function (PACF) plots.

1 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Parameter Estimation: Estimate the model parameters using techniques like maximum
likelihood estimation.

Model Evaluation: Evaluate the model's performance using metrics like root mean squared
error (RMSE) or mean absolute error (MAE).

Steps involved in ARIMA Model :


1. Load and Prepare the Time Series Data
For demonstration, we use the built-in AirPassengers dataset.

2. Check for Stationarity


ARIMA requires a stationary series, meaning that statistical properties like mean and variance
should be constant over time.

If p-value > 0.05, the data is non-stationary, and we apply differencing.

If p-value ≤ 0.05, the data is stationary.

3. Apply Differencing (If Necessary)


If the time series is non-stationary, differencing is required.

4. Identify ARIMA Parameters (p, d, q)


Determine ARIMA parameters manually using ACF (AutoCorrelation Function) and PACF (Partial
AutoCorrelation Function) plots.

Applications:
ARIMA models are widely used for various time series forecasting tasks, including:
Predicting stock prices.
Forecasting sales and demand.
Analyzing financial data.
Understanding and predicting trends in various datasets

Program:

# Install required packages if not already installed


if (!require(forecast)) install.packages("forecast", dependencies = TRUE)
if (!require(tseries)) install.packages("tseries", dependencies = TRUE)
install.packages("forecast")

# Load necessary libraries


library(forecast)
library(tseries)
2 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
# Load a sample time series dataset (AirPassengers dataset)
data(AirPassengers)
ts_data <- ts(AirPassengers, start = c(1949, 1), frequency = 12)

# Open a new plot window


dev.new()

# Plot the original time series data


plot(ts_data, main = "AirPassengers Time Series", ylab = "Passengers", xlab = "Year", col =
"blue")

# Check stationarity using Augmented Dickey-Fuller (ADF) test


adf_test <- adf.test(ts_data)
print(adf_test)

# Open a new plot window for ACF


dev.new()
acf(ts_data, main = "ACF Plot")

# Open a new plot window for PACF


dev.new()
pacf(ts_data, main = "PACF Plot")

# If the series is non-stationary, apply first-order differencing


if (adf_test$p.value > 0.05) {
ts_data_diff <- diff(ts_data, differences = 1) # Keep the original ts_data unchanged
print("Differencing applied to make the series stationary.")
} else {
ts_data_diff <- ts_data
}

# Re-check stationarity after differencing


adf_test_diff <- adf.test(ts_data_diff, na.action = na.omit)
print(adf_test_diff)

# Determine the best ARIMA model automatically


best_model <- auto.arima(ts_data) # Use original ts_data for ARIMA fitting

# Print model summary


summary(best_model)

# Forecast for the next 12 months


forecast_values <- forecast(best_model, h = 12)
3 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
# Open a new plot window for forecast
dev.new()
plot(forecast_values, main = "ARIMA Forecast", col = "blue")

# Print forecasted values


print(forecast_values)

# Check residuals to validate the model


checkresiduals(best_model)

Output :

Augmented Dickey-Fuller Test

data: ts_data
Dickey-Fuller = -7.3186, Lag order = 5, p-value =
0.01
alternative hypothesis: stationary

4 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


5 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
6 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
EXPERIMENT 8 Date :
----------------------------------------------------------------------------------------------

Write R program for Object segmentation using hierarchical based methods

AIM: To implement hierarchical based methods


Description : Hierarchical clustering is a technique used to group similar data points
together based on their similarity creating a hierarchy or tree-like structure

A dendrogram is like a family tree for clusters. It shows how individual data points or
groups of data merge together.

Types of Hierarchical Clustering


1. Agglomerative Clustering
2. Divisive clustering

Workflow for Hierarchical Agglomerative clustering


1. Start with individual points
2. Calculate distances between clusters
3. Merge the closest (smallest distance) clusters
4. Update distance matrix
5. Repeat steps 3 and 4 until only one cluster left.
6. Create a dendrogram

7 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Proogram :
# Finding distance matrix
distance_mat <- dist(mtcars, method = 'euclidean')
distance_mat

# Fitting Hierarchical clustering Model


# to training dataset
set.seed(240) # Setting seed
Hierar_cl <- hclust(distance_mat, method = "average")
Hierar_cl

# Plotting dendrogram
plot(Hierar_cl)

# Choosing no. of clusters


# Cutting tree by height
abline(h = 110, col = "green")

# Cutting tree by no. of clusters


fit <- cutree(Hierar_cl, k = 3 )
fit

table(fit)
rect.hclust(Hierar_cl, k = 3, border = "green")

OUTPUT:

Distance matrix:

 The values are shown as per the distance matrix calculation with the
method as euclidean.
8 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
 Model Hierar_cl:

 In the model, the cluster method is average, distance is euclidean and no. of objects are 32.

 So, Tree is cut where k = 3 and each category represents its number of clusters.

9 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


EXPERIMENT 9 Date :
----------------------------------------------------------------------------------------------
Write R program for Perform Visualization techniques (types of maps - Bar, Colum,
Line, Scatter, 3D Cubes etc)

AIM: To implement data Visualization techniques (tBar, Colum, Line,


Scatter, 3D Cubes etc)

Consider the following airquality data set for visualization in R:

Ozone Solar R. Wind Temp Month Day

41 190 7.4 67 5 1

36 118 8.0 72 5 2

12 149 12.6 74 5 3

18 313 11.5 62 5 4

NA NA 14.3 56 5 5

28 NA 14.9 66 5 6

a) AIM: To implement Bar Graph using R


PROGRAM:

# Horizontal Bar Plot for


# Ozone concentration in air
barplot(airquality$Ozone,

10 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


main = 'Ozone Concenteration in air',
xlab = 'ozone levels', horiz = TRUE)

OUTPUT:

b) AIM: To implement Histogram

# Histogram for Maximum Daily Temperature


data(airquality)

hist(airquality$Temp, main ="La Guardia Airport's\


Maximum Temperature(Daily)",
xlab ="Temperature(Fahrenheit)",
xlim = c(50, 125), col ="yellow",
freq = TRUE)

OUTPUT:

11 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


c) AIM: To implement scatter graph

Program:

# Scatter plot for Ozone Concentration per month

data(airquality)

plot(airquality$Ozone, airquality$Month,

main ="Scatterplot Example",

xlab ="Ozone Concentration in parts per billionn",

ylab =" Month of observation ", pch = 19)

12 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


d) AIM: To implement line graph

PROGRAM:

# Create the data for the chart.


v <- c(17, 25, 38, 13, 41)
t <- c(22, 19, 36, 19, 23)
m <- c(25, 14, 16, 34, 29)
# Plot the bar chart.
plot(v, type = "o", col = "red",
xlab = "Month", ylab = "Article Written ",
main = "Article Written chart")
lines(t, type = "o", col = "blue")
lines(m, type = "o", col = "green")
OUTPUT:

13 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


e) AIM: To implement 3D plots graph

PROGRAM:

# import and load rgl package


install.packages("rgl")
library(rgl)
# Generate some sample data
x <- seq(-5, 6, by = 0.1)
y <- seq(-5, 7, by = 0.1)
z <- outer(x, y, function(x, y) dnorm(sqrt(x^2 + y^2)))
# Create a 3D surface plot
persp3d(x, y, z, col = "blue")

# add animation
play3d(spin3d(axis = c(0, 0, 1)), duration = 10)

14 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Experiment 10 Date :
-------------------------------------------------------------------------------------------------------------------------------
Write a R program to perform Descriptive analytics on healthcare data

AIM: To implement Descriptive Analytics on healthcare data

15 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Description : Descriptive analytics is the process of summarizing and interpreting
historical data to understand what has happened in the past.

Goals of Descriptive Analytics in Healthcare :


 Understand patient demographics (e.g., age, gender distribution)
 Analyze clinical metrics like BMI(Body mass index), blood pressure, cholesterol
 Identify prevalence of conditions like diabetes, hypertension
 Spot trends over time (e.g., increasing obesity rates)
 Evaluate resource use (e.g., hospital admissions, medication use)

Program :
# Install required packages if not already installed
if (!require("summarytools")) install.packages("summarytools", dependencies = TRUE)
# Load the libraries
library(rgl)
library(dplyr)
library(summarytools)

# Try to load the data from CSV, if not found, create a sample dataset
file_path <- "health_data.csv"

if (!file.exists(file_path)) {
message("File not found. Creating sample dataset...")
set.seed(123)
health_data <- data.frame(
Age = sample(20:80, 100, replace = TRUE),
Gender = factor(sample(c("Male", "Female"), 100, replace = TRUE)),
BMI = round(runif(100, 18, 35), 1),
BloodPressure = sample(90:180, 100, replace = TRUE),
Cholesterol = sample(150:300, 100, replace = TRUE),
Diabetes = factor(sample(c("Yes", "No"), 100, replace = TRUE))
)
write.csv(health_data, file_path, row.names = FALSE)
} else {
health_data <- read.csv(file_path, stringsAsFactors = TRUE)
}

# View structure
str(health_data)

# Summary statistics for numeric variables


numeric_vars <- select(health_data, where(is.numeric))
16 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
summary(numeric_vars)

# Frequency tables for categorical variables


cat_vars <- select(health_data, where(is.factor))
lapply(cat_vars, table)

# Cross-tabulation: Gender vs Diabetes


print(table(health_data$Gender, health_data$Diabetes))

# Descriptive report using summarytools


print(dfSummary(health_data), method = "browser")

dev.new()
# 1. Plot Healthcare Attributes
plot(health_data, col ="magenta",)

dev.new()
# Plots
# 2. Age Distribution
hist(health_data$Age, main =" Age Distribution ",
xlab ="Age(in Yeaars)",
xlim = c(0, 125), col = "green",
freq = TRUE)

Output :

'data.frame': 100 obs. of 6 variables:

$ Age : int 50 34 70 33 22 61 69 73 62 56 ...


$ Gender : Factor w/ 2 levels "Female","Male": 1 1 2 2 2 1 1 1 1 2 ...
$ BMI : num 24.8 33 24.2 22.9 20.9 20.9 26.2 22.3 21.7 29.5 ...
$ BloodPressure: int 170 118 115 116 174 96 149 115 130 173 ...
$ Cholesterol : int 264 270 231 245 251 245 175 290 297 297 ...
$ Diabetes : Factor w/ 2 levels "No","Yes": 1 2 1 2 2 1 2 1 1 1 ...

> summary(numeric_vars)

Age BMI BloodPressure


Min. :22.00 Min. :18.10 Min. : 91.0
1st Qu.:34.00 1st Qu.:21.80 1st Qu.:115.8
Median :47.50 Median :26.05 Median :133.5
Mean :49.19 Mean :26.26 Mean :135.8
3rd Qu.:62.00 3rd Qu.:30.57 3rd Qu.:156.0

17 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Max. :79.00 Max. :34.70 Max. :180.0
Cholesterol
Min. :152.0
1st Qu.:193.5
Median :243.0
Mean :230.8
3rd Qu.:268.0
Max. :297.0

$Gender
Female Male
55 45

$Diabetes
No Yes
52 48

> # Cross-tabulation: Gender vs Diabetes


No Yes
Female 32 23
Male 20 25

Graph of Healthcare Attributes

18 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


EXPERIMENT 11 Date :
19 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
----------------------------------------------------------------------------------------------
Write a R program to Perform Predictive analytics on Product Sales data methods

AIM: To Perform Predictive analytics on Product Sales data


Description : Predictive analytics uses historical data and statistical modeling to forecast
future outcomes, determining the likelihood of specific events or trends

Steps in Predictive analytics :


1. Focus on the Future: Predictive analytics is not about understanding past events, but
about anticipating future trends and outcomes.

2. Data-Driven: It relies heavily on data, both current and historical, to identify patterns
and relationships that can be used to make predictions.

3. Statistical Modeling: Techniques like regression analysis, time series analysis, and
machine learning algorithms are used to build models that can predict future outcomes.

4. Decision Support: The predictions generated by predictive analytics can be used to make
informed decisions, such as optimizing business processes, identifying risks, or forecasting
demand.

Example:
Sales Forecasting: Predicting future sales based on historical sales data, market trends, and
promotional activities.

Proogram :
# Load the data
sales_data <- read.csv("product_sales.csv")
head(sales_data )

head(sales_data )

dev.new()
plot(sales_data, col="brown")

# Convert 'Season' to a factor


sales_data$Season <- as.factor(sales_data$Season)

# Split data into training (80%) and testing (20%)


set.seed(123)
sample_index <- sample(1:nrow(sales_data), 0.8 * nrow(sales_data))
train_data <- sales_data[sample_index, ]
test_data <- sales_data[-sample_index, ]

20 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


# Build a linear regression model
model <- lm(Sales ~ Price + Advertising + Season, data = train_data)

# Predict on test data


predicted_sales <- predict(model, newdata = test_data)

# Calculate RMSE
rmse <- sqrt(mean((test_data$Sales - predicted_sales)^2))
cat("Linear Regression RMSE:", rmse, "\n")
summary(model)

dev.new()

# -------------------------
# Plot: Actual vs Predicted
# -------------------------
plot(test_data$Sales, predicted_sales,
main = "Actual vs Predicted Sales",
xlab = "Actual Sales",
ylab = "Predicted Sales",
col = "blue",
pch = 19)
abline(a = 0, b = 1, col = "red", lwd = 2) # Reference line

OUTPUT:

TV Radio Newspaper Sales


1 86.3 30.0 23.9 17.11161
2 236.5 16.6 96.2 20.51756
3 122.7 24.4 60.1 16.92830
4 264.9 47.7 51.5 27.42344
5 282.1 24.1 40.3 22.14082
6 13.7 44.5 88.0 18.23741

21 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Sales Data

Linear Regression RMSE: 1.609265

summary(model)

Residuals:
Min 1Q Median 3Q Max
-2.4771 -0.9458 -0.1164 0.8441 4.2106

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.316178 0.562091 4.121 9.54e-05 ***
TV 0.039634 0.002340 16.940 < 2e-16 ***
Radio 0.328001 0.012074 27.166 < 2e-16 ***
Newspaper 0.021045 0.005086 4.138 8.98e-05 ***
---

22 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Residual standard error: 1.384 on 76 degrees of freedom
Multiple R-squared: 0.9189, Adjusted R-squared: 0.9157
F-statistic: 287 on 3 and 76 DF, p-value: < 2.2e-16

23 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


EXPERIMENT 12 Date :
----------------------------------------------------------------------------------------------

Write a R program to Apply Predictive analytics for Weather forecasting.

AIM: To Apply Predictive analytics for Weather forecasting.


Description : Predictive analytics uses historical data and statistical modeling to
forecast future outcomes, determining the likelihood of specific events or trends

Weather forecasting : It involves a sequence of steps

Step 1 : Data Collection : Data collects from Ground stations like Temperature,
humidity, wind speed, pressure, rainfall, etc.

Step 2 : Data Preprocessing & Quality Control : Raw data is often noisy or
incomplete, so: Missing values are estimated or removed.
Outliers are detected and handled.

Step 3 : Feature Engineering : To improve model performance:


Create derived variables (e.g., wind chill, heat index).
Convert date/time into cyclical features.
Convert categorical data (e.g., weather types) into numeric encodings.

Step 4 : . Model Building : Use Statistical or Machine Learning Models:


Use historical data to learn patterns.
Common methods: Linear regression for temperature
SVM / Decision Trees for classification (e.g., rain prediction)
Time series models like ARIMA, LSTM

Step 5 : . Model Evaluation :


Models are tested using Training/testing split or cross-validation
Metrics like: RMSE / MAE for temperature
Accuracy, Precision, Recall for rain or storm predictions

Step 6 : . Forecasting : Forecasts are generated for:


Short-term (1–3 days): Highly accurate
Medium-term (4–7 days): Good reliability
Long-term (>7 days): Increasing uncertainty

Forecasts may include: Temperature, Rainfall likelihood, Wind speed and


direction, Storm alerts

24 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Proogram :
# Load libraries
library(lubridate)
library(e1071)

# Load data
weather_data <- read.csv("weather_data.csv", stringsAsFactors = FALSE)

# Initial plot
dev.new()
plot(weather_data, main = "Weather Dataset", col = "green")

# Parse date and extract features


weather_data$date <- ymd(weather_data$date)
weather_data$day_of_year <- yday(weather_data$date)
weather_data$month <- month(weather_data$date)
weather_data$weekday <- wday(weather_data$date)

# Create rain label if applicable


if (!"rain_label" %in% names(weather_data) && "rain" %in% names(weather_data))
{
weather_data$rain_label <- as.factor(ifelse(weather_data$rain > 0, "Yes", "No"))
}

# Split into training and testing sets


set.seed(123)
sample_size <- floor(0.8 * nrow(weather_data))
train_indices <- sample(seq_len(nrow(weather_data)), size = sample_size)
train_data <- weather_data[train_indices, ]
test_data <- weather_data[-train_indices, ]

# Train temperature model


model_temp <- lm(temperature ~ humidity + pressure + day_of_year + month +
weekday, data = train_data)
train_data$predicted_temp <- predict(model_temp, newdata = train_data)

# Train SVM model for rain prediction if applicable


rain_model_exists <- FALSE
if ("rain_label" %in% names(weather_data)) {
model_rain_svm <- svm(rain_label ~ humidity + predicted_temp + pressure +
day_of_year + month + weekday,
data = train_data, type = "C-classification", kernel = "radial")

25 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


rain_model_exists <- TRUE
}

# Evaluate temperature model


predictions <- predict(model_temp, newdata = test_data)
rmse <- sqrt(mean((test_data$temperature - predictions)^2, na.rm = TRUE))
cat("Root Mean Squared Error (RMSE):", round(rmse, 2), "\n")

# Summary
summary(model_temp)
if (rain_model_exists) print(summary(model_rain_svm))

# Plot actual vs predicted temperature


dev.new()
plot(test_data$temperature, predictions,
col = "blue", pch = 16,
main = "Actual vs Predicted Temperature",
xlab = "Actual Temperature", ylab = "Predicted Temperature")
abline(0, 1, col = "red", lwd = 2)

# --- Future Forecasting ---

# Generate next 7 days


future_dates <- seq(max(weather_data$date) + 1, by = "day", length.out = 7)

# Create future data frame


future_data <- data.frame(
date = future_dates,
day_of_year = yday(future_dates),
month = month(future_dates),
weekday = wday(future_dates),
humidity = mean(train_data$humidity, na.rm = TRUE),
pressure = mean(train_data$pressure, na.rm = TRUE)
)

# Ensure factor compatibility


if (is.factor(train_data$weekday)) {
future_data$weekday <- factor(future_data$weekday, levels =
levels(train_data$weekday))
}

# Predict temperature
future_data$predicted_temp <- predict(model_temp, newdata = future_data)
future_data$predicted_temperature <- round(future_data$predicted_temp, 2)

# Predict rain if model exists

26 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


if (rain_model_exists) {
raw_preds <- predict(model_rain_svm, newdata = future_data)
raw_preds <- as.character(raw_preds)
future_data$rain_prediction <- ifelse(raw_preds == "Yes", "RAIN=YES",
"RAIN=NO")
} else {
future_data$rain_prediction <- rep("RAIN=NA", nrow(future_data))
}

# Display forecast
cat("\nNext 7 Days Forecast:\n")
print(future_data[, c("date", "predicted_temperature", "rain_prediction")])

# Plot forecast
dev.new()
plot(future_data$date, future_data$predicted_temperature, type = "o",
col = "red", lwd = 4, pch = 5,
main = "7-Day Forecast: Temperature & Rain",
xlab = "Date", ylab = "Temperature (°C)",
ylim = range(future_data$predicted_temperature, na.rm = TRUE) + c(-1, 2))
grid()

# Add temperature labels


text(future_data$date, future_data$predicted_temperature + 0.4,
labels = future_data$predicted_temperature,
col = "red", cex = 1)

# Add rain prediction labels with color mapping


rain_colors <- ifelse(future_data$rain_prediction == "RAIN=YES", "blue",
ifelse(future_data$rain_prediction == "RAIN=NO", "magenta",
"yellow"))

text(future_data$date, future_data$predicted_temperature + 1,
labels = future_data$rain_prediction,
col = rain_colors, font = 2, cex = 0.9)

# Add legend
legend("topright", legend = c("Temperature (°C)", "RAIN=YES", "RAIN=NO",
"RAIN=NA"),
col = c("red", "blue", "magenta", "yellow"),
pch = 16, bty = "n")

# --- Bonus: Plot raw temperature over time with rain indicators ---
dev.new()
plot(weather_data$date, weather_data$temperature, type = "o",
col = "magenta", lwd = 2, pch = 16,

27 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


main = "Temperature Over Time",
xlab = "Date", ylab = "Temperature (°C)")
points(weather_data$date[weather_data$rain > 0],
weather_data$temperature[weather_data$rain > 0],
col = "blue", pch = 17, cex = 1.2)
legend("topright",
legend = c("Temperature", "Rainy Days"),
col = c("magenta", "blue"),
pch = c(16, 17),
bty = "n")

OUTPUT:

> head(weather_data )

date temperature humidity pressure rain


1 2024-01-01 32.01821 73.83717 1022.128 0
2 2024-01-02 32.79764 45.20104 1012.928 1
3 2024-01-03 10.01488 45.15971 1021.298 1
4 2024-01-04 29.06567 57.22615 1011.070 1
5 2024-01-05 22.46109 95.97190 1003.947 0
6 2024-01-06 18.16836 97.38256 1011.058 0

28 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Root Mean Squared Error (RMSE): 10.86

> #Summary of Models


> summary(model_temp)

Call:
lm(formula = temperature ~ humidity + pressure + day_of_year
+
month + weekday, data = train_data)

Residuals:

29 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Min 1Q Median 3Q Max
-16.4739 -9.5256 0.7116 8.9975 16.6716

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -35.73927 174.87503 -0.204 0.839
humidity -0.02731 0.06225 -0.439 0.662
pressure 0.05533 0.17191 0.322 0.748
day_of_year -0.17027 0.13703 -1.243 0.218
month 3.19049 3.97729 0.802 0.425
weekday 0.36879 0.58653 0.629 0.531

Residual standard error: 10.41 on 74 degrees of freedom


Multiple R-squared: 0.05175, Adjusted R-squared: -0.01232
F-statistic: 0.8077 on 5 and 74 DF, p-value: 0.5479

> summary(model_rain_svm)

Call:
svm(formula = rain_label ~ humidity + predicted_temp +
pressure + day_of_year + month + weekday,
data = train_data, type = "C-classification",
kernel = "radial")

Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1

Number of Support Vectors: 72

( 38 34 )
Number of Classes: 2

Levels:
No Yes

30 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


31 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
Next 7 Days Forecast:

date predicted_temperature rain_prediction


1 2024-04-10 15.47 RAIN=NO
2 2024-04-11 15.67 RAIN=NO
3 2024-04-12 15.87 RAIN=NO
4 2024-04-13 16.07 RAIN=NO
5 2024-04-14 13.68 RAIN=NO
6 2024-04-15 13.88 RAIN=NO
7 2024-04-16 14.08 RAIN=NO

32 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

You might also like