0% found this document useful (0 votes)

18 views53 pages

R Record-1

The document is a practical record for the I M.Sc Computer Science course at L.R.G Government Arts College for Women, focusing on Data Mining using R. It includes various algorithms such as Apriori for association rules, K-Means and Hierarchical clustering, and classification methods, along with R code snippets and outputs for practical implementation. The record is intended for submission for the Bharathiar University Practical Examination for the academic year 2023-2024.

Uploaded by

Samy Samy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views53 pages

R Record-1

Uploaded by

Samy Samy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 53

L.R.

G GOVERNMENT ARTS COLLEGE FOR

WOMEN (AFFILIATED TO BHARATHIAR

UNIVERSITY)

TIRUPUR-641604.
DEPARTMENT OF COMPUTER SCIENCE

I MSC-COMPUTER SCIENCE

PRACTICAL-III :DATA MINING USING R

NAME :

REG.NO:
CERTIFICATE
L R.G GOVERNMENT ARTS COLLEGE FOR WOMEN

(AFFILIATED TO BHARATHIAR UNIVERSITY)

TIRUPUR-641604

NAME :

CLASS :

This is to certify that it is a bonafide record of practical work done by the above student of
the I-M.SC COMPUTER SCIENCE PRACTICAL-III: DATA MINING USING R during the
academic year 2023-2024.

Staff in Charge. Head of the Department

Submitted for the Bharathiar University Practical Examination held On. . . . . . . . . . . . . . . .

Internal Examiner. External Examiner

CONTENT
INDEX

S.N DATE CONTENTS PA SIGN

O G
E
N
O

1 APRIORI ALGORITHM TO
EXTRACT ASSOCIATION RULES
OF DATA MINING

2 K-MEANS CLUSTERING ALGORITHM

3 HIERARCHICAL CLUSTERING

4 CLASSIFICATION ALGORITHM

5 DECISION TREE

6 LINEAR REGRESSION

7 DATA VISUALZATION
1. APRIORI ALGORITHM TO EXTRACT ASSOCIATION RULES

OF DATA MINING

# Loading Libraries

library(arules)

library(arulesViz)

library(RColorBrewer)

# import dataset

data('Groceries')

# using apriori() function

rules<-apriori(Groceries,parameter=list(supp=0.01,conf=0.2))

# using inspect() function

inspect(rules[1:10])

# using itemFrequencyPlot() function

arules::itemFrequencyPlot(Groceries,topN=20,

col=brewer.pal(8,'Pastel2'),

main='Relative Item Frequency

Plot', type='relative',

ylab='Item Frequency(Relative)')
OUTPUT:
# Loading Libraries

>library(arules)

Loading required package: Matrix

Attaching package: ‘arules’

The following objects are masked from

‘package:base’: abbreviate, write

> library(arulesViz)

> library(RColorBrewer)

> # import dataset

> data('Groceries')

> # using apriori() function

> rules<-apriori(Groceries,parameter=list(supp=0.01,conf=0.2))

Apriori

Parameter specification:

confidence minval smax arem aval originalSupport maxtime support minlen

0.2 0.1 1 none FALSE TRUE 5 0.01 1

maxlen target ext

10 rules TRUE

Algorithmic

control:

filter tree heap memopt load sort verbose

0.1 TRUE TRUE FALSE TRUE 2 TRUE

Absolute minimum support count: 98

set item appearances ...[0 item(s)] done [0.00s].

set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].

sorting and recoding items ... [88 item(s)] done [0.00s].

creating transaction tree ... done [0.01s].

checking subsets of size 1 2 3 4 done [0.00s].

writing ... [232 rule(s)] done [0.00s].

creating S4 object ... done [0.00s].

> # using inspect() function

> inspect(rules[1:10])

lhs rhs support confidence coverage

[1] {} => {whole milk} 0.25551601 0.2555160 1.00000000

[2] {hard cheese} => {whole milk} 0.01006609 0.4107884 0.02450432

[3] {butter milk} => {other vegetables} 0.01037112 0.3709091 0.02796136

[4] {butter milk} => {whole milk} 0.01159126 0.4145455 0.02796136

[5] {ham} => {whole milk} 0.01148958 0.4414062 0.02602949

[6] {sliced cheese} => {whole milk} 0.01077783 0.4398340 0.02450432

[7] {oil} => {whole milk} 0.01128622 0.4021739 0.02806304

[8] {onions} => {other vegetables} 0.01423488 0.4590164 0.03101169

[9] {onions} => {whole milk} 0.01209964 0.3901639 0.03101169

[10] {berries} => {yogurt} 0.01057448 0.3180428 0.03324860

lift count
[ 1.0000 25
1 00 13
]
[ 1.6076 9
2 82 9
]
[ 1.9169 10
3 16 2
]
[ 1.6223 11
4 85 4
]
[ 1.7275 11
5 09 3
]
[ 1.7213 10
6 56 6
]
[ 1.5739 11
7 68 1
]
[8] 2.372268 140

[9] 1.526965 119

[10] 2.279848 104

> # using itemFrequencyPlot() function

> arules::itemFrequencyPlot(Groceries,topN=20,

+ col=brewer.pal(8,'Pastel2'),

+ main='Relative Item Frequency Plot',

+ type='relative',

+ ylab='Item Frequency(Relative)')
Relative Item Frequency Plot
2.K-Means Clustering
library(cluster)

df=USArrests

#Number of Rows and Columns of the Actul Data

set dim(df)

head(df)

#remove rows with missing

values df=na.omit(df)

dim(df)

#scale each variable to have mean 0 and sd 1

df=scale(df)

head(df)

set.seed(1)

#Cluster the dataset with 5 groups

km=kmeans(df,centers=5,nstart=25)

print(km)

plot(df)

points(km$centers,col=1:5,pch=8,cex=2)

cnt=table(km$cluster)

print(cnt)

final_data=cbind(df,cluster=km$cluster)

head(final_data)

plot(final_data,cex=0.6,main="Final Data")

ag=aggregate(final_data,by=list(cluster=km$cluster),mean)

head(ag)

plot(ag,cex=0.6,main="Aggregate")
OUTPUT:
#K-means Clustering

> #Number of Rows and Columns of the Actaul Data set

> dim(df)

[1] 50 4

> head(df)

Murder Assault UrbanPop Rape

Alabam 13 23 58
a .2 6 21.2
Alaska 10. 263 48
0 44.5
Arizona 8.1 294 80
31.0
Arkansa 8. 190 50
s 8 19.5
Californ 9. 276 91
ia 0 40.6
Colorad 7. 204 78
o 9 38.7

> #remove rows with missing

values

> dim(df)

[1] 50 4

> #scale each variable to have mean 0 and sd 1

> head(df)

Murder Assault UrbanPop Rape

Alabama 1.24256408 0.7828393 -0.5209066 -0.003416473

Alaska 0.50786248 1.1068225 -1.2117642 2.484202941

Arizona 0.07163341 1.4788032 0.9989801 1.042878388

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602

California 0.27826823 1.2628144 1.7589234 2.067820292

Colorado 0.02571456 0.3988593 0.8608085 1.864967207

> #Cluster the dataset with 5 groups

> print(km)

K-means clustering with 5 clusters of sizes 7, 10, 10, 11,

12 Cluster means:

Murder Assault UrbanPop Rape

1 1.5803956 0.9662584 -0.7775109 0.04844071

2 -1.1727674 -1.2078573 -1.0045069 -1.10202608

3 -0.6286291 -0.4086988 0.9506200 -0.38883734

4 -0.1642225 -0.3658283 -0.2822467 -0.11697538

5 0.7298036 1.1188219 0.7571799 1.32135653

Clustering vector:

Alabama Alaska Arkan Californi

a
Arizona 15 5 4 sa

Colorado Connecticut s
Delaware Georg
5 ia
Flori
da

5 3 3 5 1

Haw Idah Illino India Iowa

aii o is na
3 2 5 4 2

Kansas Kentucky Louisiana Maine

Maryland 4 4 1 2

Massachusetts Michigan Minnesota Mississippi Missouri

3 5 2 1 4

Montana Nebraska Nevada New Hampshire New

Jersey 4 4 5 2 3

New Mexico New York North Carolina North Dakota

Ohio 5 5 1 2
3

Oklahoma Oregon Pennsylvania Rhode Island South Carolina

4 4 3 3 1

South Dakota Tennessee Texas Utah

Vermont 2 1 5 3

Virginia Washington West Virginia Wisconsin

Wyoming 4 3 22 4

Within cluster sum of squares by cluster:

[1] 6.128432 7.443899 9.326266 7.788275 18.257332

(between_SS / total_SS = 75.0 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss"

[6] "betweenss" "size" "iter" "ifault"

> plot(df)
> points(km$centers,col=1:5,pch=8,cex=2)
> print(cnt)

12345

7 10 10 11 12

> head(final_data)

Murder Assault UrbanPop Rape cluster

Alaba 1.24256408 0.7828393 -0.5209066 - 1

ma 0.003416473
Alaska 0.50786248 1.1068225 -1.2117642 2.484202941 5

Arizon 0.07163341 1.4788032 0.9989801 1.042878388 5

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602 4

California 0.27826823 1.2628144 1.7589234 2.067820292 5

Colorado 0.02571456 0.3988593 0.8608085 1.864967207 5

> plot(final_data,cex=0.6,main="Final Data")

> head(ag)

cluster Murder Assault UrbanPop Rape cluster

1 1 1.5803956 0.9662584 -0.7775109 0.04844071 1

2 2 -1.1727674 -1.2078573 -1.0045069 -1.10202608 2

3 3 -0.6286291 -0.4086988 0.9506200 -0.38883734 3

4 4 -0.1642225 -0.3658283 -0.2822467 -0.11697538 4

5 5 0.7298036 1.1188219 0.7571799 1.32135653 5

> plot(ag,cex=0.6,main="Aggregate")
3.HIERARCHICAL CLUSTERING
#Hierarchical Clustering

library(cluster)

df=USArrests

#remove rows with missing

values df=na.omit(df)

#scale each variable to have a mean 0 and sd of

1 df=scale(df)

head(df)

d=dist(df,method="euclidean")

#complete dendogram

hc1=hclust(d,method="complete")

plot(hc1,cex=0.6,main="complete dendogram",hang=-1)

#average dendogram

hc2=hclust(d,method="average")

plot(hc2,cex=0.6,main="Average Dendogram",hang=-1)

abline(h=3.0,col="green")

groups=cutree(hc2,k=4)

print(groups)

table(groups)

rect.hclust(hc2,k=4,border="red")

final_data=cbind(df,cluster=groups)

head(final_data)

plot(final_data,cex=0.6,main="Final Data")
OUTPUT:
#Hierarchical Clustering

> #remove rows with missing values

> #scale each variable to have a mean 0 and sd of 1

> head(df)

Murder Assault UrbanPop Rape

Alabama 1.24256408 0.7828393 -0.5209066 -0.003416473

Alaska 0.50786248 1.1068225 -1.2117642 2.484202941

Arizona 0.07163341 1.4788032 0.9989801 1.042878388

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602

California 0.27826823 1.2628144 1.7589234 2.067820292

Colorado 0.02571456 0.3988593 0.8608085 1.864967207

> #complete dendogram

> plot(hc1,cex=0.6,main="complete dendogram",hang=-1)

> #average dendogram

> print(groups)

Alabama Alaska Arkan Californi

a
Arizona 12 3 4 sa

Colorado Connecticut s
Delaware Geor
3 gia
Flori
da

3 4 4 3 1

Haw Idah Illino India Iowa

aii o is na
4 4 3 4 4

Kansas Kentucky Louisiana Maine

Maryland 4 4 1 4

Massachusetts Michigan Minnesota Mississippi Missouri

4 3 4 1 3

Montana Nebraska Nevada New Hampshire New Jersey

4 4 3 4 4

New Mexico New York North Carolina North Dakota

Ohio 3 3 1 4

Oklahoma Oregon Pennsylvania Rhode Island South Carolina 4

4 4 4 1

South Dakota Tennessee Texas Utah

Vermont 4 1 3 4

Virginia Washington West Virginia Wisconsin

Wyoming 4 4 44 4

groups

1234

7 1 12 30

> rect.hclust(hc2,k=4,border="red")
> final_data=cbind(df,cluster=groups)

> head(final_data)

Murder Assault UrbanPop Rape cluster

Alaba 1.24256408 0.7828393 -0.5209066 - 1

ma 0.003416473
Alaska 0.50786248 1.1068225 -1.2117642 2.484202941 2

Arizon 0.07163341 1.4788032 0.9989801 1.042878388 3

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602 4

California 0.27826823 1.2628144 1.7589234 2.067820292 3

Colorado 0.02571456 0.3988593 0.8608085 1.864967207 3

> plot(final_data,cex=0.6,main="Final Data")
4.CLASSIFICATION ALGORITHM
#Classification

Algorithm library(class)

df=data(iris)

#Number of Rows and Columns

dim(iris)

head(iris)

rand=sample(1:nrow(iris),0.9*nrow(iris))

head(rand)

#Scale the values using Normalization

method nor<-function(x)

return((x-min(x))/(max(x)-min(x)))

iris_norm=as.data.frame(lapply(iris[,c(1,2,3,4)],nor))

head(iris_norm)

#Train dataset

iris_train=iris_norm[rand,]

iris_train_target=iris[rand,5

] #Test dataset

iris_test=iris_norm[-rand,]

iris_test_target=iris[-rand,5]

dim(iris_train)

dim(iris_test)

#K-nearesr neighbour Classification

model1=knn(train=iris_train,test=iris_test,cl=iris_train_target,k=7)
#Confusion Matric

tab=table(model1,iris_test_target)

print(tab)

accuracy=function(x)

sum(diag(x)/sum(rowSums(x)))*100

cat("Accuracy classifier=",accuracy(tab))
OUTPUT:
#Classification Algorithm

> #Number of Rows and Columns

> dim(iris)

[1] 150 5

> head(iris)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3 1.4 0.2 seto

. sa
5
2 4.9 3 1.4 0.2 seto
. sa
0
3 4.7 3 1.3 0.2 seto
. sa
2
4 4.6 3 1.5 0.2 seto
. sa
1
5 5.0 3 1.4 0.2 seto
. sa
6
6 5.4 3 1.7 0.4 seto
. sa
9

> head(rand)

[1] 114 107 25 128 14 24

> #Scale the values using Normalization method

> nor<-function(x)

+ return((x-min(x))/(max(x)-min(x)))

> head(iris_norm)

Sepal.Length Sepal.Width Petal.Length Petal.Width

1 0.62500 0.067796 0.041666
0.22222222 00 61 67
2 0.41666 0.067796 0.041666
0.16666667 67 61 67
3 0.50000 0.050847 0.041666
0.11111111 00 46 67
4 0.45833 0.084745 0.041666
0.08333333 33 76 67
5 0.19444444 0.6666667 0.06779661 0.04166667

6 0.30555556 0.7916667 0.11864407 0.12500000

> #Train dataset

> iris_train=iris_norm[rand,]

> iris_train_target=iris[rand,5]

> #Test dataset

> iris_test=iris_norm[-rand,]

> iris_test_target=iris[-rand,5]

> dim(iris_train)

[1] 135 4

> dim(iris_test)

[1] 15 4

> #K-nearesr neighbour Classification

> model1=knn(train=iris_train,test=iris_test,cl=iris_train_target,k=7)

> #Confusion Matric

> print(tab)

iris_test_target

model1 setosa versicolor

virginica setosa 6 0

versicolor 0 6 1

virginica 0 0 2

> accuracy=function(x)

+ sum(diag(x)/sum(rowSums(x)))*100

cat("Accuracy classifier=",accuracy(tab)) Accuracy classifier= 100>

5.DECISION TREE
#Decision Tree

library(rpart)

data=iris

str=data

head(data)

#creating the decision tree using regression

dtree=rpart(Sepal.Width~Sepal.Length+Petal.Width+Petal.Length+Species,data=iris,method
="anova")

plot(dtree,uniform=TRUE,main="Sepal Width Decision Tree Using Regression")

print(dtree)

text(dtree,use.n=TRUE,cex=.7)

#predicting the Sepal Width

adata<-data.frame(Species='versicolor',Sepal.Length=5.1,Petal.Length=4.5,Petal.Width=1.4)

cat("Predicted Value:\n")

pt=predict(dtree,adata,method="anova")

print(pt)

plot(pt)

#creating the decision tree using

classification df=as.data.frame(data)

dt=rpart(Sepal.Width~Sepal.Length+Petal.Width+Petal.Length+Species,data=df,method="cl
ass")

plot(dt,uniform=TRUE,main="Sepal Width Decision Tree using Classification")

print(dt)

text(dt,use.n=TRUE,cex=.7)
OUTPUT:
> #Decision Tree

> head(data)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3 1.4 0.2 seto

. sa
5
2 4.9 3 1.4 0.2 seto
. sa
0
3 4.7 3 1.3 0.2 seto
. sa
2
4 4.6 3 1.5 0.2 seto
. sa
1
5 5.0 3 1.4 0.2 seto
. sa
6
6 5.4 3 1.7 0.4 seto
. sa
9
> #creating the decision tree using regression
dtree=rpart(Sepal.Width~Sepal.Length+Petal.Width+Petal.Length+Species,data=iris,metho
d
="anova")

> plot(dtree,uniform=TRUE,main="Sepal Width Decision Tree Using Regression")

> print(dtree)

n= 150

node), split, n, deviance, yval

* denotes terminal node

1) root 150 28.3069300 3.057333

2) Species=versicolor,virginica 100 10.9616000 2.872000

4) Petal.Length< 4.05 16 0.7975000 2.487500 *

5) Petal.Length>=4.05 84 7.3480950 2.945238

10) Petal.Width< 1.95 55 3.4920000 2.860000

20) Sepal.Length< 6.35 36 2.5588890 2.805556 *

21) Sepal.Length>=6.35 19 0.6242105 2.963158 *

11) Petal.Width>=1.95 29 2.6986210 3.106897

22) Petal.Length< 5.25 7 0.3285714 2.914286 *

23) Petal.Length>=5.25 22 2.0277270 3.168182 *

3) Species=setosa 50 7.0408000 3.428000

6) Sepal.Length< 5.05 28 2.0496430 3.203571 *

7) Sepal.Length>=5.05 22 1.7859090 3.713636 *

> text(dtree,use.n=TRUE,cex=.7)

> #predicting the Sepal Width

Predicted Value:

2.805556

> plot(pt)
> #creating the decision tree using classification

> plot(dt,uniform=TRUE,main="Sepal Width Decision Tree using Classification")

> print(dt)

n= 150

node), split, n, loss, yval, (yprob)

* denotes terminal node

1) root 150 124 3 (0.0067 0.02 0.027 0.02 0.053 0.033 0.06 0.093 0.067 0.17 0.073 0.087
0.04 0.08 0.04 0.027 0.02 0.04 0.013 0.0067 0.0067 0.0067 0.0067)

2) Petal.Width>=0.8 100 80 3 (0.01 0.03 0.03 0.03 0.08 0.05 0.09 0.14 0.09 0.2 0.07 0.08
0.04 0.03 0 0.01 0 0.02 0 0 0 0 0)

4) Sepal.Length< 6.45 65 55 2.8 (0.015 0.046 0.046 0.046 0.11 0.062 0.14 0.15 0.11 0.14
0.015 0.046 0.031 0.046 0 0 0 0 0 0 0 0 0)

8) Petal.Width< 1.95 56 47 2.7 (0.018 0.054 0.054 0.054 0.11 0.071 0.16 0.11 0.12 0.16
0.018 0.036 0.018 0.018 0 0 0 0 0 0 0 0 0)

16) Sepal.Length< 5.55 12 9 2.4 (0.083 0 0.17 0.25 0.25 0.083 0.083 0 0 0.083 0 0 0 0
0 0 0 0 0 0 0 0 0) *

17) Sepal.Length>=5.55 44 36 2.7 (0 0.068 0.023 0 0.068 0.068 0.18 0.14 0.16 0.18
0.023 0.045 0.023 0.023 0 0 0 0 0 0 0 0 0)

34) Petal.Width< 1.55 29 23 2.9 (0 0.1 0.034 0 0.069 0.1 0.1 0.17 0.21 0.17 0 0.034 0
0 0 0 0 0 0 0 0 0 0)

68) Sepal.Length>=5.95 15 11 2.9 (0 0.2 0.067 0 0.067 0.067 0 0.2 0.27 0.067 0
0.067 0 0 0 0 0 0 0 0 0 0 0) *

69) Sepal.Length< 5.95 14 10 3 (0 0 0 0 0.071 0.14 0.21 0.14 0.14 0.29 0 0 0 0 0 0 0

0 0 0 0 0 0) *

35) Petal.Width>=1.55 15 10 2.7 (0 0 0 0 0.067 0 0.33 0.067 0.067 0.2 0.067 0.067
0.067 0.067 0 0 0 0 0 0 0 0 0) *

9) Petal.Width>=1.95 9 5 2.8 (0 0 0 0 0.11 0 0 0.44 0 0 0 0.11 0.11 0.22 0 0 0 0 0 0 0 0

0) *

5) Sepal.Length>=6.45 35 24 3 (0 0 0 0 0.029 0.029 0 0.11 0.057 0.31 0.17 0.14 0.057 0 0

0.029 0 0.057 0 0 0 0 0) *

3) Petal.Width< 0.8 50 41 3.4 (0 0 0.02 0 0 0 0 0 0.02 0.12 0.08 0.1 0.04 0.18 0.12 0.06
0.06 0.08 0.04 0.02 0.02 0.02 0.02)

6) Sepal.Length< 4.95 20 15 3 (0 0 0.05 0 0 0 0 0 0.05 0.25 0.2 0.2 0 0.15 0 0.1 0 0 0 0 0

0 0)

12) Petal.Length< 1.45 13 8 3 (0 0 0.077 0 0 0 0 0 0.077 0.38 0 0.23 0 0.077 0 0.15 0 0 0

0 0 0 0) *

13) Petal.Length>=1.45 7 3 3.1 (0 0 0 0 0 0 0 0 0 0 0.57 0.14 0 0.29 0 0 0 0 0 0 0 0 0) *

7) Sepal.Length>=4.95 30 24 3.4 (0 0 0 0 0 0 0 0 0 0.033 0 0.033 0.067 0.2 0.2 0.033 0.1
0.13 0.067 0.033 0.033 0.033 0.033)

14) Petal.Length< 1.45 11 7 3.5 (0 0 0 0 0 0 0 0 0 0 0 0.091 0.091 0.091 0.36 0.091 0 0

0.091 0.091 0 0.091 0) *

15) Petal.Length>=1.45 19 14 3.4 (0 0 0 0 0 0 0 0 0 0.053 0 0 0.053 0.26 0.11 0 0.16

0.21 0.053 0 0.053 0 0.053) *

> text(dt,use.n=TRUE,cex=.7)
6.LINEAR REGRESSION
#Linear Regression

setwd("D:/R")

df=read.csv("h2.csv",header=TRUE)

print(df)

lr=lm(height~weight,data=df)

print(lr)

#Linear Regression

plot(df$height,df$weight,col="blue",main="Height_Weight
Regression",cex=1.3,pch=15,xlab="height",ylab="weight")

print(summary(lr))

print(residuals(lr))

coeff=coefficients(lr)

eq=paste0("y",round(coeff[2],1),"*(",round(coeff[1],1),"*x)")

print(eq)

#Linear Equation

new.weights=data.frame(weight=c(60,50)

) print(new.weights)

df1=predict(lr,newdata=new.weights)

print(df1)

df2=data.frame(df1,new.weights)

names(df2)=c("height","weight")

print(df2)

df3=rbind(df,df2)

print(df3)

write.csv(df3,"h3.csv")

pie(table(df3$height))
OUTPUT:
> #Linear Regression

> setwd("D:/R")

> df=read.csv("h2.csv",header=TRUE)

> print(df)

height weight

1 8
0
174
2 7
0
150
3 7
5
160
4 8
5
180

> lr=lm(height~weight,data=df)

> print(lr)

Call:

lm(formula = height ~ weight, data =

df) Coefficients:

(Intercept) weight
4.80 2.08

> #Linear Regression

> print(summary(lr))

Call:

lm(formula = height ~ weight, data =

df) Residuals:

1 2 3 4

2.8 -0.4 -0.8 -1.6

Coefficients:
Estimate Std. Error t value Pr(>|t|)

(Intercept) 4.8000 16.4463 0.292 0.7979

weight 2.0800 0.2117 9.827 0.0102 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.366 on 2 degrees of freedom

Multiple R-squared: 0.9797, Adjusted R-squared: 0.9696

F-statistic: 96.57 on 1 and 2 DF, p-value: 0.0102

> print(residuals(lr))

1 2 3 4

2.8 -0.4 -0.8 -1.6

> print(eq)

[1] "y2.1*(4.8*x)"

> #Linear Equation

> print(new.weights)

weight

1 60

2 50

> print(df1)

1 2

129.6 108.8

> print(df2)

height weight

1 129.6 60

2 108.8 50

> df3=rbind(df,df2)
> print(df3)

height

weight

1 174.0 8
0
2 150.0 7
0
3 160.0 7
5
4 180.0 8
5
11 6
129.6 0
21 5
108.8 0

> write.csv(df3,"h3.csv")

> pie(table(df3$height))
7.DATA VISUALIZATION
#Data Visualization

X=iris

dim(X)

summary(X)

head(X)

hist(X$Sepal.Length,main='Histogram',col='green')

barplot(X$Sepal.Length[1:10],main='Barplot',col='red',xlab='Sepal.Length'

) pie(table(X$Sepal.Length),main='pie-chart')

pairs(X)

plot(X$Sepal.Length,main='plot-chart',col='blue')

boxplot(X,main='Boxplot',col='yellow')
OUTPUT:

> #Data Visualization

> dim(X)

[1] 150 5

> summary(X)

Sepal.Length Sepal.Width

Petal.Length Min. :4.300

Min. :2.000 Min. :1.000 1st Qu.:5.100

1st Qu.:2.800 1st Qu.:1.600

Median :5.800 Median :3.000 Median :4.350

Mean :5.843 Mean :3.057 Mean :3.758 3rd

Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 Max.

:7.900 Max. :4.400 Max. :6.900

Petal.Width Species

Min. :0.100 setosa :50

1st Qu.:0.300 versicolor:50

Median :1.300 virginica :50

Mean :1.199

3rd Qu.:1.800

Max. :2.500

> head(X)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3 1.4 0.2 seto
. sa
5
2 4.9 3 1.4 0.2 seto
. sa
0
3 4.7 3 1.3 0.2 seto
. sa
2
4 4.6 3 1.5 0.2 seto
. sa
1
5 5.0 3 1.4 0.2 seto
. sa
6
6 5.4 3 1.7 0.4 seto
. sa
9

> hist(X$Sepal.Length,main='Histogram',col='green')

> barplot(X$Sepal.Length[1:10],main='Barplot',col='red',xlab='Sepal.Length')
> pie(table(X$Sepal.Length),main='pie-chart')

> pairs(X)
> plot(X$Sepal.Length,main='plot-chart',col='blue')
> boxplot(X,main='Boxplot',col='yellow')

M.Sc Data Mining Using R Guide
No ratings yet
M.Sc Data Mining Using R Guide
15 pages
Datamining Lab Record
No ratings yet
Datamining Lab Record
36 pages
R Data Analysis Techniques
No ratings yet
R Data Analysis Techniques
9 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
Datamininganddataware
No ratings yet
Datamininganddataware
25 pages
R - Practical
No ratings yet
R - Practical
50 pages
Datamining
No ratings yet
Datamining
20 pages
Apriori Algorithm & Clustering Guide
No ratings yet
Apriori Algorithm & Clustering Guide
8 pages
DM Lab
No ratings yet
DM Lab
18 pages
R Reference Card For Data Mining
No ratings yet
R Reference Card For Data Mining
3 pages
R Reference Card For Data Mining
No ratings yet
R Reference Card For Data Mining
4 pages
R Data Mining Guide for Analysts
No ratings yet
R Data Mining Guide for Analysts
4 pages
YanchangZhao Refcard Data Mining
No ratings yet
YanchangZhao Refcard Data Mining
3 pages
DWDM Lab Report
No ratings yet
DWDM Lab Report
10 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
Datamining 2
No ratings yet
Datamining 2
54 pages
R - Language Lab Manual - PG 2024
No ratings yet
R - Language Lab Manual - PG 2024
29 pages
Da Exp9,10
No ratings yet
Da Exp9,10
9 pages
DWDM Lab Report
No ratings yet
DWDM Lab Report
12 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
DM Guidelines 14jan2022
No ratings yet
DM Guidelines 14jan2022
5 pages
1
No ratings yet
1
19 pages
R Programming for Data Analysis
No ratings yet
R Programming for Data Analysis
6 pages
Saurabh
No ratings yet
Saurabh
22 pages
KVA Anusha - PGP12021 - BA
100% (1)
KVA Anusha - PGP12021 - BA
8 pages
EXXAM
No ratings yet
EXXAM
3 pages
BAN5
No ratings yet
BAN5
2 pages
RDataMining Reference Card
No ratings yet
RDataMining Reference Card
5 pages
R Code For Discriminant and Cluster Analysis
No ratings yet
R Code For Discriminant and Cluster Analysis
23 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
WEEK
No ratings yet
WEEK
17 pages
Prac7 8 9 10
No ratings yet
Prac7 8 9 10
12 pages
R Lab Manual (1) - Merged
No ratings yet
R Lab Manual (1) - Merged
25 pages
Assignment-1 80501
No ratings yet
Assignment-1 80501
6 pages
Unit 4 DSRP
No ratings yet
Unit 4 DSRP
119 pages
DSR LAB MANUAL - 10 Programs
No ratings yet
DSR LAB MANUAL - 10 Programs
34 pages
Unit 3 Unsupervised Learning
No ratings yet
Unit 3 Unsupervised Learning
9 pages
Document 1116
No ratings yet
Document 1116
6 pages
Big Data
No ratings yet
Big Data
17 pages
Unit 6 - Machine Learning in R
No ratings yet
Unit 6 - Machine Learning in R
45 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
R File Code
No ratings yet
R File Code
16 pages
Spatial Statistics in R
No ratings yet
Spatial Statistics in R
29 pages
Data Mining Solve
No ratings yet
Data Mining Solve
5 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
R Basics
No ratings yet
R Basics
18 pages
Final Practical
No ratings yet
Final Practical
53 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Bi 5to 8
No ratings yet
Bi 5to 8
6 pages
PG DM
No ratings yet
PG DM
4 pages
Association Rules
No ratings yet
Association Rules
29 pages
DMBI
No ratings yet
DMBI
16 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
21 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Network Security Cryptography
No ratings yet
Network Security Cryptography
80 pages
Unit2 Da
No ratings yet
Unit2 Da
7 pages
App Builder
No ratings yet
App Builder
2 pages
R Practical - Aim & Algorithm
No ratings yet
R Practical - Aim & Algorithm
7 pages
Duolingo - Q2 2022 Shareholder Letter
No ratings yet
Duolingo - Q2 2022 Shareholder Letter
25 pages
ROLAND LEC-300 - USE2 - EN - R1Updt01dec008
No ratings yet
ROLAND LEC-300 - USE2 - EN - R1Updt01dec008
56 pages
Winding EN SG2000
No ratings yet
Winding EN SG2000
92 pages
Registration Form
No ratings yet
Registration Form
4 pages
Chapter Five: What-If Analysis For Linear Programming
No ratings yet
Chapter Five: What-If Analysis For Linear Programming
44 pages
BBD Tester - mn3207 3205 BOM
No ratings yet
BBD Tester - mn3207 3205 BOM
1 page
Green Illustrative Research Report Presentation
No ratings yet
Green Illustrative Research Report Presentation
30 pages
BMW i4 Models & Features Guide
No ratings yet
BMW i4 Models & Features Guide
29 pages
SITHCCC019 Student Assessment Tasks
No ratings yet
SITHCCC019 Student Assessment Tasks
29 pages
SV-Is5 Manual (English)
No ratings yet
SV-Is5 Manual (English)
205 pages
Displacement Velocity Acceleration
No ratings yet
Displacement Velocity Acceleration
6 pages
Screenshot 2024-07-10 at 7.00.31 PM
No ratings yet
Screenshot 2024-07-10 at 7.00.31 PM
112 pages
Sample SOP For MS in Electrical Engineering: Digital Design
No ratings yet
Sample SOP For MS in Electrical Engineering: Digital Design
2 pages
Delhi State Spatial Data Infrastructure: A Trend Setter in Urban Management
No ratings yet
Delhi State Spatial Data Infrastructure: A Trend Setter in Urban Management
29 pages
Operating Hydropower in Nepal
No ratings yet
Operating Hydropower in Nepal
4 pages
PSM Implementation AIGA
No ratings yet
PSM Implementation AIGA
25 pages
Mewps Operator Practical Only Assessment Sheets 2021 Mercury - Download Free PDF - Vehicles - Transport
No ratings yet
Mewps Operator Practical Only Assessment Sheets 2021 Mercury - Download Free PDF - Vehicles - Transport
1 page
LMS Algorithm for Audio Noise Cancellation
No ratings yet
LMS Algorithm for Audio Noise Cancellation
9 pages
Cash & Bank Flow Statement
No ratings yet
Cash & Bank Flow Statement
8 pages
WEG CFW11 Installation Guide 10001803811 en Es PT de FR Ru It Tu
No ratings yet
WEG CFW11 Installation Guide 10001803811 en Es PT de FR Ru It Tu
212 pages
Nursing Care Plan: Angeles University Foundation College of Nursing
No ratings yet
Nursing Care Plan: Angeles University Foundation College of Nursing
2 pages
SVP010093 Aeml Aarey C Doc 001 - R1
100% (1)
SVP010093 Aeml Aarey C Doc 001 - R1
13 pages
Cheyenne's Community Cat Initiative
No ratings yet
Cheyenne's Community Cat Initiative
1 page
IT Systems Admin Expertise
No ratings yet
IT Systems Admin Expertise
3 pages
Shang Et Al 2017 Optimal Retail Return Policies With Wardrobing
No ratings yet
Shang Et Al 2017 Optimal Retail Return Policies With Wardrobing
18 pages
Chang y Shinozuka
No ratings yet
Chang y Shinozuka
16 pages
SolidWorks Toolbox Guide
No ratings yet
SolidWorks Toolbox Guide
69 pages
Edwards Makubuya V KCC Kawempe Division
No ratings yet
Edwards Makubuya V KCC Kawempe Division
9 pages
Thermal and Slip Effect On Rotor Time Constant in Vector Controlled Indiction Motor
No ratings yet
Thermal and Slip Effect On Rotor Time Constant in Vector Controlled Indiction Motor
10 pages