0% found this document useful (0 votes)
82 views46 pages

Lab Manual

The document is a lab manual for a course on Data Sciences Using R at the SRM Institute of Science and Technology. It covers the installation of R and RStudio, basic programming concepts in R, data structures, and various data manipulation techniques. Additionally, it includes practical exercises on implementing functions, data frames, and statistical analysis using R.

Uploaded by

BALAJI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
82 views46 pages

Lab Manual

The document is a lab manual for a course on Data Sciences Using R at the SRM Institute of Science and Technology. It covers the installation of R and RStudio, basic programming concepts in R, data structures, and various data manipulation techniques. Additionally, it includes practical exercises on implementing functions, data frames, and statistical analysis using R.

Uploaded by

BALAJI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 46
INSTITUTE OF SCIENCE & TECHNOLOGY Deemed tobe University u/s 3of UGC Act, 1956 SRM INSTITUTE OF SCIENCE AND TECHNOLOGY Directorate of Distance Education MBRD2142 - Data Sciences Using R LAB MANUAL INDEX S.No List of Programs Page No. Download and install R-Programming environment and install 7 1 basic packages using install .packages()command in R. Learn all the basics of R-Programming (Data types , Variables 9 2 | Operators etc.) Implement R-Loops with different examples. 7 3 {1 Eeain the basic of functions in R and implement with examples. 20 Implement data frames in R. Write a program to join columns and | 22 5 | rows in a data frame using c bind()and r bind() in R. Implement different String Manipulation functions in R. 24 6 Implement different data structures in R (Vectors, Lists, Data 26 7 | Frames) § Write a program to read a csv file and analyze the data in the file in R| 30 ‘Create pie charts and bar charts using R. 37 9 9 | Create a data set and do statistical analysis on the data using R. 39 11 | White R program to find Correlation and Covariance 40 Write R program for Regression Modeling 42 12 Write R program to build classification model using KNN 4B 13. | algorithm 14 | White R program to build clustering model using K-mean algorithm 46 Brief Introduction of R Programming Language : R is an open-source programming language that is widely used as a statistical software and data analysis tool. R generally comes with the Command-line interface. R is available across widely used platforms like Windows, Linux, and macOS. Also, the R programming language is the latest cutting-edge tool. It was designed by Ross Thaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, R programming language is an implementation of the S programming language. It also combines with lexical scoping semantics inspired by Scheme. Moreover, the project conceives in 1992, with an initial version released in 1995 and a stable beta version in 2000. Use of R Programming : + It’s a platform-independent language. This means it can be applied to all operating system. + It’s an open-source free language. That means anyone can install it in any organization without purchasing a license, +R programming is used as a leading tool for machine learning, statistics, and data analysis. Objects, functions, and packages can easily be created by R. +R programming language is not only a statistic package but also allows us to integrate with other languages (C, C++). Thus, can easily interact with many data sources and statistical packages. + The R programming language has a vast community of users and it’s growing day by day. + Ris currently one of the most requested programming languages in the Data Science job market that makes it the hottest trend nowadays 1. Installation of R-Studio on windows: Step — 1: With R-base installed, let’s move on to installing RStudio. To begin, id RStudioand click on the download button for RStudio desktop. goto downk Step-2: Click on the link for the windows version of RStudio and save the.exe file. Step-3: Run the .exe and follow the installation instructions. Click Next on the welcome window. Enter/ browse the path to the installation folder and click Next to proceed. Select the folder for the start menu shortcut or click on do not create shortcuts and then click Next. Wait for the installation process to complete. Click Finish to end the installation, Output : =e Ce | oe Install the R Packages:- + First, run RStudio. + After clicking on the packages tab, click on install, The following dialog box will appear. + In the Install Packages dialog, write the package name you want to install under the Packages field and then click install, This will install the packageyousearchedfororgiveyoualistofmatchingpackagesbasedonyour package text. Installing Packages:- Loading Packages:- Once the package is downloaded to your computer you can access the functions and Resources provided by the package in two different ways: #load the package to use in the current R session library (package name) Getting Help on Packages:~ :/Program Files/R/R-3.2.2/libra install packages("Package Name") # Install the package named "XML" install. packages("XML") 2, Learn all the basics of R-Programming (Data types, Variables, Operators etc.) Program Description : Variables are nothing but reserved memory locations to store values. This means that, when create a variable you reserve some space in memory. A variable provides us with named storage that our programs can manipulate. A variable in R can store an atomic vector, group of atomic vectors or a combination of many Robjects. A valid variable name consists of letters, numbers and the dot or underline characters, The vari le name starts with a letter or the dot not followed by a number. An operator is a symbol that tells the compiler to perform specific mathematical or logical ‘manipulations. R language is rich in built-in operators and provides following types of operators, Data Types : Numeric : v<23.5 print(class(v)) Logical v< TRUE print(class(v)) Integer v<2L print(class(v)) Output : Reobjects. + Vectors + Lists = Matrices © Arrays © Factors + Data Frames Vectors ‘When you want to create vector with more than one element, you should use e() function which means to combine the elements into a vector. # Create a vector. apple <- e('red’'green’, print(apple) yellow") # Get the class of the vector. print(class(apple)) Output : 10 Lists A list is an R-object which can contain many different types of elements inside it like vectors, functions and even another list inside it. # Create a list. list] < list(o(2,5,3),21.3,sin) ## Print the list, print(list!) Output : W Matri es A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function, # Create a matrix, M =matrix( c('a'a',b'/c'b' a") nrow=2,ncol-3,byrow= TRUE) print(M) Output : 1 Arrays While matrices are confined to two dimensions, arrays can be of any number of dimensions, The array function takes a dim attribute which creates the required number of dimension, In the below example we create an array with two elements which are 3x3 matrices each # Create an array. a. < array((‘green','yellow'),dim= ¢(3,3,2)) print(a) Output : 13, Factors Factors are the R-objects which are created using a vector. It stores the vector along with the distinet values of the elements in the vector as labels. The label re always character irrespective of whether it is numeric or character or Boolean ete. in the input vector. They are useful in statistical modeling. Factors are created using the factor() function, The nlevels functions gives the count of levels. apple_colors<- e('green,'green’,'yellow','red' red’ red’ # Create a factor object. factor_apple<- factor(apple_colors) # Print the factor. print(factor_apple) print(nlevels(factor_apple)) [I] green green yellow red redred green Levels: green red yellow Output : poise 14 Variables: The variables can be assigned values using leftward, rightward and equal to operator. The values of the variables can be printed using printQ or eat() function. The eatQ) function combines multiple items into a continuous print output, # Assignment using equal operator. var. 1=e(0,1,2,3) # Assignment using leftward operator. var.2< ¢("leam","R") # Assignment using rightward operator. e(TRUE,1)->var3 print(var.1) cat ("var.1 is "wvar.1,"\n") cat ("var.2 is ",var.2,"\n") cat ("var.3 is ",var.3,"\n") Output : 15 R Operators : ‘Types of Operators Arithmetic Operators vt) Logical Operators v<¢(3,1,TRUE,2#3i) t< e(4,1,FALSE,2+3i) print(véet) Assignment Operators v1 <-6(3,1,TRUE,2+3i) v2 << e(3,1,TRUE,2+3i) v3 = (3,1, TRUE,2#3i) print(v1) print(v2) print(v3) ‘Output : 16 3 Implement R-Loops with Program Deseription : A for loop is the most popular control flow statement. A for loop is used to iterate a vector. It is similar to the while loop. There is only one difference between for and while, i.c., in while loop, the condition is checked before the execution of the body, but in for loop condition is checked after the execution of the body, # Create fruit vector fruit < e(Apple’, Orange’,"Guava’, 'Pinapple’, Banana''Grapes') # Create the for statement for (i in fruit) print(i) Output : 7 ¥ Creating a matrix ‘mat <+ matrix(data = seq(10, 21, by=1), nrow = 6, neo! =2) # Creating the loop with r and ¢ to iterate over the matrix for (rin I:nrow(mat)) for (¢ in 1:ncol(mat)) printipaste("mat(", , ",",¢, print(mat) ', mat{r,c])) Output : 18 R while loop = A while loop is a type of control flow statements which is used to iterate a block of code several numbers of times. The while loop terminates when the value of the Boolean expression will be false. In while loop, firstly the condition will be checked and then after the body of the statement will execute. In this statement, the condition will be checked n+1 time, rather than n times. vv < ¢("Hello","while loop") ent <- 2 while (ent <7) { print(v) cnt = ent + } Output : 19 4. Learn the basics of functions in R and implement with examples. Program Description : A funetion is a set of statements organized together to perform a specific task. R has a large number of in-built functions and the user can create their own functions. In R, a function is an object so the R interpreter is able to pass control to the function, along with arguments that may be necessary for the function to accomplish the actions. ‘The function in turn performs its task and returns control to the interpreter as well as any result which may be stored in other objects. Built-in Function # Create a sequence of numbers from 32 to 44, print(seq(32,44)) # Find mean of numbers from 25 to 82. print(mean(25:82)) # Find sum of numbers fim 41 to 68, print(sum(41:68)) Output : 2 User-defined Function ‘We can create user-defined functions in R, They are specific to what a user wants and once created they can be used like the built-in functions. Below is an example of how a function is created and used. # Create a function to print squares of numbers in sequence. new. function for(i in I:a) { b600) print(details) Output : Getting details of those peoples who joined on or after 2014. # Creating a data frame. esv_data< read.csv("record.csv") +#Getting details of those peoples who joined on or after 2014 details <- subset(csv_data,as.Date(start_date)>as.Date("2014-01-01")) print(details) Output : Writing into a CSV file: esv_data< read.csv("record.csv") #Getting details of those peoples who joined on or after 2014 details <- subset(csv_data,as.Date(start_date)>as.Date("2014-01-01")) # Writing filtered data into a new file. write.csv(details,"output.csv new_details<- read.csv("output. print(new_details) Output : 9. Create pie charts and bar charts using R Program Description : A pie-chart is a representation of values as slices of a circle with different colors. The slices are labeled and the numbers corresponding to each slice is also represented in the chart. # Create data for the graph. gocks< o(23, 56, 20, 63) labels <- ¢("Mumbai, "Pune", "Chennai", "Bangalore") # Plot the chart. pie(geeks, labels) Output : # Create the data for the chart A< (17, 32, 8, 53, 1) # Plot the bar chart barplot(A, xlab = "X-axis", ylab = "Y-axis", main ="Bar-Chart") Output : 10. Create a data set and do statistical analysis on the data using R Program Description : The R Programming Language provides some easy and quick tools that let us convert our data into visually insightful elements like graphs. #7 is used before a function # to get help on that function 2plot ?chickwts data(chickwts) #loading data into workspace plot(chickwts$feed) # plot feed from chickwts feeds-table(chickwts$feed) # plots graph in decreasing order barplot(feeds[order(feeds, decreasing~-TRUE)]) Output : 11. Write R program to find Correlation and Covariance Program Description : Covariance shows the direction of the path of the linear relationship between the variables while a function is applied to them. Correlation on the contrary measures both the power and direction of the linear relationship between two variables, # R program to illustrate # pearson Correlation Testing # Using cor) # Taking two numeric # Vectors with same length (1, 2,3, 4,5, 6,7) y=e(l,3,6,2,7,4,5) x # Calculating # Correlation coefficient # Using cor() method result = cor(x, y, method = "pearson") # Print the result cat("Pearson correlation coefficient is:", result) Output : 40 Covariance # Data vectors x 7 from a given matrix. 4, Create the variables a= {1,13, 16,2, 3.5} b=2+3a Create a scatterplot of a and b with the name “Linear function”, axes labels “a - explanatory variable” and “b - response variable”. Color the points in green 5, Write a R Code to fit simple multi linear regression using Housing data set from Kaggle to predict the price of house. Create code to import the data set eredit.esv in R a, Generate the following graphics: 1. A histogram of the balance variable 2. A boxplot of the income variable 3. A barplot of the absolute frequencies of the student variable 6. Write a code to perform EDA on diamonds data set. 7. Write a code to load and Clean the sample dataset to perform pre-processing 4R

You might also like