Introduction to R
Analytics and R Workshop
R Module - 1
Copyright 2017 : Anish Roychowdhury Jacob Minz
Agenda
• What is R?
• Understanding the R Studio IDE
• Preliminary Data Assignment and Math Operators
• Vectors and Matrices
• Data frames and Lists
• Initialization concepts
• File I/O – Reading and writing CSV data from files
• Module 1 Quiz
Copyright 2017 : Anish Roychowdhury
What is R?
Not just another alphabet!
R is a programming language and software environment for statistical computing and graphics
supported by the R Foundation for Statistical Computing.
History
R is an implementation of the S programming language combined with lexical scoping semantics inspired
by Scheme.[11] S was created by John Chambers while at Bell Labs.
R was created by Ross Ihaka and Robert Gentleman[13] at the University of Auckland, New Zealand, and is
currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after
the first names of the first two R authors and partly as a play on the name of S.[14]
Copyright 2017 : Anish Roychowdhury
Understanding the R studio IDE Variable Information
Editor Window
Documentation results
Command Line
Copyright 2017 Anish Roychowdhury
Copyright 2017 : Anish Roychowdhury
Preliminary Data Assignment and Math Operators
Comment line marker
# multiply
# clear all data variables z = x*y
rm(list=ls()) # to the power
z = x^y
# modulo division remainder > z = x%%y
Assignment operators z = x%%y >z
# basic operations # integer divide [1] 3
x <- 11; y <- 4; z = x%/%y
# add > z = x%/%y
z=x+y >z
# subtract [1] 2
z = x-y
Copyright 2017 : Anish Roychowdhury
Preliminary Data Assignment and Math Operators contd.
# Log and exponentials
vec <- (1:10)
# square root
z = sqrt(4)
# Natural log # factorial
z = factorial(4)
z = log(vec)
# combinatorics ncr
# exponential n=5;r =3
y = exp(z) > num
num = choose(n,r)
[1] 10
# Base 10 log num2 = choose(n,n-r) > num2
z = log(vec, base = 10)
[1] 10
>z
[1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700 0.7781513 0.8450980 0.9030900 0.9542425 1.0000000
Copyright 2017 : Anish Roychowdhury
Preliminary Data Assignment and Math Operators contd.
#Rounding Numbers
x = 123.456
# normal rounding 2 decimal
places
z = round(x,digits = 2) > z [1]
123.46
# flooring > z [1]
z = floor(x)
123
# ceiling
z = ceiling(x) > z [1]
124
# truncating decimal part
z = trunc(x) > z [1]
123
Copyright 2017 : Anish Roychowdhury
Vectors
A vector is a sequence of data elements of the same basic type. Members in a vector are officially called
components
# Define a Vector as arbitrary numbers
My_First_Vector <- c(12,4,4,6,9,3)
Note: both are of
same length
# Generating a vector using sequence of numbers with increment
My_Second_Vector = seq(from = 2.5, to = 5.0, by = 0.5)
# linear operation on two vectors
My_Third_Vec = 10* My_First_Vector + 20*My_Second_Vector > My_Third_Vec
[1] 170 100 110 140 180 130
# combining two vectors
First_and_Second <- c(My_First_Vector, My_Second_Vector)
> First_and_Second
[1] 12.0 4.0 4.0 6.0 9.0 3.0 2.5 3.0 3.5 4.0 4.5 5.0
Copyright 2017 : Anish Roychowdhury
More on Vectors
# repeat a vector 3 times
vec3 <- c(0,0,7) > Rvec3
Rvec3 <-rep(vec3,times=3) [1] 0 0 7 0 0 7 0 0 7
# Generating a vector using 'n' numbers equally spaced
vec2 = seq(from = 2.5, to = 7, length.out = 10) > Vec2
[1] 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
# Repeat individual occurrences of a vector specified number of times
Rvec321 <- rep(c(1,2,3),times = c(3,2,1))
> Rvec321
[1] 1 1 1 2 2 3
# Repeat each occurrence in a vector 'n' times
Rvecn <- rep(c(1,2,3),each=3) > Rvecn
[1] 1 1 1 2 2 2 3 3 3
Copyright 2017 : Anish Roychowdhury
Logical Vectors Player_1 <- c(10,34,54,78,99)
Player_2 <- c(4,24,67,49,100)
# Find out How Player 1 performed vs Player 2
Player_1.success <- Player_1 > Player_2 > Player_1.success
[1] TRUE TRUE FALSE TRUE FALSE
# Which matches did Player 1 win? > Player_1_win
Player_1_win <- which(Player_1.success) [1] 1 2 4
# What did Player 1 score in the matches player 1 won ? > P1_win_scores
P1_win_scores <- Player_1[Player_1_win] [1] 10 34 78
# How many matches did Player 1 win ? > sum(Player_1.success)
sum(Player_1.success) [1] 3
# Did Player 1 win any match ? > any(Player_1.success)
any(Player_1.success) [1] TRUE
# Did Player 1 win all the matches ? > all(Player_1.success)
all(Player_1.success) [1] FALSE
Copyright 2017 Anish Roychowdhury
Strings
# Define a string
x <- "Hello World"
# Get its length > Lenx
lenx = length(x) [1] 1
# How many characters in x ? > ncharx
ncharx = nchar(x) [1] 11
# Define a vector of 2 strings
y <- c("Hello","World")
# get its length > leny
leny = length(y) [1] 2
Copyright 2017 Anish Roychowdhury
Naming strings
# Create a vector month.days month.days
month.days <- c(31,28,31,30,31,30,31,31,30,31,30,31) [1] 31 28 31 30 31 30 31 31 30 31 30 31
# Assign Month short names
mon.shortname <- c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
names(month.days) <- mon.shortname
# print name of the 5th month >names(month.days[5])
names(month.days[5]) [1] "May
# print month names having days = 31 names(month.days[month.days==31])
names(month.days[month.days==31]) [1] "Jan" "Mar" "May" "Jul" "Aug" "Oct" "Dec"
Copyright 2017 Anish Roychowdhury
Matrices
A matrix is a collection of data elements arranged in a two-dimensional rectangular layout.
The following is an example of a matrix with 2 rows and 3 columns.
1 2 3
𝐴= # Extract 2nd row 3rd column
4 5 6 Command continuation
> A23
A23 <- A[2, 3]
[1] 7
A = matrix(
+ c(2, 4, 3, 1, 5, 7), # the data elements
+ nrow=2, # number of rows # Extract 2nd row as a vector
+ ncol=3, # number of columns ARow2Vec <- A[2, ] # the 2nd row
+ byrow = TRUE) # fill matrix by rows > ARow2Vec
[1] 1 5 7
> A # print the matrix
[,1] [,2] [,3] # Extracting a sub matrix
[1,] 2 4 3 A2by2 <- A[1:2,1:2]
[2,] 1 5 7 > A2by2
[,1] [,2]
[1,] 2 4
[2,] 1 5
Copyright 2017 Anish Roychowdhury
Data Frames
A data frame is used for storing data tables. It is a list of vectors of equal length. For
example, the following variable df is a data frame containing three vectors n, s, b.
n <- c(2, 3, 5)
s <- c("aa", "bb", "cc")
b <- c(TRUE, FALSE, TRUE)
df <- data.frame(n, s, b) # df is a data frame
How the data frame would look – Each vector becomes a column in the data frame
n s b df n s b
2 aa TRUE 1 2 aa TRUE
3 bb FALSE 2 3 bb FALSE
5 cc TRUE 3 5 cc TRUE
Copyright 2017 Anish Roychowdhury
Data Frames contd.
Viewing the first 6 rows of a built in data frame “mtcars”
# extract a particular element with row and col names
> mtcars["Mazda RX4", "cyl"]
mtcars["Mazda RX4", "cyl"]
[1] 6
# Get number of Rows information > nrow(mtcars)
nrow(mtcars) [1] 32
# Get number of Columns information > ncol(mtcars)
ncol(mtcars) [1] 11
Copyright 2017 Anish Roychowdhury
Lists
A list is a generic vector containing other objects. In the example shown,
the following variable x is a list containing copies of three vectors n, s, b,
and a numeric value 3
> n = c(2, 3, 5)
> s = c("aa", "bb", "cc", "dd", "ee")
> b = c(TRUE, FALSE, TRUE, FALSE, FALSE)
> x = list(n, s, b, 3) # x contains copies of n, s, b
How the List looks
[[1]] [1] 2 3 5
[[2]] [1] "aa" "bb" "cc" "dd" "ee"
[[3]] [1] TRUE FALSE TRUE FALSE FALSE
[[4]] [1] 3
Copyright 2017 Anish Roychowdhury
Lists contd. Complete List
Extracting a sub list from the a given list [[1]] [1] 2 3 5
[[2]] [1] "aa" "bb" "cc" "dd" "ee"
[[3]] [1] TRUE FALSE TRUE FALSE FALSE
child_list <- x[c(2, 4)] [[4]] [1] 3
[[1]] [1] "aa" "bb" "cc" "dd" "ee"
[[2]] [1] 3
Slicing the list to extract a member
Second_Elem_Slice <- x[2]
[[1]] [1] "aa" "bb" "cc" "dd" "ee"
Directly referencing a member of the list
Sec_Member <- x[[2]]
[1] "aa" "bb" "cc" "dd" "ee"
Directly referencing an item of a member of a list
Sec_Mem_First_Item <- x[[2]][1]
[1] "aa"
Copyright 2017 Anish Roychowdhury
Initialization concepts
Assigning value to a variable
Var1 <- 5
Initialize a numeric vector of length 10
Vec_Size_10 <- vector(mode="numeric", length=10)
Initialize the vector with 5 repeats of '10' and then 5 repeats of '20'
Vec_Size_10 <- rep(c(10,20),each=5)
Create an empty dataframe
df_3col_5row <- as.data.frame(matrix(ncol=3, nrow=5))
# Initialize the first column to 1,s the 2nd col to 2's and the 3rd col to 3's
for (i in 1:5){
df_3col_5row[i,] <- c(1,2,3)
} Copyright 2017 Anish Roychowdhury
List Initialization concepts
Create List column names
mylist.names <- c("COL_1", "COL_2", "COL_3")
Create empty list
mylist <- vector("list", length(mylist.names))
Initialize list with 3 Vectors of different length
mylist <- list(a=1, b=1:2, c=1:3)
Copyright 2017 Anish Roychowdhury
Module 1 - Quiz
All Elements of Data frames are vectors TRUE
All Elements of Lists must have the same length FALSE
Data frames are the most flexible structure in R FALSE
Copyright 2017 Anish Roychowdhury
Thank You
Copyright 2017 Anish Roychowdhury