1.
Data Analytics with R: Take your first steps with R, data types, missing
values, basics of R syntax, The R workspace, Vectors, System- and user-
defined objects, Matrices, Lists, Functions, Statistics methodology, Factors
and Data frames, Basic Graphics.
2
• R is a powerful open-source programming language and software environment
designed for statistical computing, data analysis, and visualization. It is
widely used in research, academia, and industries like healthcare, finance,
marketing, and agriculture.
3
FEATURES OF R
• Statistical Analysis : R is designed specifically for performing statistical tests like regression, ANOVA, and
hypothesis testing.
• Data Visualization : R can create high-quality plots and graphs using libraries like ggplot2, lattice, and
plotly.
• Data Manipulation : Packages like dplyr, tidyr, and data.table allow easy and fast data cleaning and
transformation.
• Wide Package Ecosystem: With over 19,000 packages on CRAN, R offers tools for every kind of data
science, machine learning, or domain-specific work.
• Reproducible Reporting: R allows you to create documents that include code, analysis, and results in one
file—great for reports and research papers.
• Open Source & Free : R is completely free to use, making it accessible for anyone with an internet
connection.
• Cross-Platform Support: Works on Windows, macOS, and Linux, ensuring flexibility for different systems.
• Community Support: R has a large and active community for learning, troubleshooting, and improving
your skills.
4
R ENVIRONMENT
R is an integrated suite of software facilities for data manipulation, calculation and
graphical display.
It includes,
• An effective data handling and storage facility
• A suite of operators for calculations on arrays, in particular matrices
• A large, coherent, integrated collection of intermediate tools for data analysis
• Graphical facilities for data analysis and display either on-screen or on hardcopy
• A well-developed, simple and effective programming language which includes
conditionals, loops, user-defined recursive functions and input and output
facilities
5
DATA TYPES
The frequently used ones are,
• Vectors
• Lists
• Matrices
• Arrays
• Factors
• Data Frames
6
VARIABLE DECLARATION
• In R, variables are declared by assigning a value using the assignment
operator <- or =.
• Numeric x <- 3.14
• Integer x <- 5L
• Character name <- "Bob"
• Logical flag <- FALSE
• Vector nums <- c(1, 2, 3)
• List mylist <- list(1, "a", TRUE)
• Data Frame df <- data.frame(x=1:3, y=c("a", "b", "c"))
7
DATA TYPES
Vectors are one-dimensional arrays that hold elements of the same type (numeric,
character, or logical).
When you want to create vector with more than one element, you should
use c() function which means to combine the elements into a vector.
1. #integer
numbers <- c(1, 2, 3, 4, 5)
2. # Character vector
names <- c("Alice", "Bob", "Charlie")
3. # Logical vector
flags <- c(TRUE, FALSE, TRUE)
8
DATA TYPES
Accessing Elements:
numbers[1] # First element
names[2:3] # Second to third elements
Example:
Output:
9
VECTOR OPERATIONS
v <- c(10, 20, 30, 40)
1. v + 2 # Add 2 to each element
2. v * 3 # Multiply each element by 3
3. sum(v) # Sum of elements
4. mean(v) # Mean (average)
5. length(v) # Number of elements
6. v > 25 # Which elements are greater than 25
7. v[v > 25] # Filter elements > 25
10
DATA TYPES
LISTS
A list is an R-object which can contain many different types of elements inside it
like vectors, functions and even another list inside it.
Lists can hold elements of different types, including vectors, strings, numbers,
and even other lists.
my_list <- list(
id = 101,
name = "Akash",
scores = c(89, 95, 78),
passed = TRUE
)
11
LIST EXAMPLE
my_list <- list(Name="Alice", Age=25, Marks=c(80, 90, 85))
Accesing Elements
my_list$name # Using name
my_list[[2]] # Second item
my_list$Marks[1] # First score
my_list$Age <- 26 # Modify
my_list$Passed <- TRUE # Add new element
my_list$Passed <- NULL # Remove element
12
DATA TYPES
MATRICES
A matrix is a two-dimensional rectangular data set. It can be
created using a vector input to the matrix function.
Example:
Output:
13
MATRIX
# Create a matrix with numbers 1 to 9, 3 rows and 3
columns
mat <- matrix(1:9, nrow = 3, ncol = 3)
print(mat)
# Fill the matrix by row instead of column (default)
mat_by_row <- matrix(1:9, nrow = 3, byrow = TRUE)
print(mat_by_row)
14
MATRIX OPERATIONS
• rownames(mat) <- c("Row1", "Row2", "Row3")
• colnames(mat) <- c("Col1", "Col2", "Col3")
• print(mat)
• mat[1, 2] # Element at 1st row, 2nd column
• mat[ , 2] # Entire 2nd column
• mat[3, ] # Entire 3rd row
• mat2 <- matrix(2, nrow = 3, ncol = 3)
• mat + mat2 # Addition
• mat - mat2 # Subtraction
• mat * mat2 # Element-wise multiplication
• mat %*% mat2 # Matrix multiplication (dot product)
15
ARRAYS
ARRAYS
While matrices are confined to two dimensions, arrays can be of any number
of dimensions. The array function takes a dim attribute which creates the
required number of dimension. In the below example we create an array
with two elements which are 3x3 matrices each.
Output:
Example:
16
DATAFRAMES
Data frames are tabular data objects. Unlike a matrix in data frame each column can
contain different modes of data. The first column can be numeric while the second
column can be character and third column can be logical. It is a list of vectors of equal
length.
Data Frames are created using the data.frame() function.
Example:
Output:
17
DATA FRAME
# Create the sales data frame
sales_data <- data.frame(
Product = c("Laptop", "Tablet", "Smartphone", "Monitor", "Keyboard"),
Units_Sold = c(50, 70, 100, 40, 85),
Unit_Price = c(600, 300, 400, 150, 50)
)
# View the data
print(sales_data)
# Total Sales = Units_Sold * Unit_Price
sales_data$Total_Sales <- sales_data$Units_Sold * sales_data$Unit_Price
18
# Filter products with Total_Sales greater than 20,000
high_sales <- sales_data[sales_data$Total_Sales > 20000, ]
print(high_sales)
# Total revenue from all products
total_revenue <- sum(sales_data$Total_Sales)
# Average unit price
average_price <- mean(sales_data$Unit_Price)
print(paste("Total Revenue:", total_revenue))
print(paste("Average Price:", average_price))
19
FUNCTIONS IN R
• Functions are reusable blocks of code that perform
specific tasks.
my_function <- function(a, b) {
return(a + b)
}
my_function(5, 3)
20
FACTORS
• Factors are used to store categorical data like gender
or product types.
gender <- factor(c("male", "female", "female", "male"))
levels(gender)
21
FACTORS
• Factors are used to represent categorical variables (like gender, status,
product type). They store both the values and the levels.
fruits <- factor(c("apple", "orange", "banana", "apple", "banana"))
print(fruits)
levels(fruits) # Shows unique categories
summary(fruits)
22
FUNCTION
• A function is a block of code designed to perform a specific task. R supports user-
defined and built-in functions.
add_numbers <- function(a, b) {
result <- a + b
function_name <- function(arg1, arg2, ...) { return(result)
# code block }
return(result)
} add_numbers(5, 3) # Output: 8
23
BASIC GRAPH
x <- c(1, 2, 3, 4)
y <- c(2, 4, 6, 8)
plot(x, y, type = "p", main = "Scatter Plot")
plot(x, y, type = "l", col = "blue", main = "Line Graph")
slices <- c(10, 20, 30)labels <- c("Math", "Science", "Arts")pie(slices, labels = labels, main = "Pie
Chart")
24
BASIC GRAPH
values <- c(10, 20, 15)
names <- c("A", "B", "C")
barplot(values, names.arg = names, col = "green", main = "Bar Chart")
data <- c(2, 3, 3, 4, 5, 6, 6, 7)
hist(data, col = "purple", main = "Histogram")
scores <- c(65, 70, 80, 85, 90, 95)
boxplot(scores, main = "Boxplot", col = "orange")
25
BUILT-IN FUNCTIONS
mean() Calculate average
median() Middle value
sum() Total sum
sd() Standard deviation
length() Number of elements
min() Minimum value
max() Maximum value
sort() Sort values
round() Round numbers
sqrt() Square root
26
BUILT-IN FUNCTIONS - EXAMPLE
scores <- c(88, 92, 79, 85, 90, 95, 87, 78, 94, 89)
Scores: 88 92 79 85 90 95 87 78 94 89
# Apply built-in functions Total number of scores: 10
cat("Scores:", scores, "\n") Sum of scores: 877
cat("Total number of scores:", length(scores), "\n") Mean (average): 87.7
cat("Sum of scores:", sum(scores), "\n") Median: 88.5
cat("Mean (average):", mean(scores), "\n") Standard Deviation: 5.53
cat("Median:", median(scores), "\n") Minimum score: 78
cat("Standard Deviation:", sd(scores), "\n") Maximum score: 95
cat("Minimum score:", min(scores), "\n") Range of scores: 78 95
cat("Maximum score:", max(scores), "\n") Sorted scores: 78 79 85 87 88 89 90 92 94 95
cat("Range of scores:", range(scores), "\n") Summary of scores:
cat("Sorted scores:", sort(scores), "\n") Min. 1st Qu. Median Mean 3rd Qu. Max.
cat("Summary of scores:\n") 78.00 85.25 88.50 87.70 91.50 95.00
print(summary(scores))
27