0% found this document useful (0 votes)

22 views30 pages

Materi 4

The document outlines the use of R programming for data analysis, highlighting its advantages over Excel, such as handling large datasets and advanced visualization capabilities. It includes instructions on importing Excel data into R, manipulating DataFrames, handling missing values, and utilizing looping structures. Additionally, it provides examples of R functions and flowcharts for calculating student scores and averages.

Uploaded by

andrianhendrik11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views30 pages

Materi 4

Uploaded by

andrianhendrik11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Week #4 R Programming

External Data & Looping

Dr. Azhari, MT

Department of Computer Science & Electronics

Faculty of Mathematics and Natural Sciences
Universitas Gadjah Mada
Sample Data: Professional Salary Survey

Source: https://data.world/finance/data-professional-salary-survey
1 2019_Data_Professional_Salary_Survey_Responses.xlsx

Source: https://data.world/finance/data-professional-salary-survey
2 Data_Professional_Salary_Survey_Responses.xlsx

Source: https://data.world/finance/data-professional-salary-survey
R programming over Excel for Data Analysis
1 R can handle very large datasets

2 R can automate and calculate much faster

than Excel

3 R source code is reproducible

4 Community libraries worth of R source code

are available to all

5 R provides more complex and advanced data

visualization

6 R is free, Excel is not

Source: https://www.gapintelligence.com/blog/understanding-r-programming-over-excel-for-data-analysis/
R Package for Reading Excel Data

1 R packages are collections of functions and

data sets developed by the community. They
increase the power of R by improving existing
base R functionalities, or by adding new ones.
For example, if you are usually working with
data frames, probably you will have heard
about dplyr or data.table, two of the most
popular R packages.

2 A package is a suitable way to organize your

own work and, if you want to, share it with
others. Typically, a package will include code
(not only R code!), documentation for the
package and the functions inside, some tests
to check everything works as it should, and
data sets.
R Package for Reading Excel Data

3
Import Excel File into R 1. Click File
2. Click Import Dataset
3. Click From Excel
4. Browse/select your file
5. (result area which will display your excel file)
6. (the R instruction be gerenated by Rstudio)
7. Click Import
Import Excel File into R
Import Excel File into R
Replace Values in a DataFrame in R

 DataFrames are generic data objects of R

which are used to store the tabular data.
 Data frames are considered to be the
most popular data objects in R
programming because it is more
comfortable to analyze the data in the
tabular form.
 Data frames can also be taught as
mattresses where each column of a matrix
can be of the different data types.
 DataFrame are made up of three principal
components, the data, rows, and columns.

https://www.geeksforgeeks.org/dataframe-operations-in-r/
Replace Values in a DataFrame in R
1) Replace a value across the entire DataFrame:

df[df == "Old Value"] <- "New Value"

(2) Replace a value under a single DataFrame column:

df["Column Name"][df["Column Name"] == "Old Value"] <- "New Value"

https://datatofish.com/replace-values-dataframe-r/
R Missing Value | NA
In R the missing values are coded by the symbol NA . cat("\n\n")
1 ID <-c(1,2,3,4,5,6,7,8)
To identify missings in your dataset the function is
is.na() . When you import dataset from other Name <- c("John", "Tim", NA, "Stone", "Andini", "Rossi", NA, NA)
statistical applications the missing values might be Sex <- c("male", "male", "female", "male", "female", "male", "female", "male")
coded with a number, for example 99 Age <- c(52, 23, 20, 21, 23, NA, NA, 19)
Salary <- c(3520.2, NA, 2890.3, 3025.2, 3320.5, 2985.8, NA, 2020.8)
NA is a valid logical object. Where a component of x or dtFriend <- data.frame(ID, Name, Sex, Age, Salary)
2 y is NA, the result will be NA if the outcome is print(dtFriend)
ambiguous. In other words NA & TRUE evaluates to
NA, but NA & FALSE evaluates to FALSE.

3 Missing values are inevitable in data science, and

handling them is a constant issue. In the case of
Boolean logic, it can behave fairly differently
depending on the order of arguments and exactly how
it is set up, unlike a lot of other data types. Whether
this is useful or not depends on the scenario, but the
behavior is something to keep in mind.
Function Max() and Min() in R
vHighestValue <- max(x, na.rm = FALSE) ▪ x = vector or a data frame.
▪ na.rm = remove NA values, if it mentioned False it considers NA or if
vLowestValue <- min(x, na.rm = FALSE) it mentioned True it removes NA from the vector or a data frame.

cat("\n\n") cat("\n\n")
#creates a vector #creates a vector
midTestScore <-c(78.8, 65.0, 78.9, 84, 92.1, 73.2, 58.9, 87.6, 88.3) finalTestScore <-c(88.8, 77.0, NA, 86, 94.6, 72.2, NA, 80.3, 88.8)
print(midTestScore) print(finalTestScore)
#returns the max values & min value present in the vector #returns the max values & min value present in the vector
maxScore <- max(midTestScore) minScore <- min(finalTestScore)
minScore <- min(midTestScore) maxScore <- max(finalTestScore, na.rm = TRUE)
cat( cat(
"\nThe highest Score : ", maxScore, "\nThe lowest Score : ", minScore,
"\nThe lowest Score : ", minScore, "\nThe highest Score : ", maxScore,
"\n" "\n"
) )
Replace Values in a DataFrame in R

1) Replace a value across the entire DataFrame:

df[df == "Old Value"] <- "New Value"

(2) Replace a value under a single DataFrame column:

df["Column Name"][df["Column Name"] == "Old Value"] <- "New Value"

https://datatofish.com/replace-values-dataframe-r/
Change & Update Item value of Dataframe
cat("\n\n") 4 #update missing value, corert the item data
ID <-c(1,2,3,4,5,6,7,8) dtFriend$Name[dtFriend$ID == 3] <- "Andri"
1 Name <- c("John", "Tim", NA, "Stone", "Andini", "Rossi", NA, NA) dtFriend$Name[dtFriend$ID == 7] <- "Anna"
Sex <- c("male", "male", "female", "male", "female", "male", "female", "male") dtFriend$Salary[dtFriend$Name == "Tim"] <- 1818.5
Age <- c(52, 23, 20, 21, 23, NA, NA, 19) dtFriend$Age[dtFriend$Age == 52] <- 25
Salary <- c(3520.2, NA, 2890.3, 3025.2, 3320.5, 2985.8, NA, 2020.8) dtFriend$Age[dtFriend$ID == 6] <- 20
dtFriend <- data.frame(ID, Name, Sex, Age, Salary)
print(dtFriend) print(dtFriend)

t1CheckofNA <- is.na(dtFriend)

2 t2CountofNA <- sum(is.na(dtFriend))
t3CountAveofNA <- mean(is.na(dtFriend)) 5
t4MaxofSalary <- max(dtFriend$Salary) 7
t5MaxofSalary <- max(dtFriend$Salary, na.rm = TRUE)

cat("\nt1 Check missing values :", t1CheckofNA,

"\nt2 the number of missing value:", t2CountofNA,
3 "\nt3 the average of missing value:", t3CountAveofNA,
"\nt4 the maximum of salary with missing value:", t4MaxofSalary, 6
"\nt5 the maximum of salary with out missing value:", t5MaxofSalary,
"\n"
)
Function Max() and Min() in R
cat("\n\n")
#creates a character vector with some names
ComunityFriends <- c('John','Angelina','Smuts','Garena','Lucifer', 'Andini')
firstOrderedName <- min(ComunityFriends)
lastOrderedName <- max(ComunityFriends)
print(ComunityFriends)
cat(
"\nEarlist Ordered Name :", firstOrderedName,
"\nLastest Ordered Name :", lastOrderedName,
"\n"
)
R Looping Structure 1 for (elementdata in Listofdata) {
instruction 1
instruction 2
In R, for loops take an interator variable and assign it successive :
values from a sequence or vector. For loops are most commonly }
used for iterating over the elements of an object (list, vector, etc.)

2 while ( condition ) {
instruction 1
instruction 2
:
}

repeat {
3
statement/instruction 1
statement/instruction 2
:

if( condition ) {
break
}
}
Example: R Looping Structure
list_days <- c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat") qualities <- c('funny', 'cute', 'friendly')
list_months <- list ("Jan", "Feb", "Mar", "Apr", "May", "Jun", animals <- c('koala', 'cat', 'dog', 'panda')
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
list_years <- c(2021, 2022, 2023) for (x in qualities) {
cat("\n") for (y in animals) {
for (aDay in list_days) { print(paste(x, y))
print(aDay) }
} }
for (aMonth in list_months) {
cat(aMonth, " ")
if (aMonth == "Jun") cat("\n")
}
cat("\n\n")
for( year in list_years ) {
for (aMonth in list_months) {
cat(aMonth, " ", year, "\n")
}
}
statisticsClass <- data.frame(
Exam1 = c(97,91,33,86,40,48,27,53,58,31),
Exam2 = c(68,85,27,43,46,100,92,73,98, 91),
Quiz = c(75,90,88,81,78,87,90,69,NA,NA)
)
print(statisticsClass)
statisticsClass$avg <- (statisticsClass$Exam1 + statisticsClass$Exam2)/2
max.scores <- statisticsClass[statisticsClass$avg==max(statisticsClass$avg),]
print(max.scores)
R If-then-else Structure

# if
if (condition is true) {
do something
:
}

# if ... else
if (condition is true) {
do something
:
} else { # that is, if the condition is false,
do something different
:
}
R Looping: Examples 2
cat("\n\n")
ScoreAccountingClass <- c(82.5, 92.3, 83.0, 79.0, 88.2, 81.7, 76.6)
nStudentsA <- 0
nStudentsB <- 0
1 Counting the number of Students
for (studentScore in ScoreAccountingClass) {
print(studentScore)
ScoreAccountingClass <- c(82.5, 92.3, 83.0, 79.0, 88.2, 81.7, 76.6) if (studentScore>=80) {
nStudents <- 0 nStudentsA <- nStudentsA + 1
for (studentScore in ScoreAccountingClass) { } else {
nStudentsB <- nStudentsB + 1
print(studentScore) }
nStudents <- nStudents + 1 }
} cat("\nNumber of Students Grup A:", nStudentsA,
cat("\nNumber os Students :", nStudents, "\n") "\nNumber of Students Grup B:", nStudentsB,
"\nTotal Students (Group A + Group B) :", nStudentsA + nStudentsB,
"\n"
)

Counting the number of Students

who have score >=80, and
who have score < 80 (below 80)
Example Flow Chart
1 Flowchart to calculation total number of students

(for each studentscore of nStudents, totScores

Start dataSetScore nStudent <- 0 Stop
dataSetScore)

studentScore

nStudents <- nStudents + 1

Example Flow Chart
Flowchart to calculation total number of students who have Accounting score greater and
2 equal than 80 (score Accounting >= 80), and total number of students who have Accounting
score bellow from 80 (score Accounting < 80)

nStudentsA <- 0 (for each studentscore of nStudentsA,

Start dataSetScore Stop
nStudentsB <- 0 dataSetScore) nStudentsB

studentScore

studentScore >=80

nStudentsA <- nStudentsA + 1 nStudentsB <- nStudentsB + 1

R Looping: Examples
Count variable is added one by one in
the looping, and total variable is added
Initialisasi Count variable with zero, one by one of each student score
and variable total with zero

ScorepfAccountingClass <- c(82.5, 92.3, 83.0, 79.0, 88.2, 81.7, 76.6)

nStudents <- 0
3 Counting the number of Students
totScores <- 0
Calculation Total Score & Average Score ScorepfAccountingClass

for (studentScore in ScorepfAccountingClass) {

print(studentScore)
nStudents <- nStudents + 1
totScores <- totScores + studentScore
}
AverageScore <- (totScores/nStudents)

print("\nAccounting Mid Class Score")

cat("\n",ScorepfAccountingClass,
"\nNumber os Students: ", nStudents,
"\nTotal of Scores: ", totScores,
Calculate the average outsite for looping block
"\nAverage of Mid test: ", totScores
)
Example Flow Chart
3 Flowchart to Calculation Total Score, Number of students, average of score

nStudent <- 0 (for each studentscore of AverageScore <- nStudents, totScores Stop
Start dataSetScore
Total <- 0 dataSetScore) (totScores/nStudents)

studentScore

nStudents <- nStudents + 1

totScores <- totScores + studentScore
Flow Chart 3 Example Flowchart to Calculation Total Score,
Number of students, average of score

Simbol of Flowchart
Start dataSetScore
Stop Simbol start, stop

StudentName, Simbol input, ouput nStudents <- 0

StudentAddres totScores <- 0

Simbol procesing,
Total <- 25 * discount (for each studentScore of
Calcutating dataSetScore)

Simbol filtering,
City == “Bandung” ? Conditional, averageScore <-
Control studentScore (totScores/nStudents)

Simbol sub program,

Calculate
Corelation() sub processing nStudents <- nStudents + 1 nStudents, totScores,
totScores <- totScores + studentScore averageScore

Simbol flow (to next)

Loop Stop
Looping (for)
(for star, end)
Install package from local source
install.packages(path_to_source, repos = NULL, type="source")

Name <- c("John", "Tim", NA) install.packages("~/Downloads/dplyr-master.zip", repos=NULL, type="source")

Sex <- c("men", "men", "women")
Age <- c(45, 53, NA)
dt <- data.frame(Name, Sex, Age)
print(dt) install.packages("tidyr")
is.na(dt)
sum(is.na(dt))
mean(is.na(dt))
# Replace a value across the entire DataFrame:
df[df == "Old Value"] <- "New Value“
dt$Age[dt$Age == 99] <- NA
# Replace a value under a single DataFrame column:
df["Column Name"][df["Column Name"] == "Old Value"] <- "New Value"

https://datascienceplus.com/missing-values-in-
r/#:~:text=In%20R%20the%20missing%20values,function%20is%20is.na()%20.&text=When%20
you%20import%20dataset%20from,a%20number%2C%20for%20example%2099%20.
How to Analyze a Single Variable using Graphs in R? |
DataScience+ (datascienceplus.com)

There are 4 types of plots that we can use to observe a single

variable data:
· Histograms
· Index plots
· Time-series plots
· Pie Charts

# How to create Histogram in R

# by Michaelino Mervisiano
datavar <-rnorm(1000,2.5)
hist(datavar,main="Awesome Histogram",
col="Blue",prob=TRUE,
xlab="Random Numbers from a Normal Distribution with
Mean 2.5")

https://datascienceplus.com/how-to-analyses-a-single-
variable-using-graphs-in-r/

Ma 3
No ratings yet
Ma 3
32 pages
R Data Types and Input Methods
No ratings yet
R Data Types and Input Methods
29 pages
Statistics With R Unit 1: Divya Arun Kumar
No ratings yet
Statistics With R Unit 1: Divya Arun Kumar
65 pages
Exploratory Data Analysis and Visualization
No ratings yet
Exploratory Data Analysis and Visualization
10 pages
FE418 RLectureNotes1
No ratings yet
FE418 RLectureNotes1
15 pages
4 Overview of R Part 2
No ratings yet
4 Overview of R Part 2
63 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
R Programming
No ratings yet
R Programming
50 pages
STA1040 MidSem Exam
No ratings yet
STA1040 MidSem Exam
12 pages
Sta238 Wks - Week1+2
No ratings yet
Sta238 Wks - Week1+2
35 pages
R Programing
No ratings yet
R Programing
32 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
Unit 4
No ratings yet
Unit 4
27 pages
Week2 Slides
No ratings yet
Week2 Slides
76 pages
Unit 2 R
No ratings yet
Unit 2 R
16 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
R Assignment
No ratings yet
R Assignment
9 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
R Programming LAB
No ratings yet
R Programming LAB
32 pages
(R) Internal-2 Q & A
No ratings yet
(R) Internal-2 Q & A
65 pages
First Course On R
No ratings yet
First Course On R
26 pages
R WorkSamples
No ratings yet
R WorkSamples
44 pages
Certificate: Alard College of Business Studies
No ratings yet
Certificate: Alard College of Business Studies
55 pages
Practical Programs
No ratings yet
Practical Programs
29 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
5 pages
L3 Notes-1
No ratings yet
L3 Notes-1
8 pages
R File Code
No ratings yet
R File Code
16 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
R Programming for Data Analysis
No ratings yet
R Programming for Data Analysis
11 pages
Kids C ("Jack", "Jill") : 5.1 Creating Data Frames
No ratings yet
Kids C ("Jack", "Jill") : 5.1 Creating Data Frames
11 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
Unit1 R Notes
No ratings yet
Unit1 R Notes
16 pages
People Analytics With R Part 3
No ratings yet
People Analytics With R Part 3
11 pages
RStudio
No ratings yet
RStudio
31 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Practice 1
No ratings yet
Practice 1
4 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
R Lecture 2-1
No ratings yet
R Lecture 2-1
28 pages
R Programming Materials
No ratings yet
R Programming Materials
51 pages
Unit2 R
No ratings yet
Unit2 R
19 pages
A1rib T4
No ratings yet
A1rib T4
5 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
Introduction To R Installation: Data Types Value Examples
No ratings yet
Introduction To R Installation: Data Types Value Examples
9 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
RemoveWatermark pdf24 Merged+
No ratings yet
RemoveWatermark pdf24 Merged+
76 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
44 pages
Practical 1 - Data Frame Manipulation - 072502
No ratings yet
Practical 1 - Data Frame Manipulation - 072502
16 pages
Base R
No ratings yet
Base R
9 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
Unit 2
No ratings yet
Unit 2
17 pages
Unit2 R
No ratings yet
Unit2 R
19 pages
Week3 2020
No ratings yet
Week3 2020
20 pages
Lab Manual Record: St. Josephs PG College
No ratings yet
Lab Manual Record: St. Josephs PG College
14 pages
AI & Data Science Integration
No ratings yet
AI & Data Science Integration
2 pages
BERT - Assignment - Jupyter Notebook
0% (2)
BERT - Assignment - Jupyter Notebook
8 pages
Visi-Genie Arduino: Switching Banks: Document Date: Document Revision
No ratings yet
Visi-Genie Arduino: Switching Banks: Document Date: Document Revision
13 pages
Unit 5
No ratings yet
Unit 5
86 pages
SMA Lab (Class 19-20) 5
No ratings yet
SMA Lab (Class 19-20) 5
91 pages
Ajit Tiwari Laptop
No ratings yet
Ajit Tiwari Laptop
69 pages
Privacy and Security of Wireless Communication Networks
No ratings yet
Privacy and Security of Wireless Communication Networks
24 pages
Information Disclosure Vulnerability
No ratings yet
Information Disclosure Vulnerability
8 pages
CSE B.E. Curriculum Overview
No ratings yet
CSE B.E. Curriculum Overview
10 pages
WinRoute Pro 3.0 User Manual
No ratings yet
WinRoute Pro 3.0 User Manual
81 pages
Android Bug Report Analysis
No ratings yet
Android Bug Report Analysis
10 pages
User Guide: Storage Executive
No ratings yet
User Guide: Storage Executive
36 pages
Network Transmission Basics
No ratings yet
Network Transmission Basics
16 pages
Engineering Career Portfolio
No ratings yet
Engineering Career Portfolio
2 pages
ManageEngine OPM Enterprise Edition Prerequisites. FWA NCM NFA APM 030125
No ratings yet
ManageEngine OPM Enterprise Edition Prerequisites. FWA NCM NFA APM 030125
17 pages
DBMS Exp-04 Sem-Iii Mumbai University
No ratings yet
DBMS Exp-04 Sem-Iii Mumbai University
5 pages
SWOT Analysis of Microsoft's MSN
No ratings yet
SWOT Analysis of Microsoft's MSN
5 pages
C Programs Acs - Lab Record
No ratings yet
C Programs Acs - Lab Record
74 pages
UIUX Syllabus
No ratings yet
UIUX Syllabus
15 pages
Birla Institute of Technology, Patna Campus: Website: WWW - Patna.bitmesra - Ac.in
No ratings yet
Birla Institute of Technology, Patna Campus: Website: WWW - Patna.bitmesra - Ac.in
16 pages
Css q4 Week3
No ratings yet
Css q4 Week3
27 pages
Vurdering Fra Sakkyndig Utvalg
No ratings yet
Vurdering Fra Sakkyndig Utvalg
11 pages
Analysing Frequency Spectrum of Audio Tracks Using Foobar2000
No ratings yet
Analysing Frequency Spectrum of Audio Tracks Using Foobar2000
9 pages
TERRA FIRE I/O M 70 Module Specs
No ratings yet
TERRA FIRE I/O M 70 Module Specs
3 pages
Ideacentre 510A 15ARR Spec
No ratings yet
Ideacentre 510A 15ARR Spec
5 pages
Champion ONE SFP Transceivers Guide
No ratings yet
Champion ONE SFP Transceivers Guide
2 pages
Deep Learning in Cybersecurity Survey
No ratings yet
Deep Learning in Cybersecurity Survey
33 pages
Penggunaan Python Pada Aplikasi Termux
100% (3)
Penggunaan Python Pada Aplikasi Termux
6 pages
Program Correctness & Efficiency
No ratings yet
Program Correctness & Efficiency
45 pages
Calvert Curriculum Catalog - Fundamentals of Programming and SWD
No ratings yet
Calvert Curriculum Catalog - Fundamentals of Programming and SWD
5 pages

Materi 4

Uploaded by

Materi 4

Uploaded by

Week #4 R Programming

External Data & Looping

Department of Computer Science & Electronics

2 R can automate and calculate much faster

3 R source code is reproducible

4 Community libraries worth of R source code

5 R provides more complex and advanced data

6 R is free, Excel is not

1 R packages are collections of functions and

2 A package is a suitable way to organize your

 DataFrames are generic data objects of R

df[df == "Old Value"] <- "New Value"

df["Column Name"][df["Column Name"] == "Old Value"] <- "New Value"

3 Missing values are inevitable in data science, and

1) Replace a value across the entire DataFrame:

df[df == "Old Value"] <- "New Value"

df["Column Name"][df["Column Name"] == "Old Value"] <- "New Value"

t1CheckofNA <- is.na(dtFriend)

cat("\nt1 Check missing values :", t1CheckofNA,

Counting the number of Students

(for each studentscore of nStudents, totScores

nStudents <- nStudents + 1

nStudentsA <- 0 (for each studentscore of nStudentsA,

nStudentsA <- nStudentsA + 1 nStudentsB <- nStudentsB + 1

ScorepfAccountingClass <- c(82.5, 92.3, 83.0, 79.0, 88.2, 81.7, 76.6)

for (studentScore in ScorepfAccountingClass) {

print("\nAccounting Mid Class Score")

nStudents <- nStudents + 1

StudentName, Simbol input, ouput nStudents <- 0

Simbol sub program,

Simbol flow (to next)

Name <- c("John", "Tim", NA) install.packages("~/Downloads/dplyr-master.zip", repos=NULL, type="source")

There are 4 types of plots that we can use to observe a single

# How to create Histogram in R

You might also like