0% found this document useful (0 votes)
22 views30 pages

Materi 4

The document outlines the use of R programming for data analysis, highlighting its advantages over Excel, such as handling large datasets and advanced visualization capabilities. It includes instructions on importing Excel data into R, manipulating DataFrames, handling missing values, and utilizing looping structures. Additionally, it provides examples of R functions and flowcharts for calculating student scores and averages.

Uploaded by

andrianhendrik11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views30 pages

Materi 4

The document outlines the use of R programming for data analysis, highlighting its advantages over Excel, such as handling large datasets and advanced visualization capabilities. It includes instructions on importing Excel data into R, manipulating DataFrames, handling missing values, and utilizing looping structures. Additionally, it provides examples of R functions and flowcharts for calculating student scores and averages.

Uploaded by

andrianhendrik11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Week #4 R Programming

External Data & Looping

Dr. Azhari, MT

Department of Computer Science & Electronics


Faculty of Mathematics and Natural Sciences
Universitas Gadjah Mada
Sample Data: Professional Salary Survey

Source: https://data.world/finance/data-professional-salary-survey
1 2019_Data_Professional_Salary_Survey_Responses.xlsx

Source: https://data.world/finance/data-professional-salary-survey
2 Data_Professional_Salary_Survey_Responses.xlsx

Source: https://data.world/finance/data-professional-salary-survey
R programming over Excel for Data Analysis
1 R can handle very large datasets

2 R can automate and calculate much faster


than Excel

3 R source code is reproducible

4 Community libraries worth of R source code


are available to all

5 R provides more complex and advanced data


visualization

6 R is free, Excel is not

Source: https://www.gapintelligence.com/blog/understanding-r-programming-over-excel-for-data-analysis/
R Package for Reading Excel Data

1 R packages are collections of functions and


data sets developed by the community. They
increase the power of R by improving existing
base R functionalities, or by adding new ones.
For example, if you are usually working with
data frames, probably you will have heard
about dplyr or data.table, two of the most
popular R packages.

2 A package is a suitable way to organize your


own work and, if you want to, share it with
others. Typically, a package will include code
(not only R code!), documentation for the
package and the functions inside, some tests
to check everything works as it should, and
data sets.
R Package for Reading Excel Data

3
Import Excel File into R 1. Click File
2. Click Import Dataset
3. Click From Excel
4. Browse/select your file
5. (result area which will display your excel file)
6. (the R instruction be gerenated by Rstudio)
7. Click Import
Import Excel File into R
Import Excel File into R
Replace Values in a DataFrame in R

 DataFrames are generic data objects of R


which are used to store the tabular data.
 Data frames are considered to be the
most popular data objects in R
programming because it is more
comfortable to analyze the data in the
tabular form.
 Data frames can also be taught as
mattresses where each column of a matrix
can be of the different data types.
 DataFrame are made up of three principal
components, the data, rows, and columns.

https://www.geeksforgeeks.org/dataframe-operations-in-r/
Replace Values in a DataFrame in R
1) Replace a value across the entire DataFrame:

df[df == "Old Value"] <- "New Value"


(2) Replace a value under a single DataFrame column:

df["Column Name"][df["Column Name"] == "Old Value"] <- "New Value"

https://datatofish.com/replace-values-dataframe-r/
R Missing Value | NA
In R the missing values are coded by the symbol NA . cat("\n\n")
1 ID <-c(1,2,3,4,5,6,7,8)
To identify missings in your dataset the function is
is.na() . When you import dataset from other Name <- c("John", "Tim", NA, "Stone", "Andini", "Rossi", NA, NA)
statistical applications the missing values might be Sex <- c("male", "male", "female", "male", "female", "male", "female", "male")
coded with a number, for example 99 Age <- c(52, 23, 20, 21, 23, NA, NA, 19)
Salary <- c(3520.2, NA, 2890.3, 3025.2, 3320.5, 2985.8, NA, 2020.8)
NA is a valid logical object. Where a component of x or dtFriend <- data.frame(ID, Name, Sex, Age, Salary)
2 y is NA, the result will be NA if the outcome is print(dtFriend)
ambiguous. In other words NA & TRUE evaluates to
NA, but NA & FALSE evaluates to FALSE.

3 Missing values are inevitable in data science, and


handling them is a constant issue. In the case of
Boolean logic, it can behave fairly differently
depending on the order of arguments and exactly how
it is set up, unlike a lot of other data types. Whether
this is useful or not depends on the scenario, but the
behavior is something to keep in mind.
Function Max() and Min() in R
vHighestValue <- max(x, na.rm = FALSE) ▪ x = vector or a data frame.
▪ na.rm = remove NA values, if it mentioned False it considers NA or if
vLowestValue <- min(x, na.rm = FALSE) it mentioned True it removes NA from the vector or a data frame.

cat("\n\n") cat("\n\n")
#creates a vector #creates a vector
midTestScore <-c(78.8, 65.0, 78.9, 84, 92.1, 73.2, 58.9, 87.6, 88.3) finalTestScore <-c(88.8, 77.0, NA, 86, 94.6, 72.2, NA, 80.3, 88.8)
print(midTestScore) print(finalTestScore)
#returns the max values & min value present in the vector #returns the max values & min value present in the vector
maxScore <- max(midTestScore) minScore <- min(finalTestScore)
minScore <- min(midTestScore) maxScore <- max(finalTestScore, na.rm = TRUE)
cat( cat(
"\nThe highest Score : ", maxScore, "\nThe lowest Score : ", minScore,
"\nThe lowest Score : ", minScore, "\nThe highest Score : ", maxScore,
"\n" "\n"
) )
Replace Values in a DataFrame in R

1) Replace a value across the entire DataFrame:

df[df == "Old Value"] <- "New Value"


(2) Replace a value under a single DataFrame column:

df["Column Name"][df["Column Name"] == "Old Value"] <- "New Value"

https://datatofish.com/replace-values-dataframe-r/
Change & Update Item value of Dataframe
cat("\n\n") 4 #update missing value, corert the item data
ID <-c(1,2,3,4,5,6,7,8) dtFriend$Name[dtFriend$ID == 3] <- "Andri"
1 Name <- c("John", "Tim", NA, "Stone", "Andini", "Rossi", NA, NA) dtFriend$Name[dtFriend$ID == 7] <- "Anna"
Sex <- c("male", "male", "female", "male", "female", "male", "female", "male") dtFriend$Salary[dtFriend$Name == "Tim"] <- 1818.5
Age <- c(52, 23, 20, 21, 23, NA, NA, 19) dtFriend$Age[dtFriend$Age == 52] <- 25
Salary <- c(3520.2, NA, 2890.3, 3025.2, 3320.5, 2985.8, NA, 2020.8) dtFriend$Age[dtFriend$ID == 6] <- 20
dtFriend <- data.frame(ID, Name, Sex, Age, Salary)
print(dtFriend) print(dtFriend)

t1CheckofNA <- is.na(dtFriend)


2 t2CountofNA <- sum(is.na(dtFriend))
t3CountAveofNA <- mean(is.na(dtFriend)) 5
t4MaxofSalary <- max(dtFriend$Salary) 7
t5MaxofSalary <- max(dtFriend$Salary, na.rm = TRUE)

cat("\nt1 Check missing values :", t1CheckofNA,


"\nt2 the number of missing value:", t2CountofNA,
3 "\nt3 the average of missing value:", t3CountAveofNA,
"\nt4 the maximum of salary with missing value:", t4MaxofSalary, 6
"\nt5 the maximum of salary with out missing value:", t5MaxofSalary,
"\n"
)
Function Max() and Min() in R
cat("\n\n")
#creates a character vector with some names
ComunityFriends <- c('John','Angelina','Smuts','Garena','Lucifer', 'Andini')
firstOrderedName <- min(ComunityFriends)
lastOrderedName <- max(ComunityFriends)
print(ComunityFriends)
cat(
"\nEarlist Ordered Name :", firstOrderedName,
"\nLastest Ordered Name :", lastOrderedName,
"\n"
)
R Looping Structure 1 for (elementdata in Listofdata) {
instruction 1
instruction 2
In R, for loops take an interator variable and assign it successive :
values from a sequence or vector. For loops are most commonly }
used for iterating over the elements of an object (list, vector, etc.)

2 while ( condition ) {
instruction 1
instruction 2
:
}

repeat {
3
statement/instruction 1
statement/instruction 2
:

if( condition ) {
break
}
}
Example: R Looping Structure
list_days <- c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat") qualities <- c('funny', 'cute', 'friendly')
list_months <- list ("Jan", "Feb", "Mar", "Apr", "May", "Jun", animals <- c('koala', 'cat', 'dog', 'panda')
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
list_years <- c(2021, 2022, 2023) for (x in qualities) {
cat("\n") for (y in animals) {
for (aDay in list_days) { print(paste(x, y))
print(aDay) }
} }
for (aMonth in list_months) {
cat(aMonth, " ")
if (aMonth == "Jun") cat("\n")
}
cat("\n\n")
for( year in list_years ) {
for (aMonth in list_months) {
cat(aMonth, " ", year, "\n")
}
}
statisticsClass <- data.frame(
Exam1 = c(97,91,33,86,40,48,27,53,58,31),
Exam2 = c(68,85,27,43,46,100,92,73,98, 91),
Quiz = c(75,90,88,81,78,87,90,69,NA,NA)
)
print(statisticsClass)
statisticsClass$avg <- (statisticsClass$Exam1 + statisticsClass$Exam2)/2
max.scores <- statisticsClass[statisticsClass$avg==max(statisticsClass$avg),]
print(max.scores)
R If-then-else Structure

# if
if (condition is true) {
do something
:
}

# if ... else
if (condition is true) {
do something
:
} else { # that is, if the condition is false,
do something different
:
}
R Looping: Examples 2
cat("\n\n")
ScoreAccountingClass <- c(82.5, 92.3, 83.0, 79.0, 88.2, 81.7, 76.6)
nStudentsA <- 0
nStudentsB <- 0
1 Counting the number of Students
for (studentScore in ScoreAccountingClass) {
print(studentScore)
ScoreAccountingClass <- c(82.5, 92.3, 83.0, 79.0, 88.2, 81.7, 76.6) if (studentScore>=80) {
nStudents <- 0 nStudentsA <- nStudentsA + 1
for (studentScore in ScoreAccountingClass) { } else {
nStudentsB <- nStudentsB + 1
print(studentScore) }
nStudents <- nStudents + 1 }
} cat("\nNumber of Students Grup A:", nStudentsA,
cat("\nNumber os Students :", nStudents, "\n") "\nNumber of Students Grup B:", nStudentsB,
"\nTotal Students (Group A + Group B) :", nStudentsA + nStudentsB,
"\n"
)

Counting the number of Students


who have score >=80, and
who have score < 80 (below 80)
Example Flow Chart
1 Flowchart to calculation total number of students

(for each studentscore of nStudents, totScores


Start dataSetScore nStudent <- 0 Stop
dataSetScore)

studentScore

nStudents <- nStudents + 1


Example Flow Chart
Flowchart to calculation total number of students who have Accounting score greater and
2 equal than 80 (score Accounting >= 80), and total number of students who have Accounting
score bellow from 80 (score Accounting < 80)

nStudentsA <- 0 (for each studentscore of nStudentsA,


Start dataSetScore Stop
nStudentsB <- 0 dataSetScore) nStudentsB

studentScore

studentScore >=80

nStudentsA <- nStudentsA + 1 nStudentsB <- nStudentsB + 1


R Looping: Examples
Count variable is added one by one in
the looping, and total variable is added
Initialisasi Count variable with zero, one by one of each student score
and variable total with zero

ScorepfAccountingClass <- c(82.5, 92.3, 83.0, 79.0, 88.2, 81.7, 76.6)


nStudents <- 0
3 Counting the number of Students
totScores <- 0
Calculation Total Score & Average Score ScorepfAccountingClass

for (studentScore in ScorepfAccountingClass) {


print(studentScore)
nStudents <- nStudents + 1
totScores <- totScores + studentScore
}
AverageScore <- (totScores/nStudents)

print("\nAccounting Mid Class Score")


cat("\n",ScorepfAccountingClass,
"\nNumber os Students: ", nStudents,
"\nTotal of Scores: ", totScores,
Calculate the average outsite for looping block
"\nAverage of Mid test: ", totScores
)
Example Flow Chart
3 Flowchart to Calculation Total Score, Number of students, average of score

nStudent <- 0 (for each studentscore of AverageScore <- nStudents, totScores Stop
Start dataSetScore
Total <- 0 dataSetScore) (totScores/nStudents)

studentScore

nStudents <- nStudents + 1


totScores <- totScores + studentScore
Flow Chart 3 Example Flowchart to Calculation Total Score,
Number of students, average of score

Simbol of Flowchart
Start dataSetScore
Stop Simbol start, stop

StudentName, Simbol input, ouput nStudents <- 0


StudentAddres totScores <- 0

Simbol procesing,
Total <- 25 * discount (for each studentScore of
Calcutating dataSetScore)

Simbol filtering,
City == “Bandung” ? Conditional, averageScore <-
Control studentScore (totScores/nStudents)

Simbol sub program,


Calculate
Corelation() sub processing nStudents <- nStudents + 1 nStudents, totScores,
totScores <- totScores + studentScore averageScore

Simbol flow (to next)

Loop Stop
Looping (for)
(for star, end)
Install package from local source
install.packages(path_to_source, repos = NULL, type="source")

Name <- c("John", "Tim", NA) install.packages("~/Downloads/dplyr-master.zip", repos=NULL, type="source")


Sex <- c("men", "men", "women")
Age <- c(45, 53, NA)
dt <- data.frame(Name, Sex, Age)
print(dt) install.packages("tidyr")
is.na(dt)
sum(is.na(dt))
mean(is.na(dt))
# Replace a value across the entire DataFrame:
df[df == "Old Value"] <- "New Value“
dt$Age[dt$Age == 99] <- NA
# Replace a value under a single DataFrame column:
df["Column Name"][df["Column Name"] == "Old Value"] <- "New Value"

https://datascienceplus.com/missing-values-in-
r/#:~:text=In%20R%20the%20missing%20values,function%20is%20is.na()%20.&text=When%20
you%20import%20dataset%20from,a%20number%2C%20for%20example%2099%20.
How to Analyze a Single Variable using Graphs in R? |
DataScience+ (datascienceplus.com)

There are 4 types of plots that we can use to observe a single


variable data:
· Histograms
· Index plots
· Time-series plots
· Pie Charts

# How to create Histogram in R


# by Michaelino Mervisiano
datavar <-rnorm(1000,2.5)
hist(datavar,main="Awesome Histogram",
col="Blue",prob=TRUE,
xlab="Random Numbers from a Normal Distribution with
Mean 2.5")

https://datascienceplus.com/how-to-analyses-a-single-
variable-using-graphs-in-r/

You might also like