Introduction To R
Introduction To R
2019
PENDAHULUAN
                     1. Pengenalan R
R adalah bahasa dan lingkungan program untuk komputasi
statistik dan grafik.
R merupakan GNU project yang mirip dengan bahasa dan
lingkungan program S yang dikembangkan di Bell Laboratories
(dulu bagian dari AT&T, kemudian menjadi Lucent Technologies
dan merger dengan Alcatel) oleh John Chambers dan kolega.
Menurut John Chambers, bahasa tersebut dapat diekspresikan
menjadi “to turn ideas into software, quickly and faithfully”.
                          Link berguna
Karena R merupakan bahasa open source dan gratis, maka banyak
yang forum, bahan, materi, dan contoh yang dibagikan online secara
gratis. Berikut merupakan beberapa tempat yang bisa digunakan
untuk mencari materi analisis R yang menarik:
  1.github.com
  2.RStudio online learning
  3.Harvard online resource
  4.stackoverflow.com
  5.swirl
  6.R for data science
Prepare yourself!
                                                    Materi Modul
Modul ini berisi mengenai:
Pengenalan R dan RStudio
     –   Elemen-elemen RStudio
     –   Objek dasar pada R
     –   Perintah dasar pada R
R Programming
     –   Operasi matematika dasar
     –   Operasi matriks
     –   Fungsi iterasi
     –   Membuat fungsi di dalam R
Data Cleaning
     –   Import data
     –   Melihat data secara sekilas
     –   Mengetahui jenis objek
     –   Memberi nama variabel
     –   Mengetahui apakah ada data yang hilang atau tidak lengkap
     –   Deskripsi dan visualisasi data
Persiapan Data
     –   Subset
     –   Membuat variabel baru
     –   Transformasi bentuk data
MATERI
                  Pengenalan R dan RStudio
R merupakan bahasa pemrograman sedangkan RStudio merupakan
Integrated Development Environment (IDE) untuk bahasa R. Ketika membuka
RStudio maka ada beberapa bagian penting:
   – Console Pada bagian ini berfungsi untuk menjalankan command atau function
     per baris, atau ketika command sudah selesai ditulis dengan menekan enter.
   – Output Sedangkan pada jendela output akan berisi hasil gambar grafis jika ada
     hasil yang berbentuk grafis dari command
   – Environment Pada bagian ini dapat dilihat objek apa saja yang ada di dalam
     proses pengerjaan data
   – Script Berbeda dengan console yang langsung dieksekusi per baris, script
     memperbolehkan kita untuk membuat serangkaian perintah dan dieksekusi
     semua atau sebagian jika dibutuhkan.
          RStudio environment
script Workspace
                                Tab section
console
                    Objek-Objek Dasar R
Sama dengan bahasa pemrograman yang lain, tipe data (class)
pada R terdiri dari:
  – numeric (bilangan riil, contoh: 2.3567)
  – integer (bilangan bulat, contoh: 4)
  – complex (bilangan imajiner, contoh: 3i)
  – logical (true/false)
  – character
                 Objek-Objek Dasar R
R memiliki basis data struktur yang dapat dikategorikan
berdasarkan dimensi dan apakah data tersebut isinya homogen
atau heterogen. Sehingga bentuk struktur data tersebut dapat
dirangkum menjadi:
Dimension   Homogeneous      Heterogeneous
1d          Atomic vectors   List
2d          Matrix           Data frame
nd          Array
                                Data type
• Atomic data type:
                                               Same Class   Different   Dimension
   –   numeric (real number)                     Object       Class
   –   integer                                               Object
                                      Vector       
   –   complex
                                       List                    
   –   logical (true / false)         Matrix                              
   –   character                       Data
                                                                          
                                      frame
                  Dealing with Vector(1)
• Construction:
  > x <- c(1, 5, 4, 9, 0)
  > typeof(x)
  [1] "double"
  > length(x)
  [1] 5
  x <- c(1, 5.4, TRUE, "hello")
  >x
  [1] "1" "5.4" "TRUE" "hello"
  > typeof(x)
  [1] "character"
                                  Dealing with Vector(2)
• add row or column using rbind() and cbind()   •   Dimension of matrix can be modified as well,
> cbind(x, c(1, 2, 3)) # add column                 using the dim() function.
[,1] [,2] [,3] [,4]                             >x
[1,] 0 0 7 1
[2,] 0 10 8 2
                                                [,1] [,2] [,3]
[3,] 0 6 9 3                                    [1,] 1 3 5
> rbind(x,c(1,2,3)) # add row                   [2,] 2 4 6
[,1] [,2] [,3]
[1,] 0 0 7                                      > dim(x) <- c(3,2); x # change to 3X2 matrix
[2,] 0 10 8                                     [,1] [,2]
[3,] 0 6 9
[4,] 1 2 3
                                                [1,] 1 4
> x <- x[1:2,]; x # remove last row             [2,] 2 5
[,1] [,2] [,3]                                  [3,] 3 6
[1,] 0 0 7
[2,] 0 10 8                                     > dim(x) <- c(1,6); x # change to 1X6 matrix
                                                [,1] [,2] [,3] [,4] [,5] [,6]
                                                [1,] 1 2 3 4 5 6
                                Dealing with List(1)
                                          •    Structure
• Construction:
                                                > str(x)
x <- list("a" = 2.5, "b" = TR             List of 3
UE, "c" = 1:3)                            $ a: num 2.5
>x                                        $ b: logi TRUE
$a                                        $ c: int [1:3] 1 2 3
                                          Without tag:
[1] 2.5                                   > x <- list(2.5,TRUE,1:3)
                                          >x
$b                                        [[1]]
[1] TRUE                                  [1] 2.5
                                          [[2]]
                                          [1] TRUE
$c                                        [[3]]
[1] 1 2 3                                 [1] 1 2 3
                                    Dealing with List(2)
• How to access components of a list?
x=list("name"="John","age"=19,"speak"=c("English   > x[-2]     # using negative integer to exclude
","French"))                                       second component
>x                                                 $name
$name                                              [1] "John"
[1] "John"                                         $speaks
                                                   [1] "English" "French"
$age
[1] 19                                             > x[c(T,F,F)] # index using logical vector
                                                   $name
$speak                                             [1] "John"
[1] "English" "French“                             > x[c("age","speaks")] # index using
                                                   character vector
> x[c(1:2)] # index using integer vector           $age
$name                                              [1] 19
[1] "John"                                         $speaks
$age                                               [1] "English" "French"
[1] 19
                          Dealing with List(3)
2 Dora
                                1 1 20 John
                                2 2 15 Dora
                                3         1 16 Paul
> x$Name
                                Adding col
                                > cbind(x,State=c("NY","FL"))
                                SN Age Name State
> x[[3]]
[1] "John" "Dora"
                 Dealing with Data Frame(3)
• Deleting Component
> x$State <- NULL
>x
SN Age Name
1 1 20 John
2 2 15 Dora
> x <- x[-1,]
>x
SN Age Name
2 2 15 Dora
R PROGRAMMING
                 Arithmetic Operation in R
                                   > x <- 5
Operator   Description             > y <- 16
                                   > x+y
+          Addition
                                   [1] 21
–          Subtraction             > x-y
*          Multiplication          [1] -11
                                   > x*y
/          Division                [1] 80
^          Exponent                > y/x
%%         Modulus                 [1] 3.2
           (Remainder from         > y%/%x
           division)               [1] 3
                                   > y%%x
%/%        Division Integer        [1] 1
                                   > y^x
                                   [1] 1048576
                      R Relational Operators
Operator   Description            > x <- 5
                                  > y <- 16
<          Less than              > x<y
                                  [1] TRUE
>          Greater than
                                  > x>y
           Less than or equal     [1] FALSE
<=                                > x<=5
           to
                                  [1] TRUE
           Greater than or        > y>=20
>=
           equal to               [1] FALSE
                                  > y == 16
==         Equal to
                                  [1] TRUE
!=         Not equal to           > x != 5
                                  [1] FALSE
                               Operation on Vectors
> x <- c(2,8,3)
> y <- c(6,4,1)                               • The above mentioned operators
> x+y                                           work on vectors.
[1] 8 12 4
> x>y
[1] FALSE TRUE TRUE
                                              • When there is a mismatch in
                                                length (number of elements) of
> x <- c(2,1,8,3)                               operand vectors, the elements in
> y <- c(9,4)                                   shorter one is recycled in a cyclic
> x+y # Element of y is recycled to 9,4,9,4     manner to match the length of the
[1] 11 5 17 7                                   longer one.
> x-1 # Scalar 1 is recycled to 1,1,1,1
[1] 1 0 7 2
> x+c(1,2,3)
[1] 3 3 11 4
                          R Logical Operators
Operator   Description
                                  Operators & and | perform element-wise operation producing
!          Logical NOT            result having length of the longer operand.
           Element-wise
&                                 But && and || examines only the first element of the operands
           logical AND
                                  resulting into a single length logical vector.
&&         Logical AND
                                  Zero is considered FALSE and non-zero numbers are taken as
           Element-wise           TRUE
|
           logical OR
                                > x <- c(TRUE,FALSE,0,6)
||         Logical OR           > y <- c(FALSE,TRUE,FALSE,TRUE)
                                > !x
                                [1] FALSE TRUE TRUE FALSE
                                > x&y
                                [1] FALSE FALSE FALSE TRUE
                                > x&&y
                                [1] FALSE
                                > x|y
                                [1] TRUE TRUE FALSE TRUE
                                > x||y
                                [1] TRUE
                     R Assignment Operators
       c*A
       ##      [,1] [,2]
       ## [1,]    6    3
       ## [2,]    9    6
       ## [3,]   -6    6
                        Matrix Operation(2)
Addition And Subtraction
        B <- matrix(c(1,4,-2,1,2,1),nrow = 3, ncol = 2)
        B
        ##      [,1] [,2]
        ## [1,]    1    1
        ## [2,]    4    2
        ## [3,]   -2    1
        A + B
        ##      [,1] [,2]
        ## [1,]    3    2
        ## [2,]    7    4
        ## [3,]   -4    3
        A - B
        ##      [,1] [,2]
        ## [1,]    1    0
        ## [2,]   -1    0
        ## [3,]    0    1
                     Matrix Operation(3)
Matrix Multiplication
       D <- matrix(c(2,-2,1,2,3,1),2,3)
       D
       ##      [,1] [,2] [,3]
       ## [1,]    2    1    3
       ## [2,]   -2    2    1
       D %*% A
       ##      [,1] [,2]
       ## [1,]    1   10
       ## [2,]    0    4
       A %*% D
       ##      [,1] [,2] [,3]
       ## [1,]    2    4    7
       ## [2,]    2    7   11
       ## [3,]   -8    2   -4
                     Matrix Operation(4)
Transpose
       t(A)
       ##      [,1] [,2] [,3]
       ## [1,]    2    3   -2
       ## [2,]    1    2    2
Diagonal Matrix
       S <- matrix(c(2,3,-2,1,2,2,4,2,3),ncol = 3, nrow = 3)
       S
       ##      [,1] [,2] [,3]
       ## [1,]    2    1    4
       ## [2,]    3    2    2
       ## [3,]   -2    2    3
       diag(S)
       ## [1] 2 2 3
                                 Matrix Operation(5)
Identity Matrix
            I <- diag(c(1, 1, 1))
            I
            ##      [,1] [,2] [,3]
            ## [1,]    1    0    0
            ## [2,]    0    1    0
            ## [3,]    0    0    1
Invers Matrix
            A <- matrix(c(4,4,-2,2,6,2,2,8,4),3,3)
            A
            ##      [,1] [,2] [,3]
            ## [1,]    4    2    2
            ## [2,]    4    6    8
            ## [3,]   -2    2    4
            solve(A)
            ##      [,1] [,2] [,3]
            ## [1,] 1.0 -0.5 0.5
            ## [2,] -4.0 2.5 -3.0
            ## [3,] 2.5 -1.5 2.0
            # Identity Result
            A %*% solve(A)
            ##      [,1] [,2] [,3]
            ## [1,]    1    0    0
            ## [2,]    0    1    0
            ## [3,]    0    0    1
             Matrix Operation(6)
C <- matrix(c(2,1,6,1,3,4,6,4,-2),ncol = 3, nrow = 3)
C
##      [,1] [,2] [,3]
## [1,]    2     1   6
## [2,]    1     3   4
## [3,]    6     4  -2
CI <- solve(C)
CI
##             [,1]       [,2]        [,3]
## [1,] 0.2156863 -0.25490196 0.13725490
## [2,] -0.2549020 0.39215686 0.01960784
## [3,] 0.1372549 0.01960784 -0.04901961
d <- det(CI)
d
## [1] -0.009803922
LOOPING IN R
Types of Loop
Loop For(1)
                                      Loop For(2)
Nested Loop
         H <- matrix(nrow = 30, ncol = 30)
         for(i in 1:dim(H)[1]) {
           for (j in 1:dim(H)[2]) {
             H[i,j] = i*j
           }
         }
         H[1:10, 1:10]
         ##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
         ## [1,]     1    2    3    4    5    6    7    8    9    10
         ## [2,]     2    4    6    8   10   12   14   16   18    20
         ## [3,]     3    6    9   12   15   18   21   24   27    30
         ## [4,]     4    8   12   16   20   24   28   32   36    40
         ## [5,]     5   10   15   20   25   30   35   40   45    50
         ## [6,]     6   12   18   24   30   36   42   48   54    60
         ## [7,]     7   14   21   28   35   42   49   56   63    70
         ## [8,]     8   16   24   32   40   48   56   64   72    80
         ## [9,]     9   18   27   36   45   54   63   72   81    90
         ## [10,]   10   20   30   40   50   60   70   80   90   100
                          While and Repeat Loop
set.seed(11)                                 i=1
a=rnorm(10)
                                             repeat {
x=1                                            print(paste0("Bilangan ke-",i,"=",a[i]))
                                               i=i+1
while (x<=length(a)){                          if (i > length(a)) break
  print(paste0("Bilangan ke-",x,"=",a[x]))   }
  x=x + 1
}
CONDITION IN R
                             R if statement
if (test_expression) {
statement
}
  x <- 5
  if(x > 0){
  print("Positive number")
  }
                            if…else statement
if (test_expression) {
statement1
} else {
statement2
}
 x <- -5
 if(x > 0){
 print("Non-negative number")
 } else {
 print("Negative number")
 }
                                if…else Ladder
if ( test_expression1) {
statement1
} else if ( test_expression2) {
statement2
} else if ( test_expression3) {
statement3
} else {
statement4
}
                              x <- 0
                              if (x < 0) {
                              print("Negative number")
                              } else if (x > 0) {
                              print("Positive number")
                              } else
                              print("Zero")
RANDOM NUMBER IN R
                              Uniform: [a,b]
> runif(1,0,2)       # time at light
[1] 1.490857          # also runif(1,min=0,max=2)
> runif(5,0,2)       # time at 5 lights
[1] 0.07076444 0.01870595 0.50100158 0.61309213
0.77972391
> runif(5)          # 5 random numbers in [0,1]
[1] 0.1705696 0.8001335 0.9218580 0.1200221 0.1836119
                    Normal: [mu,sigma]
> rnorm(1,100,16)
[1] 94.1719
> rnorm(1,mean=280,sd=10)
[1] 270.4325
                               Binomial:[n,p]
> n=1, p=.5              # set the probability
> rbinom(1,n,p)            # different each time
[1] 1
> rbinom(10,n,p)            # 10 different such numbers
 [1] 0 1 1 0 1 0 1 0 1 0
   – atau kita dapat menggunakan fitur yang ada di RStudio File>   Import
     Dataset. kemudian pilih bentuk data yang ada inginkan
                                    Data Cleaning(3)
# import data from txt file into data.frame
summary(sales)
   cust_id   sales_total num_of_orders gender
 Min. :100001 Min. : 30.02 Min. : 1.000 F:5035
 1st Qu.:102501 1st Qu.: 80.29 1st Qu.: 2.000 M:4965
 Median :105001 Median : 151.65 Median : 2.000
 Mean :105001 Mean : 249.46 Mean : 2.428
 3rd Qu.:107500 3rd Qu.: 295.50 3rd Qu.: 3.000
 Max. :110000 Max. :7606.09 Max. :22.000
                                Data Cleaning(4)
str(sales)
'data.frame': 10000 obs. of 4 variables:
 $ cust_id : int 100001 100002 100003 100004 100005
100006 100007 100008 100009 100010 ...
 $ sales_total : num 800.6 217.5 74.6 498.6 723.1 ...
 $ num_of_orders: int 3 3 2 3 4 2 2 2 2 2 ...
 $ gender     : Factor w/ 2 levels "F","M": 1 1 2 2 1 1 2 2 1 2 ...
DATA VISUALIZATION
                         Data Visualization(1)
Scatter plot
> plot(sales$num_of_orders,sales$sales_total,main = "Number of Orders vs Sales")
Histogram
      hist(sales$num_of_orders, xlab = "num_of_orders", main = "Histogram
      data num_of_orders", col = "red")
                      Data Cleaning(3)
Density
      plot(density(sales$sales_total), xlab = "sales_total", main =
      "Distribution of sales total")
                      Data Cleaning(4)
Boxplot
      boxplot(sales$num_of_orders, sales$per_order, main = "Boxplot of
      sales",xlab = "num_of_orders", ylab = "per_order")
                        Let’s Practice
Bersihkan environment R Anda dengan command rm(list = ls())
Buat variabel vector bernama x dan y dimana masing-masing berisi
objek numerik 1 sampai dengan 5
Buat variabel list bernama z yang berisi objek logical, numeric,
character, complex, dan integer dengan urutan tersebut
Buat variabel matrix bernama w yang berdimensi 2 x 5 dengan
menggabungkan vector x dan y
Buat variabel matrix bernama q yang berdimensi 5 x 2 dengan
menggabungkan vector x dan y
Import data hbat.csv dan buat kesimpulan mengenai data tersebut
                         Let’s Practice
Buat matriks sebagai berikut:
Kemudian hitung:
  –
  –
  –
Bangun matriks dengan nilai diagonal
                        Reference
– Grolemund, G. (2014). Hands-On Programming with R: Write Your
  Own Functions and Simulations. O’Reilly Media, Inc.
– Wickham, H., & Grolemund, G. (2016). R for data science: import,
  tidy, transform, visualize, and model data. O’Reilly Media, Inc.
– Wickham, H. (2014). Advanced R. Chapman & Hall/CRC The R. Series