0% found this document useful (0 votes)

66 views34 pages

Presentation 1

R provides several useful data structures for storing and manipulating data, including vectors, matrices, lists, and data frames. Vectors store ordered data of the same type, matrices store rectangular tables of the same type, lists store ordered data of arbitrary types, and data frames store tables with rows and columns where each column can be a different type. R also has powerful functions for importing, exporting, filtering, summarizing and updating data in data frames.

Uploaded by

satishreddy71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views34 pages

Presentation 1

Uploaded by

satishreddy71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

R Compendium

Data Structures in R
Vector Data frame
An ordered collection of data of the same type It is a table with rows and columns; data within each column has
> a = c(1,2,3) the same type (e.g. number, text, logical), but different columns
> a*2 may have different types.
[1] 2 4 6
Example:
Matrix >a
A rectangular table of data of the same type Policy_no premium State
12345 100 IL
List
25486 400 NY
An ordered collection of data of arbitrary types.
63254 350 FL
> doe = list(name="john",age=28,married=F)
> doe$name
[1] "john“
> doe$age
[1] 28

Linear Rectangular
All Same Type Vector Matrix
Mixed List Data frame
Lapply, Sapply & Apply
Lapply Sapply

•When the same or similar tasks need to be performed multiple times for sapply( li, fct )
all elements of a list or for all columns of an array. Like apply, but tries to simplify the result, by converting it
•May be easier and faster than “for” loops into a vector or array of appropriate size
•lapply(li, function )
•To each element of the list li, the function function is applied. > li = list("klaus","martin","georg")
•The result is a list whose elements are the individual function > sapply(li, toupper)
results. [1] "KLAUS" "MARTIN" "GEORG"
> li = list("klaus","martin”)
> lapply(li, toupper) > fct = function(x) { return(c(x, x*x)) }
> [[1]] > sapply(1:5, fct)
> [1] "KLAUS" [,1] [,2] [,3] [,4] [,5]
> [[2]] [1,] 1 2 3 4 5
> [1] "MARTIN" [2,] 1 4 9 16 25

Apply
apply( arr, margin, fct )
Apply the function fct along some dimensions of the array arr, according to margin, and return a vector or array of the appropriate size.
>x
[,1] [,2] [,3] > apply(x, 1, sum)
[1,] 5 7 0 [1] 12 24 17 14
[2,] 7 9 8 > apply(x, 2, sum)
[3,] 4 6 7
[4,] 6 3 5
Regular expressions

Functions for detecting patterns Functions for replacing patterns

• grep(pattern, string, value = TRUE) • sub(pattern, replacement, string) - replace first match
• grepl(pattern , string) • gsub(pattern, replacement, string) - replace all
matches
Importing & Exporting data
Understanding data table
DT[i , j , by ]

Which rows ? What to do? Grouped by what?

- vector of row numbers - summarizing - one or more columns

- logical vector - updating variable(s) - on the fly grouping var(s)
- another data table - adding variable(s)

• Order of execution

DT[i , j , by ] DT[ 1 , 3 , 2 ]
Filtering rows & selecting columns data table
Operation Syntax
Subsetting rows by numbers # select first to tenth row
dt[1:10 , ]

Using column names to select rows # selecting rows with subln_grp as "PL"
based on a condition dt[ subln_grp == "PL" , ]

# Selecting rows which don't have NA in premium field

dt[ !is.na(premium) , ]

# Select rows with state as Florida and premium greater than zero
dt[ state == "FL" & premium > 0 , ]

Select columns # Select multiple columns

dt[ , .(policy_no , premium , state)]

Note - .() is an alias to list(). If .() is used, the returned value is a data.table. If .() is
not used, the result is a vector.
Subsetting rows and selecting # Select only rows where segment is "retail" and relevant columns
columns together dt[ segment == "retail" , .(policy_no , premium , state) ]
Summarizing data table
• Count and aggregation
Operation Syntax
Count # Counting number of policies with PL as their subln_grp
dt[ subln_grp == "PL" , .N ]

Count distinct # Count unique number of segments

dt[ , uniqueN(segment)]
uniqueN( dt , by = "segment")

# Count unique number of segments for Florida

dt[ state == "FL" , uniqueN(segment) ]
Simple aggregation # Taking average premium and count of policies
dt[ , .(count = .N , average = mean(premium , na.rm = TRUE)) ]

Aggregation including filtering # Taking average premium and count of policies for which deductible is greater
than $100
dt[ deductible > 100 , .(count = .N , average = mean(premium , na.rm = TRUE)) ]
Summarizing data table
• Group by
Operation Syntax
Simple group by # Taking average premium and count of policies by segment
dt[ , .(count = .N , average = mean(premium , na.rm = TRUE)) , by = segment ]

# Taking average premium and count of policies for which deductible is greater
than $100 by segment
dt[ deductible > 100 , .(count = .N , average = mean(premium , na.rm = TRUE)) ,
by = segment ]
OR
dt[ , .(count = .N , average = mean(premium , na.rm = TRUE)) ,
by = .(segment , ded = deductible > 100 ) ]

Special symbol - .SD & .SDcols

SD = Subset of Data .SDcols

- a data.table by itself - Specifies the columns of data table that are in .SD
- holds data of current group as defined in by - Allows to apply a function to all rows , but only to selected
columns
- when no by, .SD applies to whole data.table
- Comes in handy if there are too many columns and a
- allows for calculations on multiple columns
particular operation is to be performed on a subset of columns
Summarizing data table
Operation Syntax
.SD & .SDcols # Computing average premium using .SD
dt[ , lapply(.SD , function (x) mean( x , na.rm = TRUE) ) , by = segment , .SDcols
= "premium" ]

dt[ , lapply(.SD , function (x) mean( x , na.rm = TRUE) ) , by = segment , .SDcols

= grep( "prem” , names(dt)) ]
Update, add, delete variables in data table
Special operator: :=
 Updates a data table in place (by reference)
 Can be used to:
• update existing columns
• add new columns
• delete columns
 No need to use <-

Operation Syntax
Updating variables # Doubling the premium amount
dt[ , premium := premium *2]

# Doubling both premium and deductible amount

dt[ , `:=` (premium = premium*2 ,
deductible = deductible * 2) ]
Updating variables by group # Modifying premium by segment
dt[ , premium := premium * uniqueN(deductible) / .N , by = segment ]
Update, add, delete variables in data table
Operation Syntax
Adding variables # Adding new premium variable with premium doubled
dt[ , new_premium := premium *2]

Adding variables by group # Computing state premium

dt[ , state_premium := sum(premium , na.rm = TRUE) , by = state ]

# Computing state premium and state_deductible

dt[ , `:=` (state_premium = sum(premium , na.rm = TRUE) ,
state_deductible = max(deductible) ) , by = state ]
Deleting variables # Deleing new_premium variable
dt[ , new_premium := NULL]

# Deleting state variables

dt[ , grep("state_" , names(dt)) := NULL ]

#Deleting first two columns

dt[ , (1:2) := NULL ]
Reshaping data table
From wide to long format – melt function
Syntax:
melt(data, id.vars, measure.vars,
variable.name = "variable",
value.name = "value")

Parameter Description
id.vars ID columns with IDs for multiple entries
measure.vars Columns containing values to fill into cells
variable.name and value.name Names of new columns for variables and values derived from old headers

From long to wide format – dcast function

Syntax:
dcast(dt, id ~ y, value.var = c("a", "b"))

Parameter Description
Id ~ y Formula with a LHS: ID columns containing IDs for multiple entries. And a RHS:
columns with values to spread in column headers

Value.var Columns containing values to fill into cells

Reshaping data table
Wide format Long format

Syntax:
wide_data <- as.data.table(fread("reshape_data.csv"))
long_data <- melt(wide_data, id.vars = c("policy_no" , "state" , "subln_grp"), measure.vars = c("written_premium", "endorsement_premium"),
variable.name = "premium_type" , value.name = "premium")
wide_data <- dcast(long_data , policy_no + state + subln_grp ~ premium_type , value.var = "premium")
Set() family in data table
Operation Syntax
Create or upadate column names by # Change the name of premium field to wrt_prem
reference setnames(dt, "premium", "wrt_prem")

# change names of multiple columns

setnames(dt, c("premium", "policy_no"), c("wrt_prem", "policy_id"))

Setting a key on data table # set policy_no as key

setkey(dt, policy_no)

Reorder columns by reference #Change the column order of the data table
setcolorder(dt, c("deductible", "coverage", "segment", "state", "subln_grp",
"policy_no", "premium"))
Binning variables in data table
Other operations in data table
Other operations in data table
Joins in data table
• Right join
Syntax:
right_join <- dt1[dt2, on = "policy_no“ ] # right join (dt1 is left, dt2 is right)
right_join <- merge(dt1, dt2 , by = "policy_no" , all.y = TRUE)

dt1 dt2

right_join

dt1 captures written premium and state dt2 captures coverage and location code
information

Right_join has all rows from dt2

Joins in data table
• Left join
Syntax:
left_join <- dt2[ dt1, on = "policy_no"] # left join (dt1 is left, dt2 is right)
left_join <- merge(dt1, dt2, by - "policy_no" , all.x = TRUE)

dt1 dt2

left_join

dt1 captures written premium and state dt2 captures coverage and location code
information

left_join has all rows from dt1

Joins in data table
• Inner join
Syntax:
inner_join <- dt1[ dt2, on = "policy_no", nomatch = 0] # inner join
inner_join <- merge(dt1, dt2, by = "policy_no")

dt1 dt2

inner_join

dt1 captures written premium and state dt2 captures coverage and location code
information

Inner_join has all rows where dt1’s key columns values match dt2’s key column values
Joins in data table
• Left anti join
Syntax:
left_anti_join <- dt1[ !dt2, on = "policy_no“ ]

• Right anti join

Syntax:
right_anti_join <- dt2[!dt1, on = "policy_no"]

• Full outer join

Syntax:
full_outer_join <- merge(dt1 , dt2 , by =
"policy_no" , all = TRUE)
Joins in data table
• Rolling join
• Rolling joins are used for analyzing data involving time
• Let’s say data table 1 (dt1) contains policy information like inception date, end date , written
premium and location
• Data table 2 (dt2) has policy endorsement details like endorsement effective date and associated
premium amount
dt1 dt2
Joins in data table
• Rolling join
• Rolling join can be used to merge both tables with the condition that endorsement effective date lies
within the policy period

Syntax:
# Converting character to date
dt1[ , `:=` (policy_start_date =
as.Date(policy_start_date, "%m/%d/%y") ,
policy_end_date =
as.Date(policy_end_date , "%m/%d/%y"))]
dt2[ , endorsement_eff_dt :=
as.Date(endorsement_eff_dt , "%m/%d/%y")]

# Creating the date variable to be used for joining

dt1[ , join_date := policy_start_date]
dt2[ , join_date := endorsement_eff_dt]

# setting keys on each table

setkey(dt1 , policy_no , join_date)
setkey(dt2 , policy_no , join_date)

# Rolling backward
rolling_join <- dt2[dt1 , roll = -365 ]
ggplot2 - Grammar of Graphics plot
 Data: in ggplot2, data must be stored as an R data frame

 Coordinate system: describes 2-D space that data is projected onto - for example, Cartesian
coordinates, polar coordinates, map projections

 Geoms: describe type of geometric objects that represent data - for example, points, lines, polygons

 Aesthetics: describe visual characteristics that represent data - for example, position, size, color,
shape, transparency, fill

 Scales: for each aesthetic, describe how visual characteristic is converted to display values - for
example, log scales, color scales, size scales, shape scales

 Stats : describe statistical transformations that typically summarize data - for example, counts,
means, medians, regression lines

 Facets: describe how data is split into subsets and displayed as multiple small graphs
Creating a plot object
 creates a plot object that can be assigned to a variable
 can specify data frame and aesthetic mappings (visual characteristics that represent data)

Syntax:
plot_data <- as.data.table(fread("ggplot_data.csv"))
p <- ggplot(data = plot_data , aes(x = subln_grp , y = premium ))
p
x‐axis position indicates subln_grp
y‐axis position indicates premium
Adding a layer
Syntax:
plot_data <- as.data.table(fread("ggplot_data.csv"))
p <- ggplot(data = plot_data , aes(x = subln_grp , y = premium , color = state))
p + geom_point(size = 2)
Layer
Purpose:
 Display the data – allows viewer to see patterns, overall structure, local structure, outliers

 Display statistical summaries of the data – allows viewer to see counts, means, medians, IQRs, model
predictions

 Data and aesthetics (mappings) may be inherited from ggplot() object or added, changed, or dropped
within individual layers

 Most layers contain a geom - the fundamental building block of ggplot2

full specification: geom_xxx(mapping, data, stat, position, ...)

 Each geom_xxx() has a default stat (statistical transformation) associated with it , but the default
statistical transformation can be changed using stat parameter
Adding a geom layer
Syntax:
plot_data <- as.data.table(fread("ggplot_data.csv"))
p <- ggplot(data = plot_data , aes(x = subln_grp , y = premium , color = state))
p + geom_blank() p + geom_point()

p + geom_jitter() p + geom_count()
Displaying Statistical Summary
Syntax:
plot_data <- as.data.table(fread("ggplot_data.csv"))
p <- ggplot(data = plot_data , aes(x = state))
p + geom_bar()
Already transformed data
Syntax:
plot_data <- as.data.table(fread("ggplot_data.csv"))
transfrmd_data <- plot_data[ , count := .N , by = state]
transfrmd_data <- unique(transfrmd_data[,.(count , state)])
p <- ggplot(data = transfrmd_data , aes(x = state , y = count))
p + geom_col()
# or
p + geom_bar(stat = "identity")

 geom_bar: height of bar proportional to

number of observations in each group.

 geom_col: leaves data as is. geom_bar

uses count stat by default. geom_col
uses identity stat
Displaying distributions
Syntax:
plot_data <- as.data.table(fread("ggplot_data.csv"))
p <- ggplot(data = plot_data , aes(x = premium))
p + geom_histogram() p + geom_freqpoly()

p + geom_freqpoly(aes(color = state))
Displaying Statistical Summaries
Syntax:
plot_data <- as.data.table(fread("ggplot_data.csv"))
p <- ggplot(data = plot_data , aes(x = state , y = premium))
p + geom_boxplot()
Position
Syntax:
plot_data <- as.data.table(fread("ggplot_data.csv"))
p <- ggplot(data = plot_data , aes(x = state , fill = deductible > 100))
p + geom_bar() p + geom_bar(position="stack")

p + geom_bar(position="dodge") p + geom_bar(position="fill")

Data Table
No ratings yet
Data Table
2 pages
WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
No ratings yet
WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
2 pages
WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
No ratings yet
WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
2 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
SAS R::: Cheat Sheet
No ratings yet
SAS R::: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
Datatable
No ratings yet
Datatable
2 pages
Data Transformation With Dplyr Cheat Sheet
No ratings yet
Data Transformation With Dplyr Cheat Sheet
2 pages
SAS to R: A User's Cheat Sheet
No ratings yet
SAS to R: A User's Cheat Sheet
2 pages
Base R
No ratings yet
Base R
9 pages
Day3 2017
No ratings yet
Day3 2017
27 pages
Unit 2
No ratings yet
Unit 2
76 pages
Enhanced Data
No ratings yet
Enhanced Data
12 pages
R Course Own English HS
No ratings yet
R Course Own English HS
70 pages
R Data Wrangling Cheat Sheet
No ratings yet
R Data Wrangling Cheat Sheet
2 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
R Lecture 2-1
No ratings yet
R Lecture 2-1
28 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
R Data Manipulation Guide
No ratings yet
R Data Manipulation Guide
46 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Apply Funcs DT
No ratings yet
Apply Funcs DT
32 pages
UL2
No ratings yet
UL2
2 pages
Data Wrangling
No ratings yet
Data Wrangling
12 pages
Faqs About The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
No ratings yet
Faqs About The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
21 pages
Summary R - Coding
No ratings yet
Summary R - Coding
2 pages
What Is Dplyr
No ratings yet
What Is Dplyr
23 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
R Lectures Chapter 4
No ratings yet
R Lectures Chapter 4
3 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
R Studio: Scripts, Data Handling & Cleaning
No ratings yet
R Studio: Scripts, Data Handling & Cleaning
25 pages
Tidyr & Dplyr Functions Guide
No ratings yet
Tidyr & Dplyr Functions Guide
3 pages
Lesson 7 - The Data Frame
No ratings yet
Lesson 7 - The Data Frame
7 pages
R Tutorial: Vectors, Matrices, Arrays
No ratings yet
R Tutorial: Vectors, Matrices, Arrays
8 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
CH 3
No ratings yet
CH 3
33 pages
R Programming: Data Types & Visualization
No ratings yet
R Programming: Data Types & Visualization
12 pages
Lec 4
No ratings yet
Lec 4
18 pages
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
8 pages
All Codes
No ratings yet
All Codes
10 pages
Lecture 2: More Data Structures: Outline
No ratings yet
Lecture 2: More Data Structures: Outline
16 pages
First Course On R
No ratings yet
First Course On R
26 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
R Programming Essentials
No ratings yet
R Programming Essentials
27 pages
R
No ratings yet
R
15 pages
Data Tidying With Tidyr::: Cheat Sheet
No ratings yet
Data Tidying With Tidyr::: Cheat Sheet
2 pages
05 Dplyr
No ratings yet
05 Dplyr
37 pages
R Data Summarization Techniques
No ratings yet
R Data Summarization Techniques
25 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
8 pages
Cashmanagement Multi nzaYJkVb6nrNVdZIbRd1jfL42QS7hNYwuG9Sb7GdDrg
No ratings yet
Cashmanagement Multi nzaYJkVb6nrNVdZIbRd1jfL42QS7hNYwuG9Sb7GdDrg
13 pages
Solution Sheet
100% (1)
Solution Sheet
4 pages
Multiple Regression: Dr. Sanjay Rastogi IIFT, New Delhi
No ratings yet
Multiple Regression: Dr. Sanjay Rastogi IIFT, New Delhi
37 pages
SAS Test
No ratings yet
SAS Test
4 pages
Insurance Fraud Detection Guide
No ratings yet
Insurance Fraud Detection Guide
20 pages
Types of Fraud
No ratings yet
Types of Fraud
19 pages
Indicator Variable Agency Cosfree Cash Flow Y Firm Size Market Cap Profit Eps Leverage D/E Growth Marketval Book Valuemarket Value/Bookvalue
No ratings yet
Indicator Variable Agency Cosfree Cash Flow Y Firm Size Market Cap Profit Eps Leverage D/E Growth Marketval Book Valuemarket Value/Bookvalue
2 pages
African Used Car Import Challenges
No ratings yet
African Used Car Import Challenges
26 pages
Category Grocery Drug Mass Merchandise Total
No ratings yet
Category Grocery Drug Mass Merchandise Total
7 pages
Income Yielding Asset: Year 1 Year 2 Assumptions
No ratings yet
Income Yielding Asset: Year 1 Year 2 Assumptions
2 pages
Two Basic Approaches:: Ethical Theories
No ratings yet
Two Basic Approaches:: Ethical Theories
5 pages
Ethical Leadership: What Happens When Organizations Are Unethical?
No ratings yet
Ethical Leadership: What Happens When Organizations Are Unethical?
2 pages
Institute of Actuaries of India: Subject CT3 - Probability and Mathematical Statistics
No ratings yet
Institute of Actuaries of India: Subject CT3 - Probability and Mathematical Statistics
7 pages
RiskAssessmentReport PDF
No ratings yet
RiskAssessmentReport PDF
2 pages
Report On DBT
No ratings yet
Report On DBT
21 pages
Venn Diagrams & Set Theory Exercises
No ratings yet
Venn Diagrams & Set Theory Exercises
8 pages
CEE212 - Solid and Structural Mechanics (3 Credits) Winter Semester 2014-2015
No ratings yet
CEE212 - Solid and Structural Mechanics (3 Credits) Winter Semester 2014-2015
1 page
QBM101
No ratings yet
QBM101
37 pages
Computers, Environment and Urban Systems: Renzhong Guo, Lin Li, Shen Ying, Ping Luo, Biao He, Renrong Jiang
No ratings yet
Computers, Environment and Urban Systems: Renzhong Guo, Lin Li, Shen Ying, Ping Luo, Biao He, Renrong Jiang
10 pages
100 Math
No ratings yet
100 Math
5 pages
Piping Fatigue Analysis Guide
No ratings yet
Piping Fatigue Analysis Guide
4 pages
OR Ch-2
No ratings yet
OR Ch-2
30 pages
Probability and Computing 2nd Ed Edition Mitzenmacher PDF Download
100% (10)
Probability and Computing 2nd Ed Edition Mitzenmacher PDF Download
81 pages
FSW vs TIG: Marine Steel Welding Analysis
No ratings yet
FSW vs TIG: Marine Steel Welding Analysis
7 pages
Economics & Maths Ss1 by 2024-5
No ratings yet
Economics & Maths Ss1 by 2024-5
6 pages
No of Bricks Calculation For Line III Kiln
No ratings yet
No of Bricks Calculation For Line III Kiln
3 pages
Cell Coverage For Signal Traffic
No ratings yet
Cell Coverage For Signal Traffic
43 pages
Mosaic Help
100% (1)
Mosaic Help
5 pages
Artículo ELV
No ratings yet
Artículo ELV
6 pages
Digital SAT Prep Online - Master The SAT
No ratings yet
Digital SAT Prep Online - Master The SAT
37 pages
LAS-Sample - MATH 7 WEEK 4-5
No ratings yet
LAS-Sample - MATH 7 WEEK 4-5
6 pages
Dwnload Full Basic College Mathematics 12th Edition Bittinger Solutions Manual PDF
100% (37)
Dwnload Full Basic College Mathematics 12th Edition Bittinger Solutions Manual PDF
36 pages
Titanic: Logistic Regression Project
No ratings yet
Titanic: Logistic Regression Project
19 pages
Dance Math Lesson Perimter
No ratings yet
Dance Math Lesson Perimter
3 pages
Efficiency Assessment and Determinants of Performance
No ratings yet
Efficiency Assessment and Determinants of Performance
18 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
28 pages
JEE Main Continuity Problems
100% (1)
JEE Main Continuity Problems
3 pages
Form 2 Dec Revision Booklet
No ratings yet
Form 2 Dec Revision Booklet
282 pages
Design of A Spiral Bevel Gear Acc. To ISO 23509:2006 Standards
No ratings yet
Design of A Spiral Bevel Gear Acc. To ISO 23509:2006 Standards
6 pages
Airline Data Dispersion Analysis
No ratings yet
Airline Data Dispersion Analysis
5 pages
Cyclic Codes: Saravanan Vijayakumaran Sarva@ee - Iitb.ac - in
No ratings yet
Cyclic Codes: Saravanan Vijayakumaran Sarva@ee - Iitb.ac - in
25 pages
Grade6 6 End of Unit Assessment (B) Assessment
No ratings yet
Grade6 6 End of Unit Assessment (B) Assessment
5 pages
- - P U am я: am ": Category: Primary (iii - iv) Time: 2 Hours
No ratings yet
- - P U am я: am ": Category: Primary (iii - iv) Time: 2 Hours
2 pages
Regression Analysis: Basic Concepts: 1 The Simple Linear Model
No ratings yet
Regression Analysis: Basic Concepts: 1 The Simple Linear Model
4 pages
Bhadra Binu Remya - Tasks - Operators
No ratings yet
Bhadra Binu Remya - Tasks - Operators
4 pages

Presentation 1

Uploaded by

Presentation 1

Uploaded by

R Compendium

Functions for detecting patterns Functions for replacing patterns

Which rows ? What to do? Grouped by what?

- vector of row numbers - summarizing - one or more columns

# Selecting rows which don't have NA in premium field

Select columns # Select multiple columns

Count distinct # Count unique number of segments

# Count unique number of segments for Florida

Special symbol - .SD & .SDcols

SD = Subset of Data .SDcols

dt[ , lapply(.SD , function (x) mean( x , na.rm = TRUE) ) , by = segment , .SDcols

# Doubling both premium and deductible amount

Adding variables by group # Computing state premium

# Computing state premium and state_deductible

# Deleting state variables

#Deleting first two columns

From long to wide format – dcast function

Value.var Columns containing values to fill into cells

# change names of multiple columns

Setting a key on data table # set policy_no as key

Right_join has all rows from dt2

left_join has all rows from dt1

• Right anti join

• Full outer join

# Creating the date variable to be used for joining

# setting keys on each table

 Most layers contain a geom - the fundamental building block of ggplot2

 geom_bar: height of bar proportional to

 geom_col: leaves data as is. geom_bar

You might also like