0% found this document useful (0 votes)

69 views7 pages

R Course

This document provides an introduction to using R Studio and performing common data analysis tasks. It outlines how to import and manipulate data frames, calculate descriptive statistics, perform linear and logistic regression modeling, and conduct ANOVA analyses. Key functions and commands are listed for reading data, accessing variable properties, subsetting data frames, calculating statistics, producing graphs, fitting regression models, and performing ANOVA and post-hoc tests. The document serves as a starting point for learning the basics of the R programming language and statistical analysis tools in R Studio.

Uploaded by

Andreea Dobrita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views7 pages

R Course

Uploaded by

Andreea Dobrita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

R Studio Course

To save in the memory of the program the data frame:

 name you want to use for data frame = read.csv (“name you have the csv saved in your
computer”)
 name you want to use for data frame <- read.csv (“name you have the csv saved in your
computer”)

To obtain the structure of your data frame:

 str(name of the data frame)

To obtain a summary of your data frame:

 summary(name of the data frame)

To obtain a subset of your data fram:

 name of the subset you create = subset(name of the data frame, name of the collumn == “name
of the variable”)

To save your subset in your computer:

 write.csv(name of the subset, “name of the document you want to save as.cvs”)

To see the variables your data frame has:

 ls()

To see only a variable from your data frame:

 name of the data frame$name of the variable you want to see

To obtain the mean, sd, and the summary of your variable:

 mean(name of the data frame$name of the variable), sd(name of the data frame$name of the
variable), summary (name of the data frame$name of the variable)

To see exactly for which subject some values correspond:

 which.min(name of the data frame$name of the variable)

To see the code allocated to a subject:

 name of thee data frame$name of the collumn[value]

To make a plot:

 plot(name of the data frame$name of the column for X axes, name of the data frame$name of
the column for Y axes)
To see the outliers:

 outliers = subset(name of the data frame, name of the collumn > < == & name of the collumn
><==)

To view the outliers and see the number of outliers:

 view(outliers)
 nrow(outliers)

To view exactly the details of the outliers:

 outliers[c(“name of the collumn”, “name of the collumn”, “name of the collumn”)]

getwd()

WHO = read.csv("WHO.csv")

str(WHO)

summary(WHO)

WHO_Europe = subset(WHO, Region == "Europe")

str(WHO_Europe)

write.csv(WHO_Europe, "WHO_Europe.csv")

ls()

rm(WHO_Europe)

ls()

Under15

WHO$Under15

mean(WHO$Under15)

sd(WHO$Under15)

summary(WHO$Under15)

outliers = subset(WHO, GNI > 10000 & FertilityRate > 2.5)

nrow(outliers)

outliers[c("Country", "GNI", "FertilityRate")]

hist(WHO$CellularSubscribers)

boxplot(WHO$LifeExpectancy ~ WHO$Region)
boxplot(WHO$LifeExpectancy ~ WHO$Region, xlab="", ylab="Life Expectancy", main="Life Expectancy
by Region")

table(WHO$Region)

tapply(WHO$Over60, WHO$Region, mean)

tapply(WHO$LiteracyRate, WHO$Region, min)

tapply(WHO$LiteracyRate, WHO$Region, min, na.rm=TRUE)

Introduction to R

Setting and getting the working directory

 Use File > Change dir...

 setwd("P:/Data/MATH/Hartlaub/DataAnalysis")
 getwd()

Reading data (Creating a dataframe)

 mydata=read.csv(file=file.choose())

Commands for dataframes

 mydata #shows the entire data set

 head(mydata) #shows the first 6 rows
 tail(mydata) #shows the last 6 rows
 str(mydata) #shows the variable names and types
 names(mydata) #shows the variable names
 ls() #shows a list of objects that are available
 attach(mydata) #attaches the dataframe to the R search path, which makes it easy to access
variable names

Descriptive Statistics

 mean(x) #computes the mean of the variable x

 median(x) #computes the median of the variable x
 sd(x) #computes the standard deviation of the variable x
 IQR(x) #computer the IQR of the variable x
 summary(x) #computes the 5-number summary and the mean of the variable x
 t.test(x) #get a one sample t test
 t.test(x,y) #get a two sample t test
 t.test(x, y, paired=TRUE) #get a paired t test
 cor(x,y) #computes the correlation coefficient
 cor(mydata) #computes a correlation matrix
 cor.test(x,y) #test plus CI for rho

Graphical Displays

 windows(record=TRUE) #records your work, including plots

 hist(x) #creates a histogram for the variable x
 boxplot(x) # creates a boxplot for the variable x
 cort
 stem(x) #creates a stem plot for the variable x
 plot(y~x) #creates a scatterplot of y versus x
 plot(mydata) #provides a scatterplot matrix
 abline(lm(y~x)) #adds regression line to plot
 lines(lowess(x,y)) # adds lowess line (x,y) to plot

Liner Regression Models

 regmodel=lm(y~x) #fit a regression model

 summary(regmodel) #get results from fitting the regression model
 anova(regmodel) #get the ANOVA table fro the regression fit
 plot(regmodel) #get four plots, including normal probability plot, of residuals
 fits=regmodel$fitted #store the fitted values in variable named "fits"
 resids=regmodel$residuals #store the residual values in a varaible named "resids"
 sresids=rstandard(regmodel) #store the standardized residuals in a variable named "sresids"
 studresids=rstudent(regmodel) #store the studentized residuals in a variable named
"studresids"
 beta1hat=regmodel$coeff[2] #assign the slope coefficient to the name "beta1hat"
 qt(.975,15) # find the 97.5% percentile for a t distribution with 15 df
 confint(regmodel) #CIs for all parameters
 newx=data.frame(X=41) #create a new data frame with one new x* value of 41
 predict.lm(regmodel,newx,interval="confidence") #get a CI for the mean at the value x*
 predict.lm(model,newx,interval="prediction") #get a prediction interval for an individual Y value
at the value x*
 hatvalues(regmodel) #get the leverage values (hi)
 Model Selection
o library(leaps) #load the leaps package
o allmods = regsubsets(y~x1+x2+x3+x4, nbest=2, data=mydata) #(leaps package must be
loaded), identify best two models for 1, 2, 3 predictors
o summary(allmods) # get summary of best subsets
o summary(allmods)$adjr2 #adjusted R^2 for some models
o summary(allmods)$cp #Cp for some models
o plot(allmods, scale="adjr2") # plot that identifies models
o plot(allmods, scale="Cp") # plot that identifies models
o fullmodel=lm(y~., data=mydata) # regress y on everything in mydata
o MSE=(summary(fullmodel)$sigma)^2 # store MSE for the full model
o extractAIC(lm(y~x1+x2+x3), scale=MSE) #get Cp (equivalent to AIC)
o step(fullmodel, scale=MSE, direction="backward") #backward elimination
o step(fullmodel, scale=MSE, direction="forward") #forward elimination
o step(fullmodel, scale=MSE, direction="both") #stepwise regression
o none(lm(y~1) #regress y on the constant only
o step(none, scope=list(upper=fullmodel), scale=MSE) #use Cp in stepwise regression

Logistic Regression

 table(y) #get a table of the distribution of y

 mytable=table(y, x) #get a 2-way table of y by x
 chisq.test(mytable) #Chi-sq test with Yates continuity correction
 chisq.test(mytable, correction=FALSE) #Chi-sq test of independence without Yates continuity
correction
 prop.table(table(y, x),1) #get a table of row proportions
 prop.table(table(y, x),2) #get a table of column proportions
 prop.test(c(39,22), c(100,100), correction=FALSE) #2-sample proportion test without Yates
continuity correction
 plot(x,jitter(y,amount=0.05)) #jitter y in the plot
 anova(reducedmodel, fullmodel, test="Chisq") #nested G test
 drop1(mymodel, test="Chisq") #G tests to see what to drop next
 as.factor(X) #create dummy variables for the levels of the variable X
 model1=glm(y~as.factor(X), family=binomial) #fit model with the categories of X as predictors
 summary(model1) #gives Z tests, residual deviance, and null deviance
 anova(model1, test="Chisq") #test of H0: constant term is all that is needed. (i.e., nested G test
against the model y~1.)
 confint(model1) #CIs for all parameters
 confint(model1, parm="x") #CI for the coefficient of x
 exp(confint(model1, parm="x")) #CI for odds ratio
 shortmodel=glm(cbind(y1,y2)~x, family=binomial) binomial inputs
 dresid=residuals(model1, type="deviance") #deviance residuals
 presid=residuals(model1, type="pearson") #Pearson residuals
 plot(residuals(model1, type="deviance")) #plot of deviance residuals
 newx=data.frame(X=20) #set (X=20) for an upcoming prediction
 predict(mymodel, newx, type="response") #get predicted probability at X=20

Analysis of Variance

 t.test(y~x, var.equal=TRUE) #pooled t-test where x is a factor

 x=as.factor(x) #coerce x to be a factor variable
 tapply(y, x, mean) #get mean of y at each level of x
 tapply(y, x, sd) #get stadard deviations of y at each level of x
 tapply(y, x, length) #get sample sizes of y at each level of x
 library(gplots) #load gplots package
o plotmeans(y~x) #means and 95% confidence intervals
 AOVmodel=aov(y~x) #one-way ANOVA
 summary(AOVmodel) #get ANOVA output
 oneway.test(y~x, var.equal=TRUE) #one-way test output
 library(car) #load car package
o levene.test(y,x) #Levene's test for equal variances
 blockmodel=aov(y~x+block) #Randomized block design model with "block" as a variable
 tapply(lm(y~x1:x2,mean) #get the mean of y for each cell of x1 by x2
 anova(lm(y~x1+x2)) #a way to get a two-way ANOVA table
 interaction.plot(FactorA, FactorB, y) #get an interaction plot
 pairwise.t.test(y,x,p.adj="none") #pairwise t tests
 pairwise.t.test(y,x,p.adj="bonferroni") #pairwise t tests
 TukeyHSD(AOVmodel) #get Tukey CIs and P-values
 plot(TukeyHSD(AOVmodel)) #get 95% family-wise CIs
 library(multcomp) #load multcomp package
o contrast=rbind(c(.5,.5,-1/3,-1/3,-1/3)) #set up a contrast
o summary(glht(AOVmodel, linfct=mcp(x=contrast))) #test a contrast
o confint(glht(AOVmodel, linfct=mcp(x=contrast))) #CI for a contrast
 kruskal.test(y~x) #Kruskal-Wallis test
 friedman.test(y,x,block) #Friedman test for block design

Create a personalized histogram

# Create a histogram of the price of

# all the diamonds in the diamond data set.

ggplot(diamonds, aes(x = price)) +

geom_histogram(color = "black", fill = "DarkOrange", binwidth = 500) +

scale_x_continuous(labels = dollar, breaks = seq(0, 20000, 1000)) +

theme(axis.text.x = element_text(angle = 90)) +

xlab("Price") + ylab("Count")

# a) How many diamonds cost less than $500?

diamonds %>%

filter(price < 500) %>%

summarise(n = n())

# Break out the histogram of diamond prices by cut.

# You should have five histograms in separate

# panels on your resulting plot.

ggplot(diamonds, aes(x = price)) +

geom_histogram(color = "black", fill = "DarkOrange", binwidth = 25) +

scale_x_continuous(labels = dollar, breaks = seq(0, 4000, 100)) +

theme(axis.text.x = element_text(angle = 90)) +

coord_cartesian(c(0,4000)) +

facet_grid(cut~.) +

xlab("Price") + ylab("Count")

# a) Which cut has the highest priced diamond?

# Premium

# by(diamonds$price, diamonds$cut, max)

# by(diamonds$price, diamonds$cut, min)

# by(diamonds$price, diamonds$cut, median)

diamonds %>%

group_by(cut) %>%

summarise(max_price = max(price),

min_price = min(price),

median_price = median(price))

R Codes
No ratings yet
R Codes
5 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
BAN5
No ratings yet
BAN5
2 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
R Practicals
No ratings yet
R Practicals
32 pages
Stats Notes
No ratings yet
Stats Notes
4 pages
R Commands: Appendix B
No ratings yet
R Commands: Appendix B
5 pages
R Cheatsheet ABC
No ratings yet
R Cheatsheet ABC
3 pages
Advanced Stats & Data Science Guide
No ratings yet
Advanced Stats & Data Science Guide
3 pages
R Functions List
No ratings yet
R Functions List
8 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
UL2
No ratings yet
UL2
2 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
R Commands
No ratings yet
R Commands
18 pages
R File Code
No ratings yet
R File Code
16 pages
R Cheatsheet ABCD
No ratings yet
R Cheatsheet ABCD
3 pages
R Studio Lab Summary Sheet
No ratings yet
R Studio Lab Summary Sheet
3 pages
R Functions for Statistical Analysis
No ratings yet
R Functions for Statistical Analysis
4 pages
Summary of R Commands For Statistics 100
No ratings yet
Summary of R Commands For Statistics 100
3 pages
Common Stat 101 Commands For Rstudio: 1 One Categorical Variable
No ratings yet
Common Stat 101 Commands For Rstudio: 1 One Categorical Variable
5 pages
R Regression Analysis Guide
No ratings yet
R Regression Analysis Guide
16 pages
R Data Analysis Techniques
No ratings yet
R Data Analysis Techniques
6 pages
Ds
No ratings yet
Ds
2 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
Statistics & Data Science Cheat Sheet
No ratings yet
Statistics & Data Science Cheat Sheet
3 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
CourseKata R Cheatsheet ABC
No ratings yet
CourseKata R Cheatsheet ABC
5 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
R Studio Cheat Sheet
No ratings yet
R Studio Cheat Sheet
6 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
9 pages
R Code
No ratings yet
R Code
9 pages
R Manual
No ratings yet
R Manual
10 pages
R Training AM
No ratings yet
R Training AM
6 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
All Values in The First Column
No ratings yet
All Values in The First Column
7 pages
IBS Sample I
No ratings yet
IBS Sample I
10 pages
R Tutorial #1: Applied Econometrics (Econ3005)
No ratings yet
R Tutorial #1: Applied Econometrics (Econ3005)
21 pages
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
No ratings yet
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
6 pages
Rdias FDP
No ratings yet
Rdias FDP
50 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
R Regression Functions Guide
No ratings yet
R Regression Functions Guide
5 pages
R Programming
No ratings yet
R Programming
47 pages
R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
Econometrics I: RStudio Guide
No ratings yet
Econometrics I: RStudio Guide
77 pages
Essential R
No ratings yet
Essential R
261 pages
DA Lab Week-1
No ratings yet
DA Lab Week-1
7 pages
Econ6067 R (I) 2022
No ratings yet
Econ6067 R (I) 2022
22 pages
Data Analysis in R
No ratings yet
Data Analysis in R
10 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
WEEK
No ratings yet
WEEK
17 pages
R Examples
No ratings yet
R Examples
56 pages
Social Pressure Boosts Voter Turnout
No ratings yet
Social Pressure Boosts Voter Turnout
16 pages
Determinants of Voter Turnout in Nsukka Council of Enugu State, South Eastern Nigeria
No ratings yet
Determinants of Voter Turnout in Nsukka Council of Enugu State, South Eastern Nigeria
17 pages
Utrecht - Effects of Perosnal Network
No ratings yet
Utrecht - Effects of Perosnal Network
23 pages
Fulbright Foreign Student Program
No ratings yet
Fulbright Foreign Student Program
27 pages
Pasadena - Expressive and Strategic Behavior
No ratings yet
Pasadena - Expressive and Strategic Behavior
22 pages
Mock Exam 1
No ratings yet
Mock Exam 1
2 pages
Paul S. Adler - Paul Du Gay - Glenn Morgan - Michael Reed (Eds.) - The Oxford Handbook of Sociology, Social Theory, and Organization Studies - Contemporary Currents-Oxford University Press, USA (2014)
92% (13)
Paul S. Adler - Paul Du Gay - Glenn Morgan - Michael Reed (Eds.) - The Oxford Handbook of Sociology, Social Theory, and Organization Studies - Contemporary Currents-Oxford University Press, USA (2014)
817 pages
Question A: Figure 1: Relationship Between GDP Per Capita Growth and Initial Schooling Across Countries
No ratings yet
Question A: Figure 1: Relationship Between GDP Per Capita Growth and Initial Schooling Across Countries
3 pages
(Macat Library.) Giuduci, Alessandro - Rolbina, Marianna - An Analysis of Pankaj Ghemawat's Distance Still Matters - The Hard Reality of Global Expansion-Macat International, Routledge (2018)
No ratings yet
(Macat Library.) Giuduci, Alessandro - Rolbina, Marianna - An Analysis of Pankaj Ghemawat's Distance Still Matters - The Hard Reality of Global Expansion-Macat International, Routledge (2018)
113 pages
Trust Yann Algan
No ratings yet
Trust Yann Algan
74 pages
Key Concepts in Research: Nr. Crt. Concept Description (Deffinition) Descriere (Definiție)
No ratings yet
Key Concepts in Research: Nr. Crt. Concept Description (Deffinition) Descriere (Definiție)
1 page
HDR Cake Slicer Instruction
No ratings yet
HDR Cake Slicer Instruction
9 pages
Brussels Forum 2019: European Elections - What Do They Tell Us About Europe?
No ratings yet
Brussels Forum 2019: European Elections - What Do They Tell Us About Europe?
4 pages
A Guide To LISREL-type Structural Equation Modelin
No ratings yet
A Guide To LISREL-type Structural Equation Modelin
9 pages
Narrative in The Study of Migrants: January 2017
No ratings yet
Narrative in The Study of Migrants: January 2017
17 pages
GROUP4PROSPERITY
No ratings yet
GROUP4PROSPERITY
114 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
33 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
11 pages
What Teachers Want Elementary Teachers' Perceptions of
No ratings yet
What Teachers Want Elementary Teachers' Perceptions of
14 pages
Cala English
No ratings yet
Cala English
3 pages
0.1 - The 6 P's of Sales Acceleration
No ratings yet
0.1 - The 6 P's of Sales Acceleration
3 pages
Power System Economics
100% (1)
Power System Economics
17 pages
0680 Environmental Management Coursework Handbookv1.1
No ratings yet
0680 Environmental Management Coursework Handbookv1.1
130 pages
Administering, Analyzing, & Improving Tests (Part 1)
No ratings yet
Administering, Analyzing, & Improving Tests (Part 1)
64 pages
PSHTEEE
No ratings yet
PSHTEEE
36 pages
A. Tracer Fiber Technique: This Technique Involves Immersing A Yarn, Which Contains A Very
No ratings yet
A. Tracer Fiber Technique: This Technique Involves Immersing A Yarn, Which Contains A Very
23 pages
Introduction To Operations Research 9th Edition by Hillier Official Test Bank
No ratings yet
Introduction To Operations Research 9th Edition by Hillier Official Test Bank
326 pages
Ekomet Jawaban CH 2
No ratings yet
Ekomet Jawaban CH 2
8 pages
Supply Chain Returns Strategy
No ratings yet
Supply Chain Returns Strategy
25 pages
Master Thesis First Page Sample
100% (3)
Master Thesis First Page Sample
5 pages
Sustainability in The Automotive Industry, Importance of and Impact On Automobile Interior Insights From An Empirical Survey
No ratings yet
Sustainability in The Automotive Industry, Importance of and Impact On Automobile Interior Insights From An Empirical Survey
11 pages
Performance Task Rubric
No ratings yet
Performance Task Rubric
2 pages
Green and Blue Playful and Illustrative Portrait University Research Poster
No ratings yet
Green and Blue Playful and Illustrative Portrait University Research Poster
1 page
Total Station Basics Introduction To Using The Leica Total Station
No ratings yet
Total Station Basics Introduction To Using The Leica Total Station
19 pages
Spatial Autocorrelation Analysis
No ratings yet
Spatial Autocorrelation Analysis
1 page
Rotary Kilns Second Edition Transport Phenomena and Transport Processes Boateng Digital Access
100% (1)
Rotary Kilns Second Edition Transport Phenomena and Transport Processes Boateng Digital Access
401 pages
The Role of Digital Technology and Digital Innovation Towards Firm Performance in A Digital Economy
No ratings yet
The Role of Digital Technology and Digital Innovation Towards Firm Performance in A Digital Economy
26 pages
Etextbook 978-0078024054 Supply Chain Logistics Management 4th Edition Instant Download
100% (2)
Etextbook 978-0078024054 Supply Chain Logistics Management 4th Edition Instant Download
111 pages
Pay Per Click PDF
No ratings yet
Pay Per Click PDF
12 pages
Analysis and Cognitive Methodology in Training Process
100% (1)
Analysis and Cognitive Methodology in Training Process
31 pages
Oluwaseyi Project
No ratings yet
Oluwaseyi Project
95 pages
Mba 1 Sem Quantitative Methods 2017
No ratings yet
Mba 1 Sem Quantitative Methods 2017
1 page
Biomedical Engineering Fundamentals Third Edition by Myer Kutz Ebook and TestBank Bundle Verified PDF
No ratings yet
Biomedical Engineering Fundamentals Third Edition by Myer Kutz Ebook and TestBank Bundle Verified PDF
398 pages

R Course

Uploaded by

R Course

Uploaded by

R Studio Course

To save in the memory of the program the data frame:

To obtain the structure of your data frame:

 str(name of the data frame)

To obtain a summary of your data frame:

 summary(name of the data frame)

To obtain a subset of your data fram:

To save your subset in your computer:

To see the variables your data frame has:

To see only a variable from your data frame:

 name of the data frame$name of the variable you want to see

To obtain the mean, sd, and the summary of your variable:

To see exactly for which subject some values correspond:

 which.min(name of the data frame$name of the variable)

To see the code allocated to a subject:

 name of thee data frame$name of the collumn[value]

To view the outliers and see the number of outliers:

To view exactly the details of the outliers:

 outliers[c(“name of the collumn”, “name of the collumn”, “name of the collumn”)]

WHO_Europe = subset(WHO, Region == "Europe")

outliers = subset(WHO, GNI > 10000 & FertilityRate > 2.5)

outliers[c("Country", "GNI", "FertilityRate")]

tapply(WHO$Over60, WHO$Region, mean)

tapply(WHO$LiteracyRate, WHO$Region, min)

tapply(WHO$LiteracyRate, WHO$Region, min, na.rm=TRUE)

Setting and getting the working directory

 Use File > Change dir...

Reading data (Creating a dataframe)

Commands for dataframes

 mydata #shows the entire data set

 mean(x) #computes the mean of the variable x

 windows(record=TRUE) #records your work, including plots

Liner Regression Models

 regmodel=lm(y~x) #fit a regression model

 table(y) #get a table of the distribution of y

 t.test(y~x, var.equal=TRUE) #pooled t-test where x is a factor

Create a personalized histogram

# Create a histogram of the price of

# all the diamonds in the diamond data set.

ggplot(diamonds, aes(x = price)) +

geom_histogram(color = "black", fill = "DarkOrange", binwidth = 500) +

scale_x_continuous(labels = dollar, breaks = seq(0, 20000, 1000)) +

theme(axis.text.x = element_text(angle = 90)) +

# a) How many diamonds cost less than $500?

filter(price < 500) %>%

# Break out the histogram of diamond prices by cut.

# You should have five histograms in separate

ggplot(diamonds, aes(x = price)) +

geom_histogram(color = "black", fill = "DarkOrange", binwidth = 25) +

scale_x_continuous(labels = dollar, breaks = seq(0, 4000, 100)) +

theme(axis.text.x = element_text(angle = 90)) +

# a) Which cut has the highest priced diamond?

# by(diamonds$price, diamonds$cut, max)

# by(diamonds$price, diamonds$cut, min)

# by(diamonds$price, diamonds$cut, median)

You might also like