0% found this document useful (0 votes)

306 views24 pages

Module 5: Data Presentation: Learning Outcome: Create Appropriate Tabular and Graphical Displays Using R

This module teaches how to present collected data using statistical tables and graphs in R. It discusses constructing frequency distribution tables to summarize categorical or numerical data from a single variable. This involves calculating the frequency, relative frequency, and percent frequency of each category or class to organize the raw data. An example uses purchase data of 50 soft drinks to demonstrate creating these tables manually and in R using packages like readr and pander.

Uploaded by

ABAGAEL CACHO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

306 views24 pages

Module 5: Data Presentation: Learning Outcome: Create Appropriate Tabular and Graphical Displays Using R

Uploaded by

ABAGAEL CACHO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

MODULE 5: DATA PRESENTATION

(For SEPTEMBER 18-21)

Learning Outcome: Create appropriate tabular and graphical displays using R

to present and summarize data in a meaningful manner.

In this module, you will learn how to construct statistical tables and graphs to present
collected data in a more meaningful and visual manner. Most of these can be done using
Microsoft Excel. However, we focus on the use of the R software in producing these graphs
or charts.

Data Presentation and Visualization

After the sampling and data collection process, what results is data in its raw format, which
is often difficult to understand as is. The next step would now be to summarize and organize
these using textual, tabular or graphical forms in order for the researcher or author to be
able to impart useful information to the readers. In preparing texts, tables or graphs, we
must always be mindful of what information the data are conveying, and what must be
done to include more useful information. Planning how the data will be presented is
essential before appropriately processing raw data.

Data Visualization is a term to describe the use of graphical displays to summarize and
present information about a data set. Data become more comprehensible and more
useful when they are organized and presented using graphs, frequency distribution tables,
charts, diagrams and the like to derive logical solutions and conclusions.

Summarizing Qualitative and Quantitative Data for a Single Variable

Data obtained from a single variable can be summarized and presented in many ways. A
frequency distribution table, a bar chart and a pie chart can be used to present
qualitative data. Quantitative data, on the other hand, can be summarized using a dot
plot, a stem-and-leaf display, a frequency distribution table, and a histogram. Let us look at
each these methods more closely.

FREQUENCY DISTRIBUTION TABLE

A frequency distribution is a table that shows how often each value (or set of values) of the
variable in question occurs in a data set. It is used to summarize categorical (qualitative) or
numerical (quantitative) data. Simply put, it is a tabular summary of data showing the
number or frequency of observations in each of several non-overlapping categories or
classes.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 53
The relative frequency of a class equals the fraction or proportion of the observations
belonging to a class or category.Thus, the relative frequency can be computed using

𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡𝑕𝑒 𝑐𝑙𝑎𝑠𝑠

𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

A relative frequency distribution gives a tabular summary of data showing the relative
frequency for each class. If the relative frequency multiplied by 100, we get the percent
frequency of a class.A percent frequency distribution summarizes the percent frequency of
the data for each class.

Example 1:
The raw data in the table below shows fifty soft drink purchases. Notice that there is not so
much information that we can get from the data in its current form so it is best to consider
other ways to present the data. Let us construct a frequency distribution table for the
sample.

The frequency distribution table for this data set can be constructed manually or by using
the PivotTable feature of Microsoft Excel. With some editing, the following are the
frequency, relative frequency and percent frequency tables generated:

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 54
Soft Drink Type Frequency Soft Drink Type Relative Frequency
Coke Classic 19 Coke Classic 0.38
Diet Coke 8 Diet Coke 0.16
Dr. Pepper 5 Dr. Pepper 0.10
Pepsi 13 Pepsi 0.26
Sprite 5 Sprite 0.10
Total 50 Total 1.00
Table 1. Frequency Distribution Table for Table 2. Relative Frequency Distribution Table
Soft Drink Purchases for Soft Drink Purchases

Soft Drink Type Percent Frequency

Coke Classic 38%
Diet Coke 16%
Dr. Pepper 10%
Pepsi 26%
Sprite 10%
Total 100%
Table 3. Percent Frequency Distribution Table for Soft Drink Purchases

Using RStudio, on the other hand, the task can be completed by running the following R
code in the Console window. We will use the “purchase.csv” file in our working directory.

R Script
# This is to show how to construct a Frequency Histogram for Qualitative Data

# Install necessary packages

install.packages(“readr”) # readr is a package used to read
rectangular data like 'csv' or 'tsv'
install.packages(“pander”) # pander provides a minimal and easy tool
for rendering R objects

# Load the installed packages in RStudio

library(readr)
library(pander)

# Import the file to RStudio

purchase <-read.csv("purchase.csv") # the csv data is assigned to the
object 'purchase'

# Get frequencies
data.freq =table(purchase) # table function performs
categorical tabulation of data
with the variable and its
frequency

# To name the table column

colnames(freq.dist)<-c("Frequency") # colnames sets the column names or
labels.

# RStudio output
pander(freq.dist)

Frequency
Coke Classic 19
Diet Coke 8
Dr. Pepper 5
Pepsi 13
Sprite 5

The same R code or script can also be written in the Source window or pane if you want to
keep a copy of the scripts you write in RStudio. First, we create a new R script file by
clicking on the File menu, then click on New File and select R Script. The same result can be
obtained by using the hot keys Ctrl+Shift+N.

Write the R code on the Source window. You should be able to have something similar to
Figure 11.

Save the R script file. R script files are named with an .R extension. Click on the save icon on
the Source window and browse to your set working directory. Name the file as purchase.R.

After saving the file, execute the script by highlighting all the lines on the Source window
and then clicking on the „Run‟ icon on the upper right part of the Source window. As an
alternative to the „Run‟ icon, you can press on the Ctrl+Enter keys to run the script. Take
note of this.

For the relative frequency table, we can run the following R script.

R Script
# R script for the relative frequency distribution table

data.relfreq<-data.freq/nrow(purchase) # nrow counts the total number of

rows of the purchase data.
relfreq.dist<-cbind(data.relfreq)
colnames(relfreq.dist) <-c("Relative Frequency")

# RStudio output
pander(relfreq.dist)

Relative Frequency
Coke Classic 0.38
Diet Coke 0.16
Dr. Pepper 0.1
Pepsi 0.26
Sprite 0.1

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 57
Note that since the dataset was already imported in RStudio from the previous R script,
there is no need to import the data again. Also, since the packages were already installed
and loaded from the previous R script, there is no need to repeat these commands.

Example 2:
A survey was taken in Aurora Avenue. In each of 20 homes, people were asked how many
cars were registered to their households. The results were recorded as follows:

1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0

Table 4 shows the frequency, relative frequency and percent frequency for the data in just
one table. Note that in practice, it is customary to only include one such type of
frequency.

Number of cars Frequency Relative Frequency Percent Frequency

0 4 0.20 20 %
1 6 0.30 30 %
2 5 0.25 25 %
3 3 0.15 15 %
4 2 0.10 10 %
Table 4. Frequency distribution table for the number of cars registered in each household

In this example, the frequency table constructed is for ungrouped data, which means that
the individual values do not lose their identity in the table.

Doing this in RStudio, let us consider a different approach by instead constructing a vector
representing the data values. Open a new R script file then enter and run following script.

R Script
# Create a vector for the given data.
cars<-c(1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0)

# Get the frequencies

data.freq<-table(cars)
data.relfreq<-data.freq/sum(data.freq)
data.pctfreq<-data.relfreq*100

# To combine necessary columns

freq.dist<-cbind(data.freq, data.relfreq, data.pctfreq)

# Naming the table columns

colnames(freq.dist) <-c("Frequency", "Relative Frequency", "Percent Frequency")

# RStudio output
pander(freq.dist)

Example 3:
Consider the following data set on the monthly rent ($) for a sample of 70 one-bedroom
apartments:
425 430 430 435 435 435 435 435 440 440 440 440 440 445 445
445 445 445 450 450 450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480 480 485 490 490 490
500 500 500 500 510 510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

A frequency table with 8 class intervals for this sample is shown below. In this case, the
values are grouped together in each class, and the individual values are no longer visible.

Rent (in $) Frequency

425-449 18
450-474 16
475-499 11
500-524 7
525-549 5
550-574 3
575-599 4
600-624 6
Total 70
Table 5. Frequency table for monthly rents of 70 one-bedroom apartments

To create the Grouped Frequency Distribution Table using R, we consider the following R
script and we make use of the rent.csv file in our data repository or working directory.

# Import the data into RStudio and assign it to "data"

data <-read.csv("rent.csv")

# To view the data in RStudio

View(data)

# To Define the Class Intervals

breaks <-seq(425, 625, by =25) # Creates class intervals each with
class width equal to 25.

# To Assign each observation to its class interval

classint<-cut(data$Rent, breaks, right =FALSE) # data$Rent calls the
variable "Rent" of the
data frame into the script

# To obtain the frequency of data in each class interval

freq<-table(classint)

# To combine necessary columns

freq.dist<-cbind(freq)

# To name or label the columns

colnames(freq.dist) <-c("Frequency")

# RStudio output
pander(freq.dist)

Frequency
[425,450) 18
[450,475) 16
[475,500) 11
[500,525) 7
[525,550) 5
[550,575) 3
[575,600) 4
[600,625) 6

In the output, a bracket on the left endpoint means that the value is included in the class
interval, while a parenthesis in the right endpoint means the value is not included in the
interval. For example, [525, 550) indicates the class interval 525-549.

A bar graph is a chart used to display qualitative data summarized in a frequency, relative
frequency, or percent frequency distribution.

For a vertical bar chart, the horizontal (x) axis represents the categories; the vertical (y) axis
represents a value (frequency, relative frequency, or percent frequency) for those
categories. In the graph below, the values are frequencies.

The figure below shows the bar chart of the data on softdrink purchases of Example 1.

R Script
To construct the bar chart using RStudio, we use the ggplot function. Using the
“purchase.csv” data, open a new R script file, enter and run the following script:

# Install the tidyverse and forcats Packages in RStudio

install.packages(“tidyverse”)
install.packages(“forcats”)

# Load the Packages into RStudio

library(readr)
library(tidyverse)
library(forcats)

# Import the “purchase.csv” file and assign it to “purchase”

purchase <-read.csv("purchase.csv")
View(purchase) # Presents the data on a different tab

# To generate the bar chart

bar1<-ggplot(purchase, aes(x=Purchase))+geom_bar(width=.5)+ggtitle("Soft Drink
Purchases")
bar1

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 61
# To order the bars by decreasing frequency
bar2 <- ggplot(mutate(purchase, Purchase =fct_infreq(Purchase)))
+geom_bar(aes(x = Purchase))
bar2

Just a note, you may not assign the bar graphs into the objects bar1 and bar2. Removing
these assignments in the script would generate the bar charts right away. Also, the bars will
be shown in the plots window of RStudio where you have the options to “Save as Image”,
“Save as PDF”, or “Copy to Clipboard” once you click of the “Export” icon on the Plots
window.

PIE CHART

A pie chart (also called a pie graph or circle graph) provides another graphical device for
presenting relative frequency and percent frequency distributions for qualitative data. The

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 62
numerical values shown for each sector can be frequencies, relative frequencies, or
percent frequencies, which subdivides the circles into sectors.

A pie chart makes use of sectors (slices) in a circle. The angle of a sector is proportional to
the frequency of each of the categories of the variable that defines the data. The formula
to determine the angle of a sector in a circle graph is:

𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
𝐴𝑛𝑔𝑙𝑒 𝑜𝑓 𝑠𝑒𝑐𝑡𝑜𝑟 = × 360𝑜
𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦

The figure below shows the pie chart of the data on softdrink purchases of Example 1
generated using Microsoft Excel.

R Script

Suppose we start with the raw data, the following is the script in creating a simple pie chart
in RStudio. We use the “purchase.csv” file for the same example.

# Load Packages into RStudio

library(readr)

# Import “purchase.csv” file into RStudio and assign it to “purchase”

purchase <-read.csv("purchase.csv")

# Determine the Frequencies

data.freq<-table(purchase) # Determines the Frequencies
data.freq # Presents the Frequencies
purchase
Coke Classic Diet Coke Dr. Pepper Pepsi Sprite
19 8 5 13 5
# Construct the vector of Frequencies
freq<-c(19, 8, 5, 13, 5)

# Calculate the Percentages

percents<-round(freq/sum(freq)*100, 1) # round-off values to 1 decimal place.
lbls<-paste(lbls, percents) # add percents to labels
lbls<-paste(lbls, "%", sep =" ") # adds % sign to labels

# Construct the pie chart with percentages

piechart<-pie(freq, labels =lbls, col=rainbow(length(lbls)), main="Pie Chart
of Soft Drink Purchases" )
piechart

DOT PLOT

A dot plot is a graphical display of data using dots. It is similar to a bar graph because the
height of each “bar” of dots is equal to the number of items in a particular category. To
draw a dot plot, count the number of data points falling in each category and draw a
stack of dots that number high for each category. A dot plot can be used as a graphical
display of the frequency of qualitative and quantitative (ungrouped) data.

The figure that follows shows the dot plot for the data of Example 2 on the number of cars
registered to each household:

1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0

Number of cars Frequency

0 4
1 6
2 5
3 3
4 2
Table 4. Frequency distribution table for the
number of cars registered in each household

Here we present two ways by which a dot plot is constructed. First is by importing a .csv
data file from MS Excel, which is very useful especially if we have a large data set, and the
other way is by constructing the data vector in the RStudio environment. This is applicable if
we would be dealing with a small set of data. The following are the scripts. For the first
method, we use the “cars.csv” data from our directory.

# Install the ggplot package in RStudio

install.packages(“ggplot2”)

# Load necessary packages in RStudio

library(readr)
library(ggplot2)

# Import the “cars.csv” data and assign it to “data”

data <-read.csv("cars.csv")

# Generate the dotplot

ggplot(data, aes(cars))+geom_dotplot(binwidth=0.5)

# Create the vector given the data (for a small data set)
cars <-c(1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0)
ID <-1:20# Generates a sequence of integers from 1 to 20.
data<-data.frame(ID, cars)
str(data)
'data.frame': 20 obs. of 2 variables:# The data frame with 2 variables
$ ID :int 1 2 3 4 5 6 7 8 9 10 ...
$ cars: num 1 2 1 0 3 4 0 1 1 1 ...
ggplot(data, aes(cars)) +geom_dotplot(binwidth=0.3)

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 65
Notice the difference in dot sizes with different binwidths. You can further explore RStudio
functionality by varying the values of “arguments” in the syntax.

STEM-AND-LEAF PLOT

A stem-and-leaf plot is a graphical display for quantitative data that shows both the rank
order and shape of a data set. It is particularly useful when data are not too numerous.
Stem-and-leaf plots are a method for showing the frequency with which certain classes of
values occur.

Example 1:
The following illustration and steps are taken from the website:
https://study.com/academy/lesson/how-to-make-a-stem-and-leaf-plot.html
The process will be easiest to follow with sample data, so let's pretend that a sports
statistician wants to make a stem-and-leaf plot for a recent game played by the Blues
basketball team. The total minutes played by each team member has been recorded and
shown below:
Blues Member Name Minutes Played
Gifford 22
Slavky 29
Harrison 22
Samon 31
Mantry 20
Lewing 12
Wilson 14
Larriby 24
Paston 13
Lebling 4

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 66
Waster 2
Canno 1
Step 1: Determine the smallest and largest number in the data.
Looking at the stats, we see the number of minutes played ranges from a low of 1
minute to a high of 31 minutes.

Step 2: Identify the stems.

For any number, the digit/s to the left of the right-most digit is a stem. For example,
the number 31 has a stem of 3, while the number 29 has a stem of 2. A one-digit
number like 4 has a stem of 0. Think ''04'' for 4.Based on the range of 1 to 31, we
need stems of 0, 1, 2 and 3.

Step 3: Draw a vertical line and list the stem numbers to the left of the line.
0|
1|
2|
3|

Step 4: Fill in the leaves.

The first data value is for Gifford who played 22 minutes. The stem is on the left. The
leaf is on the right.
0|
1|
2|2
3|
Let's enter Lebling's 4 minutes. The stem is 0 and the leaf is 4.
0|4
1|
2|2
3|
Entering the rest of the data:
0|4 2 1
1|2 4 3
2|2 9 2 0 4
3|1

Step 5: Sort the leaf data.

The stem-and-leaf plot is easier to interpret when each row's leaves are sorted from
low to high.
0|1 2 4
1|2 3 4

The place value of the leaf is called the leaf unit. In the example above, the leaf unit is 1.
Other leaf units may be 100, 10, 0.1, and so on. If the leaf unit is not 1, it should be displayed
in the stem-and-leaf plot.

R Script

For the same example, the stem and leaf plot can be generated in RStudio by using the
stem() function. The script is very short. Try this out in RStudio.

# Create data vector and stem and leaf plot

data <-c(22, 29, 22, 31, 20, 12, 14, 24, 13, 4, 2, 1)
stem(data)
The decimal point is 1 digit(s) to the right of the |

0 | 124
1 | 234
2 | 02249
3 | 1

Example 2:
The stem-and-leaf plot for the data set
8.6 11.7 9.4 9.1 10.2 11.0 8.8
with leaf unit 0.1 is given by

This means that in reading the data from the stem-and-leaf plot, the stems are digits in the
units place while the leaves are the digits in tenths place (first decimal place).

R Script

# Create data vector and stem and leaf plot

data<-c(8.6, 11.7, 9.4, 9.1, 10.2, 11.0, 8.8)
stem(data)
The decimal point is at the |

Example 3:
Let us now consider a data frame for this example. In MS Excel, open the data file
“inflation.csv”. The data shows the Inflation rate (in %) of countries in Asia and the Pacific.
Upon inspection of the variables, you would notice that there is only one quantitative
variable which is the inflation rate, labeled “Inflation”. We now create a stem-and-leaf
display for this variable.

R Script

# Load the readr package in RStudio

library(readr)

# Import "inflation.csv" data and assign it to "data"

data <-read.csv("inflation.csv")

#Inspect the data frame

head(data)
Regional.Member Year Inflation SubregionCountry.Code
1 Afghanistan 2013 7.4 South Asia AFG
2 Armenia 2013 5.8 Central Asia ARM
3 Azerbaijan 2013 2.4 Central Asia AZE
4 Bangladesh 2013 6.8 South Asia BGD
5 Bhutan 2013 8.8 South Asia BTN
6 Brunei Darussalam 2013 0.4 Southeast Asia BRN
# Select only the "Inflation" variable for the plot
# Assign filtered or isolated "Inflation" variable to "inf"
inf<-data$Inflation# The use of the “$” sign selects only the "Inflation"
variable from the data.

# Create the stem and leaf plot

stem(inf)

The decimal point is at the |

-0 | 552
0 | 4588349
2 | 0112444668990778
4 | 1239047889
6 | 446688934
8 | 894499
10 | 7

A histogram is a graphical portrayal of the frequency distribution of grouped data. It

divides the data set into class intervals and gives the frequency for each class. Histograms
are particularly useful for summarizing large sets of data.

The histogram corresponding to the frequency distribution table for the data on monthly
rent ($) for a sample of 70 one-bedroom apartments in Example 3 is shown below:

Rent (in $) Frequency

425-449 18
450-474 16
475-499 11
500-524 7
525-549 5
550-574 3
575-599 4
600-624 6
Total 70
Table 5. Frequency distribution table
for monthly rents of 70 one-bedroom
apartments

R Script

To plot the histogram for the same example, again we use the “rent.csv” file.

# Load the readr package in RStudio

library(readr)

# Import the "rent.csv" file and assign it to "data"

data <-read.csv("rent.csv")

# Create the histogram

hist(data$Rent, breaks=seq(425, 625, by=25), main="Histogram of Rents",
xlab="Monthly Rent", ylab="Frequency", col="gray", border="yellow",right =FALSE,
ylim =c(0,20) )

Tabular and graphical displays for data obtained from two variables are helpful in
understanding the relationship between them, if any. In this section we will discuss
thecrosstabulation or contingency table and the scatter diagram.

CROSSTABULATION

A crosstabulation or contingency table is a tabular summary of data for two variables. The
variables can both be qualitative or both quantitative, or can be a combination of one
qualitative and one quantitative variable. If either variable is quantitative, classes must be
created for the values of the quantitative variable. The labels shown in the margins of the
table define the categories (classes) for the two variables.

Example:
For an example, we consider the “salaries.csv” file which contains data on professors of a
university, including rank, discipline being taught, years since PhD was obtained, years of
service in the university, sex, and annual salary ($). We construct a crosstabulation of the
rank and sex of the teachers. Using RStudio, we can generate the crosstabulation shown in
Table 6.

The following is the RStudio Script.

# Install the summarytools package in RStudio

install.packages(“summarytools”)

# Load the readr, summarytools, and pander package in RStudio

library(readr)
library(summarytools)
library(pander)

# Import "salaries.csv" and assign it to "data"

data <-read.csv("salaries.csv")
head(data)
X rank discipline yrs.since.phdyrs.service sex salary
1 1 Prof B 19 18 Male 139750
2 2 Prof B 20 16 Male 173200
3 3 AsstProf B 4 3 Male 79750
4 4 Prof B 45 39 Male 115000
5 5 Prof B 40 41 Male 141500
6 6 AssocProf B 6 6 Male 97000

# Crosstabulation of Rank and Sex

crosstab <-ctable(x=data$rank, y=data$sex, prop ="r")
pander(crosstab)

• cross_table:
Female Male Total
AssocProf 10 54 64
AsstProf 11 56 67
Prof 18 248 266
Total 39 358 397
Table 6. Crosstabulation of rank and sex of teachers.

• proportions:
Female Male Total
AssocProf 0.1562 0.8438 1
AsstProf 0.1642 0.8358 1
Prof 0.06767 0.9323 1
Total 0.09824 0.9018 1
Table 7. Proportions of crosstabulation of rank and sex of teachers

From the crosstabulation, we can see that majority of the teachers have a rank of
„Professor‟. There are relatively more males than females among all the ranks and teachers
who are male professors make up the largest group. This could not have been easily
observed by just looking at the raw data.

A scatter diagram or scatter plot is a graphical display of the relationship between two
quantitative variables. One variable (independent variable) is shown on the horizontal axis
and the other variable (dependent variable) is shown on the vertical axis. The general
pattern of the plotted points suggests the overall relationship between the variables. This
relationship will be discussed more in Modules 11 (Correlation and Regression).

Example:
Consider the advertising/sales relationship for a stereo and sound equipment store. On 10
occasions during the past three months, the store used weekend television commercials to
promote sales at its stores. The managers want to investigate whether a relationship exists
between the number of commercials shown and the sales at the store during the following
week. Sample data for the 10 weeks with sales in hundreds of dollars are shown in the
table. The figure that follows is a scatter diagram for the data.

Week Number of Commercials Sales ($100s)

1 2 50
2 5 57
3 1 41
4 3 54
5 4 54
6 1 38
7 5 63
8 3 48
9 4 59
10 2 46

Here we present two scripts in generating the scatter plot for the same problem. The
example data is contained in the “advertising.csv” data file.

# Load the readr package in RStudio

library(readr)

# Import the "advertising.csv" file and assign it to "data"

data <-read.csv("advertising.csv")

# Plot the chart

plot(x=data$Number.of.Commercials, y=data$Sales...100s.,
xlab="Number of Commercials", ylab="Sales in Hundred Dollars",
main="Number of Commercials vs Sales",
xlim=c(0, 6), ylim=c(0,70))

# To include a trend line in the plot

abline(lm(data$Sales...100s.~data$Number.of.Commercials))

# Load the readr package in RStudio

library(readr)

# Import the "advertising.csv" file and assign it to "data"

data <-read.csv("advertising.csv")

# Assign variables to simple object names

comm<-data$Number.of.Commercials # Isolates "Number of Commercials and
assigns it to "comm".
sales<-data$Sales...100s. # Isolates "Sales...100s" and assigns
it to "sales".

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 74
# Plot the chart
plot(x=comm, y=sales,
xlab="Number of Commercials", ylab="Sales in Hundred Dollars",
main="Number of Commercials vs Sales",
xlim=c(0, 6), ylim=c(0,70))

# To include a trend line in the plot

abline(lm(sales ~comm))

Learning Reinforcement Activity No. 5-1: DATA PRESENTATION

Accomplish by September 21, 2020

Use RStudio to construct the tabular and graphical displays required for each problem.
Submit a single .docx file containing the output of R for each problem and submit also the
saved RStudio script.

Please use the following convention for the filename: LRA5-1<LASTNAME>.docx [Example:
LRA5-1MIRANDA.docx] and for the R script, LRA5-1<LASTNAME>.R.

1. According to Kantar Media (March 13, 2020), the top four primetime television
shows in the Philippines were Ang Probinsyano (Prob), Make It With You (MIWY),
Prima Donnas (PD), and Descendants of the Sun Philippine Adaptation (DS). Data
indicating the preferred shows for a sample of 50 viewers follow. (15 points)

Prob PD MIWY MIWY Prob MIWY DS PD Prob Prob

MIWY Prob Prob MIWY PD Prob PD Prob MIWY PD
PD PD DS PD Prob DS Prob Prob PD Prob
Prob MIWY Prob DS PD Prob Prob PD Prob DS
DS Prob DS Prob DS MIWY MIWY Prob MIWY PD

a. Construct a frequency, relative frequency, and percent frequency distribution

for the sample.
b. Construct a bar chart and a pie chart for the sample.
c. On the basis of the sample, which television show has the largest viewing
audience? Which is second?

2. The data below shows the time in days required to complete year-end audits for a
sample of 20 clients of Sanderson and Clifford, a small public accounting firm.
Construct a dot plot for the sample. (5 points)

Year-end Audit Time (in days)

12 20 14 15 21 18 22 18 17 13
15 22 14 27 18 19 33 16 23 28

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 75
3. Use the file salaries.csv to construct a crosstabulation of the following pairs of
variables: (15 points)
a. Rank (row variable) vs. Discipline (column variable)
b. Rank (row variable) vs. Years of Service (column variable, grouped by 10s)
c. Rank (row variable) vs. Salary (column variable, grouped by $25000s)

Congratulations! You have just completed Module 5.

You are getting acquainted with the R software.
In the next module, we will start computing descriptive measures.

Chapt 2 Data Organizatiion and Presentaion
No ratings yet
Chapt 2 Data Organizatiion and Presentaion
67 pages
Chapter 2 Stat
No ratings yet
Chapter 2 Stat
61 pages
02-03 R Basic (Rev 02)
No ratings yet
02-03 R Basic (Rev 02)
103 pages
Lecture 05 - Descriptive Statistics 2
No ratings yet
Lecture 05 - Descriptive Statistics 2
29 pages
BRM CH 20
No ratings yet
BRM CH 20
20 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
23 pages
Tabular and Graphical Presentation Using Excel: 2.1 Summarizing Categorical Data
No ratings yet
Tabular and Graphical Presentation Using Excel: 2.1 Summarizing Categorical Data
15 pages
Session 2 B Tabular and Graphical Displays
No ratings yet
Session 2 B Tabular and Graphical Displays
72 pages
Chapter 2 Math
No ratings yet
Chapter 2 Math
19 pages
Graphing - Distributions
No ratings yet
Graphing - Distributions
25 pages
Data Visualization for Analysts
0% (1)
Data Visualization for Analysts
21 pages
R Data Visualization Techniques
No ratings yet
R Data Visualization Techniques
48 pages
Ch2 - Descriptive Statistics - Tabular and Graphical Presentations
100% (1)
Ch2 - Descriptive Statistics - Tabular and Graphical Presentations
47 pages
Introduction To R Charts Graphs AN 15 09 2024
No ratings yet
Introduction To R Charts Graphs AN 15 09 2024
8 pages
R Programming Unit-3
No ratings yet
R Programming Unit-3
76 pages
Week 02 Data Organizatiion and Presentaion
No ratings yet
Week 02 Data Organizatiion and Presentaion
51 pages
BRM CH 20
No ratings yet
BRM CH 20
20 pages
Module 5 - Data Presentation
No ratings yet
Module 5 - Data Presentation
28 pages
MGS2150 Lecture2 Notes I
No ratings yet
MGS2150 Lecture2 Notes I
47 pages
Business Stats for Quality Control
No ratings yet
Business Stats for Quality Control
67 pages
Reviewer in StatAna - Chapter 2
No ratings yet
Reviewer in StatAna - Chapter 2
5 pages
Sheet 2 Representing Data 1
No ratings yet
Sheet 2 Representing Data 1
13 pages
Chapter 4.data Management Lesson 1 2
100% (1)
Chapter 4.data Management Lesson 1 2
86 pages
Unit 01 Statistics
No ratings yet
Unit 01 Statistics
10 pages
Bafbana Module 5
No ratings yet
Bafbana Module 5
12 pages
Statictics and Measures of Central Tendency
80% (5)
Statictics and Measures of Central Tendency
46 pages
Chapter 2 Notes
No ratings yet
Chapter 2 Notes
3 pages
Graphical Presentation
No ratings yet
Graphical Presentation
6 pages
Business Data Visualization Guide
100% (1)
Business Data Visualization Guide
74 pages
Lecture 2 Frequency Distribution and Graphical Representation
No ratings yet
Lecture 2 Frequency Distribution and Graphical Representation
35 pages
Data Analysis-Univariate & Bivariate
50% (2)
Data Analysis-Univariate & Bivariate
9 pages
R Data Visualization
No ratings yet
R Data Visualization
79 pages
CH 02
No ratings yet
CH 02
38 pages
Data Patterns for MBA Students
No ratings yet
Data Patterns for MBA Students
16 pages
2024 MA6131 Notes Chapter 1 Exploring Data (Student) - 1
No ratings yet
2024 MA6131 Notes Chapter 1 Exploring Data (Student) - 1
19 pages
Buss. Math
No ratings yet
Buss. Math
22 pages
Sbe14ch02 PP
No ratings yet
Sbe14ch02 PP
30 pages
Script Syntax
No ratings yet
Script Syntax
10 pages
6175-Assignment 3 (Ways of Representation of Graphical Data)
67% (3)
6175-Assignment 3 (Ways of Representation of Graphical Data)
7 pages
DSA Midterm
No ratings yet
DSA Midterm
29 pages
DS1-Lec1 2
No ratings yet
DS1-Lec1 2
114 pages
R Module 4
No ratings yet
R Module 4
42 pages
Business Statistics: Data Visualization
No ratings yet
Business Statistics: Data Visualization
38 pages
Frequency Distrobution & Graphs
No ratings yet
Frequency Distrobution & Graphs
18 pages
Statistics Report New 1
No ratings yet
Statistics Report New 1
11 pages
Engineering Stats Essentials
No ratings yet
Engineering Stats Essentials
3 pages
R2. Data Visualisation
No ratings yet
R2. Data Visualisation
5 pages
Session-4-5-6-Statistics For Data Analytics-Dr - Girish - Bagale - IsAGx5vCqq
No ratings yet
Session-4-5-6-Statistics For Data Analytics-Dr - Girish - Bagale - IsAGx5vCqq
21 pages
Data Presentation
No ratings yet
Data Presentation
37 pages
Business Statistics For R: Name PRN
No ratings yet
Business Statistics For R: Name PRN
30 pages
DSCI Key Terms and Ideas For Review
No ratings yet
DSCI Key Terms and Ideas For Review
98 pages
Introduction To Statistical Programming - PPT Week 2 - Descriptive Statistics
No ratings yet
Introduction To Statistical Programming - PPT Week 2 - Descriptive Statistics
54 pages
Collecting Organising and Displaying Data
No ratings yet
Collecting Organising and Displaying Data
35 pages
Unit 2 Chapter 2 Notes - Statistics
No ratings yet
Unit 2 Chapter 2 Notes - Statistics
4 pages
Tabular and Graphical Presentation of Data1
100% (1)
Tabular and Graphical Presentation of Data1
7 pages
Descriptive Statistics - Table and Graphics
No ratings yet
Descriptive Statistics - Table and Graphics
53 pages
The Frequency Distribution Table
No ratings yet
The Frequency Distribution Table
49 pages
Prelim Assessment PDF
No ratings yet
Prelim Assessment PDF
2 pages
Unit 1: Measures of Central Tendency: Module 6: Descriptive Statistical Measures
No ratings yet
Unit 1: Measures of Central Tendency: Module 6: Descriptive Statistical Measures
10 pages
CMPC 312 Prelim Activity #1-Engagement Letter
No ratings yet
CMPC 312 Prelim Activity #1-Engagement Letter
2 pages
Unit 2: Methods of Data Collection
No ratings yet
Unit 2: Methods of Data Collection
6 pages
Auditing Theory - Overview of The Audit Process With Answers
88% (32)
Auditing Theory - Overview of The Audit Process With Answers
44 pages
Audit Theory Chapter 1, Answer Key
No ratings yet
Audit Theory Chapter 1, Answer Key
3 pages
Module 4 PDF
No ratings yet
Module 4 PDF
13 pages
Understanding Variables & Data Types
No ratings yet
Understanding Variables & Data Types
5 pages
Module 1: Introduction To Statistics: Learning Outcomes
No ratings yet
Module 1: Introduction To Statistics: Learning Outcomes
6 pages
Proposals and Formal Reports: Fill The Gap With A Suitable Word or Phrase. Use Textbook As A Reference
No ratings yet
Proposals and Formal Reports: Fill The Gap With A Suitable Word or Phrase. Use Textbook As A Reference
4 pages
Detailed Lesson Plan - Statistics and Probability
67% (3)
Detailed Lesson Plan - Statistics and Probability
9 pages
Test Bank For Statistics For Business and Economics 7th Edition by Newbold
100% (4)
Test Bank For Statistics For Business and Economics 7th Edition by Newbold
54 pages
Aptitude Notes
No ratings yet
Aptitude Notes
607 pages
2 Sem File Bcom
No ratings yet
2 Sem File Bcom
22 pages
IELTS Writing Task 1 Guide
No ratings yet
IELTS Writing Task 1 Guide
39 pages
E DAB 05 Visualizations
No ratings yet
E DAB 05 Visualizations
25 pages
IELTS Task 1 Writing Guide
No ratings yet
IELTS Task 1 Writing Guide
20 pages
S2-Data Visualization 101 - How To Choose A Chart Type - by Sara A. Metwalli - Towards Data Science
No ratings yet
S2-Data Visualization 101 - How To Choose A Chart Type - by Sara A. Metwalli - Towards Data Science
17 pages
DLL Mathematics 6 q4 w6
No ratings yet
DLL Mathematics 6 q4 w6
7 pages
Planning A Task 1 IELTS Academic Writing Essay
No ratings yet
Planning A Task 1 IELTS Academic Writing Essay
12 pages
Storytelling With Data
100% (4)
Storytelling With Data
156 pages
g8 Graphic Aids
100% (1)
g8 Graphic Aids
45 pages
Fundamental Statistics For The Social and Behavioral Sciences 1st Edition by Tokunaga ISBN Test Bank
100% (64)
Fundamental Statistics For The Social and Behavioral Sciences 1st Edition by Tokunaga ISBN Test Bank
45 pages
Math7 Q3 W2 Day4
No ratings yet
Math7 Q3 W2 Day4
22 pages
Unit19 RESULTS INTERPRETATION AND DISCUSSION
No ratings yet
Unit19 RESULTS INTERPRETATION AND DISCUSSION
15 pages
Bus 3104.E1 Midtm Fall 16
No ratings yet
Bus 3104.E1 Midtm Fall 16
8 pages
Human and Social Biology SBA
No ratings yet
Human and Social Biology SBA
17 pages
Communication Aids and Strategies
No ratings yet
Communication Aids and Strategies
24 pages
Learning Task M3 LA1 Task 3
No ratings yet
Learning Task M3 LA1 Task 3
16 pages
Digital Skills Excel Charts
No ratings yet
Digital Skills Excel Charts
13 pages
Test Bank For Statistics For Management and Economics Abbreviated 10th Edition Keller 9781285869643 Instant Download
100% (7)
Test Bank For Statistics For Management and Economics Abbreviated 10th Edition Keller 9781285869643 Instant Download
125 pages
Exercise 3 - Week 6
No ratings yet
Exercise 3 - Week 6
3 pages
Case in Point Graph Analysis For Consulting and Case Interviews
No ratings yet
Case in Point Graph Analysis For Consulting and Case Interviews
79 pages
The Data Storytelling Handbook
100% (17)
The Data Storytelling Handbook
49 pages
Excel Training Modules Guide
No ratings yet
Excel Training Modules Guide
76 pages
Global Data Trends in Graphs & Charts
No ratings yet
Global Data Trends in Graphs & Charts
18 pages
Q. Define Data Explain Its Types With Suitable Example ?
No ratings yet
Q. Define Data Explain Its Types With Suitable Example ?
53 pages
Gathering Information and Summarizing Findings
100% (1)
Gathering Information and Summarizing Findings
20 pages
Data Visualization Techniques Guide
No ratings yet
Data Visualization Techniques Guide
15 pages

Module 5: Data Presentation: Learning Outcome: Create Appropriate Tabular and Graphical Displays Using R

Uploaded by

Module 5: Data Presentation: Learning Outcome: Create Appropriate Tabular and Graphical Displays Using R

Uploaded by

MODULE 5: DATA PRESENTATION

(For SEPTEMBER 18-21)

Learning Outcome: Create appropriate tabular and graphical displays using R

Data Presentation and Visualization

Summarizing Qualitative and Quantitative Data for a Single Variable

FREQUENCY DISTRIBUTION TABLE

𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡𝑕𝑒 𝑐𝑙𝑎𝑠𝑠

Soft Drink Type Percent Frequency

# Install necessary packages

# Load the installed packages in RStudio

# Import the file to RStudio

# To name the table column

data.relfreq<-data.freq/nrow(purchase) # nrow counts the total number of

Number of cars Frequency Relative Frequency Percent Frequency

# Get the frequencies

# To combine necessary columns

# Naming the table columns

Rent (in $) Frequency

# Import the data into RStudio and assign it to "data"

# To view the data in RStudio

# To Define the Class Intervals

# To Assign each observation to its class interval

# To obtain the frequency of data in each class interval

# To combine necessary columns

# To name or label the columns

# Install the tidyverse and forcats Packages in RStudio

# Load the Packages into RStudio

# Import the “purchase.csv” file and assign it to “purchase”

# To generate the bar chart

# Load Packages into RStudio

# Import “purchase.csv” file into RStudio and assign it to “purchase”

# Determine the Frequencies

# Calculate the Percentages

# Construct the pie chart with percentages

Number of cars Frequency

# Install the ggplot package in RStudio

# Load necessary packages in RStudio

# Import the “cars.csv” data and assign it to “data”

# Generate the dotplot

Step 2: Identify the stems.

Step 4: Fill in the leaves.

Step 5: Sort the leaf data.

# Create data vector and stem and leaf plot

# Create data vector and stem and leaf plot

# Load the readr package in RStudio

# Import "inflation.csv" data and assign it to "data"

#Inspect the data frame

# Create the stem and leaf plot

The decimal point is at the |

A histogram is a graphical portrayal of the frequency distribution of grouped data. It

Rent (in $) Frequency

# Load the readr package in RStudio

# Import the "rent.csv" file and assign it to "data"

# Create the histogram

The following is the RStudio Script.

# Install the summarytools package in RStudio

# Load the readr, summarytools, and pander package in RStudio

# Import "salaries.csv" and assign it to "data"

# Crosstabulation of Rank and Sex

Week Number of Commercials Sales ($100s)

# Load the readr package in RStudio

# Import the "advertising.csv" file and assign it to "data"

# Plot the chart

# To include a trend line in the plot

# Load the readr package in RStudio

# Import the "advertising.csv" file and assign it to "data"

# Assign variables to simple object names

# To include a trend line in the plot

Learning Reinforcement Activity No. 5-1: DATA PRESENTATION

Prob PD MIWY MIWY Prob MIWY DS PD Prob Prob

a. Construct a frequency, relative frequency, and percent frequency distribution

Year-end Audit Time (in days)

Congratulations! You have just completed Module 5.

You might also like