0% found this document useful (0 votes)
67 views4 pages

"A" "B" "B" "C" 'D': Ggplot2 Geom - Violin

A violin plot displays the distribution of a numeric variable for different groups or categories. It can be created in R using ggplot2's geom_violin() function. The document provides examples of basic violin plots, flipping the axes to create a horizontal violin plot, adding boxplots, and creating grouped violin plots to show subgroups within each category.

Uploaded by

Luis Emilio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views4 pages

"A" "B" "B" "C" 'D': Ggplot2 Geom - Violin

A violin plot displays the distribution of a numeric variable for different groups or categories. It can be created in R using ggplot2's geom_violin() function. The document provides examples of basic violin plots, flipping the axes to create a horizontal violin plot, adding boxplots, and creating grouped violin plots to show subgroups within each category.

Uploaded by

Luis Emilio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Basic violin plot

Building a violin plot with ggplot2 is pretty straightforward thanks to the dedicated geom_violin() function.


# Library
library(ggplot2)

# create a dataset
data <- data.frame(
name=c( rep("A",500), rep("B",500), rep("B",500), rep("C",20), rep('D', 100) ),
value=c( rnorm(500, 10, 5), rnorm(500, 13, 1), rnorm(500, 18, 1), rnorm(20, 25, 4), rnorm(100, 12, 1)))

# Most basic violin chart


# fill=name allow to automatically dedicate a color for each group
p <- ggplot(data, aes(x=name, y=value, fill=name)) + geom_violin()

#p

From wide format


In this case we need to reformat the input. This is possible thanks to the gather() function of the tidyr library that is
part of the tidyverse.
# Let's use the iris dataset as an example:
data_wide <- iris[ , 1:4]
Sepal.Lengt
h Sepal.Width Petal.Length Petal.Width
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
library(tidyr)
library(ggplot2)
library(dplyr)
data_wide %>%
gather(key="MesureType", value="Val") %>%
ggplot( aes(x=MesureType, y=Val, fill=MesureType)) + geom_violin()
Building a violin plot with ggplot2 is pretty straightforward thanks to the
dedicated geom_violin() function. Here, calling coord_flip() allows to flip X and Y axis and
thus get a horizontal version of the chart. Moreover, note the use of the theme_ipsum of
the hrbrthemes library that improves general appearance.

# Libraries
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
library(hrbrthemes)
library(viridis)

# Load dataset from github


data <-
read.table("https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv",
header=TRUE, sep=",")

# Data is at wide format, we need to make it 'tidy' or 'long'


data <- data %>%
gather(key="text", value="value") %>%
mutate(text = gsub("\\.", " ",text)) %>%
mutate(value = round(as.numeric(value),0)) %>%
filter(text %in% c("Almost Certainly","Very Good Chance","We Believe","Likely","About
Even", "Little Chance", "Chances Are Slight", "Almost No Chance"))

# Plot
p <- data %>%
mutate(text = fct_reorder(text, value)) %>% # Reorder data
ggplot( aes(x=text, y=value, fill=text, color=text)) +
geom_violin(width=2.1, size=0.2) +
scale_fill_viridis(discrete=TRUE) +
scale_color_viridis(discrete=TRUE) +
theme_ipsum() +
theme(
legend.position="none"
) +
coord_flip() + # This switch X and Y axis and allows to get the horizontal version
xlab("") +
ylab("Assigned Probability (%)")

Building a violin plot with ggplot2 is pretty straightforward thanks to the


dedicated geom_violin() function. It is possible to use geom_boxplot() with a small width in
addition to display a boxplot that provides summary statistics.
Moreover, note a small trick that allows to provide sample size of each group on the X axis: a
new column called myaxis is created and is then used for the X axis.

# Libraries
library(ggplot2)
library(dplyr)
library(hrbrthemes)
library(viridis)

# create a dataset
data <- data.frame(
name=c( rep("A",500), rep("B",500), rep("B",500), rep("C",20), rep('D', 100) ),
value=c( rnorm(500, 10, 5), rnorm(500, 13, 1), rnorm(500, 18, 1), rnorm(20, 25, 4),
rnorm(100, 12, 1) )
)

# sample size
sample_size = data %>% group_by(name) %>% summarize(num=n())

# Plot
data %>%
left_join(sample_size) %>%
mutate(myaxis = paste0(name, "\n", "n=", num)) %>%
ggplot( aes(x=myaxis, y=value, fill=name)) +
geom_violin(width=1.4) +
geom_boxplot(width=0.1, color="grey", alpha=0.2) +
scale_fill_viridis(discrete = TRUE) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("A Violin wrapping a boxplot") +
xlab("")

A grouped violin plot displays the distribution of a numeric variable for groups and subgroups.
Here, groups are days of the week, and subgroups are Males and Females. Ggplot2 allows this
kind of representation thanks to the position="dodge" option of the geom_violin() function.
Groups must be provided to x, subgroups must be provided to fill.

# Libraries
library(ggplot2)
library(dplyr)
library(forcats)
library(hrbrthemes)
library(viridis)
# Load dataset from github
data <-
read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/10_O
neNumSevCatSubgroupsSevObs.csv", header=T, sep=",") %>%
mutate(tip = round(tip/total_bill*100, 1))

# Grouped
data %>%
mutate(day = fct_reorder(day, tip)) %>%
mutate(day = factor(day, levels=c("Thur", "Fri", "Sat", "Sun"))) %>%
ggplot(aes(fill=sex, y=tip, x=day)) +
geom_violin(position="dodge", alpha=0.5, outlier.colour="transparent") +
scale_fill_viridis(discrete=T, name="") +
theme_ipsum() +
xlab("") +
ylab("Tip (%)") +
ylim(0,40)

You might also like