Data
visualization
Data visualization working with R graphics
Pie –chart:
In R the pie chart is created using the pie() function which takes positive
numbers as a vector input. The additional parameters are used to control
labels, color, title etc.
Syntax
pie(x, labels, radius, main, col, clockwise)
Parameter description:
x is a vector containing the numeric values used in the pie chart.
labels is used to give description to the slices.
radius indicates the radius of the circle of the pie chart.(value between −1
and +1).
main indicates the title of the chart.
col indicates the color palette.
clockwise is a logical value indicating if the slices are drawn clockwise or
anti clockwise.
# create pie chart:
x<-c(60,70,80,55)
labels<-c("product","sales","advertise","mark")
pie(x,labels)
# main fun() # of chart
pie(x,labels,main="Departments")
#colr function:
colour<-c("pink","blue","yellow","white")
#init.angle=90# start the first pie at 90 degrees
pie(x,init.angle=90,labels,main="Departments",col=colour)
Legend function:
To add a list of explanation for each pie, use the legend() function
# legend function:
legend("bottomright",c("product","sales","advertise","mark"),cex=0.7,
fill=colour)
Bar chart:
• A bar chart is a pictorial representation in which numerical values of
variables are represented by length or height of lines or rectangles of equal
width. A bar chart is used for summarizing a set of categorical data.
• R uses the function barplot() to create bar charts. R can draw both
vertical and Horizontal bars in the bar chart. In bar chart each of the bars
can be given different colors.
Syntax
barplot(H,xlab,ylab,main, names.arg,col)
Following is the description of the parameters :
H is a vector or matrix containing numeric values used in bar chart.
xlab is the label for x axis.
ylab is the label for y axis.
main is the title of the bar chart.
names.arg is a vector of names appearing under each bar.
col is used to give colors to the bars in the graph.
# create bar chart
a<-c(15,30,45,60) #
barplot(a)
The x variable represents values in the x-axis (A,B,C,D)
The y variable represents values in the y-axis (2,4,6,8)
Then we use the barplot() function to create a bar chart of the values
names.arg defines the names of each observation in the x-axis
a<-c(15,30,45,60) # y axis(name.arg)#Names to each bar
b<-c("A","B","C","D") # x axis
barplot(a,names.arg=b)
a<-c(15,30,45,60) # y axis(name.arg)#Names to each bar
b<-c("A","B","C","D") # x axis
barplot(a,names.arg=b)
# plotting the bar chart
barplot(a ,names.arg=b,xlab="letters", ylab="value",
col="blue",main="bar chart",border="orange")
# Create the input vectors.
colors = c("pink","white","blue")
months <- c("Mar","Apr","May","Jun","Jul")
regions <- c("one","two","three")
# Create the matrix of the values.
Values <- matrix(c(2,9,3,11,9,4,8,7,3,12,5,2,8,10,11),nrow = 3,ncol = 5,byrow = TRUE)
# Create the bar chart
barplot(Values, main ="bar chart",names.arg =
months,xlab="month",ylab="value",col=colors)
# Add the legend to the chart
legend("topleft",regions, cex=0.3, fill=colors)
Histogram:
• A histogram is a type of bar chart which shows the frequency of the number
of values which are compared with a set of values ranges. The histogram is
used for the distribution.
• whereas a bar chart is used for comparing different entities. In the
histogram, each bar represents the height of the number of values present
in the given range.
• R creates histogram using hist() function. This function takes a vector as an
input and uses some more parameters to plot histograms.
Syntax:
hist(v,main,xlab,xlim,ylim,breaks,col,border)
description of the parameters :
v is a vector containing numeric values used in histogram.
main indicates title of the chart.
col is used to set color of the bars.
border is used to set border color of each bar.
xlab is used to give description of x-axis.
xlim is used to specify the range of values on the x-axis.
ylim is used to specify the range of values on the y-axis.
breaks is used to mention the width of each bar.
# create Histogram:
a<-c(45,24,23,15,18,30,44,18,16,20)
hist(a,xlab="value",ylab="order")
#plot histogram:
hist(a,xlab="value",ylab=“order",main="histogram",col="orange",
border="green")
Range of X and Y values
To specify the range of values allowed in X axis and Y axis, we can use the
xlim and ylim parameters.
The width of each of the bar can be decided by using breaks.
#using xlim and y lim parameter:
hist(a,xlab="value",ylab="point",main="histogram",col="orange"
,border="green",xlim=c(0,40),ylim=c(0,5),breaks=3)
Example of Histogram:
A histogram is what we call an area diagram. It indicates the frequency of
a class interval. The class interval or the range of values is known as bins
or classes with reference to histograms. A bar indicates the number of
data points within a specific class. That means the higher the frequency of
a particular class, higher the bar.
Example of a Histogram.
From the below-given table of the various heights of trees in a region,
we will draw a histogram to illustrate how it is done. Let us look at the
frequency table now.
Height of Trees (ft) No. of trees
60-65 3
65-70 3
70-75 8
75-80 10
80-85 5
85-90 2
here the heights of the tree are continuous data. These class intervals are
the bins. And the number of trees are obviously the frequency.
Histogram:
Histograms vs Bar Charts
In bar graphs, each bar represents one value or category. On the other
hand in a histogram, each bar will represent a continuous data
In a bar graph, the x-axis need not always be a numerical value. It can also
be a category. However, in a histogram, the X-axis is always quantitative
data and it is continuous data.
Due to the above factor, a histogram can be observed for a pattern or
tendency of data to fall in more on the low end or high end etc. Same
cannot be done for a bar chart
Line chart:
A line chart is a graph that connects a series of points by drawing line
segments between them. These points are ordered in one of their
coordinate (usually the x-coordinate) value. Line charts are usually used in
identifying the trends in data.
The plot() function in R is used to create the line graph.
syntax :
plot(v,type,col,xlab,ylab)
Description of the parameters
v is a vector containing the numeric values.
type takes the value "p" to draw only the points, "l" to draw only the lines
and "o" to draw both points and lines.
xlab is the label for x axis.
ylab is the label for y axis.
main is the Title of the chart.
col is used to give colors to both the points and lines.
Create line chart:
#Create line chart:
x<-c(12,14,16,18,15,29,23,34)
plot(x,type=“o")
Different type of line charts:
x<-c(12,14,16,18,15,29,23,34)
plot(x,type="p")
plot(x,type="l")
plot(x,type="o")
plot(x,type="b")
plot(x,type="c")
plot(x,type="h")
plot(x,type="s")
plot(x,type="S")
plot(x,type="n")
#plotting the chart:
plot(x,type="o",xlab="points","ylab"="value",
col="pink",border="green",main="Line chart")
Scatter plot
A "scatter plot" is a type of plot used to display the relationship between two
numerical variables, and plots one dot for each observation.
Each point represents the values of two variables. One variable is chosen
in the horizontal axis and another in the vertical axis.
The simple scatter plot is created using the plot() function.
Syntax
plot(x, y, main, xlab, ylab, xlim, ylim, axes)
Description of the parameters :
x is the data set whose values are the horizontal coordinates.
y is the data set whose values are the vertical coordinates.
main is the tile of the graph.
xlab is the label in the horizontal axis.
ylab is the label in the vertical axis.
xlim is the limits of the values of x used for plotting.
ylim is the limits of the values of y used for plotting.
axes indicates whether both axes should be drawn on the plot.
Loading inbuilt dataset:
data=fread("C://Users/Admin/Downloads/archive/cardio.csv")
data
file<-data [,c("Age","Usage")]
head(file)
plot(x=file$Age,y=file$Usage,main="Scatter plot",xlab="Age",
ylab="Usage",colr="pink")
(or)
# creating dataset for scatterplot:
data <-data.frame(weight=c(3,5,4,2,2,5),
milegae=c(15,30,45,60,75,80))
data
Output:
weight milegae
1 3 15
2 5 30
3 4 45
4 2 60
5 2 75
6 5 80
#plotting the dataset:
plot(data,xlab="mileage",ylab="weight",main="scatterplot" ,col="red",)
The different points
symbols commonly used in R
pch = 0,square
pch = 1,circle
pch = 2,triangle point up
pch = 3,plus
pch = 4,cross
pch = 5,diamond
pch = 6,triangle point down
pch = 7,square cross
pch = 8,star
pch = 9,diamond plus
pch = 10,circle plus
pch = 11,triangles up and down
pch = 12,square plus
pch = 13,circle cross
pch = 14,square and triangle down
pch = 15, filled square
pch = 16, filled circle
pch = 17, filled triangle point-up
pch = 18, filled diamond
pch = 19, solid circle
pch = 20,bullet (smaller circle)
pch = 21, filled circle blue
pch = 22, filled square blue
pch = 23, filled diamond blue
pch = 24, filled triangle point-up blue
pch = 25, filled triangle point down blue
#pch=2
plot(data,xlab="mileage",ylab="weight",main="scatterplot"
,col="red",pch=2)
#plot(data,xlab="mileage",ylab="weight",main="scatterplot"
,col="red",pch=18)
#limits apply x and y axis:
plot(data,xlab="mileage",ylab="weight",main="scatterplot"
,col="red",xlim=c(3,5),ylim=c(30,60))
Scatterplot Matrices
When we have more than two variables and we want to find the
correlation between one variable versus the remaining ones we use
scatterplot matrix.
We use pairs() function to create matrices of scatterplots.
Syntax
pairs(formula, data)
formula represents the series of variables used in pairs.
data represents the data set from which the variables will be taken.
data <-data.frame(weight=c(3,5,4,2,2,5),
milegae=c(15,30,45,60,75,80),
cyl=c(12,14,8,23,45,60),
km=c(20,30,45,35,40,48))
data
Output:
weight milegae cyl km
1 3 15 12 20
2 5 30 14 30
3 4 45 8 45
4 2 60 23 35
5 2 75 45 40
6 5 80 60 48
>
# pair of variables in scatter plot
pairs(~weight+mileage+cyl+km, data=input)
#making line graph using data set :
plot(input$cyl,input$km,type="l",xlab="cycle",ylab="kilometer",
main="Graph", col="blue")
# making bar chart using data set
x=input$cyl
y=input$km
barplot(x, names.arg=y,xlab="first“, ylab="second",
col="red",border="green",main="barplot")
ggplot2 package:
R allows us to create graphics declaratively. R provides the ggplot package
for this purpose. This package is famous for its elegant and quality graphs
which sets it apart from other visualization packages.
always start by calling the ggplot() function.
then specify the data object. It has to be a data frame. And it needs one
numeric and one categorical variable.
then come these aesthetics, set in the aes() function: set the categoric
variable for the X axis, use the numeric for the Y axis
Installation:
Install.packages(“<package-name>”)
Install.packages(“ggplot2”)
library(“ggplot2”)
qplot is a function which is used to create a ggplot2 graph:
# create bar graph using ggplot2:
#qplot is a function from ggplot2 library
#ggplot2 :
qplot( x=input$cyl, names.arg=input$km,
geom="bar",
xlab="vehicle",
ylab="distance",
col="green",
main="ggplot graph")
# Histogram using ggplot2:
#ggplot2 Histogram:
qplot(input$mileage,geom="bar",xlab="vehicle",ylab="distance",
fill="red",main="ggplot graph")
The Jupyter Notebook is an open-source web application that allows you
to create and share documents that contain live code, equations,
visualizations and narrative text.
Uses include data cleaning and transformation, numerical simulation,
statistical modeling, data visualization, machine learning, and much more.
Matplotlib is one of the most popular Python packages used for data
visualization. It is a cross-platform library for making 2D plots from data in
arrays.
Matplotlib
Matplotlib is a low-level library of Python which is used for data
visualization. It is easy to use and emulates MATLAB like graphs and
visualization.
This library is built on the top of NumPy arrays and consist of
several plots like line chart, bar chart, histogram, etc. It provides a lot of
flexibility but at the cost of writing more code.
We will use the pip command to install this module.
you can start plotting with the help of the plot() function.When you’re
done, remember to show your plot using the show() function. Matplotlib
is written in Python and makes use of NumPy, the numerical
mathematics extension of Python.
It consists of several plots :
Line
Bar
Scatter
Histogram
And many more
Installation
Install Matplotlib with pip Matplotlib can also be installed using the
Python package manager, pip. To install Matplotlib with pip, open a
terminal window and type:
pip install matplotlib
# importing matplotlib module
from matplotlib import pyplot as plt
Pyplot
Pyplot is a Matplotlib module which provides a MATLAB-like interface.
Matplotlib is designed to be as usable as MATLAB, with the ability to use
Each pyplot function makes some change to a figure: e.g., creates a figure,
creates a plotting area in a figure, plots some lines in a plotting area,
decorates the plot with labels, etc.
Creating plot using matplotlib:
import matplotlib.pyplot as plt
x=[12,15,18,20,23]
y=[21,24,26,28,30]
plt.plot(x,y)
plt.show
Adding Title
The title() method in matplotlib module is used to specify the title of the
visualization
import matplotlib.pyplot as plt
x=[12,15,18,20,23]
y=[21,24,26,28,30]
plt.plot(x,y)
plt.title("graph")
plt.show
Adding fontsize ,color,labels:
#plt.title("graph", fontsize=50,color="red")
Plt.xlabel(“x axis”)
Plt.ylabel(“y axis”)
# setting label name in x-axis
#legend
plt.ylim(24,28)
plt.xticks(x,labels=["a","b","c","d","e"])
plt.legend(["ABC"])
#grid()
Plt.grid(axis=‘x’)
Plt.grid(axis=‘y’)
Creating a bar plot
The matplotlib API in Python provides the bar() function which can be
used in MATLAB style use or as an object-oriented API. The syntax of the
bar() function to be used with the axes is as follows:-
plt.bar(x, height, width, bottom, align)
import matplotlib.pyplot as plt
import numpy as np
x=np.array(["DT","DV","CLOUD","PYTHON"])
y=np.array([12,14,16,18])
fig=plt.figure(figsize=(8,4))
plt.bar(x,y,width=0.5,color="pink")
plt.xlabel("Subject")
plt.ylabel("Duration")
plt.title("bar chart")
plt.show
Histogram:
To create a histogram the first step is to create bin of the ranges, then
distribute the whole range of the values into a series of intervals, and count
the values which fall into each of the intervals.
The following table shows the parameters accepted by matplotlib.pyplot.hist()
function :
Customization that is available for the Histogram –
bins: Number of equal-width bins
color: For changing the face color
edgecolor: Color of the edges
linestyle: For the edgelines
alpha: blending value, between 0 (transparent) and 1 (opaque)
Example:
# csv file
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("C://Users/Admin/Downloads/archive/cardio.csv")
Data
# create histogram using matplotlib:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("C://Users/Admin/Downloads/archive/cardio.csv")
x=data["Age"]
fig=plt.figure(figsize=(10,5))
plt.hist(x,width=0.5,)
plt.xlabel("order")
plt.ylabel("frequency")
plt.title("Histogram")
plt.show
Thank
You!