0% found this document useful (0 votes)

19 views29 pages

TEB2043 Introduction To Data Science: Descriptive Analytics & Visualization DR Shuhaida Mohamed Shuhidan JAN 2025

This document outlines an introductory course on Data Science focusing on Descriptive Analytics and Visualization. It covers key concepts such as descriptive statistics, including measures of central tendency and variability, as well as practical implementation using R for data analysis and visualization techniques. The course also introduces various visualization methods, including scatter plots, histograms, and bar charts, along with case studies for practical understanding.

Uploaded by

hafizmna04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views29 pages

TEB2043 Introduction To Data Science: Descriptive Analytics & Visualization DR Shuhaida Mohamed Shuhidan JAN 2025

Uploaded by

hafizmna04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

TEB2043

Introduction to Data Science

Descriptive Analytics & Visualization

Dr Shuhaida Mohamed Shuhidan

JAN 2025
Credit: Ts Dr Nurul Aida Osman
Learning Outcomes

At the end of this session, you will be able to:

• Explain about descriptive analytics
• Implement R code for descriptive statistics
• Implement R code for visualization

2
I. Descriptive Analytics
Descriptive Analytics

• The examination of data or content, usually manually performed, to answer the question “What
happened?” (or What is happening?), characterized by traditional business intelligence (BI) and
visualizations such as pie charts, bar charts, line graphs, tables, or generated narratives. (gartner.com)

• The interpretation of historical data to better understand changes that have occurred in a business.
Descriptive analytics describes the use of a range of historic data to draw comparisons. Most commonly
reported financial metrics are a product of descriptive analytics, for example, year-over-year pricing
changes, month-over-month sales growth, the number of users, or the total revenue per subscriber. These
measures all describe what has occurred in a business during a set period. (investopedia.com)

4
Descriptive Statistics

• Data are described in measurements e.g., measures of central tendency and measures of variability or
dispersion.
• Central tendency - represents the whole set of data by a single value. It gives us the location of central
points.
o Mean
o Mode
o Median
• Variability/dispersion - the spread of data or how well is our data is distributed.
o Range
o Variance
o Standard deviation

5
Descriptive Statistics

• Import iris data from library

data <- iris
• The use of head function can describe the data
head(data)

• Or tail function
tail(data)

6
Descriptive Statistics AL2 Think Pair Share
(Quiz #2 - Part of 2%)

• The str function is used to describe the structure of the data

str(data)

Quiz #2
Define Factor and discuss the meaning of “Factor w/ 3 levels” in the above figure.

7
Descriptive Statistics

• The minimum and maximum values from the data can be determined using min and max functions
respectively.
min(data$Sepal.Length) #this produces 4.3
max(data$Sepal.Length) #this produces 7.9
• Alternatively, minimum and maximum values can be determined by range function.
range(data$Sepal.Length)
• The above code produces 4.3 7.9 which we can extract any of the values using index.
range(data$Sepal.Length)[1] #this produces 4.3
range(data$Sepal.Length) [2] #this produces 7.9
• Similarly, the code above can be written as:
range_val <- range(data$Sepal.Length)
range_val[1]
range_val[2]
• Sometimes, we may want to determine the range value; we may do that using min and max functions:
the_range <- max(data$Sepal.Length)-min(data$Sepal.Length)
the_range #this produces 3.6
8
Descriptive Statistics AL3 In-Class Teams
(Quiz #2 - Part of 2%)

• The mean and median of data can be determined with the mean and median functions.
mean(data$Sepal.Length) #this produces 5.843333
median(data$Sepal.Length) #this produces 5.8

Quiz #2:
How about mode? Investigate about the mode calculation in R

9
Descriptive Statistics

• Standard deviation and variance can be determined with the sd and var functions respectively.
sd(data$Sepal.Length) #this produces 0.8280661
var(data$Sepal.Length) #this produces 0.6856935

• Variance (σ2) is the average of the squared differences from the Mean.

• Standard deviation (σ) is the square root of the Variance.

Normal distribution

Image source: mathisfun.com

10
Descriptive Statistics
• We have seen how summary function is used to generate useful descriptive statistics.

summary(data) summary(data$Sepal.Length)

• summary function can be further varied using by function as follows:

by(data, data$Species, summary)

11
Descriptive Statistics

• Data can be divided into quartiles.

• First quartile (lower quartile) → the value that cuts off the first 25% of the data when it is sorted in
ascending order.
• Third quartile (upper quartile) → the value that cuts off the first 75% when it is sorted in ascending order.
• Assume that a vector A contains 9 values:
A<-c(170.2, 181.5, 188.9, 163.9, 166.4, 163.7, 160.4, 175.8, 181.5)
• Using quantile function:
quantile(A)

1st Quartile 3rd Quartile n=9

1st Quartile=0.25*9
Round up 2.25=3
• Let’s check: 3rd value=163.9

sort(A)
n=9
3rd Quartile=0.75*9
Round up 6.75=7
7th value=181.5 12
Descriptive Statistics

• The quantile function can be used for specific quartile:

quantile(A,0.25)
quantile(A,0.75)
• As expected, other quartiles can also be determined using the quantile function:
quantile(A,0.4)
quantile(A,0.8)
*find out how R determines the results for the above code
• There is a specific function known as IQR that calculates the interquartile range (i.e., the difference between
the 3rd and the 1st quartiles):
IQR(A)

13
Descriptive Statistics

• Other commonly used descriptive statistics:

o Counting the number of rows
nrow(data)
nrow(data[‘Sepal.Length’])
o Counting the number of columns
ncol(data)
o Counting the number of NA
sum(is.na(data$Sepal.Length))
o Counting the number of negative values
sum(data$Sepal.Length<0)
o Counting the number of unique text-based values (non-numeric)
B<-c(rep("Yellow",2),rep("Red",3),rep("Yellow",3),rep("Black",3))
factor(B)
14
II. Visualization
Scatter Plot

• Other than iris, there are a number of other built-in datasets provided by R:
data()
• Import mtcars data:
data <- mtcars
• View the mtcars data:
data
• To plot a scatter plot, x and y values have to be defined, and plot function is used:
x<- -10:10
y<-x*x
plot(x,y,xlab='x',ylab='y',col='red')

16
Scatter Plot

• Let’s plot mpg (y) vs wt (x)

x<-data$wt
y<-data$mpg
plot(x,y,xlab='wt',ylab='mpg',col='green')

17
Histogram

hist(data$mpg,col=“green”)

18
Bar Chart
val=data$mpg
carnames=row.names(data)
barplot(val,ylab='mpg',main="Car - MPG",names.arg= carnames,
cex.names=0.6,las=2,col="blue")

19
More on Scatter Plot
my<-read.csv("C:/Users/nurulaida/OneDrive - Universiti Teknologi
PETRONAS/R/IDS/covid_my.csv")
my

x<-1:15
y<-my$Confirmed
plot(x,y,pch=16,col='blue',ylab='Confirmed case',main="Covid-19 Confirmed
Cases in Malaysia")
text(x,y,labels=my$State,pos=4,cex=0.5)

20
More on Bar Chart
val=my$Deaths
name_st=my$State
barplot(val,ylab='Deaths',main="Covid-19 Deaths in Malaysia",names.arg=
name_st, cex.names=0.6,las=2,col="orange")

21
Pie Chart

lbl=my$State
val2=my$Confirmed
pie(val2,lbl,cex=0.5)

22
3D Pie Chart
library(plotrix)
val2=my$Confirmed
lbl=my$State

pie3D(val2,
col = hcl.colors(length(val2), "Spectral"),
border = "white",
labels=lbl,labelcex=0.5)

23
Exploded 3D Pie Chart
library(plotrix)
val2=my$Confirmed
lbl=my$State

pie3D(val2,
col = hcl.colors(length(val2), "Spectral"),
border = "white",
labels=lbl,labelcex=0.5,explode=0.2)

24
Exploded 3D Pie Chart
val2=my[my$Confirmed>300000,]
val3=val2$Confirmed
lbl=paste(val2$State,val2$Confirmed,sep=",")

pie3D(val3,
col = hcl.colors(length(val3), "Spectral"),
border = "white",
labels=lbl,labelcex=0.5,explode=0.2)

25
Case Study 1 – Stacked Bar Chart AL5 Reflection
(Quiz #2 - Part of 2%)

val2=my[my$Confirmed>300000,]
tbl1=val2[c("Confirmed","Population")]
legendval=c("Confirmed","Population")
colors=c("green","orange")
row.names(tbl1)=val2$State
tbl2=t(tbl1)
options(scipen=999)
barplot(as.matrix(tbl2),col=colors,cex.names=0.8,las=2, cex.axis=0.8)
legend("topright", legendval, cex=0.8, fill=colors)

26
Case Study 2 – Geomap AL5 Reflection
(Quiz #2 - Part of 2%)

library(rworldmap) #to get a Malaysia map

library(tidyverse)
library(tidygeocoder)

mydat<-read.csv("C:/Users/nurulaida.osman/OneDrive - Universiti Teknologi

PETRONAS/R/IDS/covid_my.csv")
global <- map_data("world") #get map
ggplot() +
geom_polygon(data = global %>% filter(region == "Malaysia"), aes(x=long, y =
lat, group=group),
fill = "lightskyblue1") +
coord_fixed(1.3) +
geom_point(data = mydat, aes(x = Long, y = Lat),color="red") +
geom_text(
data = mydat,label=paste(mydat$State,mydat$Confirmed,sep=","), aes(x = Long,
y = Lat),
nudge_x = 0.25, nudge_y = 0.25,
color = "black", size=1.5
) +
theme_void()
27
Case Study 2 – Geomap AL5 Reflection
(Quiz #2 - Part of 2%)

28
Summary
You have learned…
✓ Descriptive analytics
✓ Descriptive statistics that are important for descriptive analytics (and data preparation)
✓ Visualization

More variations of visualization can be produced using…

➢ ggplot2
➢ plotly
➢ leaflet

Next…
❖ More on data cleaning
❖ Feature settings
29

Computer Statistics With R: 2. Exploratory Data Analysis (Descriptive Statistics)
No ratings yet
Computer Statistics With R: 2. Exploratory Data Analysis (Descriptive Statistics)
28 pages
Module 5-6
No ratings yet
Module 5-6
12 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
Unit 3
No ratings yet
Unit 3
11 pages
R Code
No ratings yet
R Code
9 pages
Chapter Five
No ratings yet
Chapter Five
48 pages
Unit3 R
No ratings yet
Unit3 R
30 pages
ch2 (Descriptive Statistics)
No ratings yet
ch2 (Descriptive Statistics)
18 pages
R For Data Exploration
No ratings yet
R For Data Exploration
52 pages
R Topicscovered
No ratings yet
R Topicscovered
22 pages
Unit3 R
No ratings yet
Unit3 R
19 pages
03 UnderstandData
No ratings yet
03 UnderstandData
29 pages
Graph Plotting in R Programming
No ratings yet
Graph Plotting in R Programming
12 pages
Basics of Data Analysis and Graphics in
No ratings yet
Basics of Data Analysis and Graphics in
103 pages
R Programming: Descriptive Stats Guide
No ratings yet
R Programming: Descriptive Stats Guide
3 pages
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
No ratings yet
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
35 pages
R Programming for Students
No ratings yet
R Programming for Students
10 pages
Unit 4
No ratings yet
Unit 4
35 pages
Unit V Statistics R
No ratings yet
Unit V Statistics R
60 pages
UNIT 3 - Exploratory Graphs
No ratings yet
UNIT 3 - Exploratory Graphs
23 pages
DWDM - Lab Manual1
No ratings yet
DWDM - Lab Manual1
40 pages
On Eda
No ratings yet
On Eda
60 pages
Exploratory Data Analysis - NOTES
No ratings yet
Exploratory Data Analysis - NOTES
31 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
R Data Visualization Techniques
No ratings yet
R Data Visualization Techniques
46 pages
Unit 1 Assignment SKELETON R spr18
No ratings yet
Unit 1 Assignment SKELETON R spr18
23 pages
Week - 6-7
No ratings yet
Week - 6-7
9 pages
EFA in R
No ratings yet
EFA in R
32 pages
CS 459 Chapter 2
No ratings yet
CS 459 Chapter 2
84 pages
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
No ratings yet
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
28 pages
Stats Lab1
No ratings yet
Stats Lab1
11 pages
STAT 1000 - Worksheet 2
No ratings yet
STAT 1000 - Worksheet 2
14 pages
R Data Summarization Techniques
No ratings yet
R Data Summarization Techniques
25 pages
Lec 2
No ratings yet
Lec 2
26 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
Lecture 10 R
No ratings yet
Lecture 10 R
117 pages
Ma 3
No ratings yet
Ma 3
32 pages
Chapter 2 - Representing Sample Data: Graphical Displays
No ratings yet
Chapter 2 - Representing Sample Data: Graphical Displays
16 pages
Exploratory Data Analysis Guide
No ratings yet
Exploratory Data Analysis Guide
8 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Unit 2
No ratings yet
Unit 2
32 pages
Unit Iii (R)
No ratings yet
Unit Iii (R)
75 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
STAT 1000 - Worksheet 2
No ratings yet
STAT 1000 - Worksheet 2
14 pages
R Complete
No ratings yet
R Complete
24 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
4 pages
L1.2 Exploratory Data Analysis 2023
No ratings yet
L1.2 Exploratory Data Analysis 2023
49 pages
Data Description & Visualization
No ratings yet
Data Description & Visualization
14 pages
Annotated 3 Ch3 Data Description F2014
No ratings yet
Annotated 3 Ch3 Data Description F2014
16 pages
Notes: Section 1: Exploratory Data Analysis
No ratings yet
Notes: Section 1: Exploratory Data Analysis
6 pages
Module IV
No ratings yet
Module IV
43 pages
Exploratory Data Analysis and Visualization
No ratings yet
Exploratory Data Analysis and Visualization
10 pages
Descriptive Statistics Guide
No ratings yet
Descriptive Statistics Guide
9 pages
Ps 3
No ratings yet
Ps 3
3 pages
1 3 ST-explore
No ratings yet
1 3 ST-explore
55 pages
Business Analytics (MGT 555) Individual Assignment Assignment 2
No ratings yet
Business Analytics (MGT 555) Individual Assignment Assignment 2
5 pages
Warnick Et Al., 1995
No ratings yet
Warnick Et Al., 1995
12 pages
Lecture Types of Distribution
No ratings yet
Lecture Types of Distribution
14 pages
Shruti Mishra 0901AI221061 Dip
No ratings yet
Shruti Mishra 0901AI221061 Dip
18 pages
An Illustrated Guide To The Poisson Regression Model - by Sachin Date - Towards Data Science
No ratings yet
An Illustrated Guide To The Poisson Regression Model - by Sachin Date - Towards Data Science
25 pages
Network Analysis Questions and Worksheet - July - 2022-Wednesdays and Fridays
No ratings yet
Network Analysis Questions and Worksheet - July - 2022-Wednesdays and Fridays
2 pages
A Modified ID3 Decision Tree Algorithm Based On Cumulative
100% (1)
A Modified ID3 Decision Tree Algorithm Based On Cumulative
19 pages
Chp9 PEM May 25
No ratings yet
Chp9 PEM May 25
16 pages
One-Way Repeated Measures ANOVA Guide
100% (1)
One-Way Repeated Measures ANOVA Guide
9 pages
Comp3314 4. Regression Classification
No ratings yet
Comp3314 4. Regression Classification
120 pages
(123doc) Quantitative Methods For Second Language Research Carsten Roever Aek Phakiti Routledge 2018 Scan
No ratings yet
(123doc) Quantitative Methods For Second Language Research Carsten Roever Aek Phakiti Routledge 2018 Scan
291 pages
Mathematical Expectation in Statistics
No ratings yet
Mathematical Expectation in Statistics
14 pages
Research Methods For Engineering
100% (1)
Research Methods For Engineering
89 pages
Data Case Analysis: Ly Anh Tuan-S3818425
No ratings yet
Data Case Analysis: Ly Anh Tuan-S3818425
7 pages
Noc18 mg42 Assignment7
No ratings yet
Noc18 mg42 Assignment7
4 pages
Math Activities for Students
No ratings yet
Math Activities for Students
1 page
Advantages & Disadvantages of Data Measures
100% (1)
Advantages & Disadvantages of Data Measures
5 pages
365 Data Science Axs
No ratings yet
365 Data Science Axs
103 pages
Research Methods & Data Analysis
No ratings yet
Research Methods & Data Analysis
53 pages
Notes On The Cram Er-Rao Inequality: Kimball Martin February 8, 2012
No ratings yet
Notes On The Cram Er-Rao Inequality: Kimball Martin February 8, 2012
6 pages
Stat 234 Chang. Section 02, 391255: Ben Jacobson March 6, 2012
No ratings yet
Stat 234 Chang. Section 02, 391255: Ben Jacobson March 6, 2012
4 pages
PS - Module 3 - ViRa
No ratings yet
PS - Module 3 - ViRa
104 pages
Operations Management Chapter 3
67% (3)
Operations Management Chapter 3
37 pages
Chapter 4test of Hypotheses
No ratings yet
Chapter 4test of Hypotheses
42 pages
Infra 4 Deterioration
No ratings yet
Infra 4 Deterioration
19 pages
Chapter 10, Part A Statistical Inferences About Means and Proportions With Two Populations
No ratings yet
Chapter 10, Part A Statistical Inferences About Means and Proportions With Two Populations
48 pages
(Ebook PDF) Business Statistics in Practice 3rd Canadian Edition Instant Download
100% (1)
(Ebook PDF) Business Statistics in Practice 3rd Canadian Edition Instant Download
54 pages
Application of Problem Based Learning Model Assisted by Augmented Reality Media To Improve Students' High Order Thinking Skills
No ratings yet
Application of Problem Based Learning Model Assisted by Augmented Reality Media To Improve Students' High Order Thinking Skills
12 pages
Engineering Data Analysis: Worksheet 2
No ratings yet
Engineering Data Analysis: Worksheet 2
2 pages
Statistical Treatment of Data
No ratings yet
Statistical Treatment of Data
3 pages

TEB2043 Introduction To Data Science: Descriptive Analytics & Visualization DR Shuhaida Mohamed Shuhidan JAN 2025

Uploaded by

TEB2043 Introduction To Data Science: Descriptive Analytics & Visualization DR Shuhaida Mohamed Shuhidan JAN 2025

Uploaded by

TEB2043

Introduction to Data Science

Dr Shuhaida Mohamed Shuhidan

At the end of this session, you will be able to:

• Import iris data from library

• The str function is used to describe the structure of the data

• Standard deviation (σ) is the square root of the Variance.

Image source: mathisfun.com

• summary function can be further varied using by function as follows:

by(data, data$Species, summary)

• Data can be divided into quartiles.

1st Quartile 3rd Quartile n=9

• The quantile function can be used for specific quartile:

• Other commonly used descriptive statistics:

• Let’s plot mpg (y) vs wt (x)

library(rworldmap) #to get a Malaysia map

mydat<-read.csv("C:/Users/nurulaida.osman/OneDrive - Universiti Teknologi

More variations of visualization can be produced using…

You might also like