What is Descriptive Statistics?
Descriptive statistics refers to a branch of statistics that involves summarizing, organizing, and
presenting data meaningfully and concisely. It focuses on describing and analyzing a dataset's main
features and characteristics without making any generalizations or inferences to a larger population.
The primary goal of descriptive statistics is to provide a clear and concise summary of the data,
enabling researchers or analysts to gain insights and understand patterns, trends, and distributions
within the dataset. This summary typically includes measures such as central tendency (e.g., mean,
median, mode), dispersion (e.g., range, variance, standard deviation), and shape of the distribution
(e.g., skewness, kurtosis).
Descriptive statistics also involves a graphical representation of data through charts, graphs, and
tables, which can further aid in visualizing and interpreting the information. Common graphical
techniques include histograms, bar charts, pie charts, scatter plots, and box plots.
By employing descriptive statistics, researchers can effectively summarize and communicate the key
characteristics of a dataset, facilitating a better understanding of the data and providing a foundation
for further statistical analysis or decision-making processes.
Also Read: The Difference Between Data Mining and Statistics
Descriptive Statistics Examples
Example 1:
Exam Scores Suppose you have the following scores of 20 students on an exam:
85, 90, 75, 92, 88, 79, 83, 95, 87, 91, 78, 86, 89, 94, 82, 80, 84, 93, 88, 81
To calculate descriptive statistics:
Mean: Add up all the scores and divide by the number of scores. Mean = (85 + 90 + 75 + 92 +
88 + 79 + 83 + 95 + 87 + 91 + 78 + 86 + 89 + 94 + 82 + 80 + 84 + 93 + 88 + 81) / 20 = 1770 / 20
= 88.5
Median: Arrange the scores in ascending order and find the middle value. Median = 86
(middle value)
Mode: Identify the score(s) that appear(s) most frequently. Mode = 88
Range: Calculate the difference between the highest and lowest scores. Range = 95 - 75 = 20
Variance: Calculate the average of the squared differences from the mean. Variance = [(85-
88.5)^2 + (90-88.5)^2 + ... + (81-88.5)^2] / 20 = 33.25
Standard Deviation: Take the square root of the variance. Standard Deviation = √33.25 = 5.77
Example 2:
Monthly Income Consider a sample of 50 individuals and their monthly incomes:
$2,500, $3,000, $3,200, $4,000, $2,800, $3,500, $4,500, $3,200, $3,800, $3,500, $2,800, $4,200,
$3,900, $3,600, $3,000, $2,700, $2,900, $3,700, $3,500, $3,200, $3,600, $4,300, $4,100, $3,800,
$3,600, $2,500, $4,200, $4,200, $3,400, $3,300, $3,800, $3,900, $3,500, $2,800, $4,100, $3,200,
$3,600, $4,000, $3,700, $3,000, $3,100, $2,900, $3,400, $3,800, $4,000, $3,300, $3,100, $3,200,
$4,200, $3,400.
To calculate descriptive statistics:
Mean: Add up all the incomes and divide by the number of incomes. Mean = ($2,500 +
$3,000 + ... + $3,400) / 50 = $166,200 / 50 = $3,324
Median: Arrange the incomes in ascending order and find the middle value. Median = $3,400
(middle value)
Range: Calculate the difference between the highest and lowest incomes. Range = $4,500 -
$2,500 = $2,000
Variance: Calculate the average of the squared differences from the mean. Variance =
[($2,500-$3,324)^2 + ($3,000-$3,324)^2 + ... + ($3,400-$3,324)^2] / 50 = $221,684,000 / 50 =
$4,433,680
Standard Deviation: Take the square root of the variance. Standard Deviation = √$4,433,680
= $2,105.18
These calculations provide descriptive statistics that summarize the central tendency, dispersion, and
shape of the data in these examples.
Types of Descriptive Statistics
Descriptive statistics break down into several types, characteristics, or measures. Some authors say
that there are two types. Others say three or even four.
Distribution (Also Called Frequency Distribution)
Datasets consist of a distribution of scores or values. Statisticians use graphs and tables to summarize
the frequency of every possible value of a variable, rendered in percentages or numbers. For
instance, if you held a poll to determine people’s favorite Beatle, you’d set up one column with all
possible variables (John, Paul, George, and Ringo), and another with the number of votes.
Statisticians depict frequency distributions as either a graph or as a table.
Measures of Central Tendency
Measures of central tendency estimate a dataset's average or center, finding the result using three
methods: mean, mode, and median.
Mean: The mean is also known as “M” and is the most common method for finding averages. You get
the mean by adding all the response values together, and dividing the sum by the number of
responses, or “N.” For instance, say someone is trying to figure out how many hours a day they sleep
in a week. So, the data set would be the hour entries (e.g., 6,8,7,10,8,4,9), and the sum of those
values is 52. There are seven responses, so N=7. You divide the value sum of 52 by N, or 7, to find M,
which in this instance is 7.3.
Mode: The mode is just the most frequent response value. Datasets may have any number of modes,
including “zero.” You can find the mode by arranging your dataset's order from the lowest to highest
value and then looking for the most common response. So, in using our sleep study from the last
part: 4,6,7,8,8,9,10. As you can see, the mode is eight. Mode is particularly useful for
analyzing nominal data, as it identifies the most frequently occurring category.
Median: Finally, we have the median, defined as the value in the precise center of the dataset.
Arrange the values in ascending order (like we did for the mode) and look for the number in the set’s
middle. In this case, the median is eight.
Variability (Also Called Dispersion)
The measure of variability gives the statistician an idea of how spread out the responses are. The
spread has three aspects — range, standard deviation, and variance.
Range: Use range to determine how far apart the most extreme values are. Start by subtracting the
dataset’s lowest value from its highest value. Once again, we turn to our sleep study: 4,6,7,8,8,9,10.
We subtract four (the lowest) from ten (the highest) and get six. There’s your range.
Standard Deviation: This aspect takes a little more work. The standard deviation (s) is your dataset’s
average amount of variability, showing you how far each score lies from the mean. The larger your
standard deviation, the greater your dataset’s variable. Follow these six steps:
1. List the scores and their means.
2. Find the deviation by subtracting the mean from each score.
3. Square each deviation.
4. Total up all the squared deviations.
5. Divide the sum of the squared deviations by N-1.
6. Find the result’s square root.
Raw Number/Data Deviation from Mean Deviation Squared
4 4-7.3= -3.3 10.89
6 6-7.3= -1.3 1.69
7 7-7.3= -0.3 0.09
8 8-7.3= 0.7 0.49
8 8-7.3= 0.7 0.49
9 9-7.3=1.7 2.89
10 10-7.3= 2.7 7.29
M=7.3 Sum = 0.9 Square sums= 23.83
When you divide the sum of the squared deviations by 6 (N-1): 23.83/6, you get 3.971, and the
square root of that result is 1.992. As a result, we now know that each score deviates from the mean
by an average of 1.992 points.
Variance: Variance reflects the dataset’s degree spread. The greater the degree of data spread, the
larger the variance relative to the mean. You can get the variance by just squaring the standard
deviation. Using the above example, we square 1.992 and arrive at 3.971.
Become a Data Science & Business Analytics Professional
28%Annual Job Growth By 2026
11.5 MExpected New Jobs For Data Science By 2026
Data Analyst
Industry-recognized Data Analyst Master’s certificate from Simplilearn
Dedicated live sessions by faculty of industry experts
11 months
View Program
Post Graduate Program in Data Analytics
Post Graduate Program certificate and Alumni Association membership
Exclusive hackathons and Ask me Anything sessions by IBM
8 months
View Program
Here's what learners are saying regarding our programs:
Gayathri Ramesh
Associate Data Engineer, Publicis Sapient
The course was well structured and curated. The live classes were extremely helpful. They made
learning more productive and interactive. The program helped me change my domain from a data
analyst to an Associate Data Engineer.
Felix Chong
Project Manage, Codethink
After completing this course, I landed a new job & a salary hike of 30%. I now work with Zuhlke
Group as a Project Manager.
Not sure what you’re looking for?View all Related Programs
Univariate Descriptive Statistics
Univariate descriptive statistics examine only one variable at a time and do not compare variables.
Rather, it allows the researcher to describe individual variables. As a result, this sort of statistic is also
known as descriptive statistics. The patterns identified in this sort of data may be explained using the
following:
Measures of central tendency (mean, mode, and median)
Data dispersion (standard deviation, variance, range, minimum, maximum, and quartiles)
(standard deviation, variance, range, minimum, maximum, and quartiles)
Tables of frequency distribution
Pie graphs
Frequency polygon histograms
Bar graphs
Bivariate Descriptive Statistics
When using bivariate descriptive statistics, two variables are concurrently analyzed (compared) to
see whether they are correlated. Generally, by convention, the independent variable is represented
by the columns, and the rows represent the dependent variable.'
There are numerous real-world applications for bivariate data. For example, estimating when a
natural occurrence will occur is quite valuable. Bivariate data analysis is a tool in the statistician's
toolbox. Sometimes, something as simple as projecting one parameter against the other on a Two-
dimensional plane can better understand what the information is trying to convince you. For
example, the scatterplot below demonstrates the link between the period between eruptions at Old
Faithful and the eruption's duration.
Univariate vs. Bivariate Statistics
Univariate Bivariate
Involves only one variable Involves two variables
Doesn't deal with relationships or causes Deals with causes or relationships
The prime purpose of bivariate is
explaining:
Correlations: Comparisons,
The prime purpose of univariate is describing: explanations, causes,
Dispersion: variance, range, standard deviation, relationships
quartiles, maximum, minimum Dependent and independent
Central tendency: mean median, and mode variables
Bar graph, pie chart, histogram, box-and-whisker Tables where just one variable
plot, line graph is dependent on other
variables' values
Simultaneous analysis of two
variables
What is the Main Purpose of Descriptive Statistics?
Descriptive statistics can be useful for two things: 1) providing basic information about variables in a
dataset and 2) highlighting potential relationships between variables. Graphical/Pictorial Methods
are measures of the three most common descriptive statistics that can be displayed graphically or
pictorially. It is used to summarise data. Descriptive statistics only make statements about the data
set used to calculate them; they never go beyond your data.
Scatter Plots
A scatter plot employs dots to indicate values for two separate numeric variables. Each dot's location
on the horizontal and vertical axes represents a data point's values. Scatter plots are being used to
monitor relationships between variables.
The main purposes of scatter plots are to examine and display relationships between two numerical
variables. The points in a scatter plot document the values of individual points and trends when the
data is obtained as a whole. Identification of correlational links is prevalent with scatter plots. In
these situations, we want to know what a good vertical value prediction would be given a specific
horizontal value.
This can lead to overplotting when there are many data points to plot. When data points are overlaid
to the point where it is difficult to see the connections between them and the variables, this is
known as overplotting. It might be difficult to discern how densely-packed data points are when lots
of them are in a tiny space.
There are a couple simple methods to relieve this issue. One approach is to choose only a subset of
data points: a random sample of points should still offer the basic sense of the patterns in the whole
data. Additionally, we can alter the shape of the dots by increasing transparency to make overlaps
visible or decreasing point size to minimise overlaps.
What’s the Difference Between Descriptive Statistics and Inferential Statistics?
So, what’s the difference between the two statistical forms? We’ve already touched upon this when
we mentioned that descriptive statistics doesn’t infer any conclusions or predictions, which implies
that inferential statistics do so.
Inferential statistics takes a random sample of data from a portion of the population and describes
and makes inferences about the entire population. For instance, in asking 50 people if they liked the
movie they had just seen, inferential statistics would build on that and assume that those results
would hold for the rest of the moviegoing population in general.
Therefore, if you stood outside that movie theater and surveyed 50 people who had just seen Rocky
20: Enough Already! and 38 of them disliked it (about 76 percent), you could extrapolate that 76% of
the rest of the movie-watching world will dislike it too, even though you haven’t the means, time,
and opportunity to ask all those people.
Simply put: Descriptive statistics give you a clear picture of what your current data shows. Inferential
statistics makes projections based on that data.
Why Not Become a Data Scientist?
Whether you like descriptive or inferential statistics, you can find many opportunities in the field
of data analytics and data science. Simplilearn’s Professional Certificate Program in Data Science,
gives you broad exposure to key data science concepts and tools like Python, R, Machine Learning,
and more. Hands-on labs and project work in this acclaimed program bring the ideas to life with
skilled trainers and teaching assistants to guide you along the way.
The boot camp, conducted in partnership with Purdue University and in collaboration with IBM,
features the perfect mix of theory, case studies, & extensive hands-on practice. The Economic
Times ranked this Data Science certification program at the top of its list.
According to Glassdoor, data scientists earn an annual average of USD 113,309. Payscale shows that a
data scientist in India makes a yearly average of ₹817,366. Data science is a great career choice if
you’re looking for a challenge in a secure vocation and getting well-compensated in the process!
Check out Simplilearn’s data science courses today and embark on this exciting new opportunity!
Choose the Right Program
Are you interested in the data science field? Our Data Science courses are meticulously curated to
equip you with the requisite expertise and know-how to flourish in this swiftly expanding sector.
Below is an elaborate comparison to help you comprehend better:
Post Graduate Post Graduate
Program
DS Master's Program In Data Program In Data
Name
Science Science
Geo All Geos All Geos IN/ROW
University Simplilearn Purdue Caltech
Course
11 Months 11 Months 11 Months
Duration
Coding
Experience Basic Basic No
Required
8+ skills
8+ skills including
including
Exploratory Data
Supervised &
Analysis,
10+ skills including data structure, data Unsupervised
Skills You Descriptive
manipulation, NumPy, Scikit-Learn, Learning
Will Learn Statistics,
Tableau and more Deep Learning
Inferential
Data
Statistics, and
Visualization,
more
and more
Purdue Alumni
Association
Membership Upto 14 CEU
Additional Applied Learning via Capstone and 25+ Free IIMJobs Pro- Credits Caltech
Benefits Data Science Projects Membership of 6 CTME Circle
months Membership
Resume Building
Assistance
Cost $$ $$$$ $$$$
Explore Program Explore Program Explore Program
Frequently Asked Questions
1. What do you mean by descriptive statistics?
Descriptive statistics refers to a set of methods used to summarize and describe the main features of
a dataset, such as its central tendency, variability, and distribution. These methods provide an
overview of the data and help identify patterns and relationships.
2. What is descriptive statistics. Explain with examples.
Descriptive statistics are methods used to summarize and describe the main features of a dataset.
Examples include measures of central tendency, such as mean, median, and mode, which provide
information about the typical value in the dataset. Measures of variability, such as range, variance,
and standard deviation, describe the spread or dispersion of the data. Descriptive statistics can also
include graphical methods, including histograms, box plots, and scatter plots, to visually represent
the data.
3. What are the four types of descriptive statistics?
The four types of descriptive statistics are:
Measures of central tendency
Measures of variability
Standards of relative position
Graphical methods
Measures of central tendency describe the typical value in the dataset and include mean, median,
and mode. Measures of variability represent the spread or dispersion of the data and include range,
variance, and standard deviation. Measures of relative position describe the location of a specific
value within the dataset, such as percentiles. Graphical methods use charts, histograms, and other
visual representations to display data.
4. What is the main purpose of descriptive statistics?
The primary objective of descriptive statistics is to effectively summarize and describe the main
features of a dataset, providing an overview of the data and helping to identify patterns and
relationships within it. Descriptive statistics provide a useful starting point for analyzing data, as they
can help to identify outliers, summarize key characteristics of the data, and inform the selection of
appropriate statistical methods for further analysis. They are commonly used in multiplle fields,
including social sciences, business, and healthcare.
5. Can Descriptive Statistics be used to make inferences or predictions?
Descriptive statistics is primarily used to summarize and describe data, but they do not involve
making inferences or predictions beyond the data itself. Statistical inference methods are needed to
make inferences or predictions about a larger population, which go beyond descriptive statistics and
involve estimating parameters and testing hypotheses.
6. Why is descriptive statistics important?
Descriptive statistics is important because it allows us to summarize and describe data meaningfully.
It helps us understand a dataset's main features and characteristics, identify patterns and trends, and
gain insights from the data. Descriptive statistics provide a foundation for further analysis, decision-
making, and communication of findings.
7. What is descriptive statistics used for?
Descriptive statistics is used to summarize and present data concisely and meaningfully. It is
commonly used in various fields such as research, business, economics, social sciences, and
healthcare. Descriptive statistics helps researchers and analysts to describe the central tendency
(mean, median, mode), dispersion (range, variance, and standard deviation), and shape of the
distribution of a dataset. It also involves graphical representation of data to aid visualization and
understanding.
8. Explain the difference between inferential and descriptive statistics ?
The main difference between descriptive and inferential statistics lies in their purpose and scope.
Descriptive statistics focuses on summarizing and describing the characteristics of a sample or
population, without making inferences or generalizations to a larger population. It aims to provide a
concise summary of data and reveal patterns within the observed dataset.
In contrast, inferential statistics involves drawing conclusions, making predictions, or testing
hypotheses about a population based on a sample of data. It uses probability theory and statistical
techniques to generalize findings from a sample to a larger population. Inferential statistics allows
researchers to make inferences, estimate parameters, assess relationships, and make predictions
beyond the observed data.