0% found this document useful (0 votes)
27 views12 pages

STS Notes

The document provides an overview of statistical analysis, including definitions, types of statistics, data collection methods, and sampling techniques. It distinguishes between descriptive and inferential statistics, outlines the limitations of statistics, and explains key terminology such as population, sample, and variables. Additionally, it discusses methods for determining sample size and various sampling techniques to ensure representative data collection.

Uploaded by

mkitagawa31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views12 pages

STS Notes

The document provides an overview of statistical analysis, including definitions, types of statistics, data collection methods, and sampling techniques. It distinguishes between descriptive and inferential statistics, outlines the limitations of statistics, and explains key terminology such as population, sample, and variables. Additionally, it discusses methods for determining sample size and various sampling techniques to ensure representative data collection.

Uploaded by

mkitagawa31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

STATISTICAL ANALYSIS

BSMA3103 (Lecture/Lab)
STATISTIC
 The average age of 3,000 selected jeepney drivers
NATURE OF STATISTICS
across the country is 48 years.
WHAT IS STATISTICS?  The average weight of 150 randomly selected
Statistics is the science of collecting, organizing, students is 57 kg
summarizing, and analyzing information to draw  The proportion of Filipino teenagers who smoke is
conclusions or answer questions. In addition, statistics 33% based on the responses of 500 teenagers.
is about providing a measure of confidence in any
conclusions. TYPES OF STATISTICS
1. Descriptive statistics consist of organizing and
FIELDS OF STATISTICS summarizing data. Descriptive statistics describe data
A. Mathematical statistics- the study and
through numerical summaries, tables, and graphs.
development of statistical theory and methods in the
2. Inferential statistics uses methods that take a result
abstract.
from a sample, extend it to the population, and
B. Applied statistics- the application of statistical
measure the reliability of the result.
methods to solve real problems involving randomly
generated data and the development of new statistical DATA
methodology motivated by real problems. -In statistics, data refers to a collection of facts,
figures, or information gathered for analysis and
LIMITATION OF STATISTICS interpretation
1. Not suitable to the study of qualitative phenomenon. -data are measurements or observation that are
2. Statistics does not study individuals. gathered for an event under study
3. Statistical laws are not exact.
4. Statistics table may be misused. SOURCES OF DATA
5. Statistics is only, one of the methods of studying a  Primary sources - provide a first hand account of
problem. an event or time period and are considered to be
authoritative. They represent original thinking,
BASIC TERMINOLOGY IN STATISTICS reports on discoveries or events, or they can share
 Universe is the set of all entities under study. new information.
 A population is the total or entire group of  Secondary sources - offer an analysis,
individuals or observations from which interpretation or a restatement of primary sources
information is desired by a researcher. apart from and are considered to be persuasive they often
persons, a population may consist of mosquitoes, involve generalization, synthesis interpretation,
villages, institution, etc. commentary or evaluation an attempt to convince
 Sample is the subset of the population. It is a the reader of the creator's argument. They often
smaller manageable subset of a larger population attempt to describe or explain primary sources.
that is selected to represent the whole group  Primary data - are data documented by the
 An individual is a person or object that is a primary source. The data collectors documented
member of the population being studied. the data themselves.
 A statistic is a numerical summary of a sample.  Secondary data - are data documented by a
 A parameter is a numerical summary of a secondary source. The data collectors had the data
population documented by other sources.

PARAMETER
 The average age of all jeepney drivers in the DATA COLLECTION
Philippines is 50 years. Data collection is the process of gathering and
 The average weight of all PHS students is 60 kg measuring information on variables of interest, in an
 The proportion of Filipino teenagers who smoke is established systematic fashion that enables one to
30%

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
answer stated research questions, test hypotheses, and
VARIABLES
evaluate outcomes.
Variables are the characteristics of the individuals
STEPS IN DATA GATHERING within the population.
1. Set the objectives for collecting data.
2. Determine the data needed based on the set VARIABLES CAN BE CLASSIFIED INTO
TWO GROUP
objectives.
1. Qualitative variables (categorical) is variable that
3. Determine the method to be used in data gathering
yields categorical response. It is a word or a code that
and define the comprehensive data collection points.
represents a class or category.
4. Design the data gathering forms to be used.
2. Quantitative variables (numeric) takes numerical
5. Collect the data.
values representing an amount or quantity.
FIVE METHODS TO COLLECT PRIMARY
QUANTITATIVE VARIABLES MAY BE
DATA
FURTHER CLASSIFIED INTO:
1. Direct personal interview
1. Discrete Variable is a quantitative variable that either a
2. Indirect/Questionnaire Method
3. Focus Group finite number of possible values or a countable number of
4. Experiment possible values. If you count to get the value of a
5. Observation quantitative variable, it is discrete.
2. Continuous Variable is a quantitative variable that has
OPEN-ENDED & CLOSED-ENDED an infinite number of possible values that are not countable.
Open- Ended You measure to get the value of a quantitative variable, it is
-more detailed answers continuous.
-could reveal addtnl insights
-difficult to encode, tabulate and analyze LEVEL OF MEASUREMENT
-low response rate
-respondent has to be articulate RATIO
-respondent could feel threatened
QUANTITATIVE
-responses could have different levels of detail
Closed-Ended INTERVAL
-Easy to encode, tabulate and analyze
-Easy to understand
ORDINAL
-Enables Inter-study comparisons
QUALITATIVE
-Saves time and money
-High response rate NOMINAL
-Could frustrate respondents
-Potenially biased response sets DATA COLLECTION
-Difficult or impossible to detect if respondent truly
CATEGORICAL
understood the questions DATA

FIVE METHODS TO COLLECT


SECONDARY DATA
1. Published report on newspaper and periodicals.
2. Financial Data reported in annual reports.
3. Records maintained by the institution.
4. Internal reports of the government departments. NOMINAL ORDINAL
5. Information from official publications.

No rank or ordering Natural order or rank

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
NUMERICAL the quantity arithmetic operations such as addition and
DATA Subtraction can be performed on values of the variable.

Examples:
- temperature on Fahrenheit/Celsius thermometer
-trait anxiety (e.g., high anxious vs. Low anxious)
-iq (e.g., high iq vs. Average iq vs. Low iq)
INTERVAL RATIO

Measurable, but Measurable and RATIO LEVEL


arbitrary zero absolute zero A ratio scale represents the highest, most precise, level
of measurement. It has the properties of the interval
It is important to know which type of scale is represented by level of measurement and the ratios of the values of
your data since different statistics are appropriate for the variable have meaning. A value of zero means the
different scales of measurement. A characteristic may be absence of the quantity. Arithmetic operations such as
measured using nominal, ordinal, interval and ration scales.
multiplication and division can be performed on the
values of the variable.
NOMINAL LEVEL
They are sometimes called categorical scales or
VARIABLES CAN BE CLASSIFIED INTO TWO
categorical data. Such a scale classifies persons or
TYPES:
objects into two or more categories. Whatever the 1. DEPENDENT VARIABLES
basis for classification, a person can only be in one  The variable we want to infer or predict is called the
category, and members of a give category have a dependent variable.
common set of characteristics.  Response variable
 Outcome variable
Examples:  Target variable
- method of payment (cash, check, debit card, credit  Denotes as “Y”
card) 2. INDEPENDENT VARIABLES
 The variables we use for prediction are called
-type of school (public vs. Private)
independent variables.
- eye color (blue, green, brown)
 Predictor variables
 Input variables
ORDINAL LEVEL
 Denotes as “X”
This involves data that may be arranged in some order,
but differences between data values either cannot be
determined or meaningless. An ordinal scale not only STATISTICS NOTATIONS
classifies subjects but also ranks them in terms of the In statistics, special notation is used to clearly
degree to which they possess a characteristic of distinguish between values that describe a population
interest. the entire group being studied) and a sample (a subset
of the population). These notations are important so we
In other words, an ordinal scale puts the subjects in don't confuse parameters (population measures) with
order from highest to lowest, from most to least. statistics (sample measures).
Although ordinal scales indicate that some subjects are
higher, or lower than others, they do not indicate how Population refers to the entire group of individuals or
much higher or how much better. items of interest. Values that describe a population are
called parameters.
INTERVAL LEVEL  Population Mean (μ)- the average of all values in
This is a measurement level not only classifies and
the population.
orders the measurements, but it also specifies that the
 Population Variance (²)- the average of the
distances between each interval on the scale are
squared deviations from the mean.
equivalent along the scale from low interval to high
interval. A value of zero does not mean the absence of

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
 Population Standard Deviation ()- the square
root of the variance. The sample size is typically denoted by “n” and it is
 Population Proportion (P)- proportion of always a positive integer. No exact sample size can be
elements in the population with a certain mentioned here and it can vary in different research
characteristic. settings. However, all else being equal, large sized
 Population Size (N)- total number of elements in sample leads to increased precision in estimates of
the population. various properties of the population.

Sample refers to a subset of the population. Values Choosing of sample size depends on nonstatistical
calculated from a sample are called statistics. considerations and statistical considerations.
 Sample Mean (x̄ )- the average of all values in the • Non-statistical considerations- It may include
sample. availability of resources, man power, budget, ethics
 Sample Variance (s²)- variance computed from and sampling frame.
the sample. • Statistical considerations - It will include the
 Sample Standard Deviation (s)- square root of desired
the sample variance. precision of the estimate.
 Sample Proportion (p̂ )- proportion of sample
elements with a certain characteristic. THREE CRITERIA NEED TO BE SPECIFIED TO
 Sample Size (n)- total number of elements in the DETERMINE THE APPROPRIATE SAMPLE
SIZE:
sample.
A. Level of Precision
Concept Population Sample Also called sampling error, the level of precision, is
(Parameter) (Statistic) the range in which the true value of the population is
Mean μ (mu) x̄ (x-bar) estimated to be.
(average) B. Confidence Interval
It is statistical measure of the number of times out of
Variance σ² (sigma s²
100 that results can be expected to be within a
squared)
Standard σ (sigma) s specified range. For example, a confidence interval of
Deviation 90% means that results of an action will probably meet
expectations 90% of the time.
Proportion P p̂ (p-hat)
Desired Confidence Level Z-Score
Size N n 80% 1.28
85% 1.44
Key Difference:
90% 1.65
Greek letters (μ, o, o?, P, N) are commonly used for 95% 1.96
population parameters. 99% 2.58
Roman letters (x, s, s?, p, n) are commonly used for
sample statistics. C. Degree of Variability
Depending upon the target population and attributes
SAMPLE SIZE under consideration, the degree of variability varies
In statistics, sample size refers to the number of considerably. The more heterogeneous a population is,
individual participants, observations, or data points the larger the sample size is required to get an
included in a study or research project. It represents a optimum level of precision.
subset of a larger population that is selected to be
representative of that population for analysis and
drawing conclusions. The sample size is crucial METHODS IN DETERMINING THE
because it impacts the reliability and generalizability SAMPLE SIZE
of research findings. A. Estimating the Mean or Average

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
The sample size required to estimate the population
mean to with a level of confidence with specified
margin of error e, given by:
which we know only after we have taken the sample.

where:
Z is the z-score corresponding to level of confidence
e is the level of precision

Example:
A soft drink machine is regulated so that the amount of
drink dispensed is approximately normally distribute
with a standard deviation equal to 0.5 ounce.
Determine
the sample size needed if we wish to be 95% confider
that the sample mean will be within 0.03 ounce from
the
true mean. C. Slovin's Formula

1. Find the z – score for confidence level 95% in the z


– table.

2. Use the given formula for estimating the mean or


average

We need a 1068 sample for our study.


D. Finite Population Correction
B. Estimating Proportion (Infinite Population)
The sample size required to obtain a confidence
interval for “p” with specified margin of error “e” is
given by

where:
Z is the z-score corresponding to level of confidence
e is the level of precision
P is population proportion
There is a dilemma in this formula: It depends on

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
for the sample. The selection process is random,
SAMPLING TECHNIQUES
meaning it's not influenced by any specific
 Sampling is the process of selecting a subset characteristics or biases of the researcher or the
(sample) from a larger group (population) in order population itself. By ensuring random selection, a
to study and draw conclusions about the entire simple random sample aims to create a sample that
population. Since studying the whole population is accurately reflects the characteristics of the larger
often time-consuming, costly, or impractical, population.
researchers use sampling to gather information
more efficiently. The goal is to make sure the B. Systematic Sampling (SS)
sample is as representative as possible of the Systematic sampling is a probability sampling method
population. where the sample is selected from a larger population
 Sampling techniques are the methods or strategies by choosing every kth element from a list or sequence.
used to select samples from a population. They It's a straightforward technique often used when a
determine how participants or elements are population is ordered in a predictable way.
chosen.
 Good sampling techniques reduce bias and
increase the accuracy of results.
where,
TYPES OF SAMPLING TECHNIQUES k = sampling interval
1. Probability Sampling is a sampling technique in
N = population size
which every member of the population has a known,
n = sample size
non-zero chance of being selected. It relies on
randomization, making it more representative and less
C. Stratified Sampling (STS)
prone to bias.
 Stratified sampling is a probability sampling
2. Non-Probability Sampling is a sampling technique
method used to divide a population into subgroups
where not all members of the population have a chance
(strata) based on shared characteristics, and then
of being included. Selection is based on convenience,
samples are randomly selected from each stratum.
judgment, or voluntary participation, making it less
This ensures representation from all subgroups in
representative but often easier and cheaper to conduct.
the final sample
Probability Sampling Non-Probability
Aspect Sampling  Through stratification, the population is divided
Chance of Equal and known Unequal, not all
into smaller, non-overlapping groups called
Selection chance for all members have a strata. These strata are formed based on specific
members chance characteristics relevant to the research, such as
Bias Less prone to bias More prone to
bias age, gender, income, or education level.
Representa More representative Less  Once the strata are defined, a random sampling
tiveness of the population representative
technique (like simple random sampling) is used
Cost & More costly and time- Cheaper and
Time consuming faster within each stratum to select individuals. This
method ensures that each stratum is adequately
1. PROBABILITY SAMPLING represented in the overall sample.

A. Simple Random Sampling (SRS) D. Cluster Sampling (CS)


Simple random sample is a subset of a larger Cluster sampling is a probability sampling technique
population where each member has an equal and where the population is divided into groups (clusters),
random chance of being selected. This method is used and then a random selection of these clusters is
to ensure the sample is representative of the population sampled. Instead of sampling individual units, the
and to minimize bias in research or data collection. In entire cluster is included in the sample. This is useful
a simple random sample, every individual in the for large, geographically dispersed populations where
population has the same probability of being chosen

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
it's more practical to sample entire groups rather than However, some individuals or subgroups may have no
individuals. chance of being sampled. In order to be able to
2. NON-PROBABILITY SAMPLING generalize the conclusion to the whole population,
some assumptions, which are usually not met, are
A. Convenience Or Haphazard Sampling required.
Units are selected in an arbitrary manner with little or
no planning involved. Haphazard sampling assumes E. Volunteer Sampling
that the population units are all alike, then any unit The respondents are only volunteers in this method.
may be chosen for the sample. An example of Generally, volunteers must be screened so as to get a
haphazard sampling is the vox pop survey where the set of characteristics suitable for the purposes of the
interviewer selects any person who happens to walk survey (e.g. individuals with a particular disease). This
by. Unfortunately, unless the population units are truly method can be subject to large selection biases, but is
similar, selection is subject to the biases of the sometimes necessary. For example, for ethical reasons,
interviewer and whoever happened to walk by at the volunteers with particular medical conditions may
time of sampling. have to be solicited for some medical experiments.

B. Purposive / Judgement Sampling F. Crowdsourcing and Web Panels


With this method, sampling is done based on previous Crowdsourcing has been defined slightly differently by
ideas of population composition and behaviour. An researchers from various areas. Despite the multiplicity
expert with knowledge of the population decides of definitions for crowdsourcing, one constant has
which units in the population should be sampled. In been the broadcasting of a problem to the public, and
other words, the expert purposely selects what is an open call for contributions to help solve the
considered to be a representative sample. Judgment problem.
sampling is subject to the researcher’s biases and is
perhaps even more biased than haphazard sampling While, web panels (or online or internet panels) could
be defined as an access panels of people willing to
C. Quota Sampling respond to web questionnaires. It contains a sample of
This is one of the most common forms of non- potential respondents who declare that they will
probability sampling. Sampling is done until a specific cooperate for future data collection if selected. A web
number of units (quotas) for various subpopulations panel survey is a survey utilizing samples from web
have been selected. Quota sampling is a means for panels.
satisfying sample size objectives for the
subpopulations.
SOURCES OF ERRORS IN
SAMPLING
D. Snowball Or Network Sampling
Suppose a researcher wishes to find rare individuals in 1. Non-sampling Error
-Errors that result from the survey process
the population, and already knows of the existence of
-Any errors that cannot be attributed to the sample-to-
some of these individuals and how to contact them. sample variability
One approach is to contact those individuals and Sources of Non-sampling Error
simply ask them if they know anyone like themselves, 1. Non-responses
then contact those people, etc. The sample grows like a 2. Interviewer Error
snowball rolling down a hill to hopefully include 3. Misrepresented Answers
virtually everybody with that characteristic. Snowball 4. Data Entry Errors
5. Questionnaire Design
sampling is useful for rare or hard to reach populations
6. Wording of Questions
such as people with disabilities, homeless people, drug 7. Selection Bias
users, or other persons who may not belong to an
organized group or such as musicians, painters, or 2. Sampling Error
poets, not readily identified on a survey list frame.

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
-Error that results from taking one sample instead of Visualization also enables us to collect and organize
examining the whole population data based on categories and topics, which can make it
Error that results from using sampling to estimate easier to break it down into manageable chunks. This
information regarding a population
can be a significant benefit.
DATA VISUALIZATION 2. Locating patterns and anomalies within a given
 The presentation of information or data in a visual data collection.
format is known as data visualization. The purpose If you were to manually sort through raw data, it could
of data visualization is to transmit information or take you a very long time to identify patterns, trends,
data to readers in a way that is understandable and or anything that is out of the ordinary. However, you
useful to them. Charts, infographics, diagrams, and may sort through a large amount of data in a short
maps are the most common ways that data can be amount of time by using data visualization tools such
represented graphically. as charts. Even better, charts make it more simpler and
 Data visualization is a form of communication that faster to identify patterns than it would be to do so by
portrays dense and complex information in combing through numerical data.
graphical form. The resulting visuals are designed 3. Tell a story that can be found inside the data.
to make it easy to compare data and use it to tell a The mere presentation of numbers does not typically
story – both of which can help users in decision elicit an emotional response. However, data
making. Data visualization can express data of visualization allows for the telling of a story that
varying types and sizes: from a few data points to provides context for the data. Designers utilize
large multivariate datasets. methods such as color theory, images, design style,
 The fields of art and data science come together in and visual cues to appeal to the emotions of readers,
the discipline of data visualization. Although a put faces to numbers, and introduce a narrative to the
data visualization has the potential to be artistic data. They also use these methods to put faces to
and aesthetically beautiful, it must not lose sight of numbers.
the fact that its primary purpose is to effectively 4. Putting more weight to a claim or viewpoint.
communicate the facts it depicts. When it comes to persuading people that your
 The process of drawing conclusions from viewpoint is correct, showing them the evidence is
collected, processed, and modeled data requires often necessary for them to believe you. Your
that the data be visualized as one of the processes argument can be strengthened while also highlighting
in the data science workflow. This means that data your creative potential if you use a good infographic or
visualization is one of the steps in the data science chart. You can use a comparison infographic, for
process. The discipline known as data presentation instance, to compare the various points of view in an
architecture (dpa) seeks to identify, locate, argument, various ideas, product or service options,
process, format, and convey data in the most advantages and disadvantages, and even more.
effective manner possible. Data visualization is 5. Bringing attention to the most relevant aspects of a
one component of the larger data presentation data set.
architecture (dpa) field We make use of data visualizations on occasion so that
it is simpler for readers to investigate the data and
Uses of data visualization draw their own conclusions. On the other hand, we
1. Present information in a way that is both frequently employ data visualizations in order to
interesting and simple to understand. convey a story, present a specific argument, or urge
Large amounts of numbers can frequently cause us to readers to arrive at a particular conclusion. Visual cues
experience double vision. Finding the meaning of the are employed by designers to guide the viewer's
data that is presented in rows might be challenging. attention to specific locations on a page. Visual cues
The use of pictures, charts, descriptive language, and are elements such as forms, symbols, and colors that
an engaging design all contribute to data visualization, either direct the viewer's attention to a particular
which enables us to reframe the data in a new light.

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
portion of the data visualization or highlight a certain 6. Geographic infographic (map infographic) - uses
portion of the data. maps and location data to show trends or patterns.
Example: population density by region, global internet
usage, tourism hotspots.
Types of data visualization 7. Hierarchical infographic presents information in
TABLE levels of importance or ranking (like a pyramid or flow
When performing an analysis of comparative data on structure).
categorical objects, a data table or a spreadsheet can be Example: maslow's hierarchy of needs, organizational
an effective format to use. The things being compared structure, food pyramid.
are typically arranged in a column, and the classified 8. List infographic - uses a list format to summarize
objects are placed in the rows of the table. The points clearly and attractively.
numerical value is then placed in what is known as the Example: top 10 tips for productivity, safety rules, best
cell, which is located at the junction of the row and the practices.
column. 9. Flowchart infographic - helps readers make
decisions by guiding them through different options or
INFOGRAPHIC paths.
An infographic is a compilation of images, charts, and Example: should you buy or rent?, troubleshooting
relatively little text that provides a concise summary of guides.
a subject in an easy-to-understand format 10. Interactive infographic - digital version that
allows users to interact, click, or explore additional
Infographics come in many types, depending on the data.
purpose and the way information is presented. Here are Example: online data dashboards, clickable maps,
the different types of infographics commonly used: animated infographics.

1. Statistical infographic focuses on numbers, data, CHARTS


and statistics. Often uses charts, graphs, and A chart can be thought of as a graphical representation
percentages to highlight findings. of facts in its most basic form. Graphical elements like
Example lines, bars, dots, slices, and icons serve to represent the
: survey results, market research, population growth individual data points on charts.
data.
2. Informational infographic explains a concept or Change over time
provides an overview of a topic in a structured way, Change over time charts show data over a period of
usually with text and icons. time, such as trends or comparisons across multiple
Example: introduction to climate change, company categories. Common use cases include:
mission and values. -stock price performance
3. Timeline infographic - shows events or processes in -health statistics
chronological order. -chronologies
Example: company history, evolution of technology, (line, bar, stacked bar, candlestick, area, timeline,
project milestones. horizon, waterfall)
4. Process infographic - breaks down the steps of a
process or workflow. Category comparison
Example: steps in job application, cooking recipe, how Compare data between multiple distinct categories.
a bill becomes a law. Use cases include:
5. Comparison infographic highlights similarities and -income across different countries
differences between two or more items. -popular venue times
Example: comparing products, pros and cons, old vs. -team allocations
New methods.

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
(bar, grouped bar, bubble, parallel coordinate, multi- A diagram is a graphical depiction of information,
line, bullet) comparable to a chart in its function. Both two-
dimensional and three-dimensional representations of
Ranking diagrams are possible. Diagrams can be used to plan
Show an item’s position in an ordered list. Use cases out projects, assist in decision making, map out
include: processes, determine root causes, connect concepts,
-electronic results and identify connections.
-performance statistics MAPS
(ordered bar, ordered column, parallel coordinates) A land mass is depicted pictorially on a map in order
to facilitate easier comprehension. The geographic
Part-to-whole characteristics of the land, such as its regions,
Show how partial elements add up to a total. Use cases landscapes, cities, and roadways, as well as its bodies
include: of water, are depicted on maps.
-consolidated revenue of product categories
-budgets STYLE
(stacked bar, pie, donut, stacked area, treemap, Data visualizations use custom styles and shapes to
sunburst) make data easier to understand at a glance, in ways
that suit the user's needs and context.
Correlation Charts can benefit from customizing the following:
Show correlation between two or more variables. Use  Graphical elements
cases include:  Typography
-income and life expectancy  Iconography
(scatterplot, bubble, column/line, heatmap)  Axes and labels
 Legends and annotations
Distribution
Show how often each values occur in a dataset. Use Styling different types of data
cases include: Visual encoding is the process of translating data into
-population distribution visual form. Unique graphical attributes can be applied
-income distribution to both quantitative data (such as temperature, price, or
(histogram, box plot, violin, density) speed) and qualitative data (such as categories, flavors,
or expressions). These attributes include:
Flow  Shape
Show movement of data between multiple states. Use Charts can use shapes to display data in a range of
cases include: ways. A shape can be styled as playful and curvilinear,
-fund transfers or precise and high-fidelity, among other ways in
-vote counts and election results between.
(sankey, gantt, chord, network)
Level of shape detail
Relationship Charts can represent data at varying levels of
Show how multiple items relate to one another. Use precision. Data intended for close exploration should
cases include: be represented by shapes that are suitable for
-social networks interaction (in terms of touch target size and related
-word charts affordances). Whereas data that's intended to express a
(network, venn diagram, chord, sunburst) general idea or trend can use shapes with less detail.

DIAGRAM  Color

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
Can be used to differentiate chart data in four primary content in the hierarchy. However, these treatments
ways: should be used sparingly, with a limited number of
-distinguishing categories from one another typographic styles.
-representing quantity
-highlighting specific data/ highlight area of focus Iconography
-expressing meaning Iconography can represent different types of data in a
chart and improve a chart's overall usability.
 Size Iconography can be used for.
 Area  Categorical data to differentiate groups or
 Volume categories
 Length  Ul controls and actions, such as filter, zoom, save,
 Angle and download
 Position  States, such as errors, no data, completed states,
 Direction and danger
 Density
When placing icons in a chart, it's recommended to use
Accessibility universally recognizable symbols, particularly when
To accommodate users who don't see color representing actions or states, such as: save, download,
differences, you can use other methods to accentuate completed, error, and danger.
data, such as high-contrast shading, shape, or texture.
Labelled axis
Line - A labelled axis or multiple axes, indicates the scale
Chart lines can express qualities about data, such as and scope of the data displayed. For example, line
hierarchy, highlights, and comparisons. Line styles can charts display a range of values along both horizontal
be styled in different ways, such as using dashes or and vertical labelled axes
varied opacities. -bar charts should always start at the x-axis baseline
Lines can be applied to specific elements, including: value of zero
 Annotations
 Forecasting elements Bar chart baseline
 Comparative tools Bar charts should start at a baseline (the starting value
 Confidence intervals on the y-axis) of zero. Starting at a baseline that isn't
 Anomalies zero can cause the data to be perceived incorrectly

Typography Axis labels


Text can be used to label different chart elements, Label usage should reflect the most important data
including: insights in a chart. Axis labels should be used as
 Chart titles needed, and in consistent ways across a ul. Their
 Data labels presence should not inhibit reading the chart.
 Axis labels
 Legend Do: Support legibility by using a balanced number of
The text with the highest level of hierarchy is usually axis labels
the chart title, with axis labels and the legend having Don't: Overload the chart with numerous axis labels.
the lowest level of hierarchy.
Text orientation
Text weight Text labels should be placed horizontally on the chart
Headings and varying font weights can communicate so that they are easy to read.
which content is more (or less) important than other Text labels should not:

kath
STATISTICAL ANALYSIS
BSMA3103 (Lecture/Lab)
 Be rotated
 Stacked vertically

Do: Orient text horizontally on bar charts, rotating the


bars if needed to make space.
Don't: Rotate bar labels, as it makes them difficult to
read.

Legends and annotation


Legends and annotations describe a chart's
information. Annotations should highlight data points,
data outliers, and any noteworthy content.
1. Annotation
2. Legend

Labels and legends


Chart elements can be labeled directly in simple charts.
However, charts that are dense (or part of a larger
group of charts) can display labels in a legend.

Small displays
Charts displayed on wearables (or other small screens)
should be a simplified version of the mobile or desktop
chart.

Do: Annotate key points on the graph to describe the


data.
Caution: Don't place key data points off-screen, as it
requires the user to scroll to see them.

Threshold lines give users context about the data


displayed.

kath

You might also like