BAN 6003 – SEC 7B06
Lecture 1 – July 16 2025
AGENDA
Introduction
Canvas Assignments
Course Expectations & Grading
McGraw Hill
Lecture 1 –
Data Measurement
Summary Measures
Data Visualization
ABOUT ME
Education:
BS Psychology and Biology - University of Illinois
MS Industrial and Systems Engineering - Virginia Tech
PhD Industrial and Systems Engineering (Human Factors) - University
of Wisconsin
Work History:
Northrop Grumman, Children’s Hospital of Philadelphia, Divvy Dose
(Optum), Volkswagen Auto Cloud, Saama, Froedtert Hospital, Teladoc
Health*
Hobbies:
Travel, Sports: Tennis, Basketball, Music, Food
CHAPTER 2: DATA
MEASUREMENT AND
WRANGLING
Turning Raw Data into Reliable Insights
Prof. Siddarth Ponnala| 7/16/2025
LECTURE OVERVIEW
Part 1: Introduction to Data Measurement
Part 2: Data Wrangling Techniques
Part 3: Practical Tips and Wrap-up
WHAT IS DATA MEASUREMENT?
Assigning values to variables for comparison and analysis
Ensures consistency, interpretability, and analytical validity
SCALES OF MEASUREMENT
Nominal – Categories without order (e.g., Gender, Blood Type)
Ordinal – Ordered categories (e.g., Satisfaction rating)
Interval – Ordered with equal spacing, no true zero (e.g.,
Temperature)
Ratio – Interval with true zero (e.g., Height, Income)
WHY MEASUREMENT MATTERS
Determines suitable visualizations
Influences statistical tests
Affects model interpretation
COMMON RAW DATA ISSUES
Missing values
Inconsistent formats
Duplicates
Outliers
Mixed data types
IMPORTING AND CLEANING DATA
Import from CSV, Excel, SQL, APIs
Remove duplicates: drop_duplicates()
Handle missing data: dropna(), impute values
Standardize: casing, dates, whitespace
TRANSFORMING DATA
Create new columns: BMI = weight/height^2
Recode values: Yes/No to 1/0
Bin continuous data into categories
RESHAPING AND AGGREGATING
Reshape: pivot(), melt()
Group by category: groupby() + aggregate
Summarize with mean, median, count
MERGING DATA SOURCES
Join multiple datasets
Merge types: inner, left, right, outer
Example: Combining patient records with lab results
POPULAR TOOLS FOR
WRANGLING
Python/Pandas: Scalable, efficient scripting
R/dplyr: Tidyverse-style chaining
Excel: Manual but visual for small data
SQL: Great for structured databases
BEST PRACTICES
Know your data types and scales
Document each transformation
Validate cleaned data visually/statistically
Write reproducible code
KEY TAKEAWAYS
Proper measurement guides valid analysis
Wrangling transforms messy data into insights
Foundation for data visualization and modeling
WHAT’S NEXT?
Exploratory Data Analysis (EDA)
Data visualization best practices
Feature engineering
Data quality assessment
QUESTIONS OR DISCUSSION
Prompt: What challenges have you faced in cleaning data?
Thank you!
CHAPTER 3: UNDERSTANDING
SUMMARY MEASURES IN DATA
ANALYSIS
How to Describe and Compare Data Effectively
Your Name | Date
LECTURE OVERVIEW
Part 1: What Are Summary Measures?
Part 2: Types of Summary Statistics
Part 3: Practical Applications and Tips
WHAT ARE SUMMARY
MEASURES?
Describe features of a dataset using single values
Understand data's center, spread, and shape
Reduce complexity and support comparisons
CATEGORIES OF SUMMARY
MEASURES
Measures of Central Tendency (mean, median, mode)
Measures of Spread (range, variance, SD, IQR)
Shape of Distribution (skewness, kurtosis)
Position Metrics (percentiles, quartiles)
MEAN, MEDIAN, AND MODE
Mean: Average value – use for symmetric distributions
Median: Middle value – good for skewed data
Mode: Most frequent – use with categorical data
MEASURES OF SPREAD
Range = Max - Min
Variance: Average squared deviations
Standard Deviation: √variance
IQR: Q3 - Q1 (middle 50%)
DISTRIBUTION SHAPE: SKEWNESS
& KURTOSIS
Skewness: Direction and extent of asymmetry
Kurtosis: Peakedness or flatness
Tools: Histogram, boxplot, density plot
POSITION METRICS
Percentiles: Below which % of data falls
Quartiles: Divide data into four parts
Used in boxplots, benchmarks
SUMMARY STATS BY DATA TYPE
Nominal: Mode, Frequency Count
Ordinal: Median, IQR
Interval/Ratio: Mean, SD, Variance, Percentiles
VISUALIZATION AIDS
Boxplots: Show median, IQR, outliers
Histograms: Show distribution and skew
Bar Charts: For mode/categorical frequency
REAL-LIFE EXAMPLES
Median income across cities
IQR of blood pressure in clinic
Average customer support response time
BEST PRACTICES
Assess data type and distribution
Report multiple summary measures
Use visuals to complement stats
Watch for outliers and skew
SUMMARY AND KEY TAKEAWAYS
Simplify complex data with summary measures
Different measures serve different goals
Context and data type matter
WHAT’S NEXT?
Exploratory Data Analysis (EDA)
Inferential statistics
Data visualization techniques
QUESTIONS OR DISCUSSION
Prompt: Which summary measure do you use most—and why?
Thank you!
CHAPTER 4: DATA
VISUALIZATION
Data Visualization
• Data visualization - the process of displaying data
(often in large quantities) in a meaningful fashion to
provide insights that will support better decisions.
– Data visualization improves decision-making, provides
managers with better analysis capabilities that reduce
reliance on IT professionals, and improves collaboration
and information sharing.
Creating Charts in Microsoft Excel
• Highlight the data.
• Select the Insert tab.
• Click on the chart type, then subtype.
• Use the options in the Design (Chart Design in Mac) and
Format tabs to customize your chart.
Column and Bar Charts
• Excel distinguishes between vertical and horizontal bar charts, calling the
former column charts and the latter bar charts.
– A clustered column chart compares values across categories using vertical
rectangles;
– a stacked column chart displays the contribution of each value to the total by
stacking the rectangles;
– a 100% stacked column chart compares the percentage that each value
contributes to a total.
• Column and bar charts are useful for comparing categorical or ordinal
data, for illustrating differences between sets of values, and for showing
proportions or percentages of a whole.
Line Charts
• Line charts provide a useful means for displaying data over
time.
– You may plot multiple data series in line charts; however, they can be
difficult to interpret if the magnitude of the data values differs greatly.
In that case, it would be advisable to create separate charts for each
data series.
Pie Charts
• A pie chart displays the relative proportion of each data
source to the total by partitioning a circle into pie-shaped
areas.
Pie Chart Alternatives
• Data visualization professionals don't recommend using pie charts. In
a pie chart, it is difficult to compare the relative sizes of areas;
however, the bars in the column chart can easily be compared to
determine relative ratios of the data.
– If you do use pie charts, restrict them to small numbers of categories,
always ensure that the numbers add to 100%, and use labels to display
the group names and actual percentages. Avoid three-dimensional (3-D)
pie charts—especially those that are rotated—and keep them simple.
Area Charts
• An area chart combines the features of a pie chart
with those of line charts.
– Area charts present more information than pie or line
charts alone but may clutter the observer’s mind with
too many details if too many data series are used; thus,
they should be used with care.
Scatter Charts
• Scatter charts show the relationship between two
variables. To construct a scatter chart, we need
observations that consist of pairs of variables.
Orbit Charts
• An orbit chart is a scatter chart in which the points are connected in
sequence, such as over time. Orbit charts show the “path” that the
data take over time, often showing some unusual patterns that can
provide unique insights.
– Create a scatter chart with smooth lines and markers.
Bubble Charts
• A bubble chart is a type of scatter chart in which
the size of the data marker corresponds to the
value of a third variable; consequently, it is a way
to plot three variables in two dimensions.
Combination Charts
• Often, we wish to display multiple data series on the same chart
using different chart types. Excel 2016 for Windows provides a
Combo Chart option for constructing such a combination chart; in
Excel 2016 for Mac, it must be done manually.
• We can also plot a second data series on a secondary axis; this is
particularly useful when the scales differ greatly.
Radar Charts
• Radar charts show multiple metrics on a spider web.
• This is a useful chart to compare survey data from one time period
to another or to compare performance of different entities such as
factories, companies, and so on using the same criteria.
Stock Charts
• A stock chart allows you to plot stock prices,
such as daily high, low, and close values.
• We will explain how to create stock charts in
Chapter 6 to visualize some statistical results,
and again in Chapter 15 to visualize optimization
results.
Sparklines
• Sparklines are graphics that summarize a row or
column of data in a single cell.
• Excel has three types of sparklines: line, column,
and win/loss.
– Line sparklines are clearly useful for time-series data.
– Column sparklines are more appropriate for categorical
data.
– Win-loss sparklines are useful for data that move up or
down over time.
Dashboards
• A dashboard is a visual representation of a set of key business
measures. It is derived from the analogy of an automobile’s control
panel, which displays speed, gasoline level, temperature, and so on.
– Dashboards provide important summaries of key business information to help
manage a business process or function.