ISOM3360 Data Mining for Business Analytics
Introduction
Instructor: Yi Yang
Department of ISOM
Spring 2025
Welcome
❑ Course information
2
About me
❑ Instructor: Yi Yang, Associate Professor, ISOM
❑ Research interests: machine learning
❑ Ph.D. from Northwestern University
❑ Taught at UIUC for two years
❑ Worked at IBM Research and Amazon
❑ Consulting for hedge fund on machine learning
❑ Teaching ISOM 3360, 3370, 5270 and ExecEd
3
Course information
❑ 18~19 lectures
❑ 10 lab sessions
❑ Hands-on problem solving using Python
❑ 3 assignments
❑ 1 team project (3~4 person per group)
❑ 2 exams: Midterm exam (tentatively Mar 20,
7:00pm-8:30pm); Final exam (TBD)
4
Course material
❑ All the materials (e.g., lecture slides, readings)
will be posted on Canvas course website.
❑ Data Mining for Business Analytics: Concepts, Techniques, and
Applications in R, by Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R.
Patel, Kenneth C. Lichtendahl
❑ Data Science for Business: What you need to know about data mining
and data-analytic thinking, by Foster Provost, Tom Fawcett
❑ Learning Data Mining with Python, by Robert Layton
5
Grading components
❑ Lab 5%
❑ Class Attendance/Participation 10%
❑ Homework Assignments 10%
❑ Group Project 15%
❑ Midterm Exam 28%
❑ Final Exam 32%
6
❑ Instructor: Yi Yang
❑ Email: imyiyang@ust.hk Begin subject: [ISOM3360]
❑ Office Hours: by appointment
❑ Teaching Assistant:
❑ Sophie Gu, imsophie@ust.hk
7
Academic integrity
8
Questions
9
You may have heard of these
Big Data
Artificial
Intelligence
Data
Mining Data
Science
Machine Python
Learning
Course: Small data
Python: not related to machine learning/data mining --> they are
independent, but usually ppl like using python, but actually they can 10
use other coding language
Data mining
11
Data mining
Hong Kong Smart City: All the things are
interconnected, e.g. traffic light
12
Data
Structured Data Unstructured Data
Data that has a predefined
Data that does not have a
and organized format or
Definition predefined or organized format
schema, often stored in a
or schema
database or spreadsheet
Text documents, social media
Tables in a relational
posts, images, videos, audio
Examples database, spreadsheets,
recordings, email messages,
log files, financial data
web pages, GPS data, etc
Natural language processing,
Techniques Machine learning (simple) computer vision, speech
recognition (hard)
Data Volume Usually smaller in volume Usually much larger in volume
most exciting data for company
Focus of the course: structured data but still rely on machine learning but need extra
technique e.g. nlp
13
Unstructured data
❑ In addition to traditional numerical data, a wealth
of potentially valuable business information may
originate in unstructured forms.
14
Unstructured data: Text
data is in the text format
e.g. financial annual report (operation; earning; numbers is in
the text)
--> valuable for the company e.g. use for trading
---> but how to process: use machine learning
understanding the information
then give insights is important
15
16
Unstructured data: Image
Street view
recongize the car/truck by using machine learning
can use human but costly and time-consuming
17
Data mining
IMPORTANT: Definition
❑ Finding patterns in large amount of data, using
machine learning methods, for actionable
insights Patterns: similarity/commonity
we can still build the modern with small data, don't need large amount of data, but of
course large is better
finding actionable insights is the "Purpose"
18
Prediction is the key
❑ Prediction is the key for decision
making under uncertainty.
❑ Better prediction creates competitive
advantages.
What is actionable insights?
Insights refers to prediction
prediction all we care about
beacuse our life is all of uncertainty
e.g. do we need to bring unbrumella , the can give use predicition, then
can make a better decision
reduce uncertainty --> then can earn money
e.g. estimate the customer demand in next month
19
Machine Learning
machine learning can help us make predicition
❑ Machine learning algorithms enable computer programs to
automatically analyze data, recognize patterns, and make
predictions for new unseen data. this is the key
❑ Machine learning models make predictions.
Machine learning: It’s a
induces pattern FACE
from data
learn patterns from human face e.g. eyes
by use mathmatical forms
use old data format to predict unseen/new
data --> better accuuracy
X random guess
Face recognition from image 20
❑ Q: Is ChatGPT a machine learning model?
prediction process?
input: is our question, whcih is a new unseen data
ourput: ChatGPT's predicition
YES
21
An Example: customer complaint management
Some complaint with high priority/ low priority
--> then can classify
--> machine learning: traing the model to predict the priority type/complaint type e.g. shipping problem or product
--> then can send the provlem to different team very quickly withour manually
Paint point: Firms receive customer complaint
filings on different aspects, how to handle the
complaints in a timely manner?
22
Traditional vs. machine learning solution
❑ Traditional: hire a team of customer services to read the
complaints and forward the complaints to different teams for
handling.
❑ Machine learning: automatically classify customer complaints
into different categories (e.g. shipping related, product defect
related); automatically rank the priority of the complaints; and
forward the complaints to different teams for handling.
23
Business Intelligence pyramid
Decision making and
strategic planning
Data Mining, Machine Learning
Data retrieval and
aggregation
Data management
and Storage
different layer requires different technique
24
Predictive vs. Descriptive analytics
Descriptive Analytics Predictive Analytics
Focus Understanding past events Predicting future outcomes
Goal Summarize and present data Build predictive models
Data Type Historical data Historical data
Analysis Identify patterns and trends Build models to make predictions
Use Case Understand past performance Forecast future outcomes
understand the existing data/patterns
Example Sales data for the past year Sales forecast for next quarter
Summary statistics and
Output Predictive models and forecasts
visualizations predicition is about new data
Data aggregation, visualization,
Techniques Machine learning
and basic statistics
Decision Making Reactive Proactive
cannot use summarize to answer the question" what is
the sales for next month"
it needs analyse and prediction
25
Exercise
You, as a company marketing director, want to know the answers to the
following questions. Which ones require a data mining solution?
Who are the high-value customers?
desceiptive anaysis:
Is there an age difference between the high-value customers and the low-value
customers?
descriptive anaysis: find the low-calue and high-value, then calcuate the averge age, then find the different
--> t-test (method name)
Will some particular new customer be high-value customer?
predictive analysis
How many sales amount should I expect a new customer to generate?
predictive analysis: regression number, beacuse need to predict the real number
Customer Gender Age Membership Monthly Amount
Purchase
Alice F 25 Y 5 $120
Let’s define customers whose
Bob M 40 Y 3 $30
amount > $100 as high-value
Charlie M 35 Y 6 $210 customer. The rests are the low-
value customer.
Doug M 18 N 4 $95
… … … … … …
26
Descriptive analytics
27
Exercise
❑ Say you work in a digital media company that provides
online streaming video service. You have lots of data
about lots of users watching lots of movies/TVs. What are
the
personalized use cases of predictive analytics?
recommadation
of videos --> prediciton -->
based on the customer's
behavior in the past --> then
predict the customers's new
move
Good example: Tiktok, keep
suggest good short videos to
you, then keep watching
Loop: watch more, more data,
tiktok know more about use,
then better prediction /
accuracy
budget planning: can know
whcih movies is worth to be
invested
--> predict ppl interest in
what topics of mov
--> so company invest more
in the host of cast
Misuse: share account of
netflix, good for individual,
use IP address to check ppl share account
but bad for netflix
--> prediction problem 28
--> Netflix flight for them
Course objective
❑ You will learn
❑ Various machine learning models
❑ Hands on experience by lab practice
❑ Analytical thinking by various business examples
large data
use little shelves to dig the gold --> slow
with machine learning, know how to use the bigger tools --> more effectively and
efficient
course: not about how to disign digger, but learn how to use it
❑ You will not learn
❑ Data warehousing, Database, big data techniques
❑ Business/Managerial planning
29
30