Course Outline
Course Outline
Course Description
One of the Canadian former ice hockey players, Wayne Gretzky, said this once “I skate where the puck is going
to be, not where it has been.” As someone can guess, a major component of this course deals with predictive
analytics, i.e., how to convert a business objective to a prediction problem, what tools and techniques are around
for prediction, and how to evaluate a prediction model. In addition to the predictive analytics, the course focuses
on tools meant for discovering structure in the data (read extract insights from the data) that help businesses in
making decisions and driving strategies. The course also touches upon some very powerful visualization
techniques, again meant to discover insights from the data.
Analytics is being used in almost every areas, e.g., FMCG, apparel industry, finance sector, consulting,
education, healthcare, and what not. The widespread proliferation of IT influenced economic activity leaves
behind a rich trail of micro-level data. Yet, most organizations are data rich but information poor. Emerging
technologies such as RFID, weblogs, social networks, website usage tracking and vast amounts of online
information (such as product ratings and bid histories) have the potential to reveal a lot about consumer, supplier
and competitor preferences to those that have the ears (read data-mining capabilities) to listen. The questions that
can be addressed using analytics are plenty: How can mobile companies use their customer database to predict
customer churn or to personalize SMS messages for improving customer service? How can financial institutions
use past loan data to predict the chance of defaulting for a new loan applicant? How can Bollywood use data on
movies to predict the next box-office hit? How can charities use data from a campaign in one location to target
the right people in another location? And how can politicians use databases of supporters to segment and best
target each audience? So, the knowledge acquired in this course will benefit not only those who plan careers in
pure-play analytics but also those who plan to work in applied fields like targeted marketing, predictive modeling,
strategic consulting, risk management, etc. The bottom line is - the analytics is going to give students an edge in
whatever areas they are going to work for.
In the course, we work with real business problems and real data. Students will learn the types of questions
that data mining can answer and the appropriate data mining tools for answering different questions. The emphasis
is on understanding the concepts behind a wide set of data mining techniques and their relation to specific business
analytics situations, rather than on mastering the theoretical underpinnings of the techniques. The course will
offer a practical experience of converting business objectives to data mining problems. Around the middle of the
course, students will have an “Ideation” project that solely focuses on formulating a data mining problem after
identifying a business objective. The course ends with a term project that requires the students to work on a data
1
mining problem with a real dataset. Students will have individual assignments during the term to prepare
themselves for the project.
An important feature of this course is hands-on learning using state-of-the-art business analytics software and
an Excel data mining add-in. All required data mining algorithms (plus illustrative data sets) are provided in an
Excel add-in, XLMiner. In addition, we will introduce TIBCO Spotfire, an industry leading data visualization
tool. Students are not required to have any extraordinary technical ability in Statistics or other methodological
areas to benefit from the course. The material is developed and presented in an intuitive manner with the objective
of making the students smart consumers of this widely applicable technology, in the managerial context.
Wherever appropriate, real examples will be used to motivate the topic being covered. However, sufficient
comfort level in dealing with data is necessary in order to appreciate the values of different techniques by doing.
It will not be required to write codes, however, it is expected that students taking this course are comfortable
dealing with data and using new software.
Students are expected to attend all classes, as well as, come prepared for each class to make the best use of
the classroom time as the course content is delivered in an interactive fashion and each lecture significantly builds
upon the materials covered in previous lecture(s).
Learning Goals
In addition to the course objectives listed above, students should expect to develop the following by the end of
the course:
2
Each student shall be able to identify key issues in a business setting, develop a perspective that is
supported with relevant information and integrative thinking, to draw and assess conclusions.
Assessment: Individual Assignments, classroom discussions, ideation proposal, team presentations
Recommended Textbook
Data Mining for Business Analytics: Concepts, Techniques, and Application with XLMiner by Galit
Shmueli, Nitin R. Patel and Peter C. Bruce.
Required Software
• Analytical Solver Data Mining (formerly known as XLMiner), an Excel add-in.
• Business Intelligence and visualization tool TIBCO Spotfire Professional.
Optional Software
• R with R-Studio (an open-source software; no assignment or classroom exercise will involve R programming;
this is optional; however, additional asynchronous material will be provided for those interested)
• Python with Jupyter notebooks (Also an open-source software. Code will be provided to replicate the same
examples discussed in class.)
Session-wise Schedule
Session
Topics Deliverables Recommended Readings
ID
• Book Chapters 1 and 2 –
• Business Analytics Introduction and Overview of the
Applications Data Mining Process
• Introduction to • Article “A Predictive Analytics
Supervised and Primer” by Thomas H.
S01 Unsupervised machine Davenport; HBR, Sep 02, 2014
• Article “Where predictive
learning techniques
analytics is having the biggest
• Explaining vs. Prediction impact” by Jacob LaRiviere,
• Data Partitioning in Justin Rao, Preston McAfee,
Supervised Learning Vijay K Narayanan, and Walter
Sun; HBR, May 25, 2016
• Please install the necessary software, at least, the trial version before next class.
Week 1
3
• Book Chapters 5.1 and 5.2 –
• Linear Regression for
Evaluating Predictive
Prediction
S02 Performance
• Prediction goals and
• Book Chapter 6 – Multiple
performance
Linear Regression
• Individual
Please submit by Monday
Monday Assignment 1 –
noon.
15%
• Profiling vs.
Classification • Book Chapter 10 – Logistic
• Logistic Regression for Regression
Classification • Book Chapters 5.3, 5.4, and 5.5
S03
• Classification goals and – Evaluating Predictive
performance Performance
• Ranking goals and
performance
• Book Chapter 7 – k-NN
• K-NN
• Book Chapter 8 – The Naïve
• Naïve Bayes
S04 Bayes Classifier
Week 2
• CART
• Book Chapter 9 – CART
•
o K-means
clustering
4
• Article: Cluster Analysis for
• Case Study: Mall of Segmentation, Darden Business
America Publishing, University of Virginia
S06 • Cluster characterization Book Chapter 3- Data
Data Visualization Visualization
•
Meet the instructor to discuss the submitted proposal for the group project.
• Individual
Please submit by Monday Assignment 3 –
Monday
noon. 15%
Asynchronous components:
• Profiling using Logistic regression (complementary materials to Session 3)
• Ensemble (Complementary materials to Session 5)
• Multi-class classification (Complementary materials to Session 5)
• Guest Lecture: “Analytics at Scale” – Abhishek Kumar, Machine Learning, Google Inc
• Tutorial on R and Python programming, counterpart to all the methods covered in Sessions 2 through 7 (Optional
component for students)
5
NOTE: Classes will not be recorded and attendance is mandatory
Graded Components
Name of the
component
(For details Take- Group Softcopy / Mark
Submission Coding Weight
on each home or Assignm Hard copy release
Deadline Scheme
component in-class ent (Y/N) submission date
stay tuned to
LMS posts)
Within 10
Individual Week 2 –
Take-home N Soft copy 2N-b days after 15%
Assignment 1 Monday noon
submission
Individual Week 3 –
Take-home N -do- 2N-b -do- 20%
Assignment 2 Monday noon
Group
Week 3 –
Project Take-home Y -do- 3N-b -do- 10%
Monday noon
Proposal
Individual Week 4 –
Take-home N -do- 2N-b -do- 15%
Assignment 3 Monday noon
Group
Project Session 8 In-class Y -do- 3N-b -do- 10%
Presentation
Individual
exercise in
class -
Session 8 In-class N -do- 4N -do- 05%
evaluating
other teams’
presentations
Group
Week 5 –
Project Final Take-home Y -do- 3N-b -do- 20%
Monday noon
Report
6
Individual: 60%
1
Any referencing needs to be accompanied with appropriate citations
2
A non-exhaustive list includes journal articles, news items, databases, industry reports, open courseware
9