Int To Ds
Int To Ds
1
Data Revolution
• Data is created constantly, and at an ever-increasing rate
• Massive amounts of data about many aspects of our lives
• Shopping, communicating, reading news, listening to music, searching for
information, expressing our opinions
• The finance, the medical industry, pharmaceuticals, bioinformatics,
government, education, retail, and the list goes on.
• Websites track every user’s on every click.
• Smartphone are building up a record of our location
• Smart cars collect driving habits, smart homes collect living habits, and
smart marketers collect purchasing habits.
2 2
Data Revolution
• Cross-referenced encyclopedia; domain-specific databases about movies,
music, sports results, pinball machines,
3 3
Big Data - a tsunami that is hitting us
We are witnessing a tsunami of data:
Huge volumes
Data of different types and formats
Impacting the business at new and ever increasing speeds
The challenges:
Capturing/collecting data
Managing
Processing - from managing the raw data to programming to
provide insight into the data
Storing - safeguarding and securing
“Big Data refers to non-conventional strategies and innovative
technologies used by businesses and organizations to
capture, manage, process, and make sense of a large volume
of data”
Data has an intrinsic property…it grows and grows
1 in 2
business leaders don’t have
access to data they need
Growing interconnected &
instrumented world
Data Revolution
eBay captures a terabyte of data per minute
Every mouse click on a web site is captured in Web log files
Machines (smart meters, Sensors, GPS, etc)
Social media sites
7
7
Characteristics of the Data Revolution
8
Characteristics of Big Data
March 3, 2017
10
Causes of Data Revolution (Historical perspective)
Year Event
1991 • World Wide Web is born
1995 Sun releases the Java platform
Global Positioning System (GPS) omnipresence in car, airplane
1999 invents the term the Internet of Things
2001 Wikipedia is launched
2003 The amount of data created surpasses the amount of data created in all
of human history before then
LinkedIn launched, 260 million users by 2013
11
Causes of Data Revolution
12
Demand for Data Science
According to US News and World Report in 2023, information
security analyst, software developer, data scientist ranked among
the top jobs in terms of pay and demand
Data scientist
Average annual salary: $152,279
13
Definition of Data Science
• Data science (DS) is an interdisciplinary field
that uses scientific methods, processes,
algorithms, and systems to extract knowledge
and insights from structured, semi-structured and
unstructured data.
• In simpler terms, DS is about obtaining,
processing, and analyzing data to gain insights
for many purposes.
14 14
Definition of Data Science
• DS combines various technologies, techniques,
and theories from various fields, mostly related to
computer science, statistics, and mathematics, to
obtain actionable knowledge from data.
• In simple terms, it is the umbrella of techniques
used when trying to extract insights and
information from data.
15 15
Data Science
16
Discipline Definition (reading assignment)
Groups who have tried to define data science profession
• ACM Data Science Task Force (2019)
• The EDISON Data Science Framework (2018)
• The National Academies of Science, Engineering, and Medicine
Report on Data Science for Undergraduates (2018)
• The Park City Report (2017)
• The Business Higher Education Framework (BHEF) Data Science and
Analytics (DSA) Competency Map (2016)
• Business Analytics Curriculum for Undergraduate Majors (2015)
17
Data Analytics life cycle
18
Identified Data Science Competence Groups
Traditional/known Data Science competences/skills groups
include
Data Analytics or Business Analytics or Machine Learning
Engineering or Programming
Subject/Scientific Domain Knowledge
19
Identified Data Science Competence Groups
20
Data Science Competence Groups - Research
Scientific Methods
• Design Experiment
• Collect Data
• Analyse Data
• Identify Patterns
• Hypothesise Explanation
• Test Hypothesis
Business Operations
• Operations Strategy
• Plan
• Design & Deploy
• Monitor & Control
• Improve & Re-design
21
Data Science Competences Groups – Business
24
Statistical Inference
• The world we live in is complex, random, and uncertain
• It’s one big data-generating machine
• We capture the world or certain traces of the world into
data
• Those captured traces will be converted into something
more comprehensible, to something that somehow
captures it all in a much more concise way, and that
something could be mathematical models or functions of
the data, in a process called statistical estimators.
• This overall process of going from the world to the data,
and then from the data back to the world, is the field of
statistical inference.
25 25
Statistical Inference
• We usually infer not from the total population but
from the sample
• In the age of BigData where we have all the
population, the notion of taking sample may not
work
• The new kinds of data in BigData require us to
think more carefully about what sampling means
in these contexts.
• How do you sample from a network and preserve
the complex network structure?
26 26
Understanding data
Types of data
Categorical data
Nominal, e.g. colour, gender, …, etc.
Ordinal, e.g. Military rank, academic
rank, overall performance, …, etc.
Numerical data
Discrete , e.g. number of children in HH,
number of students in a class.
Continous, e.g. income, age, weight,..,
etc.
27
Understanding data
28
Understanding data
unstructured data.
o Storage Considerations: While it can be challenging
29
Understanding data
model.
o It doesn’t conform to rigid structures like those
30
Understanding data
Data sources
1. Panel Data: Panel data, also known as
longitudinal data, involves measurements over
time for the same subjects (individuals, firms,
countries, etc.).
Examples:
32
Understanding Data
Data sources
3. Biological Data:Biological data refers to
information derived from living organisms and their
products.
Examples:
DNA sequences.
Protein structures.
Genomic data.
Amino acid sequences.
Use Case: Bioinformatics leverages biological data to
analyze and interpret vast amounts of genomic
information.
33
Understanding Data
Data sources
4. Spatial Data: Spatial data directly or
indirectly references specific geographical areas
or locations. It includes both location-specific
data and other relevant information.
Examples:
36
Data Science Discipline
Knowledge Areas
37
Identified Data Science Skills/Experience
Groups
A data scientist is a practitioner who has sufficient knowledge in the overlapping
regimes of business needs, domain knowledge, analytical skills, and software and
systems engineering to manage the end-to-end data processes in the data life cycle.
38
Identified Data Science Skills/Experience
Groups
Group 1: Skills/experience related to competences
Data Analytics and Machine Learning
Data Management/Curation (including both general data management and scientific data
management)
Data Science Engineering (hardware and software) skills
Scientific/Research Methods
Application/subject domain related (research or business)
Mathematics and Statistics Big Data Tools and Programming
Languages
Group 2: Big Data (Data Science) tools and platforms • Big Data Analytics platforms
Big Data Analytics platforms • Math& Stats tools
Math & Stats apps & tools • Databases
Databases (SQL and NoSQL) • Data/applications visualization
• Data Management and Curation
Data Management and Curation platform
Data and applications visualisation
Cloud based platforms and tools
Group 3: Programming and programming languages and IDE
General and specialized development platforms for data analysis and statistics
Group 4: Soft skills or Social Intelligence
Personal, inter-personal communication, team work (also called social intelligence or soft 39
skills)
The roles and responsibilities of Data
Scientists, Data Engineers, and the
dynamics of Data Science Teams:
40
Data Scientists
Role:
Data scientists are analytical experts who extract valuable insights
from data.
They bridge the gap between raw data and actionable
business decisions.
Responsibilities:
Data Collection and Cleaning
Exploratory Data Analysis (EDA
Feature Selection and Model Building:
Communication.
Skills: technical, analytical, and communication skills.
Impact: They drive data-driven decision-making within
organizations.
41
Data Engineers
Role: Data engineers build and maintain the
infrastructure that data scientists use for data
collection, storage, and processing.
Responsibilities:
Data Pipelines.
Database Management
Data Transformation
Skills: database management, programming,
and system architecture.
Impact: They enable efficient data flow and
accessibility for data scientists.
42
Data Science Teams
43
Application of data science (Societal
Problems Addressed )
44
HealthCare
47
Retail
Customer is savvy, impatient and busy.
They want instant gratification and excellent customer service.
In order to compete and stay one step ahead, retailers need to have a
360-degree view of the customer.
Helps businesses get deep insights into customer behavior.
It helps them understand their customer’s requirements more precisely
It gives insights such as:
How to increase margins at a product-level?
Insights into your customer profile that helps answer questions like who they are and why
they make certain purchases (Market Basket analysis)
Identify items that are likely to be purchased together.
Which marketing strategies work better than others?
ROI of marketing spend
Optimal Pricing
What promotions and offers to employ in each store?
Store wise product-mix
Personalized offers
Efficient stock strategy 48
E-commerce
Businesses can collect a wealth of information about their site, their visitors
and where they came from, and use it to find new customers and increase
conversions.
E- commerce businesses primarily use analytics to understand:
Acquisition - how your visitors and customers found and arrived at your
site.
Shopping and purchasing behavior: how users engage with your
website, which products they view, which ones they add or remove from
shopping carts; along with initiating, abandoning, and completing
transactions.
Economic Performance – how many products the average transaction
includes, the average order value, refunds you had to issue.
49
Finance
The global financial analytics market is one of the fastest growing sectors of
the data industry.
Organizations big and small are investing in financial analytics tools and
technologies to solve specific business problems, reduce costs, improve
budgets and get insights into future financial scenarios.
Typically financial analytics includes
Risk analysis
Working capital management
Fraud detection and prevention
Shareholder metric analysis
50
Healthcare, Education, Telecom etc
Analytics can be used for evidence based medical care, improved patient
care, predicting outbreaks of diseases and reducing hospital operating costs.
Analytics is also being used to improve teaching practices. It also enables
teachers to better monitor student progress, personalize learning and
improve educational institutions operational efficiencies.
In the telecom industry analytics is fast gaining much ground. Operators are
using analytics to drive revenue, reduce churn and improve network
performance.
51
Marketing
Understanding customers and how to find more people like them is the key
to sustainable growth.
Analytics can not only help companies do this but it can add value to other
marketing functions as well, by gathering data across all marketing channels
and consolidating it into a common marketing view.
It helps measure, manage and analyze marketing performance to maximize
its effectiveness and optimize return on investment (ROI).
How are our marketing initiatives performing today?
Which of them are viable in the long run?
How can we improve those which are not effective?
How do our marketing activities compare with our competitors’?
What can we learn from our competition?
Are our marketing resources properly allocated?
Are we using the right channels?
52
Sales
Though sales analytics can help identify, model, understand and predict
sales trends and outcomes we see very few companies realizing its potential
to aid sales management.
However the potential is huge and over the next several years sales
analytics will be one of the most important domains for Data Analytics and
Big Data.
What sales analytics can essentially do is:
See what goods and services have and have not sold well.
Determine optimal inventory
Measure the effectiveness of the sales force and determine optimal sales force
size
Sales incentive cost analysis
Competitor sales analysis
53
Supply chain management
Supply chain analytics helps monetize and optimize:
Current inventory status
Forecasts
Demand planning
Sourcing
Production
Improved worker productivity measurement
Transportation routing
54
Human Resource
HR analytics helps managers by creating a single view of all relevant
workforce and other HR related data.
These insights can be used to make business decisions that drive business
processes and initiatives and improve profitability.
Key areas where workforce related data driven analytics can be used are:
Talent acquisition and retention
Attrition
Headcount Management and Workforce Optimization
Optimization of Compensation and Benefits
Build Leadership
Performance and Career Management
Training and Development
55
What is Data Analytics?
• Analytics can range from a simple exploration into how
many sales of a particular product were made last year to
a complex neural network model predicting which
customers to target for next year’s marketing campaign.
• The extensive use of data, statistical and quantitative
analysis, exploratory , predictive models, and fact based
management to drive decisions and actions
• A key to deriving value from data is the use of analytics.
56 56
What is Data Analytics?
57
Three Kinds of Analytics
58
Example of Descriptive analytics
60
Example of Prescriptive analytics
61