0% found this document useful (0 votes)

50 views27 pages

Week 1 Lecture 1 New

This document summarizes a lecture on data science. It defines data science as the extraction of knowledge from large amounts of structured or unstructured data, with applications in many fields. It discusses the differences between structured and unstructured data, as well as between supervised and unsupervised learning problems in data science. Supervised learning involves predicting or understanding relationships between inputs and known outputs, while unsupervised learning discovers patterns without known outputs.

Uploaded by

gavan.corke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views27 pages

Week 1 Lecture 1 New

Uploaded by

gavan.corke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Week 1 Lecture 1

Unit Coordinator - Dr Liwan Liyanage

School of Computing, Engineering and Mathematics

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 1 / 27
What is Data Science?

Data Science is;

Statistics?
Machine Learning?
Big Data?

Data Science is the extraction of knowledge from large volumes of

structured or unstructured data. It has application in Science,
Business, Social Science, wherever data is collected. . .
We will look at some areas of Data Science.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 2 / 27
Data Science
From towardsdatascience.com

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 3 / 27
Big data
From eureka.co

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 4 / 27
Jobs in Data Science
From indeed.com

Figure 3:
Unit Coordinator - Dr Liwan Liyanage (School Week
of Computing,
1 LectureEngineering
1 and Mathematics) 5 / 27
Jobs in Big Data Analytics
From indeed.com

Figure 4:
Unit Coordinator - Dr Liwan Liyanage (School Week
of Computing,
1 LectureEngineering
1 and Mathematics) 6 / 27
Data Science or Statistics

Data Science uses a blend of methods from statistics, computing

and machine learning to extract information from data. It is LESS
concerned about p-values and hypothesis testing. But these have
uses.
Typical data has multiple measurements (variables) on several
observations.
For example, Fisher’s Iris data - measured the species, sepal lengths and
widths and petal lengths and widths for 150 iris flowers.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 7 / 27
Types of Data
Structured data
Quantitative or Numeric data - height, weight, salary, sales
dollars
Qualitative or Factor/Categorical data - Ethnicity, product
code, hair colour
Structured data is usually a series of measurements on distinct
observational units, and can be arranged as a table.
Unstructured data
Images - Flickr
Videos - Youtube
Text - eg. Twitter
Unstructured data usually DOES NOT look like a nice table, but
contains information non-the-less. Often the first step in analysing
unstructured data is to extract information to make structured data.
Unit Coordinator - Dr Liwan Liyanage (School Week
of Computing,
1 LectureEngineering
1 and Mathematics) 8 / 27
Big Data

You’ve no doubt heard the hype around “Big Data”.

Data from multiple sources that can be linked to form a complete (or
extensive) picture of an individual?
This is probably harder than it looks.
However, many areas now have access to large amounts of data or linked
“big data”.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 9 / 27
Examples of Data Science problems

Predict the outcome of marketing campaigns

Model (anticipate) demand for a product or service
Model (understand) the relationship between stress and working
conditions
Recommend similar products to previous purchases
Find likely fraudulent insurance claims
Understand structures in groups or networks

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 10 / 27
Supervised versus Unsupervised

Data Science problems generally split into supervised and

unsupervised problems.
Supervised learning involves data where each observational unit has
one special variable - the output/outcome.

Patients survive a treatment or not

Customer spend on a certain product

Unsupervised learning DOES NOT have a special variable, we are

interested in discovering patterns.

Fraud - finding observations that don’t fit the usual pattern

Segmentation - grouping a market into more homogeneous groups

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 11 / 27
Supervised Learning
In supervised learning, we have a response or outcome.
We are interested in understanding or predicting the relationship
between the output and several inputs.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 12 / 27
Supervised Learning

Figure 6:

We want to learn about f from a sample of inputs and outputs

Unit Coordinator - Dr Liwan Liyanage (School Week
of Computing,
1 LectureEngineering
1 and Mathematics) 13 / 27
Supervised Learning

Starting point:

Outcome measurement Y (also called dependent variable,

response, target).
Vector of p predictor measurements X (also called inputs,
regressors, covariates, features, independent variables).
In the regression problem, Y is quantitative (e.g price, blood
pressure).
In the classification problem, Y takes values in a finite,
unordered set (survived/died, digit 0-9, cancer class of tissue
sample).
We have training data (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ). These are
observations ( examples, instances) of these measurements.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 14 / 27
Supervised Learning

In Mathematical terms we would write;

E(Y ) = f (X1 , X2 , ..., Xp )

y is the output and the x0 s are the inputs.

We DO NOT model y but its expected value.
Even for the same set of inputs, the output may vary;
measurement error, random variation, etc.
So, E(y) is the expected value or average value, for a given set of
inputs.
Any difference between the expected value and an observed value is
called noise.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 15 / 27
Supervised Learning

The simplest form of supervised learning is simple linear regression

E(Y ) = a + bX

There is one input (X) and one output, and two parameters, a and b. A
sample of data; input/output pairs, would be used to estimate the
parameters.
The results could then be used to;

Make inferences about the relationship between input and output.

Make prediction of future outputs for a given input value.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 16 / 27
Bias and Variance

Figure 7:

Both graphs have the same Y and X. The left graph has a very
simple smooth f , the right a more complex f . Modelling must
choose the right form of f as well as fitting the actual f (estimating
parameters).
Unit Coordinator - Dr Liwan Liyanage (School Week
of Computing,
1 LectureEngineering
1 and Mathematics) 17 / 27
Bias and Variance

Generally, fitting complex functions results in more variance - there is

more uncertainty around the fitted parameters.
Fitting simple functions can result in more bias - there is systematic
differences between the fitted and true functions.
Both bias and variance contribute to prediction accuracy.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 18 / 27
Prediction Accuracy and Interpretation

Prediction Accuracy refers to how closely we can predict a future

observation.
Often it must be estimated from the same sample that was used to fit
the function (Not good practice).

Training data set is used to build the model

Validation data set is used to validate the model accuracy
Testing data set is used to measure prediction accuracy.

More complex functions sometimes have better prediction accuracy but

the results can be hard to interpret.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 19 / 27
Regression vs. Classification

When the output is a numeric variable, supervised learning is

sometimes referred to as regression. Some examples of regression
methods are;

(Simple) Linear Regression

Generalised Linear Models
Neural Networks

When the output is a class or factor variable, supervised learning is

classification. Some examples of classification methods are;

Nearest Neighbours
Generalised Linear Models (Logistic Regression)
Support Vector Machines

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 20 / 27
Unsupervised Learning

When there is NO variable that can be considered an output or special,

then unsupervised learning may be appropriate. Unsupervised
learning looks for patterns amongst the input variables.
Methods can range from;

Visualisation and dimension reduction (eg: PCA)

for a large number of inputs finding combinations of variables that can

be plotted to display features in the data.

Clustering (eg: kmeans, hierarchical clustering)

using automated techniques to find groups in the data.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 21 / 27
Clustering the Iris data

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 22 / 27
Unsupervised Learning

No outcome variable, just a set of predictors (features) measured on

a set of samples.
Objective is more fuzzy and groups of samples that behave similarly,
and features that behave similarly, and linear combinations of
features with the most variation.
Difficult to know how well you are doing.
Different from supervised learning, but can be useful as a
pre-processing step for supervised learning.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 23 / 27
This Unit
Supervised Learning:
Linear models: Simple Linear Regression and Multiple Linear
Regresion
Classification: Logistic Regresion, Discrimination and kNN
Classification and Regression Trees (Decision Trees)
Support Vector Machines
Unsupervised Learning:
Dimension reduction: Principal Component Analysis
Clustering: K Means and Hierarchical
Unstructured Data:
Text Mining (NOT COVERED)
Resampling and Error estimation
Visualisation
Unit Coordinator - Dr Liwan Liyanage (School Week
of Computing,
1 LectureEngineering
1 and Mathematics) 24 / 27
Objectives

On the basis of the training data, we would like to:

Accurately predict unseen test cases.

Understand which inputs affect the outcome, and how.
Assess the quality of our predictions and inferences.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 25 / 27
Philosophy

It is important to understand the ideas behind the various

techniques, in order to know how and when to use them.
One has to understand the simpler methods first, in order to grasp
the more sophisticated ones.
It is important to accurately assess the performance of a method, to
know how well or how badly it is working [simpler methods
often perform as well as fancier ones!]
This is an exciting research area, having important applications in
science, industry and finance.
Statistical learning is a fundamental ingredient in the training of
a modern data scientist.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 26 / 27
TEXTBOOK

Lecture notes are based on the textbook,

for further reference refer;
Prescribed Textbook

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An

Introduction to Statistical Learning: with Applications in R
Springer.

Unit Coordinator - Dr Liwan Liyanage (School Week

of Computing,
1 LectureEngineering
1 and Mathematics) 27 / 27

Machine Learning - Week 1
No ratings yet
Machine Learning - Week 1
1 page
DUnit I
No ratings yet
DUnit I
25 pages
2.0 Machine Learning Introduction
No ratings yet
2.0 Machine Learning Introduction
24 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Data Science
No ratings yet
Data Science
62 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Data Science Interview Questions 30 Days 1686062665
No ratings yet
Data Science Interview Questions 30 Days 1686062665
300 pages
Chapter 01 Introduction To ML
No ratings yet
Chapter 01 Introduction To ML
178 pages
ML Interview Questions PDF
83% (6)
ML Interview Questions PDF
20 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
L1 Introduction
No ratings yet
L1 Introduction
26 pages
465-Lecture 1 (Deep Learning)
No ratings yet
465-Lecture 1 (Deep Learning)
47 pages
ML Study
No ratings yet
ML Study
9 pages
Notes 1
No ratings yet
Notes 1
3 pages
Intro To Data Science Lecture 1
No ratings yet
Intro To Data Science Lecture 1
7 pages
ML 01
No ratings yet
ML 01
24 pages
ML Valkenborg
No ratings yet
ML Valkenborg
84 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
No ratings yet
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
6 pages
Data Science Activity
No ratings yet
Data Science Activity
11 pages
ABDUA 3 and 4
No ratings yet
ABDUA 3 and 4
102 pages
COMP9417 Review Notes
No ratings yet
COMP9417 Review Notes
10 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
AIML - Unit 4 Notes
No ratings yet
AIML - Unit 4 Notes
23 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
OR Forecasting Tool
No ratings yet
OR Forecasting Tool
39 pages
UNIT I Single Topic Per Page
No ratings yet
UNIT I Single Topic Per Page
12 pages
Data Analytics Unit IV
No ratings yet
Data Analytics Unit IV
13 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
UNIT 1ML Removed Removed
No ratings yet
UNIT 1ML Removed Removed
123 pages
PS632 Lecture 08 Using Unstructured Data
No ratings yet
PS632 Lecture 08 Using Unstructured Data
19 pages
Aiml-Qb - Unit 3
No ratings yet
Aiml-Qb - Unit 3
6 pages
Machine Learning and Deep Learning Supervised Learning 1682688720
No ratings yet
Machine Learning and Deep Learning Supervised Learning 1682688720
121 pages
01 Intro
No ratings yet
01 Intro
22 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
Unit 2 - Advance Concepts of Modelling in AI
100% (1)
Unit 2 - Advance Concepts of Modelling in AI
12 pages
MI - Unit 3
100% (1)
MI - Unit 3
107 pages
Classification
No ratings yet
Classification
53 pages
Machine Learning Basics for Beginners
100% (5)
Machine Learning Basics for Beginners
134 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
All Models Are Wrong
No ratings yet
All Models Are Wrong
429 pages
ML Merge
No ratings yet
ML Merge
145 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
19 pages
ML Introduction
No ratings yet
ML Introduction
47 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Supervised and Unsupervised Machine Learning Algorithms
No ratings yet
Supervised and Unsupervised Machine Learning Algorithms
3 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
7 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
Intro MLT 08jan25
No ratings yet
Intro MLT 08jan25
21 pages
Aa 3
No ratings yet
Aa 3
2 pages
Deep Learning & Image Processing (Notes)
No ratings yet
Deep Learning & Image Processing (Notes)
76 pages
Ai & ML - SLM
No ratings yet
Ai & ML - SLM
87 pages
Internship Report On Data Science
No ratings yet
Internship Report On Data Science
33 pages
Machine Learning-Based Modelling in Atomic Layer Deposition Processes (Etc.) (Z-Library)
No ratings yet
Machine Learning-Based Modelling in Atomic Layer Deposition Processes (Etc.) (Z-Library)
377 pages
Intro to AI for Students
No ratings yet
Intro to AI for Students
42 pages
Course Logistics and Introduction To Machine Learning
No ratings yet
Course Logistics and Introduction To Machine Learning
34 pages
Data Science Notes
No ratings yet
Data Science Notes
13 pages
Machine Learning Complete-Course-Notes Polimi
No ratings yet
Machine Learning Complete-Course-Notes Polimi
107 pages
Data Management and Data Transformation, Introduction To Machine Learning
No ratings yet
Data Management and Data Transformation, Introduction To Machine Learning
54 pages
A Datamining Model For Detection of Fraudulent Behaviour in Water
No ratings yet
A Datamining Model For Detection of Fraudulent Behaviour in Water
36 pages
Heart Failure CETM24
No ratings yet
Heart Failure CETM24
28 pages
Business Analytics GRP 2 Final
No ratings yet
Business Analytics GRP 2 Final
19 pages
Automated Tech for Industry Pros
No ratings yet
Automated Tech for Industry Pros
7 pages
AI-driven Applications.: Differences Between AI vs. Machine Learning vs. Deep Learning
No ratings yet
AI-driven Applications.: Differences Between AI vs. Machine Learning vs. Deep Learning
10 pages
Unit - 4
No ratings yet
Unit - 4
21 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Machine Leaning Cours
No ratings yet
Machine Leaning Cours
24 pages
B.Tech IT ML Study Guide
100% (2)
B.Tech IT ML Study Guide
21 pages
M.tech ML Unit-3
No ratings yet
M.tech ML Unit-3
17 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Self Supervised Learning: A Succinct Review: Veenu Rani Syed Tufael Nabi Munish Kumar Ajay Mittal Krishan Kumar
No ratings yet
Self Supervised Learning: A Succinct Review: Veenu Rani Syed Tufael Nabi Munish Kumar Ajay Mittal Krishan Kumar
15 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
Salary Prediction Document
No ratings yet
Salary Prediction Document
30 pages
Full ml-2
No ratings yet
Full ml-2
1 page
Exercise Guide v2.0
No ratings yet
Exercise Guide v2.0
124 pages
ML MCQ Unit 1
No ratings yet
ML MCQ Unit 1
8 pages
Data Mining Challenges & Solutions
No ratings yet
Data Mining Challenges & Solutions
15 pages
Notes Unit 1-3 Part-I
No ratings yet
Notes Unit 1-3 Part-I
20 pages
Unit 4 - Data Mining and Warehousing - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining and Warehousing - WWW - Rgpvnotes.in
13 pages
Forest Fire Prediction Report
No ratings yet
Forest Fire Prediction Report
26 pages

Week 1 Lecture 1 New

Uploaded by

Week 1 Lecture 1 New

Uploaded by

Week 1 Lecture 1

Unit Coordinator - Dr Liwan Liyanage

School of Computing, Engineering and Mathematics

Unit Coordinator - Dr Liwan Liyanage (School Week

Data Science is;

Data Science is the extraction of knowledge from large volumes of

Unit Coordinator - Dr Liwan Liyanage (School Week

Unit Coordinator - Dr Liwan Liyanage (School Week

Unit Coordinator - Dr Liwan Liyanage (School Week

Data Science uses a blend of methods from statistics, computing

Unit Coordinator - Dr Liwan Liyanage (School Week

You’ve no doubt heard the hype around “Big Data”.

Unit Coordinator - Dr Liwan Liyanage (School Week

Predict the outcome of marketing campaigns

Unit Coordinator - Dr Liwan Liyanage (School Week

Data Science problems generally split into supervised and

Patients survive a treatment or not

Unsupervised learning DOES NOT have a special variable, we are

Fraud - finding observations that don’t fit the usual pattern

Unit Coordinator - Dr Liwan Liyanage (School Week

Unit Coordinator - Dr Liwan Liyanage (School Week

We want to learn about f from a sample of inputs and outputs

Outcome measurement Y (also called dependent variable,

Unit Coordinator - Dr Liwan Liyanage (School Week

In Mathematical terms we would write;

E(Y ) = f (X1 , X2 , ..., Xp )

y is the output and the x0 s are the inputs.

Unit Coordinator - Dr Liwan Liyanage (School Week

The simplest form of supervised learning is simple linear regression

Make inferences about the relationship between input and output.

Unit Coordinator - Dr Liwan Liyanage (School Week

Generally, fitting complex functions results in more variance - there is

Unit Coordinator - Dr Liwan Liyanage (School Week

Prediction Accuracy refers to how closely we can predict a future

Training data set is used to build the model

More complex functions sometimes have better prediction accuracy but

Unit Coordinator - Dr Liwan Liyanage (School Week

When the output is a numeric variable, supervised learning is

(Simple) Linear Regression

When the output is a class or factor variable, supervised learning is

Unit Coordinator - Dr Liwan Liyanage (School Week

When there is NO variable that can be considered an output or special,

Visualisation and dimension reduction (eg: PCA)

for a large number of inputs finding combinations of variables that can

Clustering (eg: kmeans, hierarchical clustering)

using automated techniques to find groups in the data.

Unit Coordinator - Dr Liwan Liyanage (School Week

Unit Coordinator - Dr Liwan Liyanage (School Week

No outcome variable, just a set of predictors (features) measured on

Unit Coordinator - Dr Liwan Liyanage (School Week

On the basis of the training data, we would like to:

Accurately predict unseen test cases.

Unit Coordinator - Dr Liwan Liyanage (School Week

It is important to understand the ideas behind the various

Unit Coordinator - Dr Liwan Liyanage (School Week

Lecture notes are based on the textbook,

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An

Unit Coordinator - Dr Liwan Liyanage (School Week

You might also like