0% found this document useful (0 votes)

39 views48 pages

Week 1

The document is an introduction to a Data Science course taught by Dr. Irfan Yousuf at UET, Lahore. It outlines the importance of data science as a career, the necessary skill set including statistics and programming, and provides an overview of key statistical concepts such as descriptive and inferential statistics, probability distributions, and normal distribution. The course aims to equip students with the foundational knowledge required in the field of data science.

Uploaded by

Ambreen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views48 pages

Week 1

Uploaded by

Ambreen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Introduction to Data Science

Dr. Irfan Yousuf

Department of Computer Science (New Campus)
UET, Lahore
(Week 1; January 15 - 19, 2024)
Instructor
• Dr. Irfan Yousuf
• irfan.yousuf@uet.edu.pk
Weekly Contents
Weekly Contents
Weekly Contents
Weekly Contents
Why Data Science?
• One of the topmost professions
• New driving force behind industries is Data.
• Data Science is the Career of Tomorrow.
Skill Set Needed
• Statistics
• Programming skills
• Multivariable Calculus & Linear Algebra
Statistics
• In plural form, it refers to set of numerical data.

• In singular form, it is an academic discipline.

Data
• Facts and statistics collected for reference or analysis.

• Data are units of information, often numeric, that are

collected through observation.

• Data is a collection of facts, such as numbers, words,

measurements, observations or just descriptions of things.
What is Statistics
• Statistics is a branch of mathematics that deals with the
scientific collection, organization, presentation, analysis,
and interpretation of numerical data in order to obtain
useful and meaningful information.
Descriptive Statistics
• A statistical method concerned with the collection,
organization, presentation and description of sample data.
Inferential Statistics
• Inferential Statistics concerned with the analysis of a sample
data leading to prediction, inferences, interpretation,
decision or conclusion about the entire population
Population vs. Sample
• Population: The totality of all the elements or persons for
which one has an interest at a particular time.
• Students of 2018 session of CS-KSK

• Sample: It is a subset of a population

• Students with CGPA > 3.0
Parameter vs. Statistic
• A parameter is a number describing a whole population.

• A statistic is a number describing a sample.

• With inferential statistics, we use sample statistics to make educated

guesses about population parameters.
Quantitative vs. Qualitative Data
• Quantitative: These are numerical information obtained from
counting or measuring that which can be manipulated by any
fundamental operation.
• Age, Weight, Height

• Qualitative: These are descriptive attributes and characterized

by categorical responses.
• Gender, Weather, Attitude
Variable
• A variable is any characteristics, number, or quantity that can
be measured or counted.

• Independent variables: Variables you manipulate in order to

affect the outcome of an experiment, e.g., Age

• Dependent variables: Variables that represent the outcome of

the experiment, e.g., Salary
Descriptive vs. Inferential Statistics
• Descriptive: concerned with the collection, organization,
presentation and description of sample data.

• Inferential: concerned with the analysis of a sample data

leading to prediction, inferences, interpretation, decision or
conclusion about the entire population
Inferential Statistics
• Inferential statistics takes data from a sample and makes
inferences about the larger population from which the sample
was drawn.
• Because the goal of inferential statistics is to draw
conclusions from a sample and generalize them to a
population, we need to have confidence that our sample
accurately reflects the population.

• Define the population we are studying.

• Draw a representative sample from that population.
• Use analyses that incorporate the sampling error.
Probability Distributions
• A probability distribution is the mathematical function that
gives the probabilities of occurrence of different possible
outcomes of an experiment.

• Tossing a coin
• throwing a fair die

• Probability distributions are typically defined in terms of the

probability distribution functions.
Probability Distribution Functions

Probability Mass
Function (PMF) for
Discrete Data
Cumulative
Distribution
Function (CDF)
Probability Density
Function (PDF) for
Continuous Data
Discrete vs. Continuous Variable
• A discrete variable is a variable that takes on distinct,
countable values. In theory, you should always be able to
count the values of a discrete variable.

• A continuous variable is a variable that can take on any

value within a range. Because the possible values for a
continuous variable are infinite, we measure continuous
variables (rather than count),
Probability Density Functions (PDFs)
• For a discrete random variable X that takes on a finite or countably
infinite number of possible values, we determine P(X=x) for all the
possible values of X, and call it the probability mass function
(pmf)

• For continuous random variables, the probability that X takes on

any particular value x is 0. That is, finding P(X=x) for a continuous
random variable is not going to work. Instead, we'll need to find the
probability that falls in some interval (a,b) , that is, we'll need to
find P(a < X < b). We'll do that using a probability density function
(pdf).
Probability Mass Function
Day Travel Time (min) pms X p(X=x)
1 25 0.1 25 0.1
2 26 0.2 26 0.2
3 26 0.2 28 0.2
4 28 0.2 32 0.1
5 28 0.2 33 0.1
6 32 0.1 34 0.2
7 33 0.1 35 0.1
8 34 0.2
9 34 0.2
10 35 0.1
Cumulative Distribution Function of PMF
Day Travel Time (min) pms X PMF CDF
1 25 0.1 25 0.1 0.1
2 26 0.2 26 0.2 0.3
3 26 0.2 28 0.2 0.5
4 28 0.2 32 0.1 0.6
5 28 0.2 33 0.1 0.7
6 32 0.1 34 0.2 0.9
7 33 0.1 35 0.1 1
8 34 0.2
9 34 0.2
10 35 0.1
Probability Density Function
Probability Density Function
Let the random variable X denote the time a person waits for
an elevator to arrive. Suppose the longest one would need to
wait for the elevator is 2 minutes, so that the possible values of
X (in minutes) are given by the interval [0,2] .
A possible pdf for X is given by:
Probability Density Function

probability that a person waits less than

30 seconds (or 0.5 minutes).

Integral Formula
Probability Density Function
Continuous random variables have zero point probabilities, i.e.,
the probability that a continuous random variable equals a single
value is always given by 0.

Probability for a continuous random variable is given by areas

under pdf’s.
Cumulative Distribution Function of PDF

Let X have pdf f , then the cdf F is given by

Cumulative Distribution Function of PDF

PDF to CDF
Normal Distribution
Normal Distribution
• The mean, median and mode are exactly the same.
• The distribution is symmetric about the mean—half the
values fall below the mean and half above the mean.
• The distribution can be described by two values: the mean and
the standard deviation.
Normal Distribution

Day Time 11 28.24

1 32.14 12 29.10
2 31.30 13 28.34
3 29.17
14 28.50
4 28.15
15 29.26
5 30.30
6 30.41
16 28.29
7 32.37 17 25.36
8 33.19 18 27.18
9 31.19 19 30.29
10 30.37 20 27.15
Normal Distribution
Normal Distribution
Day Time f(x)
1 32.14 0.08
2 31.30 0.13
3 29.17 0.20
4 28.15 0.16
5 30.30 0.19 Mean 29.52
6 30.41 0.18
7 32.37 0.07 St. Dev 1.96
8 33.19 0.04
9 31.19 0.14
10 30.37 0.19
11 28.24 0.16
12 29.10 0.20
13 28.34 0.17
14 28.50 0.18
15 29.26 0.20
16 28.29 0.17
17 25.36 0.02
18 27.18 0.10
19 30.29 0.19
20 27.15 0.10
Normal Distribution
Time f(x)
25.36 0.02
27.15 0.10
27.18 0.10
28.15 0.16
28.24 0.16
28.29 0.17
28.34 0.17
28.50 0.18
29.10 0.20
29.17 0.20
29.26 0.20
30.29 0.19
30.30 0.19
30.37 0.19
30.41 0.18
31.19 0.14
31.30 0.13
32.14 0.08
32.37 0.07
33.19 0.04
Normal Distribution

Mean 29.52 M+SD 31.48

St. Dev 1.96
M-SD 27.55
Normal Distribution
• The mean, median and mode are exactly the same.
• The distribution is symmetric about the mean—half the
values fall below the mean and half above the mean.
• The distribution can be described by two values: the mean and
the standard deviation.
Normal Distribution
68-95-99.7 Rule
CDF of Normal Distribution
• The cumulative distribution function (cdf) is the probability that the
variable X takes a value less than or equal to x.
• (Here in the figure below, Mean=0, SD=1)
Z-Distribution
• The standard normal distribution, also called the z-distribution, is a
special normal distribution where the mean is 0 and the standard
deviation is 1.
• Z-scores tell you how many standard deviations away from the mean
each value lies.
Z-Distribution
Z-Score

As the formula shows, the z-score is simply the raw score

minus the population mean, divided by the population
standard deviation.
Z-Distribution
Day Time
1 26
2 33
3 65
4 28 Mean is 38.8 minutes
5 34 Standard Deviation is 11.4 minutes
6 55
7 25
8 44
9 50
10 36
11 26
12 37
13 43
14 62
15 35
16 38
17 45
18 32
19 28
20 34
Summary
• Introduction to Data Science

Math 5846 Chapter 2
No ratings yet
Math 5846 Chapter 2
102 pages
Continuous Random Variables and Probability Distributions: Institute of Technology of Cambodia
No ratings yet
Continuous Random Variables and Probability Distributions: Institute of Technology of Cambodia
34 pages
Chapter 2 Random Variables
No ratings yet
Chapter 2 Random Variables
34 pages
5 Continuous Probabilities
No ratings yet
5 Continuous Probabilities
55 pages
تقرير الإحصاء PDF
No ratings yet
تقرير الإحصاء PDF
8 pages
STAT1012 Ch4 Continuous Probability Distribution
No ratings yet
STAT1012 Ch4 Continuous Probability Distribution
53 pages
UE21EC241A - MEE - RM - U3 Updated
No ratings yet
UE21EC241A - MEE - RM - U3 Updated
161 pages
STT201
No ratings yet
STT201
19 pages
Lecture04 CH 04 ContinuousDistributions Baron Inf Stats FA24
No ratings yet
Lecture04 CH 04 ContinuousDistributions Baron Inf Stats FA24
46 pages
Random Variables PDF
No ratings yet
Random Variables PDF
64 pages
Continuous Probability Distributions
No ratings yet
Continuous Probability Distributions
40 pages
Continuous Probability Distributions
No ratings yet
Continuous Probability Distributions
22 pages
Lecture 5
No ratings yet
Lecture 5
109 pages
CH 7 - Random Variables Discrete and Continuous
No ratings yet
CH 7 - Random Variables Discrete and Continuous
7 pages
ISM Session 5 June 2025
No ratings yet
ISM Session 5 June 2025
74 pages
Addis Ababa Science & Technology University Department of Electrical & Computer Engineering
No ratings yet
Addis Ababa Science & Technology University Department of Electrical & Computer Engineering
63 pages
CH 3
No ratings yet
CH 3
22 pages
Chap 2 Random Variables
No ratings yet
Chap 2 Random Variables
41 pages
MTE 201 (2024) Prof Mushayabasa
No ratings yet
MTE 201 (2024) Prof Mushayabasa
40 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Mit18 05 s22 Class05-Prep-C
No ratings yet
Mit18 05 s22 Class05-Prep-C
8 pages
Group 2 Continuous Random Variable
No ratings yet
Group 2 Continuous Random Variable
30 pages
Week5 BAM
No ratings yet
Week5 BAM
48 pages
Chapter 2 Random Variables PDF
No ratings yet
Chapter 2 Random Variables PDF
41 pages
Topic Two. Random Variable and Probability Distribution
No ratings yet
Topic Two. Random Variable and Probability Distribution
43 pages
Module 4
No ratings yet
Module 4
34 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
14 pages
Continuous Random Variables and Probability Distributions
No ratings yet
Continuous Random Variables and Probability Distributions
3 pages
Chapter 2 - Random Variables and Distributions
No ratings yet
Chapter 2 - Random Variables and Distributions
31 pages
Unit II - ML
No ratings yet
Unit II - ML
29 pages
Chapter 2 Random Variables
No ratings yet
Chapter 2 Random Variables
41 pages
BMS2901 Week2 Final Canvas
No ratings yet
BMS2901 Week2 Final Canvas
32 pages
Gallery of Continuous Random Variables Class 5, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
No ratings yet
Gallery of Continuous Random Variables Class 5, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
7 pages
Continuous Random Variables
No ratings yet
Continuous Random Variables
156 pages
MT233 October 2019-1
No ratings yet
MT233 October 2019-1
39 pages
Continuous Random Variables Guide
No ratings yet
Continuous Random Variables Guide
15 pages
Continuous Random Variables Guide
No ratings yet
Continuous Random Variables Guide
66 pages
Chapter (3) (1) CCCCCCCCCCCC
No ratings yet
Chapter (3) (1) CCCCCCCCCCCC
16 pages
EEE 6542 - Lecture 4 Notes - Complete - Backup
No ratings yet
EEE 6542 - Lecture 4 Notes - Complete - Backup
40 pages
Descriptive Statistics and Probability Distributions: Session 1
No ratings yet
Descriptive Statistics and Probability Distributions: Session 1
34 pages
Unit-2 - Random Variables and Probability Distributions - Jan2025
No ratings yet
Unit-2 - Random Variables and Probability Distributions - Jan2025
136 pages
Module2 - Random Variable
No ratings yet
Module2 - Random Variable
24 pages
Continuous Random Variables Guide
No ratings yet
Continuous Random Variables Guide
35 pages
Theme 3 - Lecture 1 Slides - 2024 - 2
No ratings yet
Theme 3 - Lecture 1 Slides - 2024 - 2
17 pages
Lecture Note 3
No ratings yet
Lecture Note 3
11 pages
Continuous Probability Insights
No ratings yet
Continuous Probability Insights
59 pages
Module 2
No ratings yet
Module 2
36 pages
Lecture04 Continuous Random Variables Ver1
No ratings yet
Lecture04 Continuous Random Variables Ver1
35 pages
Pro Ch3 (2021 22) Note
No ratings yet
Pro Ch3 (2021 22) Note
84 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
19 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
02 Random Variables
No ratings yet
02 Random Variables
51 pages
L1 RVs-1
No ratings yet
L1 RVs-1
47 pages
Continuous Probability Distributions: Supporting Australian Mathematics Project
No ratings yet
Continuous Probability Distributions: Supporting Australian Mathematics Project
29 pages
6 Continuous Variables
No ratings yet
6 Continuous Variables
8 pages
Continuous Random Variables and Probability Distributions
No ratings yet
Continuous Random Variables and Probability Distributions
35 pages
Continuous Random Variables Probability Distribution: Department of Mathematics
No ratings yet
Continuous Random Variables Probability Distribution: Department of Mathematics
35 pages
Mathematics - Application and Interpretation - Command Terms and Notation
No ratings yet
Mathematics - Application and Interpretation - Command Terms and Notation
7 pages
Biostatistics
No ratings yet
Biostatistics
49 pages
I P S F E Sampling Distributions: Ntroduction To Robability AND Tatistics Ourteenth Dition
No ratings yet
I P S F E Sampling Distributions: Ntroduction To Robability AND Tatistics Ourteenth Dition
37 pages
PDF Probability and Computing 2nd Ed Edition Mitzenmacher Download
100% (7)
PDF Probability and Computing 2nd Ed Edition Mitzenmacher Download
84 pages
Statistics Sol
No ratings yet
Statistics Sol
208 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
6 pages
A Concise Introduction To Statistical Inference Thijssen J Download
100% (2)
A Concise Introduction To Statistical Inference Thijssen J Download
88 pages
M1 Lesson 1
No ratings yet
M1 Lesson 1
6 pages
Chapter 4
No ratings yet
Chapter 4
27 pages
Quality Management Insights
No ratings yet
Quality Management Insights
22 pages
W8-9 Module 008 Linear Regression and Correlation PDF
No ratings yet
W8-9 Module 008 Linear Regression and Correlation PDF
3 pages
Today Final Test Dec 2023 - 27-12
No ratings yet
Today Final Test Dec 2023 - 27-12
12 pages
Training Course 4 Statistics and Probability
No ratings yet
Training Course 4 Statistics and Probability
33 pages
Continuous Probability Distribution
100% (1)
Continuous Probability Distribution
8 pages
6 Sebaran Penarikan Contoh
No ratings yet
6 Sebaran Penarikan Contoh
15 pages
Discrete Random
No ratings yet
Discrete Random
57 pages
As Level Maths 1 Year SoW
100% (1)
As Level Maths 1 Year SoW
81 pages
Modern Statistics A Computerbased Approach With Python Ron S Kenett PDF Download
100% (7)
Modern Statistics A Computerbased Approach With Python Ron S Kenett PDF Download
84 pages
BAB 10 Part 1
No ratings yet
BAB 10 Part 1
20 pages
2024 Css Stats Mcqs Special
No ratings yet
2024 Css Stats Mcqs Special
3 pages
M e Cse
No ratings yet
M e Cse
83 pages
PGP27 - DA Course Outline
No ratings yet
PGP27 - DA Course Outline
6 pages
Arnab Chowdhury As1
No ratings yet
Arnab Chowdhury As1
12 pages
M.Sc. Agriculture Syllabus 2020
No ratings yet
M.Sc. Agriculture Syllabus 2020
58 pages
Stats & Probability for Students
No ratings yet
Stats & Probability for Students
5 pages
Probability Distribution: Question Booklet
No ratings yet
Probability Distribution: Question Booklet
8 pages
Econometrics Outline2023
No ratings yet
Econometrics Outline2023
141 pages
Probability and Statistics Problems
No ratings yet
Probability and Statistics Problems
4 pages
(Ebook) Simulation Modeling and Arena by Manuel D. Rossetti ISBN 9781118607916, 1118607910 Instant Download
No ratings yet
(Ebook) Simulation Modeling and Arena by Manuel D. Rossetti ISBN 9781118607916, 1118607910 Instant Download
91 pages
PS 3 - 2015
No ratings yet
PS 3 - 2015
2 pages

Week 1

Uploaded by

Week 1

Uploaded by

Introduction to Data Science

Dr. Irfan Yousuf

• In singular form, it is an academic discipline.

• Data are units of information, often numeric, that are

• Data is a collection of facts, such as numbers, words,

• Sample: It is a subset of a population

• A statistic is a number describing a sample.

• With inferential statistics, we use sample statistics to make educated

• Qualitative: These are descriptive attributes and characterized

• Independent variables: Variables you manipulate in order to

• Dependent variables: Variables that represent the outcome of

• Inferential: concerned with the analysis of a sample data

• Define the population we are studying.

• Probability distributions are typically defined in terms of the

• A continuous variable is a variable that can take on any

• For continuous random variables, the probability that X takes on

probability that a person waits less than

Probability for a continuous random variable is given by areas

Let X have pdf f , then the cdf F is given by

Day Time 11 28.24

Mean 29.52 M+SD 31.48

As the formula shows, the z-score is simply the raw score

You might also like