Intro to Machine Learning Analysis

This 3 sentence summary provides the key points about the document: The document introduces machine learning techniques that will be covered in a class, including how to critically analyze findings to determine if they are statistically significant. It reviews basic probability and statistical concepts like probability distributions, the law of large numbers, the central limit theorem, and how inferences can be made about populations based on random samples using estimates, confidence intervals, and p-values. The goal is for students to understand these principles in order to critically read papers using these machine learning and statistical methods.

Uploaded by

Sridhar Sundaramurthy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views3 pages

Intro to Machine Learning Analysis

Uploaded by

Sridhar Sundaramurthy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Lecture 0: Background

Rafael A. Irizarry and Hector Corrada Bravo

January 2010

The purpose of this class is for you to learn machine learning techniques commonly used in data analysis. By the end of the term, you should be able to read
papers that used these methods critically and analyze data using them.
When using any of these tools we will be we will be asking ourselves if our
findings are statistically significant. For example, if we make use of a particular classification algorithm and find that we can predict the outcome of 7 out
of our 10 cases, how can we determine if this could have happened by chance
alone? To be able to answer these questions, we need to understand some basic probabilistic and statistical principles. Today we will review some of these
principles.

Probability
If I toss a coin, what is the chance it lands heads?
In this class we will sometimes be using notation like this:
Let X be a random variable that takes values 0 (tails) or 1 (heads) such that
P r(X = 1) = 1/2
For die, we would write:
P r(X = k) = 1/6, k = 1, . . . , 6.
We will refer to these as probability distributions.
More complicated distributions can be defined, for example, by considering
the random variable

Y =

N
X
i=1

where the Xi s, i = 1, . . . , N are independent tosses of the same die (or coin).
What are possible values of Y ? What is the distribution of Y ? What does
independent mean?

Populations, LLN and CLT

In science, randomness usually comes from either random sampling or randomization.
Side note: What about observational studies?
How does the above relate to populations?
The coin toss can be related to a very large population where each subject is
either, say, a democrat (heads) or a republican (tails). If half are democrats
and half are republican, then if we pick a person at random, its just like a coin
toss.
If dems are 1, and reps are 0, what is the population average x
?
If I take a random sample with replacement (a poll) of N = 10 subjects, what
is the distribution of the sample average?
What happens to the difference between the sample average and the population
average as the sample size gets bigger?
a random variable? What about the distribuWhy is the sample average X
tion? Is the population average a random variable? What does the law of large
numbers (LLN) say? What does the central limit theorem (CLT) say?

Inference
How does this all relate to scientific problems? Many times in science we can
model the process producing data with a stochastic (probabilistic) model where
parameters (such as population averages) are unkowns. We then make inferences based on the data.
For example, in the dems and reps problem we may not know the percentages
of 1s and 0s. To find out, we take a random sample, and construct estimates
(the sample average), confidence intervals and p-values.
How do we construct a confidence interval for the percentage of democrats?
What would be an interesting null hypothesis in this case? How would we
construct a p-value for this null hypothesis?
For continuous data, this is all pretty much the same. For example, we may want
to know if the average weight of Baltimore women is over some recommended
ideal weight.

Note: In this case, we could use a t-test if the sample is small.

This inferential approach is used in any situation where a population average
is of interest and we can only obtain a random sample. It is also used when
randomization is used.

Probability Theory - Towards Data Science
No ratings yet
Probability Theory - Towards Data Science
19 pages
Introduction To Probability Theory and Statistics
No ratings yet
Introduction To Probability Theory and Statistics
127 pages
Introduction To Probability Theory and S
No ratings yet
Introduction To Probability Theory and S
127 pages
Chapter 2
No ratings yet
Chapter 2
23 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Machine Learning for CS Juniors
No ratings yet
Machine Learning for CS Juniors
59 pages
Lecture 1
No ratings yet
Lecture 1
12 pages
Intro to Probability & Statistics
No ratings yet
Intro to Probability & Statistics
20 pages
Probability & Statistics Basics
No ratings yet
Probability & Statistics Basics
9 pages
Statistics Answers
No ratings yet
Statistics Answers
12 pages
LectureNotes Complete
No ratings yet
LectureNotes Complete
90 pages
Probability & Statistics Guide
No ratings yet
Probability & Statistics Guide
181 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
Unit 1 - Probability
No ratings yet
Unit 1 - Probability
29 pages
A Short Introduction To Probability
No ratings yet
A Short Introduction To Probability
123 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
Handbook Feb23
No ratings yet
Handbook Feb23
377 pages
Statistical Foundations of Machine Learning: The Handbook
No ratings yet
Statistical Foundations of Machine Learning: The Handbook
364 pages
ML Unit-3
No ratings yet
ML Unit-3
16 pages
Orhan Gazi - Introduction To Probability and Random Variables-Springer (2023)
No ratings yet
Orhan Gazi - Introduction To Probability and Random Variables-Springer (2023)
240 pages
Proba
No ratings yet
Proba
188 pages
Stats 210 Course Book
No ratings yet
Stats 210 Course Book
200 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
UNIT 3 (Overwiew of Probability)
No ratings yet
UNIT 3 (Overwiew of Probability)
21 pages
(Probability and Statistics For Programmers) Allen Downey - Think Stats. Probability and Statistics For programmers-O'Reilly Media (2012) PDF
100% (13)
(Probability and Statistics For Programmers) Allen Downey - Think Stats. Probability and Statistics For programmers-O'Reilly Media (2012) PDF
142 pages
EDA Reviewer
No ratings yet
EDA Reviewer
8 pages
(Ebook PDF) Knowing The Odds An Introduction To Probabilitypdf Download
100% (3)
(Ebook PDF) Knowing The Odds An Introduction To Probabilitypdf Download
59 pages
Fundamentals of Mathematical Statistics: Pavol Oršanský
No ratings yet
Fundamentals of Mathematical Statistics: Pavol Oršanský
85 pages
Basicsofstatisticalmethods PDF
No ratings yet
Basicsofstatisticalmethods PDF
85 pages
(Ebook PDF) Knowing The Odds An Introduction To Probability Download
100% (4)
(Ebook PDF) Knowing The Odds An Introduction To Probability Download
50 pages
Lecture Notes
No ratings yet
Lecture Notes
138 pages
Math Essentials1234adadvklop32165adada PDF
No ratings yet
Math Essentials1234adadvklop32165adada PDF
55 pages
1234adadvklop32165adada PDF
No ratings yet
1234adadvklop32165adada PDF
55 pages
Math Essentials1234adadada PDF
No ratings yet
Math Essentials1234adadada PDF
55 pages
Math Essentials PDF
No ratings yet
Math Essentials PDF
55 pages
JB Ise Probability
No ratings yet
JB Ise Probability
53 pages
ProbabilityDistributions BRSM SP2022 Lecture3
No ratings yet
ProbabilityDistributions BRSM SP2022 Lecture3
45 pages
1 Basic Ideas 1
No ratings yet
1 Basic Ideas 1
75 pages
Probability 360
No ratings yet
Probability 360
74 pages
Probability FoundationalMathofAI S24
No ratings yet
Probability FoundationalMathofAI S24
7 pages
MIT2 854F10 Stats
No ratings yet
MIT2 854F10 Stats
38 pages
Unit 1 - Prob
No ratings yet
Unit 1 - Prob
48 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Statistical Methods in Economics
No ratings yet
Statistical Methods in Economics
8 pages
Statistical Methods in Economics
No ratings yet
Statistical Methods in Economics
205 pages
Unit 4.
No ratings yet
Unit 4.
22 pages
210 Book
No ratings yet
210 Book
199 pages
Introduction To Probability Theory and Statistics For Linguistics
No ratings yet
Introduction To Probability Theory and Statistics For Linguistics
137 pages
Statistics
No ratings yet
Statistics
137 pages
Notes Aukland Studied PDF
No ratings yet
Notes Aukland Studied PDF
200 pages
Lecture Notes Week 2
No ratings yet
Lecture Notes Week 2
10 pages
MIDS Unit 2
No ratings yet
MIDS Unit 2
18 pages
Chapter 5. Introduction To Kalman Filter: It Was 1960 When The First Paper On Kalman Filter Was Presented To The World
No ratings yet
Chapter 5. Introduction To Kalman Filter: It Was 1960 When The First Paper On Kalman Filter Was Presented To The World
6 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
91 pages
Part 2 Basic Audio Mat Lab
No ratings yet
Part 2 Basic Audio Mat Lab
2 pages
An Introduction To Digital Image Processing With Matlab
100% (3)
An Introduction To Digital Image Processing With Matlab
233 pages
Essentials On The Analysis of Randomized Algorithms: 1 Basics
No ratings yet
Essentials On The Analysis of Randomized Algorithms: 1 Basics
8 pages
( ) Part2 . ( )
No ratings yet
( ) Part2 . ( )
38 pages
Cal Poly Pomona: Nguyen Jason, Andrew Co
No ratings yet
Cal Poly Pomona: Nguyen Jason, Andrew Co
6 pages
DICOM Structure Basics
No ratings yet
DICOM Structure Basics
36 pages
Applying Domain-Driven Design and Patterns - With Examples in C#
100% (3)
Applying Domain-Driven Design and Patterns - With Examples in C#
613 pages
S4 CH 1 - 2 (Quadratic Equations)
No ratings yet
S4 CH 1 - 2 (Quadratic Equations)
12 pages
Accu Chek Performa System Manual
No ratings yet
Accu Chek Performa System Manual
196 pages
Altivar 31
No ratings yet
Altivar 31
96 pages
User Authentication Guide & Security
100% (1)
User Authentication Guide & Security
147 pages
Employee Engagement
No ratings yet
Employee Engagement
16 pages
Make Sectors Help2
50% (2)
Make Sectors Help2
9 pages
Electrophoretic Display Technology The Beginnings, The Improvements, and A Future in Flexible Electronics
No ratings yet
Electrophoretic Display Technology The Beginnings, The Improvements, and A Future in Flexible Electronics
15 pages
API Standard 1104 21st Ed Sept 2013 Errata3 July 2014 Welding of Pipelines and Related Facilit PDF Free
No ratings yet
API Standard 1104 21st Ed Sept 2013 Errata3 July 2014 Welding of Pipelines and Related Facilit PDF Free
3 pages
Cute Momo Momo Challenge Know Your Meme
No ratings yet
Cute Momo Momo Challenge Know Your Meme
1 page
IPSDC FW Release Notes
No ratings yet
IPSDC FW Release Notes
2 pages
DESLTD-REF-5007 - Edge5DVR Std-PRO - Software Version 5.6.0 Manual Rev 3
No ratings yet
DESLTD-REF-5007 - Edge5DVR Std-PRO - Software Version 5.6.0 Manual Rev 3
119 pages
Google Cloud Messaging For Android
No ratings yet
Google Cloud Messaging For Android
63 pages
IPS S.S. - Grooved Couplings (17.03)
No ratings yet
IPS S.S. - Grooved Couplings (17.03)
4 pages
Principal Component Analysis PCA in Machine Learning
No ratings yet
Principal Component Analysis PCA in Machine Learning
20 pages
Huawei BB Schematic Overview
No ratings yet
Huawei BB Schematic Overview
43 pages
FIMM Exam Schedule Jan-Mar 2024
No ratings yet
FIMM Exam Schedule Jan-Mar 2024
6 pages
October 2021 QP
No ratings yet
October 2021 QP
32 pages
Password Security Project Report
No ratings yet
Password Security Project Report
6 pages
Budget Plan For COPAR
No ratings yet
Budget Plan For COPAR
3 pages
ActivityInfo Roles and Responsibilities 2023-24
No ratings yet
ActivityInfo Roles and Responsibilities 2023-24
5 pages
EMI Sop
No ratings yet
EMI Sop
12 pages
CuGaS2 Semiconductor Insights
No ratings yet
CuGaS2 Semiconductor Insights
9 pages
INTP Career Insights & Strengths
No ratings yet
INTP Career Insights & Strengths
6 pages
SPM Unit-3
No ratings yet
SPM Unit-3
26 pages
TLM With Examples
No ratings yet
TLM With Examples
13 pages
FDP On LabVIEW For Engineers and Scientist TEQIP II CoE S & IP
No ratings yet
FDP On LabVIEW For Engineers and Scientist TEQIP II CoE S & IP
2 pages

Intro to Machine Learning Analysis

Uploaded by

Intro to Machine Learning Analysis

Uploaded by

Lecture 0: Background

Rafael A. Irizarry and Hector Corrada Bravo

Populations, LLN and CLT

Note: In this case, we could use a t-test if the sample is small.

You might also like