0% found this document useful (0 votes)

122 views50 pages

Python for Machine Learning Enthusiasts

This document provides an overview of modern machine learning techniques using Python. It discusses: 1) Why machine learning and Python are important tools, and outlines some common machine learning tasks like image recognition, speech recognition, and medical diagnosis. 2) The essential Python modules for machine learning like numpy, scipy, sklearn, and matplotlib. 3) Different machine learning classifiers like k-nearest neighbors, linear classifiers, support vector machines, and random forests. Examples of each are given. 4) How to get started with Python for machine learning through distributions like Anaconda or individual module installations. 5) A brief introduction to the Python language and how it is used for numerical and scientific computing.

Uploaded by

Madhukar Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views50 pages

Python for Machine Learning Enthusiasts

Uploaded by

Madhukar Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Modern Machine Learning in Python

Heikki Huttunen
University Lecturer, Dr.Eng.
Heikki.Huttunen@tut.fi

Department of Signal Processing

Tampere University of Technology, Finland

August 14, 2015

Outline

• Motivation
• Why machine learning?
• Machine learning basics
• Why Python?
• Review of widely used classifiers and their implementation in
Python
• Examples
• Data driven design of a classifier for ovarian cancer detection
• ICANN MEG Mind Reading Challenge 2011
• IEEE MLSP Birds 2013 Competition
• DecMeg 2014 Decoding the Human Brain Competition
• Hands-on session (TC407)
• Slides available at http://www.cs.tut.fi/~hehu/MMLP.pdf
Introduction

• Machine learning has become an

important tool for multitude of
scientific disciplines.
• Training based approaches are rapidly
substituting traditional manually
engineered pipelines.
• ”Training based” = we show examples
of what is interesting and hope the
machine learns to do it for us
• A few modern research topics:
• Image recognition (what is in this
image and where?)
• Speech recognition (what do I say?)
Price et al., ”Highly accurate two-gene
• Medicine (data-driven diagnosis) classifier for differentiating gastrointestinal
stromal tumors and leiomyosarcomas,” PNAS
2007.
Why Python?

• Python is becoming increasingly central tool

for data science.
• This was not always the case: 10 years ago
everyone was using Matlab.

Python
• However, due to licensing issues and heavy
development of Python, scientific Python
started to gain its user base.

numpy
• Python’s strength is in its variability and
huge community.

sklearn
• There are 2 versions: Python 2.7 and 3.4.
We’ll use the former one as it’s better

matplotlib
supported by most packages interesting to us.
Alternatives to Python in Science
Python vs. R
Python vs. Matlab
• R is #1 workhorse for statistics
• Matlab is #1 workhorse for
and data analysis.
linear algebra.
• Popularity: http://www.
• Matlab is professionally
kdnuggets.com/polls/2015/
maintained product.
r-vs-python.html
• Some Matlab’s toolboxes are
• R is great for specific data
great (Image Processing tb).
analysis and visualization needs.
Some are obsolete (Neural
Network tb). • However, Python interfaces to
other domains ranging from
• New versions 2 twice a year.
deep neural networks (Caffe,
Amount of novelty varies.
Theano) and image analysis
• Matlab is expensive for (OpenCV) to even a fullblown
non-educational users. webserver (Django)
Essential Modules

• numpy: The matrix / numerical analysis layer at the bottom

• scipy: Scientific computing utilities (linalg, FFT, signal/image
processing...)
• sklearn: Machine learning (our focus here)
• matplotlib: Plotting and visualization
• opencv: Computer vision
• pandas: Data analysis
• caffe, theano, minerva: Deep neural networks
• spyder: The front end (Scientific PYthon Development
EnviRonment)
Where to get Python?

• It is possible to construct your custom Python environment by

installing individual modules.
• Alternatively, one may install a full distributions, such as
• Anaconda http://www.continuum.io/ ← my favorite
• Enthought Canopy http://www.enthought.com/
• Pythonxy http://code.google.com/p/pythonxy/
• ...or in linux:
# apt-get install python
# apt-get install python-numpy
# apt-get install python-sklearn
# apt-get install python-matplotlib
# apt-get install spyder
The Language

• Python was designed to be a highly

readable language.
• Python uses whitespace to delimit
program blocks. First you hate it, later
you love it.
• All used modules are imported using
an import declaration.
• The members within the module are
referred using the dot, e.g.,:
np.cos([1,2,3])
• Interpreted language. Also interactive
with IPython extensions.
Classification
• Many machine learning problems can be posed as
classification tasks.
• Most classification tasks can be posed as a problem of
partitioning a vector space into disjoint regions.
• These problems consist of following components:

Samples: x1 , x2 , . . . , xN ∈ RP
Class labels: y1 , y2 , . . . , yN ∈ {1, 2, . . . , C }
Classifier: F (x) : RP 7→ {1, 2, . . . , C }

• Now, the task is to find the function F that maps the samples
most accurately to their corresponding labels.
• For example: Find the function F that minimizes the number
of erroneous predictions, i.e., the cases in which F (xk ) 6= yk .
Classification Example

• For example, consider the 2-dimensional

dataset on the right. 2

• The data consists of 200 blue crosses and

0
200 red circles.
• Based on these data, what would be a 2 Which class?
good partition of the 2D space into ”red”
4
and ”blue” regions?
• What kind of boundaries are allowed 6
between the regions?
• Straight lines? 8
• Continuous curved boundaries?
• Boundary without any restriction? 10
2 1 0 1 2 3 4 5 6

• Note: In 2-D this can be solved manually,

but not in 4000-D.
Different Classifiers

• In the following we will take a non-mathematical overview of

the following widely used classifiers.
• Nearest Neighbor classifier
• Linear classifiers (with linear boundary)
• The support vector machine (with nonlinear boundary)
• Ensemble classifiers: Random Forest
Nearest Neighbor Classifier
2 Nearest Neighbor Classifier

2 Classified as RED

• Probably the most natural idea is to

4
forget the boundary approach and just see
what kind of samples are nearby. 6

• This is the idea behind the Nearest

8
neighbor classifier: Just copy the class
label of the most similar training sample 10
2 1 0 1 2 3 4 5 6
to the unknown test sample.
K-Nearest Neighbor Classifier
2 5-Nearest Neighbor Classifier

• The problem with Nearest Neighbor is its 2 Classified as BLUE

fragility to changes in the training data.
4
• The classification boundary may change a
lot by moving only one training sample. 6

• The robustness can be increased by taking

8
a majority vote of more nearby samples.
• The K-Nearest neighbor classifier 10
2 1 0 1 2 3 4 5 6
selects the most frequent class label
among the K nearest training samples.
Other Problems with the Nearest Neighbor

• Nearest neighbor is prone to overlearning: Especially the 1-NN

forms local regions around every isolated data point.
• It is highly unlikely that these represent the general trend of
the whole population.
• Moreover, the training step extremely fast while the
classification step becomes extremely slow (consider training
data with billion high-dimensional samples).
• Therefore, more compact representations are preferred:
Training time is usually not critical while execution time is.
Linear Classifiers
2 A Linear Classifier

• A linear classifier learns a linear decision

0
boundary between classes.
• In mathematical terms, the classification 2 Classified as BLUE
rule can be written as
4
(
Class 1, if wT x<b
F (x) = T
6
Class 2, if w x ≥ b
8
where the weights w are learned from the
data. 10
2 1 0 1 2 3 4 5 6

• The expression wT x = k wk xk
P
essentially transforms the
multidimensional data x to a real number,
which is then compared to a threshold b.
Flavors of Linear Classifiers

There exists many algorithms for learning the weights w, including:

• Linear Discriminant • Support Vector • Logistic Regression
Analysis (LDA) Machine (SVM) (LR)
2 Linear Discriminant Analysis 2 Support Vector Machine 2 Logistic Regression

0 0 0

2 Classified as BLUE 2 Classified as RED 2 Classified as RED

4 4 4

6 6 6

8 8 8

10 10 10
2 1 0 1 2 3 4 5 6 2 1 0 1 2 3 4 5 6 2 1 0 1 2 3 4 5 6
Details
• The LDA:
• The oldest of the three: Fisher, 1935.
• ”Find the projection that maximizes class separation”, i.e., pull
the classes as far from each other as possible.
• Closed form solution, fast to train.
• The SVM:
• Vapnik and Chervonenkis, 1963.
• ”Maximize the margin between classes”.
• Slowest of the three, performs well with sparse
high-dimensional data.
• Logistic Regression (a.k.a. Generalized Linear Model):
• History traces back to 1700’s, but proper formulation and
efficient training algorithm by Nelder and Wedderburn in 1972.
• Statistical algorithm: ”Maximize the likelihood of the data”.
• Outputs also class probability. Has been extended to
automatic feature selection.
The Class Probabilities of Logistic Regression

2 Logistic Regression Probabilities

Classified as RED
2 p = 60.6 %

10
2 1 0 1 2 3 4 5 6
Nonlinear SVM
2 Support Vector Machine with Kernel Trick

• The Support Vector Machine can be

0
extended to nonlinear class boundaries.
• This is based on the kernel trick that 2 Classified as RED
implicitly maps the data to a higher
4
dimensional space.
• The same effect is achieved by appending 6
”fake” dimensions to the feature vector; e.g.,
8

x = (x1 , x2 ) =⇒ x0 = (x1 , x2 , x12 , x22 )

10
2 1 0 1 2 3 4 5 6
The classifier is then trained in 4-D.
• The kernel trick is a lot faster than the
explicit map, and allows mappings to even
∞-dimensional spaces!
Random Forest

• Random forest (RF) is popular tree based

classifier, proposed by Breiman in 1993. Temp < 5

• The RF is based on a collection of Yes No

decision trees.
• A decision tree is a straightforward Snow? Raining?
if-then-else-like diagram that combines Yes No Yes No
individual attributes into a decision.
• Decision trees are easily trained to learn
Winter Fall Fall Sunny?
the data.
Yes No
• For example, the tree on the right
predicts the time of year from the Summer Spring

following measured attributes:

{Temperature, Snow, Rain, Sunny}.
Random Forest
Snow?

• The problem of decision trees is that they

overlearn the data, i.e., the training data No Yes
is memorized with a poor ability to
generalize. Raining? Winter
• With no restrictions, the DT will have one
root-leaf-path per sample, which is No Yes
essentially same as nearest neighbor.
Sunny? Fall
• RF avoids this by training many
”imperfect” trees. No Yes
• Each tree is trained with a subset of
samples and a subset of features, i.e., Spring Summer
some of the attributes are hidden from
the training. Decision Tree without the
Temperature attribute.
Random Forest
2 Random Forest

2 Classified as BLUE

• After training many trees each with

4
different samples and features, the RF
predicts the class by taking the majority 6

vote: what is the most frequent label.

8
• The number of trees varies from a few
dozen to few thousand—default in 10
2 1 0 1 2 3 4 5 6
Python is 10.
Random Forest
2 Random Forest Class Probabilities

• The collection of trees gives a natural way

0
of estimating class probabilities: Just use
the proportion of trees voting for each Classified as BLUE
2 p = 66.0 %
class.
• RF’s also includes a method for assessing 4

feature importances.
6
Relative Importance of Attributes
Horizontal axis 8

26.8%
10
2 1 0 1 2 3 4 5 6

73.2%

Vertical axis
Generalization
• The important thing is how well the classifier works with
unseen data.
• An overfitted classifier memorizes the training data and does
not generalize.
2 1-NN / Training Data: Acc = 100.0 % 2 1-NN / Test Data: Acc = 88.2 %

0 0

2 2

4 4

6 6

8 8

10 10
2 1 0 1 2 3 4 5 6 2 1 0 1 2 3 4 5 6
Overfitting
3 Example time series
2
1
0
• Generalization is also related to overfitting. 1

• On the right, a polynomial model is fitted to 2

3
0.0 0.2 0.4 0.6 0.8 1.0
a time series to minimize the error between
3 7'th order polynomial fit
the samples (red) and the model (green). 2

• As the order of the polynomial increases, the 1

0
model starts to follow the data very faithfully. 1
2
• Low-order models do not have enough 3
0.0 0.2 0.4 0.6 0.8 1.0
expression power. 25'th order polynomial fit
3
• High-order models are over-fitting to noise 2
1
and become ”unstable” with crazy values 0
near the boundaries. 1
2
3
0.0 0.2 0.4 0.6 0.8 1.0
Regularization
3 Example time series
2
1
0
1
• Regularization adds a penalty term to the 2
3
fitting error. 0.0 0.2 0.4 0.6 0.8 1.0

3 7'th order regularized polynomial fit

• The model is encouraged to use small 2
coefficients. 1
0
• Large coefficients are expensive, so the model 1

can afford to fit only to the major trends. 2

3
0.0 0.2 0.4 0.6 0.8 1.0
• On the right, the high order model has a 25'th order regularized polynomial fit
3
good expression power, but does still not 2

follow the noise patterns. 1

0
1
2
3
0.0 0.2 0.4 0.6 0.8 1.0
Sparsity
1.0 1e13 Unregularized model: 24 nonzeros
0.8
• Regularization also enables the design of 0.6
0.4
sparse classifiers. 0.2
0.0
0.2
• In this case, the penalty term is designed 0.4
0.6
0.8
such that it favors zero coefficients. 0 5 10
Order
15 20 25

20
`2 penalty: 24 nonzeros
• A zero coefficient for a linear operator is 15
10
equivalent to discarding the corresponding 5
0
feature altogether. 5
10
• The plots illustrate the model coefficients 15
0 5 10 15 20 25
Order
without regularization, with traditional 6
`1 penalty: 8 nonzeros
4
regularization and sparse regularization. 2
0
• The importance of sparsity is twofold: The 2
4
model can be used for feature selection, but 6
8
often also generalizes better. 0 5 10
Order
15 20 25
Example: Classifier Design for Ovarian Cancer
Detection

35 Proteomic Mass Spectrum

Healthy
• We will study an example of Cancer
30
classifying proteomic fingerprints of
25
mass-spectra measured from ovarian
cancer patients and healthy controls.a 20

• 121 cancer patients and 95 healthy

15
controls.
10
• Mass spectra consists of 4000 features
(already downsampled at the 5

preprocessing step).
0
a
Conrads, Thomas P., et al. ”High-resolution serum proteomic
5
features for ovarian cancer detection.” Endocrine-Related Cancer (2004). 0 500 1000 1500 2000 2500 3000 3500 4000
Reading the Data

• The data is in csv (comma separated Code:

values) format, where each line
contains 4000 MS measurements for
one person.
• The attached code reads each line and
stores the 4000 measurements into a
list.
• The class labels are read in a similar
manner.
• Alternatively, we could have read using
Result:
numpy’s loadtxt function, or the
Pandas module;
http://pandas.pydata.org/
Training a Classifier
Code:
• The attached code trains and applies a
Logistic Regression classifier for the
ovarian cancer problem, with
essentially 3 lines of code:
from sklearn.linear_model \
import LogisticRegression Result:
0.8 LR coefficients (error: 0.00 %)
clf = LogisticRegression()
0.6
clf.fit(X, y) 0.4
prediction = clf.predict(X) 0.2
0.0
• Note that the above prediction is 0.2
0.4
100 % accurate; an unrealistic result
0.6
due to testing on the (rather small) 0.8
0 500 1000 1500 2000 2500 3000 3500 4000
training set.
Cross-validation

• The generalization ability of a classifier needs

to be tested with unseen samples.
• In practice this can be done by splitting the
Testing Training
data into separate training and test sets.
Training Training
• A standard approach is to use K -fold
Training Training
cross-validation:
Training Training
• Split the training data to K parts (called
Training Testing
folds)
• Use each fold for testing exactly once and
train with the other folds
• The error estimate is the mean of the K
estimates
Cross-validation in sklearn
Code:

• Cross-validation is extremely elegant in

sklearn.
• Essentially 1 line of code is required:

from sklearn.cross_validation \
import cross_val_score
scores = cross_val_score(clf, X, y)

• The cross-validator receives the classifier

and the data and returns a list of scores Result:
for each test fold.
• The function can receive additional
parameters that modify the number of
folds, possible stratification (balancing the
classes in folds), etc.
Cross-validating a set of Classifiers
Code:

• A nice property of sklearn is that each

predictor conforms to the same
interface (i.e., each implements
.fit(), and .predict() methods).
• Thus, we can examine a list of them in
a for loop easily.
• It seems that the linear classifiers
(Logistic Regression and Linear SVM)
are the best performers for this case. Result:
Regularization
Code:

• Now that linear classifiers seem to be

suitable for this data, we will see how
regularization affects the results.
• Remember: Regularization penalizes
for large coefficients and thus prevents
overlearning. Moreover, `1
regularization makes the coefficients
sparse. Result:
100

• The code on the right calculates the 90

LinSVM-L1 [C = 100]:
CV-10 error for a range of 80 Accuracy = 99.09 %
Accuracy / % 70 Sparse Dense
regularization parameters.
60 LogReg-L1
• The resulting accuracies are plotted at LinSVM-L1
50 LogReg-L2
LinSVM-L2
the bottom. 40 -3
10 10-2 10-1 100 101 102 103 104 105 106 107
Regularization parameter
Analysis of the Classifier

Coefficients (3.4 % nonzero)

• The `1 regularized Linear SVM was the 2.0
1.5
most accurate classifier. 1.0
0.5
• It is defined by its coefficients w. 0.0
0.5
• The nonzero coefficients correspond to 1.0

the peaks of the mass spectra. 1.5

2.0
0 500 1000 1500 2000 2500 3000 3500 4000
• The decision rule is very simple
35 Proteomic Mass Spectrum
( Healthy
30
P Cancer
cancer, if wk xk ≥ b 25
prediction = P 20
healthy, if wk xk < b 15
10

with wk the coefficients, xk the mass 5

0
spectrum and b the threshold (which 5
0 500 1000 1500 2000 2500 3000 3500 4000
happens to be b = 0 this time).
Recent Developments

• One of the most exiting new research directions is Deep

Learning, which studies neural networks with many layers.
• The methodology has set new records in various areas that are
abundant with data, e.g., speech recognition, image
recognition, etc.
• The default language of most available tools is Python.
• Under the hood, the engines are typically implemented in
C++ or CUDA, that links to the Python front end.
Deep Learning Tools

• The four most popular Deep learning toolboxes are (in the
order of decreasing popularity):
• Caffe: http://caffe.berkeleyvision.org/ from Berkeley
• Torch: http://torch.ch/ from NYU (used by Facebook)
• pylearn2: http://deeplearning.net/ from U. Montreal
• Minerva https://github.com/dmlc/minerva from New
York, Peking Univ.
• All except Torch can be used via a Python front end (Torch
uses Lua language).
Applications

• Application example 1: ICANN MEG Mind Reading

Challenge, June 2011
• Application example 2: IEEE MLSP 2012 Birds
competition, Sept. 2013
• Application example 3: DecMeg2014: Decoding the Human
Brain, July 2014
ICANN MEG Mind Reading Challenge I
• We participated in the Mind reading challenge from MEG
data organized in the ICANN 2011 conference. 1 2 3

1
H. Huttunen et al., ”Regularized logistic regression for mind reading with parallel validation,” Proc.
ICANN/PASCAL2 Challenge: MEG Mind-Reading. Aalto University publication series, Espoo, June 2011.
2
H. Huttunen et al., ”MEG Mind Reading: Strategies for Feature Selection,” in Proc. of The Federated
Computer Science Event 2012, Helsinki, May 2012.
3
H. Huttunen et al., ”Mind Reading with Regularized Multinomial Logistic Regression,” Machine Vision and
Applications, Aug. 2013
ICANN MEG Mind Reading Challenge Data

• The competition data consists of MEG recordings while

watching five different movie categories.
• The data contains measurements of 204 planar gradiometer
channels at 200Hz rate, segmented into samples of one
second length.
• The task was to design and implement a classifier that takes
as an input the MEG signals and produces the predicted class
label.
• The training data with annotations had 727 one-second
samples, and the secret test data 653 unlabeled samples.
• 50 samples of the training data were special, because they
were recorded on the same day as the test data.
The Curse of Dimensionality

• Since each measurement is a time series, it cannot be fed to

classifier directly.
• Instead, we calculated a number of common quantities.
• In all, our pool of features consists of 11 × 204 = 2244
features. This means 2244 × 5 = 11220 parameters.
−11
x 10
2

−2

−4
20 40 60 80 100 120 140 160 180 200

−11
x 10
2

−2

−4
0 20 40 60 80 100 120 140 160 180 200
Where the Selected Features are Located?

• As a side product we gain insight on which areas of the brain

were used for prediction.
IEEE MLSP2013 Birds Competition

• The task of the participants was to train an algorithm to

recognize bird sounds recorded at the wildlife.
• The recordings consisted of bird sounds from C = 19 species,
and the sounds were overlapping (with up to 6 birds in a
single recording).
• There were altogether 645 ten-second audio clips with sample
rate Fs = 16 kHz.
• Half of the samples (Ntrain = 323) were used for model
training, and half (Ntest = 322) of the data was provided
without annotation.
IEEE MLSP2013 Birds Competition
• The task of the participants was to train an algorithm to
recognize bird sounds recorded at the wildlife.
• The recordings consisted of bird sounds from C = 19 species,
and the sounds were overlapping (with up to 6 birds in a
single recording).
• There were altogether 645 ten-second audio clips.
• Half of the samples were used for model training, and the
participants should predict the labels of the other half.
• We participated in the challenge, and our final submission
placed 7th among the 77 teams.
IEEE MLSP2013 Birds — Our Method

• Our approach mixed six prediction models by averaging the

predicted class probabilities together.
• Most fruitful model used dictionary based Bag-of-Words
features.
• Learn a dictionary of typical patches in the spectrograms
(including the non-annotated ones).
• Count the occurrences of each dictionary atom in each
spectrogram.
IEEE MLSP2013 Birds — Our Method

• Examples of patch histograms are shown below.

• Left: two bird species present, Right: no birds present,
Bottom: a single bird present.
• One can spot the noise encoding patches (1, 4, 6, 7, 11 . . .),
which are active in all histograms
2500 2500

2000 2000

1500 1500
Count

Count
1000 1000

500 500

0 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
Atom index Atom index
2500

2000

1500
Count

1000

500

0
0 5 10 15 20 25 30 35 40 45 50
Atom index
DecMeg2014 Competition
0
Face shown

• In the 2014 DecMeg competition, the 100

Sensor Number
task was to predict whether test persons 150

were shown a face or a non-face using 200

MEG recordings of the brain. 250

300
• Sequences were 1.5 second long 0.00 0.25 0.50 0.75 1.00
Time / s
1.25 1.50

measurements with 306 channels; face 0

Non-face shown
shown at 0.5 s - 1.5 s. 50

• In total, 16 training and 7 test subjects; 100

Sensor Number
approximately 600 datapoints from each 150

subject. 200

• Intersubject variation is significant. 250

300
0.00 0.25 0.50 0.75 1.00 1.25 1.50
Time / s
Our Model

• Our is a hierarchical combination of two layers of classifiers:

• First layer consists of Logistic Regression classifiers mapping
individual sensors or individual timepoints to class probabilities.
• The second layer is a 1000-tree Random Forest with LR
predictions as inputs.
• The idea of stacking classifiers originates to Wolpert’s stacked
generalization (1992), and was inspired by our first attempts
to approach the problem using a deep convolutional net.
LR
10
Time

20 LR
30

...
50 100 150 200 250 300
Sensor
LR

LR ... LR LR LR

RF Decision
Feature Importances
0.06

0.05

0.04

Feature Importance
31 timepoints
0.03

• The reliability of each 1st layer classifier 0.02

306 sensors

0.01

can be seen from the feature importances 0

0 50 100 150 200 250 300 350

of the Random Forest. View Index

• The full time series from individual

sensors seem a lot more informative than
snapshots of the whole brain.
• Eventually our method was successful
placing us on the 2nd place of 267 teams.
• The implementation is 100 % sklearn:
https://github.com/mahehu/decmeg
Summary

• The significance of Python as a language of scientific

computing has exploded.
• Machine learning has been one of the key areas in this
development.
• This has been coined by the open-access attitude of the
community: licensing tends to be very free.
• One of the key benefits of
scikit-learn is the
uniform API: It’s easy to test
a wide range of algorithms
with a short piece of code.
• Deep learning is a growing
trend—powered by Python.

ML Models
No ratings yet
ML Models
21 pages
AIML
No ratings yet
AIML
30 pages
Amlt Bca Unit-1
No ratings yet
Amlt Bca Unit-1
24 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Python Predictive Modeling
No ratings yet
Python Predictive Modeling
24 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
AI For Eng Supervised-Learning
No ratings yet
AI For Eng Supervised-Learning
25 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
INT354 - Unit 2
No ratings yet
INT354 - Unit 2
26 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Machine Learning Slides
No ratings yet
Machine Learning Slides
281 pages
08 CSE358 Intro To Machine Learning II
No ratings yet
08 CSE358 Intro To Machine Learning II
100 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
No ratings yet
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
112 pages
Scikit-Learn User Guide Release 0.19.dev0
100% (2)
Scikit-Learn User Guide Release 0.19.dev0
2,133 pages
Session 5
No ratings yet
Session 5
36 pages
CSCI946 W5-Classification
No ratings yet
CSCI946 W5-Classification
72 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Scikit Learn Docs
100% (1)
Scikit Learn Docs
2,201 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Scikit Learn Docs
100% (1)
Scikit Learn Docs
1,810 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Mlviva
No ratings yet
Mlviva
14 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
SVM Guide for Data Scientists
No ratings yet
SVM Guide for Data Scientists
24 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
Comaprison of Machine Learning Algorithms
No ratings yet
Comaprison of Machine Learning Algorithms
10 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
365 ML Infographic
No ratings yet
365 ML Infographic
1 page
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Challenges in ML&DM
No ratings yet
Challenges in ML&DM
12 pages
4 DL
No ratings yet
4 DL
81 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Data Science in FInancial Services - 3
No ratings yet
Data Science in FInancial Services - 3
76 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
A. Install Relevant Package For Classification. B. Choose Classifier For Classification Problem. C. Evaluate The Performance of Classifier
No ratings yet
A. Install Relevant Package For Classification. B. Choose Classifier For Classification Problem. C. Evaluate The Performance of Classifier
10 pages
Chapter Four - Part One
No ratings yet
Chapter Four - Part One
44 pages
Python ML Algorithm
No ratings yet
Python ML Algorithm
30 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Week 11 - PROG 8510 Week 11
No ratings yet
Week 11 - PROG 8510 Week 11
16 pages
Machine Learning Lec 1
No ratings yet
Machine Learning Lec 1
68 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Section 2 - Introduction To Machine Learning-Bje Edits - Ipynb - Colab
No ratings yet
Section 2 - Introduction To Machine Learning-Bje Edits - Ipynb - Colab
7 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
28 pages
Unit 3
No ratings yet
Unit 3
100 pages
Classification
No ratings yet
Classification
4 pages
CH 5
No ratings yet
CH 5
19 pages
Algorithmeknn 121213175830 Phpapp02
No ratings yet
Algorithmeknn 121213175830 Phpapp02
52 pages
Intro to Machine Learning Course
No ratings yet
Intro to Machine Learning Course
83 pages
Introduction To AI
No ratings yet
Introduction To AI
51 pages
Shape and Texture Based Inter-Specific Hybrid Fish Image Recognition
No ratings yet
Shape and Texture Based Inter-Specific Hybrid Fish Image Recognition
131 pages
Prediction of Heart Disease Using A Hybrid Technique in Data Mining Classification
No ratings yet
Prediction of Heart Disease Using A Hybrid Technique in Data Mining Classification
3 pages
Seminar Paper On Deep Learning 21
No ratings yet
Seminar Paper On Deep Learning 21
37 pages
A PROJECT REPORT - Emotionâ
No ratings yet
A PROJECT REPORT - Emotionâ
26 pages
ML Questions
No ratings yet
ML Questions
56 pages
Pratyush Master Thesis 2010
No ratings yet
Pratyush Master Thesis 2010
50 pages
Cyberbullying Detection On Social Networks Using Machine Learning Approaches
No ratings yet
Cyberbullying Detection On Social Networks Using Machine Learning Approaches
7 pages
Int. J. Production Economics: S. Deng, Tsung-Han Yeh
No ratings yet
Int. J. Production Economics: S. Deng, Tsung-Han Yeh
8 pages
Survey Paper On Credit Card Fraud Detection
No ratings yet
Survey Paper On Credit Card Fraud Detection
6 pages
CSE35 Project Report
No ratings yet
CSE35 Project Report
111 pages
Alzheimer's Disease PHD PPT 2019
100% (1)
Alzheimer's Disease PHD PPT 2019
14 pages
High-Speed Tracking With Kernelized Correlation Filters
No ratings yet
High-Speed Tracking With Kernelized Correlation Filters
14 pages
A Novel Approach To Predicting Youngs Mo PDF
No ratings yet
A Novel Approach To Predicting Youngs Mo PDF
11 pages
Forecasting Municipal Solid Waste Generation Using Artificial Intelligence Modelling Approaches
No ratings yet
Forecasting Municipal Solid Waste Generation Using Artificial Intelligence Modelling Approaches
10 pages
Dhanush - Diabetes Report
No ratings yet
Dhanush - Diabetes Report
4 pages
Social Media Fake News Detection
No ratings yet
Social Media Fake News Detection
6 pages
A System For Automated Detection of Ampoule Injection Impurities
No ratings yet
A System For Automated Detection of Ampoule Injection Impurities
10 pages
Ip - Amodha Infotech - 8549932017 PDF
No ratings yet
Ip - Amodha Infotech - 8549932017 PDF
4 pages
Crop, Fertilizer, & Irrigation Recommendation Using Machine Learning Techniques
No ratings yet
Crop, Fertilizer, & Irrigation Recommendation Using Machine Learning Techniques
9 pages
Predicting Employee Attrition with Data Mining
No ratings yet
Predicting Employee Attrition with Data Mining
6 pages
Mathematical Foundations of Machine Learning
100% (1)
Mathematical Foundations of Machine Learning
340 pages
2022 Internship Guidelines - JSS
No ratings yet
2022 Internship Guidelines - JSS
17 pages
Malawi Cichlids New
100% (2)
Malawi Cichlids New
16 pages
Data Mining Techniques in Analyzing Process Data: A Didactic
No ratings yet
Data Mining Techniques in Analyzing Process Data: A Didactic
11 pages
Clustering Approach in Diabetes Dataset: Submitted By: Submitted To: Dr. Mridu Sahu
No ratings yet
Clustering Approach in Diabetes Dataset: Submitted By: Submitted To: Dr. Mridu Sahu
20 pages
Human Activity Classification Poster
No ratings yet
Human Activity Classification Poster
1 page
L9 Model Assessment
No ratings yet
L9 Model Assessment
26 pages
Malware Detection with ML Techniques
No ratings yet
Malware Detection with ML Techniques
8 pages

Python for Machine Learning Enthusiasts

Uploaded by

Python for Machine Learning Enthusiasts

Uploaded by

Modern Machine Learning in Python

Department of Signal Processing

August 14, 2015

• Machine learning has become an

• Python is becoming increasingly central tool

• numpy: The matrix / numerical analysis layer at the bottom

• It is possible to construct your custom Python environment by

• Python was designed to be a highly

• For example, consider the 2-dimensional

• The data consists of 200 blue crosses and

• Note: In 2-D this can be solved manually,

• In the following we will take a non-mathematical overview of

• Probably the most natural idea is to

• This is the idea behind the Nearest

• The problem with Nearest Neighbor is its 2 Classified as BLUE

• The robustness can be increased by taking

• Nearest neighbor is prone to overlearning: Especially the 1-NN

• A linear classifier learns a linear decision

There exists many algorithms for learning the weights w, including:

2 Classified as BLUE 2 Classified as RED 2 Classified as RED

2 Logistic Regression Probabilities

• The Support Vector Machine can be

x = (x1 , x2 ) =⇒ x0 = (x1 , x2 , x12 , x22 )

• Random forest (RF) is popular tree based

• The RF is based on a collection of Yes No

following measured attributes:

• The problem of decision trees is that they

• After training many trees each with

vote: what is the most frequent label.

• The collection of trees gives a natural way

• On the right, a polynomial model is fitted to 2

• As the order of the polynomial increases, the 1

3 7'th order regularized polynomial fit

can afford to fit only to the major trends. 2

follow the noise patterns. 1

35 Proteomic Mass Spectrum

• 121 cancer patients and 95 healthy

• The data is in csv (comma separated Code:

• The generalization ability of a classifier needs

• Cross-validation is extremely elegant in

• The cross-validator receives the classifier

• A nice property of sklearn is that each

• Now that linear classifiers seem to be

• The code on the right calculates the 90

Coefficients (3.4 % nonzero)

the peaks of the mass spectra. 1.5

with wk the coefficients, xk the mass 5

• One of the most exiting new research directions is Deep

• Application example 1: ICANN MEG Mind Reading

• The competition data consists of MEG recordings while

• Since each measurement is a time series, it cannot be fed to

• As a side product we gain insight on which areas of the brain

• The task of the participants was to train an algorithm to

• Our approach mixed six prediction models by averaging the

• Examples of patch histograms are shown below.

• In the 2014 DecMeg competition, the 100

were shown a face or a non-face using 200

MEG recordings of the brain. 250

measurements with 306 channels; face 0

• In total, 16 training and 7 test subjects; 100

• Intersubject variation is significant. 250

• Our is a hierarchical combination of two layers of classifiers:

• The reliability of each 1st layer classifier 0.02

can be seen from the feature importances 0

of the Random Forest. View Index

• The full time series from individual

• The significance of Python as a language of scientific

You might also like