0% found this document useful (0 votes)
40 views4 pages

· (· · ·) - · - k · k ⌊ · ⌋ (a, b) a b J ·K ∇ ∇E E (w) w (·) (·) (·) k N A / B A B 0 (1) × R d ǫ δ ǫ η λ λ C Ω θ θ(s) = e / (1 + e) Φ z = Φ (x) Φ Q

1) The document defines common mathematical notation used in machine learning and statistics. It defines notation for sets, vectors, matrices, probabilities, estimators, hypothesis sets, loss functions, and other concepts. 2) Special symbols are introduced for concepts like gradients, inverses, transposes, floors, intervals, indicators, and various operators. 3) Notation is also defined for learning algorithms and their components, including regularization parameters, hypothesis selection, training and test errors, and other algorithmic details.

Uploaded by

hprof1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views4 pages

· (· · ·) - · - k · k ⌊ · ⌋ (a, b) a b J ·K ∇ ∇E E (w) w (·) (·) (·) k N A / B A B 0 (1) × R d ǫ δ ǫ η λ λ C Ω θ θ(s) = e / (1 + e) Φ z = Φ (x) Φ Q

1) The document defines common mathematical notation used in machine learning and statistics. It defines notation for sets, vectors, matrices, probabilities, estimators, hypothesis sets, loss functions, and other concepts. 2) Special symbols are introduced for concepts like gradients, inverses, transposes, floors, intervals, indicators, and various operators. 3) Notation is also defined for learning algorithms and their components, including regularization parameters, hypothesis selection, training and test errors, and other algorithmic details.

Uploaded by

hprof1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Notation


{ }
||


event (in probability)


set
absolute value of a number, or ardinality (number of elements) of a set, or determinant of a matrix

kk

square of the norm; sum of the squared omponents of a

[a, b]
JK

oor; largest integer whi h is not larger than the argument

()1
()
t
()
 
N
k

A\B
0
{1} Rd

ve tor
the interval of real numbers from

to

evaluates to 1 if argument is true, and to 0 if it is false


gradient operator, e.g.,
spe t to

w)

Ein

(gradient of

Ein (w)

with re-

inverse
pseudo-inverse
transpose ( olumns be ome rows and vi e versa)
number of ways to hoose k obje ts from N distin t obje ts
N!
(equals (N k)!k! where ` !' is the fa torial)
the set

with the elements from set

removed

zero ve tor; a olumn ve tor whose omponents are all zeros

d-dimensional Eu lidean

spa e with an added `zeroth oor-

dinate' xed to 1
toleran e in approximating a target
bound on the probability of ex eeding

(the approximation

toleran e)

learning rate (step size in iterative learning, e.g., in sto has-

regularization parameter

ti gradient des ent)


regularization parameter orresponding to weight budget

penalty for model omplexity; either a bound on general-

logisti fun tion

ization error, or a regularization term


s
s

(s) = e /(1 + e )
z = (x)

feature transform,

Qth-order

polynomial transform

193

Notation

2
A

argmina ()

a oordinate in the feature transform

, zi = i (x)

probability of a binary out ome


fra tion of a binary out ome in a sample
varian e of noise
learning algorithm
the value of

at whi h the minimum of the argument is

a hieved

B
b

an event (in probability), usually `bad' event

bias
B(N, k)

the bias term in bias-varian e de omposition

the bias term in a linear ombination of inputs, also alled

w0
maximum number of di hotomies on
point

C
d
d
dv ,dv (H)
D

points with a break

bound on the size of weights in the soft order onstraint


d
d
dimensionality of the input spa e X = R or X = {1} R

Z
H
data set D = (x1 , y1 ), , (xN , yN ); te hni ally not a set,
but a ve tor of elements (xn , yn ). D is often the training
dimensionality of the transformed spa e
VC dimension of hypothesis set

set, but sometimes split into training and validation/test

sets.

Dtrain
Dval
E(h, f )
ex
e(h(x), f (x))
en

subset of
is used.

used for training when a validation or test set

validation set; subset of

used for validation.

h and target fun tion f


e = 2.71828
2
pointwise version of E(h, f ), e.g., (h(x) f (x))
leave-one-out error on example n when this nth example is
error measure between hypothesis
exponent of

in the natural base

ex luded in training [ ross validation

E[]
Ex []
E[y|x]
Eaug
Ein , Ein (h)
E v
Eout , Eout (h)
D
Eout

Eout
Eval
Etest
f
g
g (D)
g

expe ted value of argument


expe ted value with respe t to
expe ted value of

given

augmented error (in-sample error plus regularization term)


in-sample error (training error) for hypothesis

ross validation error


out-of-sample error for hypothesis
out-of-sample error when

is used for training

expe ted out-of-sample error


validation error
test error
target fun tion,
nal hypothesis

g: X Y

f: X Y
g H sele ted

by the learning algorithm;

nal hypothesis when the training set is

average nal hypothesis [bias-varian e analysis

194

Notation

minus

g
g
h

h
H
H

nal hypothesis when trained using

H(C)

restri ted hypothesis set by weight budget

H(x1 , . . . , xN )

di hotomies (patterns of

H
I

The hat matrix [linear regression

g = Ein
h H; h : X Y

gradient, e.g.,
a hypothesis

a hypothesis in transformed spa e


hypothesis set
hypothesis

set

that

some points

orresponds to

per eptrons in

transformed spa e

[soft order

onstraint

x1 , , xN

1)

generated by

on the points

identity matrix; square matrix whose diagonal elements are

and o-diagonal elements are

K
Lq
ln
log2
M
mH (N )

size of validation set

max(, )
N
o()

maximum of the two arguments

q th-order

Legendre polynomial

logarithm in base
logarithm in base

e
2

number of hypotheses
the growth fun tion; maximum number of di hotomies generated by

on any

points

number of examples (size of

D)

absolute value of this term is asymptoti ally negligible ompared to the argument

O()

absolute value of this term is asymptoti ally smaller than

P (x)
P (y | x)
P (x, y)
P[]
Q
Qf
R
Rd
s

(marginal) probability or probability density of

a onstant multiple of the argument

sign()

supa (.)
T
t
tanh()
tra e()
V
v

x
y
x and y

onditional probability or probability density of


joint probability or probability density of

given

probability of an event
order of polynomial transform
omplexity of

(order of polynomial dening

f)

the set of real numbers

d-dimensional Eu lidean
P spa e
s = wt x = i wi xi (i goes from 0 to d or 1 to d
depending on whether x has the x0 = 1 oordinate or not)
sign fun tion, returning +1 for positive and 1 for negative
supremum; smallest value that is the argument for all a
signal

number of iterations, number of epo hs


iteration number or epo h number
hyperboli tangent fun tion;

tanh(s) = (es es )/(es +es )

tra e of square matrix (sum of diagonal elements)


number of subsets in

V -fold

ross validation (V

K = N)

dire tion in gradient des ent (not ne essarily a unit ve tor)

195

Notation

v
var
w

w
w
wlin
wreg
wPLA
w0
x
x0

unit ve tor version of

[gradient des ent

the varian e term in bias-varian e de omposition


weight ve tor ( olumn ve tor)
weight ve tor in transformed spa e

sele ted weight ve tor [po ket algorithm


weight ve tor that separates the data
solution weight ve tor to linear regression
regularized solution to linear regression with weight de ay
solution weight ve tor of per eptron learning algorithm

w to represent bias b
x X . Often a olumn ve tor x Rd or x
x is used if input is s alar.
oordinate to x, xed at x0 = 1 to absorb the bias

added oordinate in weight ve tor


the input
{1} Rd .
added

term in linear expressions

X
X
XOR

input spa e whose elements are

y
y

the output

xX

matrix whose rows are the data inputs

xn

[linear regression

ex lusive OR fun tion (returns 1 if the number of 1's in its


input is odd)

yn

y
Y
Z
Z

yY

olumn ve tor whose omponents are the data set outputs


[linear regression

estimate of

[linear regression

output spa e whose elements are

yY

transformed input spa e whose elements are

z = (x)
zn = (xn )

matrix whose rows are the transformed inputs


[linear regression

196

You might also like