Notation
{ }
||
event (in probability)
set
absolute value of a number, or
ardinality (number of elements) of a set, or determinant of a matrix
kk
square of the norm; sum of the squared
omponents of a
[a, b]
JK
oor; largest integer whi
h is not larger than the argument
()1
()
t
()
N
k
A\B
0
{1} Rd
ve
tor
the interval of real numbers from
to
evaluates to 1 if argument is true, and to 0 if it is false
gradient operator, e.g.,
spe
t to
w)
Ein
(gradient of
Ein (w)
with re-
inverse
pseudo-inverse
transpose (
olumns be
ome rows and vi
e versa)
number of ways to
hoose k obje
ts from N distin
t obje
ts
N!
(equals (N k)!k! where ` !' is the fa
torial)
the set
with the elements from set
removed
zero ve
tor; a
olumn ve
tor whose
omponents are all zeros
d-dimensional Eu
lidean
spa
e with an added `zeroth
oor-
dinate' xed to 1
toleran
e in approximating a target
bound on the probability of ex
eeding
(the approximation
toleran
e)
learning rate (step size in iterative learning, e.g., in sto
has-
regularization parameter
ti
gradient des
ent)
regularization parameter
orresponding to weight budget
penalty for model
omplexity; either a bound on general-
logisti
fun
tion
ization error, or a regularization term
s
s
(s) = e /(1 + e )
z = (x)
feature transform,
Qth-order
polynomial transform
193
Notation
2
A
argmina ()
a
oordinate in the feature transform
, zi = i (x)
probability of a binary out
ome
fra
tion of a binary out
ome in a sample
varian
e of noise
learning algorithm
the value of
at whi
h the minimum of the argument is
a
hieved
B
b
an event (in probability), usually `bad' event
bias
B(N, k)
the bias term in bias-varian
e de
omposition
the bias term in a linear
ombination of inputs, also
alled
w0
maximum number of di
hotomies on
point
C
d
d
dv
,dv
(H)
D
points with a break
bound on the size of weights in the soft order
onstraint
d
d
dimensionality of the input spa
e X = R or X = {1} R
Z
H
data set D = (x1 , y1 ), , (xN , yN ); te
hni
ally not a set,
but a ve
tor of elements (xn , yn ). D is often the training
dimensionality of the transformed spa
e
VC dimension of hypothesis set
set, but sometimes split into training and validation/test
sets.
Dtrain
Dval
E(h, f )
ex
e(h(x), f (x))
en
subset of
is used.
used for training when a validation or test set
validation set; subset of
used for validation.
h and target fun
tion f
e = 2.71828
2
pointwise version of E(h, f ), e.g., (h(x) f (x))
leave-one-out error on example n when this nth example is
error measure between hypothesis
exponent of
in the natural base
ex
luded in training [
ross validation
E[]
Ex []
E[y|x]
Eaug
Ein , Ein (h)
E
v
Eout , Eout (h)
D
Eout
Eout
Eval
Etest
f
g
g (D)
g
expe
ted value of argument
expe
ted value with respe
t to
expe
ted value of
given
augmented error (in-sample error plus regularization term)
in-sample error (training error) for hypothesis
ross validation error
out-of-sample error for hypothesis
out-of-sample error when
is used for training
expe
ted out-of-sample error
validation error
test error
target fun
tion,
nal hypothesis
g: X Y
f: X Y
g H sele
ted
by the learning algorithm;
nal hypothesis when the training set is
average nal hypothesis [bias-varian
e analysis
194
Notation
minus
g
g
h
h
H
H
nal hypothesis when trained using
H(C)
restri
ted hypothesis set by weight budget
H(x1 , . . . , xN )
di
hotomies (patterns of
H
I
The hat matrix [linear regression
g = Ein
h H; h : X Y
gradient, e.g.,
a hypothesis
a hypothesis in transformed spa
e
hypothesis set
hypothesis
set
that
some points
orresponds to
per
eptrons in
transformed spa
e
[soft order
onstraint
x1 , , xN
1)
generated by
on the points
identity matrix; square matrix whose diagonal elements are
and o-diagonal elements are
K
Lq
ln
log2
M
mH (N )
size of validation set
max(, )
N
o()
maximum of the two arguments
q th-order
Legendre polynomial
logarithm in base
logarithm in base
e
2
number of hypotheses
the growth fun
tion; maximum number of di
hotomies generated by
on any
points
number of examples (size of
D)
absolute value of this term is asymptoti
ally negligible
ompared to the argument
O()
absolute value of this term is asymptoti
ally smaller than
P (x)
P (y | x)
P (x, y)
P[]
Q
Qf
R
Rd
s
(marginal) probability or probability density of
a
onstant multiple of the argument
sign()
supa (.)
T
t
tanh()
tra
e()
V
v
x
y
x and y
onditional probability or probability density of
joint probability or probability density of
given
probability of an event
order of polynomial transform
omplexity of
(order of polynomial dening
f)
the set of real numbers
d-dimensional Eu
lidean
P spa
e
s = wt x = i wi xi (i goes from 0 to d or 1 to d
depending on whether x has the x0 = 1
oordinate or not)
sign fun
tion, returning +1 for positive and 1 for negative
supremum; smallest value that is the argument for all a
signal
number of iterations, number of epo
hs
iteration number or epo
h number
hyperboli
tangent fun
tion;
tanh(s) = (es es )/(es +es )
tra
e of square matrix (sum of diagonal elements)
number of subsets in
V -fold
ross validation (V
K = N)
dire
tion in gradient des
ent (not ne
essarily a unit ve
tor)
195
Notation
v
var
w
w
w
wlin
wreg
wPLA
w0
x
x0
unit ve
tor version of
[gradient des
ent
the varian
e term in bias-varian
e de
omposition
weight ve
tor (
olumn ve
tor)
weight ve
tor in transformed spa
e
sele
ted weight ve
tor [po
ket algorithm
weight ve
tor that separates the data
solution weight ve
tor to linear regression
regularized solution to linear regression with weight de
ay
solution weight ve
tor of per
eptron learning algorithm
w to represent bias b
x X . Often a
olumn ve
tor x Rd or x
x is used if input is s
alar.
oordinate to x, xed at x0 = 1 to absorb the bias
added
oordinate in weight ve
tor
the input
{1} Rd .
added
term in linear expressions
X
X
XOR
input spa
e whose elements are
y
y
the output
xX
matrix whose rows are the data inputs
xn
[linear regression
ex
lusive OR fun
tion (returns 1 if the number of 1's in its
input is odd)
yn
y
Y
Z
Z
yY
olumn ve
tor whose
omponents are the data set outputs
[linear regression
estimate of
[linear regression
output spa
e whose elements are
yY
transformed input spa
e whose elements are
z = (x)
zn = (xn )
matrix whose rows are the transformed inputs
[linear regression
196