10 KRR
10 KRR
Machine Learning
Siamac Fazli
                                                                                             1 / 42
Human Categorization   Kernel Methods    Kernel Ridge Regression   Cross-Validation   Summary
Overview
                                                                                         2 / 42
Human Categorization        Kernel Methods          Kernel Ridge Regression         Cross-Validation       Summary
Categorization Revisited
          3                                                        3
                            Positive Examples                                         Positive Examples
                            Negative Examples                                         Negative Examples
          2                                                        2
          1                                                        1
    x2
                                                             x2
          0                                                        0
−1 −1
−2 −2
         −3                                                       −3
          −3   −2      −1   0      1     2      3                  −3     −2   −1      0       1       2   3
                            x                                                          x
                             1                                                           1
                                                                                                               3 / 42
Human Categorization           Kernel Methods    Kernel Ridge Regression   Cross-Validation   Summary
Categorization Revisited
          3
                         Positive Examples
                         Negative Examples
          2
          1
                                                      Non-linear models (such as NN) can
                                                      learn linearly non separable problems
     2
          0
   x
                                                                                                 4 / 42
Human Categorization     Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
Categorization Revisited
                                                                                          5 / 42
Human Categorization    Kernel Methods         Kernel Ridge Regression   Cross-Validation   Summary
Categorization Revisited
Ivan Pavlov
                                   Pavlovian Conditioning
                                   (example from [Jäkel et al., 2008]):
                                         1. Condition dog to salivate after 1000Hz Tone
                                                                                               6 / 42
Human Categorization    Kernel Methods         Kernel Ridge Regression   Cross-Validation   Summary
Categorization Revisited
Ivan Pavlov
                                   Pavlovian Conditioning
                                   (example from [Jäkel et al., 2008]):
                                         1. Condition dog to salivate after 1000Hz Tone
                                         2. Present other tones (900Hz, 1100Hz, . . . )
                                                                                               6 / 42
Human Categorization    Kernel Methods         Kernel Ridge Regression   Cross-Validation   Summary
Categorization Revisited
Ivan Pavlov
                                   Pavlovian Conditioning
                                   (example from [Jäkel et al., 2008]):
                                         1. Condition dog to salivate after 1000Hz Tone
                                         2. Present other tones (900Hz, 1100Hz, . . . )
                                         3. Measure amount of saliva secreted
                                                                                               6 / 42
Human Categorization    Kernel Methods         Kernel Ridge Regression   Cross-Validation   Summary
Categorization Revisited
Ivan Pavlov
                                   Pavlovian Conditioning
                                   (example from [Jäkel et al., 2008]):
                                         1. Condition dog to salivate after 1000Hz Tone
                                         2. Present other tones (900Hz, 1100Hz, . . . )
                                         3. Measure amount of saliva secreted
                                         4. Perceptual similarity ∝ amount of saliva
                                                                                               6 / 42
Human Categorization                          Kernel Methods                      Kernel Ridge Regression                        Cross-Validation   Summary
Categorization Revisited
Categorization Revisited
Categorization Revisited
Categorization Revisited
Categorization Revisited
                                                                                          8 / 42
Human Categorization     Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                µ                 Gaussian Kernel
                                k(µ,x)
                                σ
                                                                          −(µ−x )2
                         x                              k(µ, x ) = e         σ         (1)
             Gaussian Kernel in R1
                                                                                          8 / 42
Human Categorization     Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                          9 / 42
Human Categorization     Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                          9 / 42
Human Categorization     Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                          9 / 42
Human Categorization     Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                          9 / 42
Human Categorization     Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                          9 / 42
Human Categorization     Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                          9 / 42
Human Categorization                Kernel Methods               Kernel Ridge Regression              Cross-Validation                Summary
Multilayer Perceptron
Multilayer Perceptron
Multilayer Perceptron
Multilayer Perceptron
Multilayer Perceptron
Multilayer Perceptron
Kernel Trick
                                                                                           11 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                       12 / 42
Human Categorization       Kernel Methods      Kernel Ridge Regression      Cross-Validation    Summary
Kernel Trick
      Kernel Trick
                          Prediction (e.g. category membership) ŷ
                       given a new data point xnew can be expressed
                           as linear combination of data points xi
                                                  N
                                                  X
                              ŷ = f (xnew ) =          kφ (xnew , xi )αi                      (2)
                                                  i=1
                                                                                                     13 / 42
Human Categorization    Kernel Methods         Kernel Ridge Regression   Cross-Validation    Summary
Linear Kernel
Polynomial Kernel
              Gaussian Kernel
                                                                   2
                                     k(xi , xj ) = e ||xi −xj ||2 /−2σ                      (5)
                                                                                                  14 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                       15 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                       16 / 42
Human Categorization    Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                        17 / 42
Human Categorization   Kernel Methods       Kernel Ridge Regression   Cross-Validation    Summary
          • Training
            Compute weights α ∈ RNtrain ×1 for each training sample
          • Testing
            Compute predictions on test data Xtest
                                                                                               18 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation      Summary
board
                                                                                             19 / 42
Human Categorization   Kernel Methods         Kernel Ridge Regression   Cross-Validation     Summary
                        ∂ERR (w)
                                 = − 2Xy> + 2XX> w + λ2w                                   (10)
                          ∂w
                                    1
                           → w =X (y> − X> w)
                                    λ
                                    |    {z    }
                                                     a
                                        =Xa                                                (11)
                                                                                                  19 / 42
Human Categorization     Kernel Methods    Kernel Ridge Regression   Cross-Validation     Summary
                                          1
                                       a = (y> − X> w)
                                          λ
                                      λa =y> − X> Xa
                                     y> =(X> X + Iλ)a
                                           >        −1 >
                                   → a =(X
                                         | {zX} +Iλ) y                                  (12)
                                             K
                                                                                               20 / 42
Human Categorization     Kernel Methods    Kernel Ridge Regression   Cross-Validation     Summary
                                         1
                                      a = (y> − X> w)>
                                         λ
                                      a =(K + Iλ)−1 y>                                  (13)
                                                                                               21 / 42
Human Categorization       Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
For many algorithms there are kernel versions [Müller et al., 2001].
                                                                                           22 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                       23 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                       24 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                       24 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                       24 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                       24 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
Gaussian Processes
      A Gaussian Process
        • describes a distribution over functions
                                                                                       25 / 42
Human Categorization      Kernel Methods    Kernel Ridge Regression   Cross-Validation   Summary
Gaussian Processes
      A Gaussian Process
        • describes a distribution over functions
        • is completely specified by its mean and covariance function
                                                                                           25 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                Gaussian Processes
           There is a lot to know about Gaussian Processes (GPs),
        most of which can be found in [Rasmussen and Williams, 2005]:
Gaussian Processes
                                                                                                27 / 42
Human Categorization     Kernel Methods        Kernel Ridge Regression     Cross-Validation   Summary
Gaussian Processes
f(x)
                                                                                                27 / 42
Human Categorization     Kernel Methods        Kernel Ridge Regression     Cross-Validation   Summary
Gaussian Processes
f(x)
                                                                                                27 / 42
Human Categorization     Kernel Methods        Kernel Ridge Regression     Cross-Validation   Summary
Gaussian Processes
                                                                                                27 / 42
Human Categorization     Kernel Methods     Kernel Ridge Regression   Cross-Validation       Summary
      notes:
         • x indicates train data and x̂ indicates test data
         • the first and second lines calculate the Gaussian kernel between train and test
             data all at once (K ∈ Rn×m )
                                                                                               28 / 42
Human Categorization      Kernel Methods      Kernel Ridge Regression     Cross-Validation   Summary
                                                                                               29 / 42
Human Categorization     Kernel Methods     Kernel Ridge Regression   Cross-Validation    Summary
                                                                                               30 / 42
Human Categorization    Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
Hyperarameter Selection
                                                                                        31 / 42
Human Categorization               Kernel Methods           Kernel Ridge Regression          Cross-Validation       Summary
f(x)
                                                                                      f(x)
                       x                                          x                                     x
                σ = 0.5, = 0.500                           σ = 1.0, = 0.500                      σ = 2.0, = 0.500
       f(x)
f(x)
                                                                                      f(x)
                       x                                          x                                     x
                                                                                                                      32 / 42
Human Categorization        Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                            33 / 42
Human Categorization                       Kernel Methods                               Kernel Ridge Regression                              Cross-Validation                    Summary
                                                                                   −6                                            −6                                     −6
                                                                   σ=0.2 κ=10                                    σ=0.5 κ=10                                    σ=2.0 κ=10
                                                        3                                             3                                              3
                                Generalization and Model Selection
                                                        2                                             2                                              2
                                                        1                                             1                                              1
x2
x2
                                                                                                                                             x2
                                                        0                                             0                                              0
                         The best model
                                   −1   is the model
                                                  −1 that generalizes
                                                                    −1 best
                                                       −2                                            −2                                           −2
                 Overfitting                           −3
                                                              −2       0
                                                                                    Better
                                                                                    2
                                                                                           fit
                                                                                            −3
                                                                                                            −2       0                2
                                                                                                                                                  −3     Underfitting
                                                                                                                                                          −2     0           2
                                                                       x1                                            x1                                            x1
                                                             −6                                            −6
                       σ=0.2 κ=10
                                     −6        σ=0.2 κ=10          σ=0.2
                                                                   σ=0.5 κ=100
                                                                         κ=10
                                                                                   −6        σ=0.5 κ=10          σ=0.5
                                                                                                                 σ=2.0 κ=100
                                                                                                                       κ=10
                                                                                                                                 −6        σ=2.0 κ=10−6        σ=2.0 κ=100
             3                  3                      33                   3                        33                   3                        3
             2                   2                      22                     2                      22                     2                       2
             1                   1                      11                     1                      11                     1                       1
                          x2
x2
                                                                                                                    x2
                                 0
                                                 x22
                                                                                               x22
                                                                                                                             0
                                                                                                                                             x2
                                                        00                                            00                                             0
       x2
             0
            −1                  −1                     −1
                                                       −1                     −1                     −1
                                                                                                     −1                     −1                    −1
            −2                  −2                     −2
                                                       −2                     −2                     −2
                                                                                                     −2                     −2                    −2
            −3                  −3                   −3
                                                      −3                      −3                   −3
                                                                                                    −3                      −3                    −3
                  −2       0              −2
                                           2       0           2
                                                              −2
                                                              −2        00              −2
                                                                                        22       0           2
                                                                                                            −2
                                                                                                            −2        00              −2
                                                                                                                                      22        0          2
                                                                                                                                                          −2       0         2
                           x1                      x1                  xx1                       x1                  xx1                        x1                 x1
                                                                          1                                             1
                       σ=0.2 κ=100             σ=0.2 κ=100         σ=0.5 κ=100               σ=0.5 κ=100         σ=2.0 κ=100               σ=2.0 κ=100
       Model
          3  is too complex    3                       3            Appropriate complexity
                                                                           3                         3                   3                   Model is too simple
       → Bad
          2   generalization
                     2                                   2          → Good
                                                                        2    generalization
                                                                                   2                                         2               → Bad generalization
             1                   1                       1                     1                       1                     1
                          x2
x2
                                                                                                                    x2
                                 0                                             0                                             0
       x2
x2
x2
             0                                           0                                             0
            −1                  −1                      −1                    −1                      −1                    −1
            −2
                                 σ - kernel
                                −2       −2
                                            width−2                                                   −2
                                                                                                                  κ - −2regularizer
            −3                  −3                      −3                    −3                      −3                    −3
                  −2       0              −2
                                           2       0           2
                                                              −2        0               −2
                                                                                         2       0           2
                                                                                                            −2        0               −2
                                                                                                                                       2        0          2
                           x1                      x1                   x1                       x1                   x1                        x1
                                                                                                                                                                                   34 / 42
Human Categorization       Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                           35 / 42
Human Categorization       Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                           35 / 42
Human Categorization       Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                           35 / 42
Human Categorization       Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                           35 / 42
Human Categorization       Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                           35 / 42
Human Categorization       Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                           35 / 42
Human Categorization        Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                            35 / 42
Human Categorization        Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                            35 / 42
Human Categorization        Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                            35 / 42
Human Categorization        Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                            35 / 42
Human Categorization       Kernel Methods      Kernel Ridge Regression   Cross-Validation   Summary
Cross-Validation
                                                                                              36 / 42
Human Categorization       Kernel Methods          Kernel Ridge Regression   Cross-Validation   Summary
Cross-Validation
              fold 2 [ x1 , x2 , x3 , x4 , x5 , x6 ]
                       | {z } |               {z            }
                           F1test           F1train
                                                                                                  36 / 42
Human Categorization         Kernel Methods          Kernel Ridge Regression   Cross-Validation   Summary
Cross-Validation
              fold 2 [ x1 , x2 , x3 , x4 , x5 , x6 ]
                        | {z } |                {z            }
                             F1test           F1train
              fold 3 . . .
      For each fold:
              Train your model on the training data
              Test your model on the test data
                                                                                                    36 / 42
Human Categorization   Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
              Model Evaluation
              Report mean evaluation score – e.g. accuracy – across folds
              Model Selection
              Take that parameter with the highest mean score across folds
                                                                                       37 / 42
Human Categorization       Kernel Methods         Kernel Ridge Regression      Cross-Validation        Summary
Nested Cross-Validation
                                                                                                         38 / 42
Human Categorization     Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                         39 / 42
Human Categorization     Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
                                                                                         39 / 42
Human Categorization       Kernel Methods    Kernel Ridge Regression   Cross-Validation   Summary
Summary
                                                                                            40 / 42
Human Categorization    Kernel Methods   Kernel Ridge Regression   Cross-Validation   Summary
Further reading
                                                                                        41 / 42
Human Categorization                       Kernel Methods                          Kernel Ridge Regression                             Cross-Validation                        Summary
References
      A. Aizerman, E. Braverman, and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control,
            25:821–837, 1964.
      C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer US, 2007.
      T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. 2003.
      F. Jaekel. Some Theoretical Aspects of Human Categorization Behaviour: Similarity and Generalization. PhD thesis, 2007.
      F. Jäkel, B. Schölkopf, and F. A. Wichmann. Generalization and similarity in exemplar models of categorization: insights from machine learning. Psychon Bull Rev, 15(2):
             256–71, 2008.
      K.-R. Müller, S. Mika, G. Ratsch, K. Tsuda, and B. B. Schölkopf. An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2):
            181–201, 2001.
      K. P. Murphy. Machine Learning: A Probabilistic Perspective. Adaptive Computation and Machine Learning. The MIT Press, 1 edition, 2012. ISBN
            0262018020,9780262018029.
      C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, 2005.
      B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
      J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University Press, 2004.
      R. N. Shepard. Toward a universal law of generalization for psychological science. Science, 237(4820):1317–23, 1987.
42 / 42