SURF-Face: Face Recognition Under Viewpoint
Consistency Constraints
                       Philippe Dreuw, Pascal Steingrube, Harald Hanselmann and Hermann Ney
 Human Language Technology and Pattern Recognition, RWTH Aachen University, Aachen, Germany
  Introduction                                                                         Databases
 I Most face recognition approaches are sensitive to registration errors               I AR-Face
  . rely on a very good initial alignment and illumination                              . variations in illumination
 I We propose/analyze:                                                                  . many different facial expressions
  . grid-based and dense extraction of local features                                  I CMU-PIE
  . block-based matching accounting for different                                       . variations in illumination (frontal images
    viewpoints and registration errors                                                    from the illumination subset)
  Feature Extraction                                                                   Results: Manually Aligned Faces
                                                        Orig.    IP          Grid      I AR-Face: 110 classes, 770 train, 770 test
 I Interest point based feature extraction                                              Descriptor       Extraction              # Features                             Error Rates [%]
  . SIFT or SURF interest point detector                                                                                                                           Maximum Grid Grid-Best
  . leads to a very sparse description                                                  SURF-64          IPs                     164   ×   5.6 (avg.)                 80.64 84.15       84.15
                                                                                        SIFT             IPs                     128   ×   633.78 (avg.)                1.03 95.84      95.84
 I Grid-based feature extraction
                                                                                        SURF-64          64x64-2    grid         164   ×   1024                         0.90 0.51        0.90
  . overlaid regular grid                                                               SURF-128         64x64-2    grid         128   ×   1024                         0.90 0.51        0.38
  . leads to a dense description                                                        SIFT             64x64-2    grid         128   ×   1024                       11.03 0.90         0.64
                                                                                        U-SURF-64        64x64-2    grid         164   ×   1024                         0.90 1.03        0.64
                                                                                        U-SURF-128       64x64-2    grid         128   ×   1024                         1.55 1.29        1.03
                                                                                        U-SIFT           64x64-2    grid         128   ×   1024                        0.25 0.25        0.25
  Feature Description
                                                                                       I CMU-PIE: 68 classes, 68 train (“one-shot” training), 1360 test
 I Scale Invariant Feature Transform (SIFT)
                                                                                        Descriptor       Extraction              # Features                             Error Rates [%]
  . 128-dimensional descriptor, histogram of gradients, scale invariant
                                                                                                                                                                   Maximum Grid Grid-Best
 I Speeded Up Robust Features (SURF)
                                                                                        SURF-64          IPs                     164   ×   6.80 (avg.)                93.95 95.21       95.21
  . 64-dimensional descriptor, histogram of gradients, scale invariant                  SIFT             IPs                     128   ×   723.17 (avg.)              43.47 99.33       99.33
 I face recognition: invariance w.r.t. rotation is often not necessary                  SURF-64          64x64-2    grid         164   ×   1024                       13.41 4.12         7.82
  . rotation dependent upright-versions U-SIFT, U-SURF-64, U-SURF-128                   SURF-128         64x64-2    grid         128   ×   1024                       12.45 3.68         3.24
                                                                                        SIFT             64x64-2    grid         128   ×   1024                       27.92 7.00         9.80
                                                                                        U-SURF-64        64x64-2    grid         164   ×   1024                        3.83 0.51        0.66
  Feature Matching                                                                      U-SURF-128       64x64-2    grid         128   ×   1024                         5.67 0.95        0.88
                                                                                        U-SIFT           64x64-2    grid         128   ×   1024                       16.28 1.40         6.41
 I Recognition by Matching
  . nearest neighbor matching strategy
                                                                                       Results: Unaligned Faces
  . descriptor vectors extracted at keypoints in a test image X are compared
    to all descriptor vectors extracted at keypoints from the reference images         I Automatically aligned by Viola & Jones                           I Manually aligned faces
    Yn, n = 1, · · · , N by the Euclidean distance                                          Descriptor             Error Rates [%]
  . decision rule:                       n                                                                        AR-Face CMU-PIE
                                                X                 	o
                X → r(X) = arg max max                  δ(xi, Yn,c)                         SURF-64                  5.97      15.32
                                          c       n
                                                         xi∈X                               SURF-128                 5.71      11.42                      I Unaligned faces
                                                                                            SIFT                     5.45       8.32
  . additionally, a ratio constraint is applied in δ(xi, Yn,c)
                                                                                            U-SURF-64                5.32       5.52
 I Viewpoint Matching Constraints                                                           U-SURF-128               5.71      4.86
  . maximum matching: unconstrained                                                         U-SIFT                  4.15        8.99
  . grid-based matching: absolute box constraints
  . grid-based best matching: absolute box constraints, overlapping
 I Postprocessing                                                                      Results: Partially Occluded Faces
  . RANSAC-based outlier removal
                                                                                       I AR-Face: 110 classes, 110 train (“one-shot” training), 550 test
  . RANSAC-based system combination
                                                                                            Descriptor                                          Error Rates [%]
                                                                                                                AR1scarf AR1sun                ARneutral AR2scarf AR2sun Avg.
  Matching Examples for the AR-Face and CMU-PIE Database                                    SURF-64                 2.72 30.00                       0.00      4.54 47.27 16.90
                                                                                            SURF-128                1.81 23.63                       0.00      3.63 40.90 13.99
  Feature   Maximum   Grid    Grid-Best       Maximum     Grid   Grid-Best   Feature        SIFT                    1.81 24.54                       0.00      2.72 44.54 14.72
                                                                                            U-SURF-64               4.54 23.63                       0.00      4.54 47.27 15.99
                                                                                            U-SURF-128              1.81 20.00                       0.00      3.63 41.81 13.45
                                                                                            U-SIFT                 1.81 20.90                       0.00      1.81 38.18 12.54
                                                                                            U-SURF-128+R            1.81 19.09                       0.00      3.63 43.63 13.63
                                                                                            U-SIFT+R                2.72 14.54                       0.00     0.90 35.45 10.72
                                                                                            U-SURF-128+U-SIFT+R    0.90 16.36                       0.00       2.72 32.72 10.54
  SIFT                                                                       SURF
                                                                                       Conclusions
                                                                                       I Grid-based local feature extraction instead of interest points
                                                                                       I Local descriptors:
                                                                                        . upright descriptor versions achieved better results
  U-SIFT                                                                     U-SURF     . SURF-128 better than SURF-64
 I Matching results for the AR-Face (left) and the CMU-PIE database (right)            I System robustness: manually aligned/unaligned/partially occluded faces
  . maximum matching show false classification examples                                 . SURF more robust to illumination
  . grid matchings show correct classification examples                                 . SIFT more robust to changes in viewing conditions
  . upright descriptor versions reduce the number of false matches                     I RANSAC-based system combination and outlier removal
                                                                                                          Created with LATEXbeamerposter http://www-i6.informatik.rwth-aachen.de/~dreuw/latexbeamerposter.php
http://www-i6.informatik.rwth-aachen.de                                                                                                             <surname>@cs.rwth-aachen.de