3
          What kinds of feature might one use to
establish a set of correspondences between these images?
                                                                                 5
                             Points and patches
•    Required as a pre-cursor to
    computing camera pose             computing sparse point clouds from
                                       multiple views (SfM, Visual SLAM)
    aligning different images         object instance and category recognition
    stitching image mosaics
    performing video stabilization
                                                                                        6
Keypoint detection, description and matching
                                          𝒅 𝒌 ,𝒅 𝒌 ,⋯𝒅 𝒌        𝒅 𝒌′ , 𝒅 𝒌′ , ⋯ 𝒅 𝒌′𝒎
image
                                                Matching the features of keypoints
                                                detected in different views
                                                If 𝒅 𝒌 − 𝒅 𝒌            𝒕𝒉
 Detecting                                      ⋯ matched!
 keypoints
             Describing the feature of each keypoint
                   𝒅 𝒌 ,𝒅 𝒌 ,⋯𝒅 𝒌
                                                            7
Feature detectors (keypoint detection)
   Notice how some patches can be
   localized or matched with higher accuracy than others.
                                                                      8
       Keypoints need to be easy for matching
• the simplest possible matching criterion for comparing two image
  patches, i.e., their summed square difference (SSD):
                         ⊂           𝐼 𝒙 +𝒖
                               𝐼 𝒙
• Stability metric with respect to small variations in position Δu,
   an auto-correlation (AC) function or surface
                          ⊂
                  Examples of                            9
     Three auto-correlation surfaces EAC(Δu)
 Good
 feature
 to track!
 𝛥𝑢                          𝛥𝑢        𝛥𝑢
              𝛥𝑢                  𝛥𝑢           𝛥𝑢
𝛥𝑢                      𝛥𝑢             𝛥𝑢
                                  𝛥𝑢                𝛥𝑢
                   𝛥𝑢
                                                                                                   10
                        Approximation of EAC(Δu)
                                                            𝜕𝐼 𝒙 𝜕𝐼 𝒙
 𝐸    Δ𝒖 =        𝐼 𝒙 + 𝚫𝒖 − 𝐼 𝒙              ≈       𝐼 𝒙 +      ,              ⋅ 𝚫𝒖 − 𝐼 𝒙
                                                              𝜕𝑥   𝜕𝑦
              ⊂                                   ⊂
             𝜕𝐼 𝒙 𝜕𝐼 𝒙
  =        (      ,    ) ⋅ 𝚫𝒖         =        𝐼 𝛥𝑢 + 𝐼 𝛥𝑢
               𝜕𝑥   𝜕𝑦
      ⊂                                   ⊂
  =       (𝐼 𝛥𝑢 + 2𝐼 𝐼 𝛥𝑢 𝛥𝑢 + 𝐼 𝛥𝑢 ) = 𝛥𝑢                 𝐼 + 2𝛥𝑢 𝛥𝑢       𝐼 𝐼 + 𝛥𝑢       𝐼
      ⊂                                                ⊂                ⊂              ⊂
                        𝐼         𝐼 𝐼
                    ⊂         ⊂           𝛥𝑢
   = 𝛥𝑢      𝛥𝑢
                                          𝛥𝑢      = 𝚫𝒖𝑻 𝑨𝚫𝒖
                        𝐼 𝐼       𝐼
                   ⊂          ⊂
                                                                            𝒙                  𝒙
autocorrelation matrix A                                                        ,
                                             11
            Uncertainty ellipse
corresponding to an eigenvalue analysis of
       the autocorrelation matrix A
                        ⊂        ⊂
                    ⊂                ⊂
                                                                      12
    Good features to track [Shi and Tomasi, CVPR94]
                       ⊂           ⊂
                   ⊂                   ⊂
Key Points (Feature Points) :
  A patch defined by a window (i.e. 25x25) is accepted as a
  candidate feature if in the center of the window both eigenvalues
  of , and , exceed a predefined threshold .
                                                                                      13
 Outline of a basic feature detection algorithm
1. Compute the horizontal and vertical derivatives 𝐼 (𝒙)and 𝐼 𝒙 by        𝐼 (𝒙)
   convolving the input image 𝐼(𝒙) with derivatives of Gaussians, or
   Sobel filter, etc.
2. Compute the three images, 𝐼 (𝒙), 𝐼 (𝒙)𝐼 (𝒙), and 𝐼 (𝒙).
3. Convolve each of these three images with a larger Gaussian.         𝐼 (𝒙)𝐼 (𝒙)
4. Consider 𝑨 for each pixel from these three images.
                              𝐼 𝒙           𝐼 𝒙 𝐼 𝒙
                𝑨=        ⊂             ⊂                                 𝐼 (𝒙)
                          𝐼 𝒙 𝐼 𝒙               𝐼 𝒙
                      ⊂                     ⊂
                                                                         𝜆 (𝒙)
5. Compute eigenvalues 𝜆 and 𝜆 (𝜆 < 𝜆 ) for scalar interest
   measure.
6. Find local maxima above a certain threshold and report them as
   detected feature point locations.
                                                                          keypoints
                                                                       𝒌 ,𝒌 ,⋯𝒌
                                           14
              Popular Keypoint detectors
• Shi and Tomasi (1994)
   –   minimum eigenvalue λ0
• Harris and Stephens (1988)
• Triggs (2004)
• Brown, Szeliski, and Winder (2005)
   – harmonic mean
                                                      15
Isocontours of popular keypoint detection functions
                    uneven distribution of feature                                                   16
                      points across the image
Adaptive
Non-Maximal
Suppression
                                 Top 250                                  Top 500
only detect features that are both local maxima and whose response value is significantly (10%) greater
than that of all of its neighbors within a radius r
                  Feature descriptors
We know how to detect good points
Next question: How to match them?
                              ?
Lots of possibilities (this is a popular research area)
   – Simple option: match square windows around the point
   – SIFT, ORB, AKAZE, etc…
                                                                             18
                      Feature descriptors
• After detecting features (keypoints), we must match them.
   – we must determine which features come from corresponding locations in
     different images.
• A simple error metrics: the sum of squared differences (SSD) or
  normalized cross-correlation (NCC)
   – But it is working for almost translational, and same size
• The local appearance of features will change in :
   – Orientation
   – Scale
   – Affine / Perspective deformations
   …
                                    Invariance
Suppose we are comparing two images I1 and I2 (I2 may be a transformed version of I1)
Same features regardless of the transformation: transformational invariance
    – Translation, 2D rotation, scale
    – Limited 3D rotations (SIFT works up to about 60 degrees)
    – Limited affine transformations (some are fully affine invariant)
    – Limited illumination/contrast changes
                                          Keypoint
                                                                                                      20
           Scale invariant feature transform (SIFT)
                                                          16 pixel
                                          16
                                          pixel
        Aligned to dominant orientation
                                             (a) image gradients (b) keypoint descriptor
Image gradients : (𝐼 , 𝐼 )                        A weighted gradient orientation histogram is then
Orientations: tan (𝐼 /𝐼 )                         computed in each subregion, using trilinear
                                                  interpolation.
Magnitudes: (𝐼 + 𝐼 )
are computed at each pixel and weighted                                      8bins x 16 subregions
by a Gaussian fall-off function (blue circle).
                  𝒙                 𝒙
                      ,
                                                                                  21
Scale-space feature detection using a sub-octave Difference of Gaussian pyramid
                      (Keypoint detector proposed in SIFT)
                                                                                   22
                         Feature matching
 Establish some preliminary feature matches between two (or more images.
ORB: an efficient alternative to SIFT or SURF [Ethan Rublee et al., ICCV 2011]
                   ORB (Oriented FAST and Rotated BRIEF)
         https://docs.opencv.org/3.4/d1/d89/tutorial_py_orb.html
Open CV feature point detection and description
 https://docs.opencv.org/3.4/db/d27/tutorial_py_table_of_contents_feature2d.html
                                                                                        23
              Matching strategy and error rates
The assumption for the feature descriptors:
 Euclidean distances in feature space can be used for ranking potential matches.
The simplest matching strategy:
 to set a threshold and to return all matches from other images within this threshold
                 𝑑        𝑑                              𝑑     𝑑
    Nearest Neighbor Distances Ratio NNDR=
             AKAZE(Accelerated KAZE)                            24
           [Alcantarilla et al. BMCV2013]
http://www.robesafe.com/personal/pablo.alcantarilla/kaze.html
                                                                                                   25
Machine learning for feature detectors and descriptors
•   FAST and FASTER (Rosten, Porter, and Drummond 2010), one of the first learned detectors
•   Learning covariant feature detectors (Lenc and Vedaldi 2016)
•   Learning to assign orientations to feature points (Yi, Verdie et al. 2016)
Jointly optimizing the detectors and descriptors in a single (multi-head) pipeline
• LIFT, learned invariant feature transforms (Yi, Trulls et al. 2016)
• SuperPoint, self-supervised interest point detection and description (DeTone, Malisiewicz, and
    Rabinovich 2018) https://github.com/magicleap/SuperPointPretrainedNetwork
• LF-Net, learning local features from images (Ono, Trulls et al. 2018)
•   Key.Net (Barroso-Laguna, Riba et al. 2019), which uses a combination of handcrafted and
    learned CNN features to produce a method that produces state-of-the-art results on the
    HPatches benchmark (Balntas, Lenc et al. 2017)
•   D2D, Describe-to-Detect, (Tian, Balntas et al. 2020), which extracts dense local feature
    descriptors and then keeps the ones that have high saliency
                                                                                                     26
                 Large-scale matching and retrieval
• A large (huge) number of objects in the database vs a query image
     – techniques are needed to quickly narrow down the search to a few likely images
•   Content-based image retrieval (CBIR) or Instance retrieval
     – the problem of finding a particular object in a large collection
•   Similar as document retrieval
     – the frequency of occurrence of particular words in a document for quick match with a query.
• Visual Words
    the high-dimensional feature descriptors that occur in each image must first be mapped into
    discrete visual words.
                                                        27
Typical pipeline for feature-based instance retrieval
        28
Lines
                                                               29
                   Hough Transform
• Vote possible line parameters for each detected edge point
original
oriented
                                                  30
LSD: a Line Segment Detector
     https://doi.org/10.5201/ipol.2012.gjmr-lsd
                                                                   31
                     Contour Tracking
Locating boundary curves corresponding to object boundaries
1. Snakes [Kass, Witkin, and Terzopoulos 1988]
    an energy-minimizing, two-dimensional spline curve that
    evolves (moves) towards image features such as strong edges.
2. Intelligent Scissors [Mortensen and Barrett 1995]
   allow the user to sketch in real time a curve that clings to
   object boundaries.
                                               32
 Snakes (Kass, Witkin, and Terzopoulos 1988)
f(s)=(x(s),y(s))
f(i)=(x(i),y(i))
                                                                      33
                       Intelligent scissors
the system optimize the contour in real time as the user is drawing
                                                                       34
                          Segmentation
image segmentation is one of the oldest and most widely studied problems
             experimental comparisons                             35
            on human-labeled databases
• Berkeley Segmentation Dataset and Benchmark
   [Martin, Fowlkes, Tal et al. 2001]
http://www.eecs.berkeley.edu/Research/Projects/CS/vision/groupi
   ng/segbench/
   Many of the more recent image segmentation algorithms report
   comparative results on this database.
• A database of foreground and background segmentations
   [by Alpert, Galun, Basri et al. 2007]
http://www.wisdom.weizmann.ac.il/~vision/Seg_Evaluation_DB/in
   dex.html
                                                                              36
                 Mean shift and mode finding
•Model the feature vectors associated with each pixel (e.g., color and position)
as samples from an unknown probability density function
•Try to find clusters (modes) in this distribution
                                                                                   37
               How can we find these clusters?
• The k-means and mixtures of Gaussians techniques
   – a parametric model of the density function
   – the density is the superposition of a small number of simpler distributions
   – the locations (centers) and shape (covariance) can be estimated.
• Mean shift
   – implicitly models the density function
     using a smooth continuous non-parametric model
   – smoothes the distribution
   – finds its peaks
   – the regions of feature space that correspond to each peak
   http://comaniciu.net/Papers/MsAnalysis.pdf
                                                                                            38
                                   K-means
•   implicitly models the probability density as a superposition of spherically symmetric
    distributions
•   given the number of clusters k: it is supposed to find
•   iteratively updates the cluster center location based on the samples that are closest
    to each center
•   initialized by randomly sampling k centers from the input feature vectors
                                                                                     39
                       Mixture of Gaussians
•   each cluster center is augmented by a covariance matrix whose values are re-
    estimated from the corresponding samples
•   a Mahalanobis distance is used to associate input samples with cluster centers
        xi: Input samples,      k: cluster centers,    k: covariance estimates
                                                                            40
                               Mean shift
           https://docs.opencv.org/4.x/d7/d00/tutorial_meanshift.html
The key to mean shift:
• a technique for efficiently finding peaks in this high-dimensional data
  distribution without ever computing the complete function explicitly
how to estimate the density function given a sparse set of samples?
>>> the simplest approaches is to just smooth the data
     for higher dimensions, computationally prohibitive to evaluate f(x)
                                                                    41
              multiple restart gradient descent
• Starting at some guess for a local maximum, yk (random input of xi)
• Computes the gradient of the density estimate f’(x) at yk
• Takes an uphill step in that direction
      where
            clustering in the joint domain                              42
                 of color and location
• Spatial Domain
  the spatial coordinates of the image xs = (x, y)
• Range Domain
   the color values xr = (r, g, b) or = (L*,a*,b*)
five-dimensional space xj
                                                (hs, hr, M)
                                                = (16, 19, 40)
                                                where spatial regions
                                                containing less than
                                                M pixels are
                                                eliminated.
                                              43
                  Superpixel segmentation
SLIC(Simple Linear Iterative Clustering)
http://www.sanko-shoko.net/note.php?id=mpfg
                                                                                                                44
Visual comparison of superpixels produced by various methods. The
average superpixel size in the upper left of each image is 100 pixels,
and 300 in the lower right. Alternating rows show each segmented image
followed by a detail of the center of each image.
R.Achanta, A.Shaji, K.Smith, A.Lucchi, P.Fua, and S.Susstrunk, "SLIC superpixels compared to state-of-the-art
superpixel methods", IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI), 2011.
                                                                                   45
                               Summary
• Keypoint detection, description, and matching
   – Derivatives of image intensity is used for detecting features
   – Feature descriptors should be invariant to camera pose, illumination, …etc.
   – Keypoint matching is one of the basic technologies
      • 3D reconstruction, image retrieval, AR/MR, etc…
• Line features are also useful especially for artificial targets.
• Learning based approach is also getting more popular.
• Grouping pixels by bottom-up segmentation techniques