BSB663
Image Processing
       Pinar Duygulu
   Slides are adapted from
         Selim Aksoy
Image matching
• Image matching is a fundamental aspect of many problems in
  computer vision.
   •   Object or scene recognition
   •   Solving for 3D structure from multiple images
   •   Stereo correspondence
   •   Image alignment & stitching
   •   Image indexing and search
   •   Motion tracking
• Find “interesting” pieces of the image.
   • Focus attention of algorithms
   • Speed up computation
Image matching applications
    Object recognition: Find correspondences between
    feature points in training and test images.
Image matching applications
Stereo correspondence and 3D reconstruction
Image matching applications
   Two images of Rome from Flickr
Image matching applications
        Two images of Rome from Flickr: harder case
Image matching applications
      Two images from NASA Mars Rover: even harder case
Image matching applications
   Two images from NASA Mars Rover: matching using local features
Image matching applications
Recognition
              Texture recognition
                                    Car detection
Advantages of local features
• Locality
   • features are local, so robust to occlusion and clutter
• Distinctiveness
   • can differentiate a large database of objects
• Quantity
   • hundreds or thousands in a single image
• Efficiency
   • real-time performance achievable
• Generality
   • exploit different types of features in different situations
Local features
• What makes a good feature?
• We want uniqueness.
   • Look for image regions that are unusual.
   • Lead to unambiguous matches in other images.
• How to define “unusual”?
                         0D structure
                                not useful for matching
                            1D structure
                                edge, can be localized in 1D,
                                subject to the aperture problem
                            2D structure
                                corner, can be localized in 2D,
                                good for matching
Local measures of uniqueness
• We should easily recognize the local feature by looking through a
  small window.
• Shifting the window in any direction should give a large change in
  intensity.
          “flat” region:     “edge”:               “corner”:
          no change in all   no change along the   significant change in
          directions         edge direction        all directions
Local features and image matching
• There are three important requirements for feature points to have a
  better correspondence for matching:
   • Points corresponding to the same scene points should be detected
     consistently over different views.
   • They should be invariant to image scaling, rotation and to change in
     illumination and 3D camera viewpoint.
   • There should be enough information in the neighborhood of the points so
     that corresponding points can be automatically matched.
• These points are also called interest points.
Overview of the approach
         interest points
                                               ()
                                             local descriptor
    1.   Extraction of interest points (characteristic locations).
    2.   Computation of local descriptors.
    3.   Determining correspondences.
    4.   Using these correspondences for matching/recognition/etc.
Local features: detection
• We will now talk about one particular feature detection algorithm.
• Idea: find regions that are dissimilar to its neighbors.
                       W
Local features: detection
• Consider shifting the window W by (u,v):
   • How do the pixels in W change?
   • Auto-correlation function measures
     the self similarity of a signal and is   W
     related to the sum-of-squared
     difference.
   • Compare each pixel before and
     after the shift by summing up the
     squared differences (SSD).
   • This defines an SSD “error” of E(u,v):
Local features: detection
    • Taylor Series expansion of I:
    • If the motion (u,v) is assumed to be small, then first order
      approximation is good.
    • Plugging this into the formula on the previous slide...
Local features: detection
• Sum-of-squared differences error E(u,v):
Local features: detection
• This can be rewritten:
                                           [after sum over all (x,y)]
• For the example above:
   • You can move the center of the green window to anywhere on the blue unit circle.
   • Which directions will result in the largest and smallest E values?
Local features: detection
• We want to find (u,v) such that E(u,v) is maximized or minimized:
                                            𝑢𝑇     𝑢
                                E 𝑢, 𝑣 =        H
                                            𝑣      𝑣
•   By definition, we can find these directions by looking at the eigenvectors of H.
•   First eigenvector of H is a unit vector that maximizes E(u,v).
•   Second eigenvector of H is a unit vector that minimizes E(u,v).
Quick eigenvector/eigenvalue review
• Relevant theorem:
            http://fedc.wiwi.hu-berlin.de/xplore/tutorials/mvahtmlnode16.html
Quick eigenvector/eigenvalue review
    • The eigenvectors of a matrix A are the vectors x that satisfy:
    • The scalar  is the eigenvalue corresponding to x
       • The eigenvalues are found by solving:
       • In our case, A = H is a 2x2 matrix, so we have
       • The solution:
    • Once you know , you find x by solving
Local features: detection
    • This can be rewritten:
                           x-
                                           [sum over all (x,y)]
                      x+
    • Eigenvalues and eigenvectors of H:
       • Define shifts with the smallest and largest change (E value).
       • x+ = direction of largest increase in E.
       • + = amount of increase in direction x+.
       • x- = direction of smallest increase in E.
       • - = amount of increase in direction x-.
Local features: detection
• How are +, x+, -, and x- relevant for feature detection?
    • What’s our feature scoring function?
• Want E(u,v) to be large for small shifts in all directions.
    • The minimum of E(u,v) should be large, over all unit vectors [u v].
    • This minimum is given by the smaller eigenvalue (-) of H.
Local features: detection
    • Here’s what you do:
       •   Compute the gradient at each point in the image.
       •   Create the H matrix from the entries in the gradient.
       •   Compute the eigenvalues.
       •   Find points with large response (- > threshold).
       •   Choose those points where - is a local maximum as features.
Local features: detection
    • Here’s what you do:
       •   Compute the gradient at each point in the image.
       •   Create the H matrix from the entries in the gradient.
       •   Compute the eigenvalues.
       •   Find points with large response (- > threshold).
       •   Choose those points where - is a local maximum as features.
Harris detector
    • To measure the corner strength:
           R = det(H) – k(trace(H))2
      where
             trace(H) = 1 + 2
             det(H) = 1 x 2
      (1 and 2 are the eigenvalues of H).
    • R is positive for corners, negative in edge regions, and small in
      flat regions.
    • Very similar to - but less expensive (no square root).
    • Also called the “Harris Corner Detector” or “Harris Operator”.
    • Lots of other detectors, this is one of the most popular.
Harris detector example
Harris detector example
               R values (red high, blue low)
Harris detector example
                Threshold (R > value)
Harris detector example
                 Local maxima of R
Harris detector example
                 Harris features (red)
Local features: descriptors
                                       ()
                                     local descriptor
    • Describe points so that they can be compared.
    • Descriptors characterize the local neighborhood of a
      point.
Local features: matching
• We know how to detect good features.
• Next question: how to match them?
                                  ?
               ()                =            ()
• Vector comparison using a distance measure can be used.
Local features: matching
•   Given a feature in I1, how to find the best match in I2?
    1. Define a distance function that compares two descriptors.
    2. Test all the features in I2, find the one with minimum distance.
                                         50
                                          75
                                         200
                                       feature
                                       distance
Matching examples
Matching examples
Local features: matching
• Matches can be improved using local constraints
   • neighboring points should match
   • angles, length ratios should be similar
                               1
                                                                   1
                          1                             ~1
                        2                         ~2
                                   2
                                                               2
                                               3
      Summary of the approach
• Detection of interest points/regions
   • Harris detector
   • Blob detector based on Laplacian
• Computation of descriptors for each point
   • Gray value patch, differential invariants, steerable filter, SIFT descriptor
• Similarity of descriptors
   • Correlation, Mahalanobis distance, Euclidean distance
• Semi-local constraints
   • Geometrical or statistical relations between neighborhood points
• Global verification
   • Robust estimation of geometry between images
Local features: invariance
    • Suppose you rotate the image by some angle.
       • Will you still pick up the same features?
    • What if you change the brightness?
    • What about scale?
    • We’d like to find the same features regardless of the
      transformation.
       • This is called transformational invariance.
       • Most feature methods are designed to be invariant to
          • Translation, 2D rotation, scale.
       • They can usually also handle
          • Limited 3D rotations.
          • Limited affine transformations (some are fully affine invariant).
          • Limited illumination/contrast changes.
How to achieve invariance?
    Need both of the following:
    1. Make sure your detector is invariant.
       •   Harris is invariant to translation and rotation.
       •   Scale is trickier.
           •   Common approach is to detect features at many scales using a Gaussian
               pyramid (e.g., MOPS).
           •   More sophisticated methods find “the best scale” to represent each feature
               (e.g., SIFT).
    2. Design an invariant feature descriptor.
       •   A descriptor captures the information in a region around the detected
           feature point.
       •   The simplest descriptor: a square window of pixels.
       •   Let’s look at some better approaches…
Rotation invariance for descriptors
• Find dominant orientation of the image patch.
   • This is given by x+, the eigenvector of H corresponding to + (larger
     eigenvalue).
   • Rotate the patch according to this angle.
Multi-scale Oriented Patches (MOPS)
• Take 40x40 square window around detected feature.
   •   Scale to 1/5 size (using prefiltering).
   •   Rotate to horizontal.
   •   Sample 8x8 square window centered at feature.
   •   Intensity normalize the window by subtracting the mean, dividing by the standard deviation
       in the window.
                                                              8 pixels
Multi-scale Oriented Patches (MOPS)
• Extract oriented patches at multiple scales of the Gaussian pyramid.
      Scale Invariant Feature Transform (SIFT)
• The SIFT operator developed by David Lowe is both a detector and a
  descriptor that are invariant to translation, rotation, scale, and other
  imaging parameters.
Overall approach for SIFT
    1. Scale space extrema detection
       •   Search over multiple scales and image locations.
    2. Interest point localization
       •   Fit a model to determine location and scale.
       •   Select interest points based on a measure of stability.
    3. Orientation assignment
       •   Compute best orientation(s) for each interest point region.
    4. Interest point description
       •   Use local image gradients at selected scale and rotation to
           describe each interest point region.
Scale space extrema detection
• Goal: Identify locations and scales that can be repeatably assigned
  under different views of the same scene or object.
• Method: search for stable features across multiple scales using a
  continuous function of scale.
• Prior work has shown that under a variety of assumptions, the best
  function is a Gaussian function.
• The scale space of an image is a function L(x,y,σ) that is produced
  from the convolution of a Gaussian kernel (at different scales) with
  the input image.
Scale space interest points
    • Laplacian of Gaussian (LoG) kernel
    • Scale space detection
       • Find local maxima across scale-space
    • Difference of Gaussian kernel is a close approximation to
      scale-normalized LoG.
Lowe’s pyramid scheme
    For each octave of scale space, the initial image is repeatedly convolved with Gaussian to produce
    the set of scale space images (left). Adjacent Gaussian images are subtracted to produce
    difference of Gaussian images (right). After each octave Gaussian image is downsampled by a
    factor of 2.
Interest point localization
• Detect maxima and minima of
  difference of Gaussian in scale space.
• Each point is compared to its 8
  neighbors in the current image and 9
  neighbors each in the scales above
  and below.
• Select only if it is greater or smaller
  than all the others.                      • For each max or min found, output is the
                                              location and the scale.
Orientation assignment
    • Create histogram of local gradient
      directions computed at selected
      scale.
    • Assign canonical orientation at
      peak of smoothed histogram.
    • Each key specifies stable 2D
      coordinates (x, y, scale,
      orientation).
                                           0   2
Interest point descriptors
• At this point, each interest point has
   • location,
   • scale,
   • orientation.
• Next step is to compute a descriptor for the local image region about
  each interest point that is
   • highly distinctive,
   • invariant as possible to variations such as changes in viewpoint and
     illumination.
Lowe’s interest point descriptor
• Use the normalized circular region about the interest point.
     • Rotate the window to standard orientation.
     • Scale the window size based on the scale at which the point was found.
•   Compute gradient magnitude and orientation at each point in the region.
•   Weight them by a Gaussian window overlaid on the circle.
•   Create an orientation histogram over the 4x4 subregions of the window.
•   4x4 descriptors over 16x16 sample array were used in practice. 4x4 times 8
    directions gives a vector of 128 values.
Lowe’s interest point descriptor
         An input image     Overlayed descriptors
                                       Adapted from www.vlfeat.org
Example applications
• Object and scene recognition
• Stereo correspondence
• 3D reconstruction
• Image alignment & stitching
• Image indexing and search
• Motion tracking
• Robot navigation
Examples: 3D recognition
Examples: 3D reconstruction
Examples: location recognition
Examples: robot localization
Examples: robot localization
    Map continuously built over time
Examples: panaromas
   • Recognize overlap from an unordered set of images and
     automatically stitch together.
   • SIFT features provide initial feature matching.
   • Image blending at multiple scales hides the seams.
        Panorama of Lowe’s lab automatically assembled from 143 images
Examples: panaromas
   Image registration and blending
Examples: panaromas
Sony Aibo
• SIFT usage:
   • Recognize
     charging
     station
   • Communicate
     with visual
     cards
   • Teach object
     recognition
      Photo tourism: exploring photo collections
• Joint work by University of Washington and Microsoft Research
   • http://phototour.cs.washington.edu/
   • http://research.microsoft.com/IVM/PhotoTours/
• Photosynth Technology Preview at Microsoft Live Labs
   • http://photosynth.net/
• Don’t forget to check the cool video and demo at
  http://phototour.cs.washington.edu/.
      Photo tourism: exploring photo collections
• Detect features using SIFT.
      Photo tourism: exploring photo collections
• Detect features using SIFT.
      Photo tourism: exploring photo collections
• Detect features using SIFT.
     Photo tourism: exploring photo collections
• Match features between each pair of images.
     Photo tourism: exploring photo collections
• Link up pairwise matches to form connected components of matches
  across several images.
        Image 1        Image 2       Image 3        Image 4
Photo tourism: exploring photo collections
Photo tourism: exploring photo collections
   Photos are automatically placed inside a sketchy 3D model of the scene;
   an optional overhead map also shows each photo's location.
Photo tourism: exploring photo collections
An info pane on the left shows information about the current image and navigation buttons
for moving around the collection; the filmstrip view on the bottom shows related images;
mousing over these images brings them up as a registered overlay.
Photo tourism: exploring photo collections
   Photographs can also be taken in outdoor natural environments. The
   photos are correctly placed in 3-D, and more free-form geometric models
   can be used for inter-image transitions.
Photo tourism: exploring photo collections
     Annotations entered in one image (upper left) are automatically
     transferred to all other related images.
     Scene summarization for online collections
• http://grail.cs.washington.edu/projects/canonview
           Scene summary browsing         Enhanced 3D browsing