Ninio, J. and Stevens, K. A. (2000) Variations on the Hermann grid: an extinction illusion.
Perception, 29, 1209-1217.
Fundamental matrix
Let p be a point in left image, p’ in right image
                                  l                 l’
                                      p        p’
Epipolar relation
   • p maps to epipolar line l’
   • p’ maps to epipolar line l
Epipolar mapping described by a 3x3 matrix F
                         𝑝′𝑇 𝐹𝑝 = 0
Fundamental matrix
This matrix F is called
   • the “Essential Matrix”
       – when image intrinsic parameters are known
   • the “Fundamental Matrix”
       – more generally (uncalibrated case)
Can solve for F from point correspondences
   • Each (p, p’) pair gives one linear equation in entries of F
                           𝑝′𝑇 𝐹𝑝 = 0
   • F has 9 entries, but really only 7 or 8 degrees of freedom.
Today’s lecture
• Depth Estimation from Stereo Matching (Sparse
  correspondence to Dense Correspondence)
CS 4495 Computer Vision – A. Bobick     Motion and Optic Flow
                      Stereo Matching
Stereo image rectification
Stereo image rectification
    • Reproject image planes
      onto a common plane
      parallel to the line
      between camera centers
    • Pixel motion is horizontal
      after this transformation
    • Two homographies (3x3
      transform), one for each
      input image reprojection
    ➢ C. Loop and Z. Zhang. Computing
      Rectifying Homographies for Stereo
      Vision. IEEE Conf. Computer Vision
      and Pattern Recognition, 1999.
Rectification example
The correspondence problem
• Epipolar geometry constrains our search, but we still have a
  difficult correspondence problem.
Fundamental Matrix + Sparse correspondence
Fundamental Matrix + Dense correspondence
SIFT + Fundamental Matrix + RANSAC
       Building Rome in a Day
        By Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, Richard Szeliski
        Communications of the ACM, Vol. 54 No. 10, Pages 105-112
Sparse to Dense Correspodence
       Building Rome in a Day
        By Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, Richard Szeliski
        Communications of the ACM, Vol. 54 No. 10, Pages 105-112
Structure from motion (or SLAM)
     • Given a set of corresponding points in two or more
       images, compute the camera parameters and the 3D
       point coordinates
                        ?
        Camera 1
        R1,t1      ?   Camera 2
                        R2,t2     ?        ?
                                               Camera 3
                                                R3,t3        Slide credit:
                                                            Noah Snavely
Structure from motion ambiguity
• If we scale the entire scene by some factor k and, at the same
  time, scale the camera matrices by the factor of 1/k, the
  projections of the scene points in the image remain exactly the
  same:
                                  1 
                         x = PX =  P (k X)
                                  k 
         It is impossible to recover the absolute scale of the scene!
How do we know the scale of image content?
Bundle adjustment
     • Non-linear method for refining structure and motion
     • Minimizing reprojection error
                                                         2
                     E (P, X) =  D(x ij , Pi X j )
                                 m    n
                                i =1 j =1
                                 Xj
                 P1Xj
                        x1j                        x3j
                                            P3Xj
                              P2Xj    x2j
                P1
                                                         P3
                                      P2
CS 4495 Computer Vision – A. Bobick                             Motion and Optic Flow
                      Correspondence problem
                                                Multiple match
                                                hypotheses
                                                satisfy epipolar
                                                constraint, but
                                                which is correct?
               Figure from Gee & Cipolla 1999
CS 4495 Computer Vision – A. Bobick                                     Motion and Optic Flow
 Correspondence problem
 • Beyond the hard constraint of epipolar geometry, there are “soft” constraints to
    help identify corresponding points
     • Similarity
     • Uniqueness
     • Ordering
     • Disparity gradient
 • To find matches in the image pair, we will assume
    • Most scene points visible from both views
    • Image regions for the matches are similar in appearance
CS 4495 Computer Vision – A. Bobick                                                         Motion and Optic Flow
 Dense correspondence search
                         For each epipolar line
                             For each pixel / window in the left image
                                       • compare with every pixel / window on same epipolar line
                                         in right image
                                       • pick position with minimum match cost (e.g., SSD,
                                         normalized correlation)
               Adapted from Li Zhang
CS 4495 Computer Vision – A. Bobick                                                     Motion and Optic Flow
                     Correspondence search with similarity constraint
                                         Left                       Right
                        scanline
                                                    Matching cost
                                                                            disparity
                         • Slide a window along the right scanline and compare
                           contents of that window with the reference window in
                           the left image
                         • Matching cost: SSD or normalized correlation
CS 4495 Computer Vision – A. Bobick                                     Motion and Optic Flow
                     Correspondence search with similarity constraint
                                      Left               Right
                        scanline
                                                         SSD
CS 4495 Computer Vision – A. Bobick                                     Motion and Optic Flow
                     Correspondence search with similarity constraint
                                      Left                Right
                        scanline
                                                       Norm. corr
CS 4495 Computer Vision – A. Bobick                    Motion and Optic Flow
 Correspondence problem
                 Intensity
                 profiles
                                      Source: Andrew Zisserman
CS 4495 Computer Vision – A. Bobick                                                           Motion and Optic Flow
 Correspondence problem
                                      Neighborhoods of corresponding points are
                                      similar in intensity patterns.
                                                                             Source: Andrew Zisserman
CS 4495 Computer Vision – A. Bobick   Motion and Optic Flow
 Correlation-based window matching
CS 4495 Computer Vision – A. Bobick   Motion and Optic Flow
 Correlation-based window matching
CS 4495 Computer Vision – A. Bobick   Motion and Optic Flow
 Correlation-based window matching
CS 4495 Computer Vision – A. Bobick   Motion and Optic Flow
 Correlation-based window matching
CS 4495 Computer Vision – A. Bobick                                   Motion and Optic Flow
 Correlation-based window matching
                                      ???
                                            Textureless regions are
                                            non-distinct; high
                                            ambiguity for matches.
CS 4495 Computer Vision – A. Bobick                            Motion and Optic Flow
                      Effect of window size
                                              Source: Andrew Zisserman
CS 4495 Computer Vision – A. Bobick                                                    Motion and Optic Flow
                                       Effect of window size
                                                      W=3                    W = 20
                             Want window large enough to have sufficient intensity
                             variation, yet small enough to contain only pixels with
                             about the same disparity.
               Figures from Li Zhang
CS 4495 Computer Vision – A. Bobick                              Motion and Optic Flow
                                      Left image   Right image
CS 4495 Computer Vision – A. Bobick                             Motion and Optic Flow
 Results with window search
                         Window-based matching   Ground truth
                           (best window size)
CS 4495 Computer Vision – A. Bobick                         Motion and Optic Flow
 Better solutions
 • Beyond individual correspondences to estimate disparities:
 • Optimize correspondence assignments jointly
   • Scanline at a time (e.g. dynamic programming)
   • Full 2D grid (e.g. graph cuts)
   • Approximate 2D solution (e.g. semi-global matching)
CS 4495 Computer Vision – A. Bobick                              Motion and Optic Flow
 Scanline stereo
 • Try to coherently match pixels on the entire scanline
 • Different scanlines are still optimized independently
                                      Left image   Right image
Robert Collins
CSE486, Penn State
                     Matching using Epipolar Lines
                     Left Image             Right Image
    For a patch in left image
   Compare with patches along
    same row in right image
                                     Match Score Values
Robert Collins
CSE486, Penn State
                     Matching using Epipolar Lines
                     Left Image             Right Image
    Select patch with highest
       match score.
    Repeat for all pixels in         Match Score Values
    left image.
Robert Collins
CSE486, Penn State
                      Example: 5x5 windows
                        NCC match score
           Computed disparities         Ground truth
  Black pixels: bad disparity values,
  or no matching patch in right image
Robert Collins
CSE486, Penn State
                     Occlusions: No matches
                                       Left image
                                       Right image
Robert Collins
CSE486, Penn State
                          Effects of Patch Size
                 5x5 patches                11x11patches
       Smoother in some areas            Loss of finer details
Robert Collins
               Adding Intra-Scanline Consistency
CSE486, Penn State
               So far, each left image patch has been matched
               independently along the right epipolar line.
               This can lead to errors.
               We would like to enforce some consistency
               among matches in the same row (scanline).
Robert Collins
CSE486, Penn State
                                                             Disparity Space Image
            First we introduce the concept of DSI.
            The DSI for one row represents pairwise match scores
            between patches along that row in the left and right image.
                                                               Pixels along left scanline
                                                                               Pixel i
                                                                                            C(i,j) = Match score
                     Pixels along right scanline                                            for patch centered
                                                                                            at left pixel i with
                                                                                            patch centered at
                                                   Pixel j                                  right pixel j.
Robert Collins
CSE486, Penn State
                      Disparity Space Image (DSI)
                     Left Image             Right Image
                                     Dissimilarity Values
                                        (1-NCC) or SSD
Robert Collins
CSE486, Penn State
                      Disparity Space Image (DSI)
                     Left Image             Right Image
                                     Dissimilarity Values
                                        (1-NCC) or SSD
Robert Collins
CSE486, Penn State
                      Disparity Space Image (DSI)
                     Left Image             Right Image
                                     Dissimilarity Values
                                        (1-NCC) or SSD
Robert Collins
CSE486, Penn State
                      Disparity Space Image (DSI)
                     Left Image               DSI
                                           Enter each vector of
                                           match scores as a
                                           column in the DSI
       Dissimilarity Values
Robert Collins
CSE486, Penn State
                           Disparity Space Image
                                    Left scanline
          Right scanline
Robert Collins
CSE486, Penn State
                           Disparity Space Image
                                         Left scanline
                                         Invalid entries due to constraint
                                         that disparity <= high value
                                         64 in this case)
          Right scanline
     Invalid entries due to constraint
     that disparity >= low value
     (0 in this case)
Robert Collins
CSE486, Penn State
                     DSI and Scanline Consistency
                     Assigning disparities to all pixels in left scanline now
                     amounts to finding a connected path through the DSI
        Start
                                                                                End
Robert Collins
CSE486, Penn State
                         Lowest Cost Path
           We would like to choose the “best” path.
           Want one with lowest “cost” (Lowest sum of
           dissimilarity scores along the path)
                                   ?
                                                        ?
Robert Collins
CSE486, Penn State
                         Cox et.al. Stereo Matching
                       Occluded                       i-1,j-1                  i-1,j
                       from right
                                                                      match
        Occluded           match
                                                                              Occluded
        from left                                                             from left
                                                     i,j-1      Occluded          i,j
        Three cases:                                            from right
             – Matching patches. Cost = dissimilarity score
             – Occluded from right. Cost is some constant value.
             – Occluded from left. Cost is some constant value.
                     C(i,j)= min([ C(i-1,j-1) + dissimilarity(i,j)
                                   C(i-1,j) + occlusionConstant,
                                   C(i,j-1) + occlusionConstant]);
Robert Collins
CSE486, Penn State
                     Cox et.al. Stereo Matching
  Start                            Recap: want to find lowest
                                   cost path from upper left to
                                   lower right of DSI image.
                                   At each point on the path we
                                   have three choices: step left,
                                   step down, step diagonally.
                            End
                                   Each choice has a well-defined
                                   cost associated with it.
   This problem just screams out for Dynamic Programming!
   (which, indeed, is how Cox et.al. solve the problem)
Robert Collins
CSE486, Penn State
                       Real Scanline Example
                     DSI                             DP cost matrix
                                       (cost of optimal path from each point to END)
              Every pixel in left column now is marked with
              either a disparity value, or an occlusion label.
              Proceed for every scanline in left image.
Robert Collins
CSE486, Penn State
                          Example
     Result of DP alg              Result without DP (independent pixels)
  Result of DP alg. Black pixels = occluded.
Robert Collins
CSE486, Penn State
                          Occlusion Filling
           Simple trick for filling in gaps caused by occlusion.
                                = left occluded
           Fill in left occluded pixels with value from the
           nearest valid pixel preceding it in the scanline.
 Similarly, for right occluded, look for valid pixel to the right.
Robert Collins
CSE486, Penn State
                                    Example
                     Result of DP alg with occlusion filling.
Robert Collins
CSE486, Penn State
                                       Example
     Result of DP alg with occlusion filling.   Result without DP (independent pixels)
Robert Collins
CSE486, Penn State
                                       Example
     Result of DP alg with occlusion filling.   Ground truth
CS 4495 Computer Vision – A. Bobick                                                                      Motion and Optic Flow
 Stereo with 2D smoothness constraint
                         • What defines a good stereo correspondence?
                            1. Match quality
                                      •   Want each pixel to find a good match in the other image
                                2.    Smoothness
                                      •   If two pixels are adjacent, they should (usually) move about
                                          the same amount
CS 4495 Computer Vision – A. Bobick                                                                   Motion and Optic Flow
 Optimizing for match quality and smoothness (in any direction)
                        I1                        I2                        D
                                      W1(i )              W2(i+D(i ))                         D(i )
                                        E =  Edata ( I1 , I 2 , D) +  Esmooth ( D)
                  Edata =  (W1 (i ) − W2 (i + D(i)) )
                                                           2
                                                                Esmooth =         (D(i) − D( j ))
                                                                            neighbors i , j
                                  i
                   • Energy functions of this form can be minimized using
                      graph cuts
                       Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate
                       Energy Minimization via Graph Cuts, PAMI 2001 Source: Steve Seitz
CS 4495 Computer Vision – A. Bobick                             Motion and Optic Flow
 Results with window search
                         Window-based matching   Ground truth
                           (best window size)
CS 4495 Computer Vision – A. Bobick                                                                   Motion and Optic Flow
 Better results…
                                      Graph cut method                                 Ground truth
                 Boykov et al., Fast Approximate Energy Minimization via Graph Cuts,
                   International Conference on Computer Vision, September 1999.
CS 4495 Computer Vision – A. Bobick                                      Motion and Optic Flow
 Semi-global matching
            • Approximate the full smoothness optimization by
              considering 8 or 16 directions in two or three
              passes.
            • Optimization looks like scanline, dynamic
              programming stereo, but with a 2d notion of
              smoothness
  Stereo Processing by Semi-Global Matching and Mutual Information. Hirschmuller,
  PAMI 2007. 3500+ citations
CS 4495 Computer Vision – A. Bobick   Motion and Optic Flow
 Semi-global matching
CS 4495 Computer Vision – A. Bobick            Motion and Optic Flow
 https://vision.middlebury.edu/stereo/eval3/
CS 4495 Computer Vision – A. Bobick                                  Motion and Optic Flow
 Stereo Depth Estimation Challenges
 • Low-contrast ; textureless image regions
 • Occlusions
 • Violations of brightness constancy (e.g., specular reflections)
 • Really large baselines (foreshortening and appearance change)
 • Camera calibration errors
CS 4495 Computer Vision – A. Bobick   Motion and Optic Flow
 Quizlet
Active stereo with structured light
   • Project “structured” light patterns onto the object
       • Simplifies the correspondence problem
       • Allows us to use only one camera
                                     camera
                         projector
   L. Zhang, B. Curless, and S. M. Seitz. Rapid Shape Acquisition Using Color Structured
   Light and Multi-pass Dynamic Programming. 3DPVT 2002
Kinect: Structured infrared light
                  http://bbzippo.wordpress.com/2010/11/28/kinect-in-infrared/
iPhone X
 iPhone 12 switched to lidar
 (time of flight)
Self-driving efforts use both lidar and stereo
State of the art lidar