Introduction to Object Recognition
Outline
•   The Problem of Object Recognition
•   Approaches to Object Recognition
•   Requirements and Performance Criteria
•   Representation Schemes
•   Matching Schemes
•   Example Systems
•   Indexing
•   Grouping
•   Error Analysis
              Problem Statement
• Given some knowledge of how certain objects
  may appear and an image of a scene possibly
  containing those objects, report which objects are
  present in the scene and where.
                              Recognition should be:
                               (1) invariant to view point
                               changes and object transformations
                               (2) robust to noise and occlusions
                           Challenges
• The appearance of an object can have a large range of
  variation due to:
   –   photometric effects
   –   scene clutter
   –   changes in shape (e.g.,non-rigid objects)
   –   viewpoint changes
• Different views of the same object can give rise to
  widely different images !!
       Object Recognition Applications
•   Quality control and assembly in industrial plants.
•   Robot localization and navigation.
•   Monitoring and surveillance.
•   Automatic exploration of image databases.
         Human Visual Recognition
• A spontaneous, natural activity for humans and
  other biological systems.
   – People know about tens of thousands of different
     objects, yet they can easily distinguish among them.
   – People can recognize objects with movable parts or
     objects that are not rigid.
   – People can balance the information provided by
     different kinds of visual input.
              Why Is It Difficult?
• Hard mathematical problems in understanding
  the relationship between geometric shapes and
  their projections into images.
• We must match an image to one of a huge number
  of possible objects, in any of an infinite number of
  possible positions (computational complexity)
      Why Is It Difficult? (cont’d)
• We do not understand the recognition
  problem
       What do we do in practice?
• Impose constraints to simplify the problem.
• Construct useful machines rather than
  modeling human performance.
   Approaches Differ According To:
• Knowledge they employ
  – Model-based approach (i.e., based on explicit model of
    the object's shape or appearance)
  – Context-based approach (i.e., based on the context in
    which objects may be found)
  – Function-based approach (i.e., based on the function
    for which objects may serve)
    Approaches Differ According To:
                      (cont’d)
• Restrictions on the form of the objects
  – 2D or 3D objects
  – Simple vs complex objects
  – Rigid vs deforming objects
• Representation schemes
  – Object-centered
  – Viewer-centered
   Approaches Differ According To:
                        (cont’d)
• Matching scheme
  – Geometry-based
  – Appearance-based
• Image formation model
  – Perspective projection
  – Affine transformation (e.g., planar objects)
  – Orthographic projection + scale
                   Requirements
• Viewpoint Invariant
  – Translation, Rotation, Scale
• Robust
  – Noise (i.e., sensor noise)
  – Local errors in early processing modules (e.g., edge
    detection)
  – Illumination/Shadows
  – Partial occlusion (i.e., self and from other objects)
  – Intrinsic shape distortions (i.e., non-rigid objects)
             Performance Criteria
• Scope
  – What kind of objects can be recognized and in what
    kinds of scenes ?
• Robustness
  – Does the method tolerate reasonable amounts of noise
    and occlusion in the scene ?
  – Does it degrade gracefully as those tolerances are
    exceeded ?
      Performance Criteria (cont’d)
• Efficiency
  – How much time and memory are required to search the
    solution space ?
• Accuracy
  – Correct recognition
  – False positives (wrong recognitions)
  – False negatives (missed recognitions)
Object-centered Representation (cont’d)
• Two different matching approaches:
    (1) Derive a similar object-centered description from
    the scene and match it with the models (e.g. using
    “shape from X” methods).
    (2) Apply a model of the image formation process on
    the candidate model to back-project it onto the scene
    (camera calibration required).
          Predicting New Views
• There is some evidence that the human visual
  system uses a “viewer-centered” representation for
  object recognition.
• It predicts the appearance of objects in images
  obtained under novel conditions by generalizing
  from familiar images of the objects.
Predicting New Views (cont’d)
           Familiar Views
             Predict Novel View
                        Matching Schemes
(1) Geometry-based
 explore correspondences
  between model and scene
 features
 (2) Appearance-based
  represent objects from all
 possible viewpoints and all
 possible illumination
 directions.
         Geometry-based Matching
• Advantage: efficient in “segmenting” the object
  of interest from the scene and robust in handling
  “occlusion”
• Disadvantage: rely heavily on feature extraction
  and their performance degrades when imaging
  conditions give rise to poor segmentations.
         Appearance-based Matching
• Advantage: circumvent the feature extraction
  problem by enumerating many possible object
  appearances in advance.
• Disadvantages: (i) difficulties with segmenting the
  objects from the background and dealing with
  occlusions, (ii) too many possible appearances, (iii)
  how to sample the space of appearances ?
     Model-Based Object Recognition
• The environment is rather constraint and recognition
  relies upon the existence of a set of predefined objects.
              Goals of Matching
• Identify a group of features from an unknown scene
  which approximately match a set of features from a
  known view of a model object.
• Recover the geometric transformation that the model
  object has undergone
            Transformation Space
• 2D objects (2 translation, 1 rotation, 1 scale)
• 3D objects, perspective projection (3 rotation, 3
  translation)
• 3D objects, orthographic projection + scale
  (essentially 5 parameters and a constant for depth)
         Indexing-based Recognition
• Preprocessing step: groups of model features are
  used to index the database and the indexed locations
  are filled with entries containing references to the
  model objects and information that later can be used
  for pose recovering.
• Recognition step: groups of scene features are used to
  index the database and the model objects listed in the
  indexed locations are collected into a list of candidate
  models (hypotheses).
                         References
•   E. Grimson and T. Lozano-Perez, "Localizing overlapping parts by
    searching the interpretation tree", IEEE Pattern Analysis and
    Machine Intelligence, vol. 9, no. 4, pp. 469-482, July 1987.
•   D. Huttenlocher and S. Ullman, "Recognizing solid objects by
    alignment with an image", International Journal of Computer
    Vision, vol. 5, no. 2, pp. 195-212, 1990.
•   Y. Lamdan, J. Schwartz, and H. Wolfson, "Affine invariant model-
    based object recognition", IEEE Trans. on Robotics and
    Automation, vol. 6, no. 5, pp. 578-589, October 1990.
•   Rigoutsos I. & Hummel R., "A Bayesian approach to model matching
    with geometric hashing", CVGIP: Image Understanding, 62, 11-26,
    1995.
                   References (cont’d)
•   D. Clemens and D. Jacobs, "Space and time bounds on indexing 3D
    models from 2D images", IEEE Pattern Analysis and Machine
    Intelligence, vol. 13 no. 10, pp. 1007-1017, 1991.
•   D. Thompson and J. Mundy, "Three dimensional model matching from
    an unconstrained viewpoint", IEEE Conference on Robotics and
    Automation, pp. 208-220, 1987.
•   D. Ballard, "Generalizing the hough transform to detect arbitrary
    patterns", Pattern Recognition, vol. 13, no. 2, pp. 111-122, 1981.
•   H. Murase and S. Nayar, "Visual learning and recognition of 3D
    objects from appearance", International Journal of Computer Vision,
    vol. 14, pp. 5-24, 1995.
                   References (cont’d)
•   M. Turk and A. Pentland, "Eigenfaces for Recognition", Journal of
    Cognitive Neuroscience, Vol. 3, pp. 71-86, 1991.
•   D. Jacobs, "Robust and efficient detection of salient convex groups",
    IEEE Transactions on Pattern Analysis and Machine Intelligence,
    vol. 18, no. 1, pp. 23-37, 1996.
•   Bowyer and C. Dyer, "Aspect graphs: an introduction and survey of
    recent results", International Journal of Imaging Systems and
    Technology, vol. 2, pp. 315-328, 1990.