Introduction To Compute Rvision
Introduction To Compute Rvision
TO COMPUTER VISION
                 (Computer Vision and Robotics)
Contents
CHAPTER 1 3
1.1 Introduction 3
CHAPTER 1
COMPUTER VISION
1.1    Introduction
      This chapter describes the vision-based control strategies for pick-
and-place robotic application. The software application of these strategies
is accomplished by using MATLAB/SIMULINK of MathWorks -
Company. The vision algorithms are used to identify the interest-objects
models and send their position and orientation data to the data acquisition
system then to the microcontroller which solves the inverse kinematics
and orders the robot to pick these objects to place them in a target goal.
Microcontroller Data-Acquisition PC
                                      2
                                                                 Computer Vision
{C}
{B}
{W}
                          {S}
                                              {G}
                                                    α   {T}
             B
                 TW = BTS . STG . (WTG=T)-1                                 (5.1)
      Where BTS and (WTG=T)-1 are known from the physical dimensions of
                                                                            B
the robot. The robot variables are included in robot matrix                     TW.
To get BTW, the matrix STG which gives the position and orientation of the
goal relative to the frame {S} must be determined as seen
in section 1.7.
                                         3
                                                                  Computer Vision
         It worth noting that the (x, y)      image   will be determined through
 difficult image processing algorithms as discussed in the following
 sections. However, determining the (X, Y) world of the object centroid in
 (mm) needs a good calibration process see section 1.5.
                                                                          X, Y
                                                                          mm
 End-effector                       Robot Variables
   Position                          d1, θ2, θ3, θ4,
                      Robot               θ5
                                                       Robot Inverse
                                                         Kinematics
 control system. Certain errors can arise from different sources like
                    Fig. 1.3 Open-loop
 inverse kinematic inaccuracy,         control and
                               robot precision block diagram
                                                   camera  calibration.
                        Video             Video          x, y
           Camera                       Processing      Pixel              Camera
                                                                          Calibration
                                                                                    X, Y
                                          4
                                                                                    mm
                                      5
                                                            Computer Vision
image at that point. When (x), (y), and the amplitude values of (f) are all
finite, discrete quantities, we call the image a digital image. The field of
digital image processing refers to processing digital images by means of a
digital computer. Note that a digital image is composed of a finite number
of elements, each of which has a particular location and value. These
elements are referred to as picture elements, image elements, and pixels.
Pixel is the term most widely used to denote the elements of a digital
image.
Image Coordinates
      Assume that an image f(x, y) is sampled so that the resulting image
has M rows and N columns so, the image is of size M * N. The values of
the coordinates are discrete quantities. For notational clarity and
convenience, we shall use integer values for these discrete coordinates.
The image origin is usually defined to be at (x, y) = (0, 0). The next
coordinate values along the first row of the image are (x, y) = (0, 1). The
notation (0, 1) is used to signify the second sample along the first row. It
does not mean that these are the actual values of physical coordinates
when the image was sampled.
    Fig. 1.6 shows this coordinate convention. Note that (x) ranges from
0 to (M–1) and (y) from 0 to (N–1) in integer increments.
                                     6
                                                                       Computer Vision
      The coordinate system in Fig. 1.6 and the preceding discussion lead
to the following representation for a digitized image:
                   [                                                           ]
                           f (0 , 0)     f (0 , 1)   . . ..     f (0 , N −1)
                           f (1 , 0)     f (1 ,1 )   . . ..     f (1 , N−1 )
                       .. ..         .. ..           . . .. . .. .
                       f ( M−1 , 0 ) f ( M −1, 1)    . . .. f ( M−1 , N −1)
          f(x, y) =
                                           7
                                                             Computer Vision
in front of the lens. Fig. 1.7 illustrates the pinhole camera model with the
image plane located in front of the lens.
  Fig. 1.7 Illustration of the pinhole camera model, image plane in front
            of the lens to simplify calculations [Courtesy of Maria
            Magnusson Seger].
                                     8
                                                            Computer Vision
     The point where the Z axis pierces the image plane is known as the
principal point and the Z axis as the principal axis. The origin of the
image coordinate system is chosen, for now, as the principal point and its
x- and y axes are aligned with the X and Y axes of the camera coordinate
system. All of this is illustrated in Fig. 1.8.
                                        9
                                                             Computer Vision
             [ ] ¿ ¿[ ]
                                        X
               x                        Y
               y                        Z                            (1.2)
               z =                      1
                                                                    (1.3)
                      ximage = P Xcam
P= ¿ ¿¿ ¿ (1.4)
     The camera matrix derived above assumes that the origin of the
image coordinate system is at the principal point p. However, this is not
usually the case in practice. If the coordinates of the principal point (P)
are (px, py) in the image coordinate system, see Fig. 1.9.
                                            10
                                                           Computer Vision
            [ ] ¿ ¿[ ]
                                      X
             x                        Y                                (1.5)
             y                        Z
             z =                      1
                       [ ]
                      f 0 Px
                                                                       (1.6)
                      0 f Py
                   K= 0 0 1
P= ¿ ¿¿ ¿ (1.7)
            ximage =   ¿ ¿¿ ¿                      Xcam
                                                                       (1.8)
     The next step is to introduce the world coordinate system and relate
it to the camera coordinate system.
                                      11
                                                            Computer Vision
                                                                      (1.9)
                  Xworld= [ X   Y   Z 1 ]T
    Fig. 1.10 Camera geometry in a general world coordinate system.
                                      12
                                                           Computer Vision
note that Xcam = 0 if Xworld= C i.e. the camera coordinate is zero at the
camera center, as expected. From (1.9) we can write:
                                 []
                                  X
                                  Y                                 (1.11)
                  [R −R C
             Xcam= 0
                    T
                      1        ] [ ]
                                  Z   R
                                  1 = 0T
                                             t
                                             1 Xworld
1.4.3
                                     13
                                                                    Computer Vision
true. In particular, if the number of pixels per unit distance in image
coordinates are mx and my in the x and y directions, respectively, then the
calibration matrix becomes:
                 [                    ][                  ][               ]
                mx      0         0        f 0        Px  αx   0      x0
                0       my        0        0 f        Py  0    αy     y0       (1.13)
K= 0 0 1 0 0 1 = 0 0 1
                             [                        ]
                                 αx    s         x0
                                 0     αy        y0                            (1.14)
                                 0     0         1
                       K=
                          C
                                [
                              TS=
                                    ¿ C RS ¿ C tS
                                     0T      1      ]                   (1.16)
                [                                           ]
                    0 .9992 0. 0027 −0. 0399
                    −0 .002 0. 9999 0 . 0136                             (1.15)
     C
                    0 .0399 −0 .0162 0. 9991
         RS =
     C
              [−226 . 1425     −166 . 5716 1. 24∗103 ]                  (1.16)
         tS =
            [                                           ]
                1. 064∗103       0     0
                                     3
                 −0 . 8118 1 .0658∗10 0                                  (1.17)
                258 .3899 295 .171     1
         k=
                                                 15
                                                                Computer Vision
[ ][                                        ]
 x image       1. 064∗103       0     0
 y image                            3
                −0 . 8118 1 .0658∗10 0          .
 zimage        258 .3899 295 .171     1
           =
                                                           ][             ]
                                                                X world
               [
                   0 .9992 0. 0027 −0. 0399 −226. 1425          Y world       (1.18)
                   −0 .002 0. 9999  0 . 0136 −166 . 5716        Z world
                   0 .0399 −0 .0162  0 . 9991 1 .24∗103            1
                                                    ][          ]
                                                    X world
[ ][
 x image           0.0106     0   −0 .0004 −2. 4079 Y world                   (1.19)
 y image              0   0 .0107 0. 0002 −1 .7735    0
 zimage            0.0026 0 .0030   0     −1. 0636     1
           =
      Where ximage and yimage are in pixels and determined from the
designed algorithms blob analysis. So, after identifying the values
of (x, y)image in [pixels], we can calculate the (X,Y) world in [mm] of the
target objects from (1.20) and (1.21).
                                       16
                                                            Computer Vision
     The proposed algorithm should guide the gripper to grasp the objects
from its centroid, so the centroid of objects should be obtained. The blob
analysis block in the Simulink software is very similar to the “region
props” function in MATLAB. They both measure a set of properties for
each connected object in an image file. The properties include area,
centroid, bounding box, major and minor axis, orientation and so on. The
details of the proposed Simulink models will be explained in the next
section. In the following sub-section three different image processing
algorithms are discussed.
     The method tracks and estimates the velocity of the arm robot only.
It assumes all objects in the scene are rigid, no shape changes allowed.
This assumption is often relaxed to local rigidity. This assumption assures
that optical flow actually captures real motions in a scene rather than
expansions, contractions, deformations and/or shears of various scene
objects.
                                    17
                                                               Computer Vision
computation of differential optical flow is, essentially, a two-step
procedure:
     The optical flow methods try to calculate the motion between two
image frames which are taken at times (t) and (t + δt) at every voxel
position. These methods are called differential since they are based on
local Taylor series approximations of the image signal; that is, they
use partial derivatives with respect to the spatial and temporal
coordinates.
                                         18
                                                              Computer Vision
techniques such as thresholding, median filtering are then sequentially
applied to obtain labeled regions for statistical analysis.
     The idea of this project is derived from the tracking section of the
demos listed in MATLAB computer vision toolbox website. The
algorithm consists of software simulation on Simulink.
                                      19
                                                            Computer Vision
     The Simulink model for this algorithm mainly consists of
three parts, which are “Velocity Estimation (yellow block)”, “Velocity
Threshold Calculation (green block)” and “Blob analysis (Centroid
Determination) (red block)”, see Fig.1.12.
     For the velocity estimation, the optical flow block (yellow block) is
used in the Simulink built in library. The optical flow block reads image
intensity value and estimate the velocity of object motion. The velocity
estimation can be either between two images or between current frame
and Nth frame back, see Fig. 1.12.
     After obtaining the velocity from the Optical Flow block, calculating
the velocity threshold is needed in order to determine what is the
minimum velocity magnitude corresponding to a moving object (green
subsystem block, see Fig. 1.12).
                                     20
                                                            Computer Vision
     After that, a comparison of the input velocity with mean velocity
value will be done using relational operator block (gray block). If the
input velocity is greater than the mean value, it will be mapped to one and
zero otherwise. The output of this comparison became a threshold
intensity matrix and passed to a median filter block (green block) &
closing block (yellow block) to remove noise, see Fig. 1.13.
                                    21
                                                                Computer Vision
be visualized as a 3d matrix with the main colors set out on the axis. The
values for the main colors vary from 0 to 1. Each color is coded with
three values, a value for red, blue and green. In this color space, an
imported image on a computer is thus transformed into 3 matrices with
values per pixel for the representing main color (Fig. 1.15).
                                     22
                                                           Computer Vision
colors. The color definition can be done with two 2D images, but that is
still very difficult.
     The Simulink model for this algorithm mainly consists of two parts,
which are “Identifying RGB of target objects and Gripper Label” and
“Boundary Box, Centroid Determination”. For the RGB identifying, color
analyzer program "Camtasia Studio program" is used to identify the RGB
values of objects so, a Simulink subsystem block called "RGB Filter" is
done for the proposed RGB values input see Fig. 1.16 and Fig. 1.17.
                                    23
                                                            Computer Vision
     After obtaining the RGB values from the RGB Filter block, it will be
passed to the blob analysis block in order to obtain the boundary box,
centroid for the object and the corresponding box area see Fig.1.14.
                                     24
                                                            Computer Vision
which the moving object present that current frame is subtracted with
background frame to detect moving object. This method is simple and
easy to realize, and accurately extracts the characteristics of target data,
but it is sensitive to the change of external environment, so it is
applicable to the condition that the background is known.
                                     25
                                                               Computer Vision
Fig. 1.19 Subsystem for the taken median over time of each pixel
      The absolute value of the difference between the whole picture and
the background is taken to eliminate negatives. Then a threshold is
established, so anything above it is in the foreground and becomes white,
anything below it is in the background and becomes black see Fig .1.20.
Fig. 1.20 Subsystem that determines the threshold and blob analysis
Simulink can perform blob detection on the white objects and determine
the    points   needed   to   draw   a    rectangle   around    them    see
Fig. 1.14. Unfortunately, this software does not work perfectly due to lag.
The system takes almost a full second to get a new frame and analyze it
for background and object detection.
26