Object recognition using eigenvectors
Ovidiu Ghita1 and Paul F. Whelan2
Vision Systems Laboratory, School of Electronic Engineering,
Dublin City University, Glasnevin, Dublin 9, Ireland
ABSTRACT
A method for object recognition and pose estimation for robotic bin picking is presented. The approach discussed is a
variant on current approaches to eigenimage analysis. Compared to traditional approaches which use object geometry only
(shape invariants), the implementation described uses the eigenspace determined by processing the eigenvalues and
eigenvectors of the image set. The image set is obtained by varying pose whilst maintaining a constant level of illumination in
space, and the eigenspace is computed for each object of interest. For an unknown input image, the recognition algorithm
projects this image to each eigenspace and the object is recognised using space partitioning methods which determine the
object and the position in space. Several experimental results have been obtained to demonstrate the robustness of this
method when applied to the robotic bin picking task.
Keywords: 3-D, bin picking, robot, eigenspace, object recognition, image set
2. INTRODUCTION
One of the main aims of an intelligent robotic vision system is to recognise objects and to compute their position in space.
The vision recognition task must be performed as close to real time as possible. Therefore the recognition algorithm must be
computationally inexpensive.
In this paper we present an approach based on the use of eigenvectors (eigenimage). This can be briefly described as a
method, which computes the eigenspace determined by processing the eigenvalues and eigenvectors of the image set (see
also [7], [8], [13], [16]). For our practical implementation in order to decrease the number of images, the image set is
obtained by varying pose while maintaining a constant level of illumination. The eigenspace is determined using the
eigenvalues (eigenvectors) of covariance matrix in order to obtain a low dimensional subspace. For an unknown input image,
the recognition algorithm projects this image to eigenspace and the object is recognised using the space partitioning method,
which determines the object and also its position in space according with the number of images that describe the position
from the image set [6].
The image set is normalised in brightness and the background noise removed in order to eliminate the redundant
information. The eigenspace for the image set is built by computing the biggest eigenvalues (eigenvectors) of the set. The
next step is to project all the images onto eigenspace and we obtain a set of points that characterise the object and the pose in
space. In order to increase the resolution for position estimation we connected these points in eigenspace and an unknown
position can be better approximated. The first step could be considered by analogy with neural networks as “learning” and the
recognition by space partitioning as “matching”.
In our approach we consider the objects from images are not occluded. Another important problem can be the texture
reflectance and this problem is directly connected to the illumination conditions.
In comparison with other approaches based on geometry recognition (see also [1], [4], [14]) this method is suitable for a
large number of objects with different shape geometry.
1
Ovidiu Ghita, Email: ghitao@eeng.dcu.ie, WWW: http://www.eeng.dcu.ie/~whelanp/vsg/vsgper.html
2
Paul F. Whelan, Email: whelanp@eeng.dcu.ie, WWW: http://www.eeng.dcu.ie/~whelanp/home.html
3. CALCULATING EIGENSPACE
An image I(x,y) is represented by a two-dimensional 256 by 256 array of 8-bit intensity values. This image can also be
considered as a vector of dimension 65536. This is obtained by reading pixel brightness values in a raster scan manner.
To reconstruct an image we map a collection of points (in our case eigenvalues) in this huge space. The main idea of the
principal component analysis (Karhunen-Loeve expansion) is to find the vectors that can better describe the image (or the
entire space).
These vectors describe an orthonormal space and this property is presented below:
1, if i
ui T
j ij j
(1)
u , 0
if i
j
where ui, uj are two eigenvectors and ij is the scalar product.
I= [i1,i2,i3,… ,iN] (2)
where i1, i2,… ,iN are the pixel values.
This vector represents an unprocessed image. The reflectance proprieties of the object shape can vary from scene to scene.
If the illumination conditions of the environment are maintained constant, the appearance is affected only by object position.
The image set for an object is obtained as:
[I1,I2,I3,… ,IP]T (3)
where P is the number of considered positions in space.
Figure 1. Image set obtained by rotating the object. All images are normalised for brightness.
Because we do not want our images to be affected by the variations in the intensity illumination or the aperture of the
imaging system (lens), we normalise each image so that the total energy contained in the image is unity [6]. The normalised
image is obtained by dividing each pixel by the following number:
1
'
i , B
N
in n
i
n1 n
2
(4)
B
Before computing the eigenspace we need to calculate the average image (A) of all images:
1
A ' (5)
I
P
i
P i 1
Figure 2. The average matrix for image set
The image set will be obtained by subtracting the average image (A) from each normalised image:
S [ I ' A, I ' A, I ' A,..., I '
T (6)
A]
1 2 3 P
The dimension of matrix S is P x N, where P is the number of positions (images) and N the number of pixels. The next
step involves the computing the covariance matrix:
CSTS (7)
This matrix is very large (65536 x 65536) and it will be extremely difficult to compute the eigenvalues and the
corresponding eigenvectors. If the number of images P is smaller than N it is easier to construct the P by P matrix Q=SST,
but in this case the dimension of space is maximum P. The eigenvalues and the corresponding eigenvectors are computed
solving the following well known equation:
Qui v (8)
i
ui
where ui is the ith eigenvector and vi is the corresponding eigenvalue.
The eigenspace is obtained by multiplying matrix of eigenvectors (eigenmatrix) with the matrix S:
E US (9)
where U=[u1,u2,… ,uP]T, U is P x P dimensional. As we expect the matrix E that represents the eigenspace is P x N
dimensional.
Figure 3. Eight of the eigenvectors corresponding to the largest eigenvalues calculated for the input image set
Using this approach the number of calculations is significantly reduced but in this case the number of eigenvectors is
relatively small (up to 36). For an exact description we need N eigenvectors, but for recognition purpose a smaller number is
sufficient to describe a low dimensional subspace.
4. OBJECT RECOGNITION AND POSITION ESTIMATION
Once the eigenspace is computed we can project all images from the set on this subspace ( E=[e1, e2,… ,ep]T). The result
will be a collection of points that describe the object and its position. Before projecting the image set onto eigenspace, we
subtract the average image from the image set.
hi [e1 , e2 (10)
,..., eP ]T ( I '
A) i
where e1, e2, … , ep are the eigenspace vectors and each vector is N dimensional.
As we expect each point is P dimensional and this appears to be a sever restriction in image reconstruction. But for the
purpose of recognition, this allows us a fairly accurate method under the condition of maintaining a constant illumination. If
these points are connected in P space to give us the possibility of estimating positions of the object that are not included in the
image set. A direct connection between position of the object and the projection on the eigenspace (or object space) is
obtained.
An unknown input image will be projected onto eigenspace and as result we will obtain a P dimensional point. This vector
can be used in a standard recognition algorithm and the simplest method to determine the best matching is to compute the
Euclidean distance [6], [15].
d 2 h h 2 (h h ) T (h h ) h T h h T h (11)
i i i i i i
where i=1,2,… ,P and h is the projection of the input image onto eigenspace.
We consider the recognition task to find the object and its position. The object is in the collection when the object gives us
the minimum distance and this distance is smaller than a threshold value.
di min h hi
(12)
If we have a large collection of objects the best approach is to compute the universal eigenspace which contains all images
and all positions. The first step is to recognise the object projecting the input image on this universal eigenspace and after this
we can estimate precisely the position by projecting the input image on the object eigenspace.
Certainly the universal eigenspace will be difficult to compute since the number of images for a large collection of objects
is huge. For this reason we will compute the largest P eigenvalues and the corresponding eigenvectors. But the advantage in
recognition terms is very important because we can determine the object from collection in only one step (see also [2], [5]).
The second step is to estimate the position of the object. Otherwise we have to search each object’s eigenspace one by one
and this method is far from efficient. We can use this approach in case we have a small number of objects or the computation
time is not very important for our application.
5. 3-D OBJECT RECOGNITION
We use this approach for our bin picking application. In order to recognise the object and its position in space is necessary
to have a depth perception of the scene, allowing the comprehension of 3-D relationship between objects in the real world.
For this purpose was developed a depth from defocusing technique, the depth estimation is calculated from the degree of
blurring in the two images (more details can be found in [11], [12]).
This transition from 2-D image to 3-D image is presented in figure 4.
(a) (b)
(c)
Figure 4. (a) 2-D image, (b) 3-D depth map, and (c) the spatial representation of the 3-D depth map
Certainly in this case it is necessary to construct the universal eigenspace and each objects eigenspace using images
corresponding to the depth map. The recognition task remains unchanged by projecting the input image on the eigenspace and
compute the Euclidean distance.
6. EXPERIMENTS AND RESULTS
Designing a practical system for object recognition require accuracy and speed. If the number of objects and corresponding
positions are small (less than 50), the best implementation is to construct only a universal eigenspace. Clearly this is an
approximate position estimation, but suitable for a range of applications. If the accuracy or the number of objects is large we
are required to construct the universal eigenspace and the object eigenspace for each object [6]. In our implementation we
used both methods. Another key problem is the dimension of the eigenspace. As we discussed before this number is limited
to P which is the number of positions for each object. This number can vary from object to object and is in direct connection
with the number of objects. For this present algorithm we have obtained poor results if the dimension of the eigenspace is less
than 5. We increase the eigenspace dimension to 30 step by step but the rate of recognition is not affected visibly after 15
when the recognition rate is near to 100%.
It is very important to maintain the illumination at a constant level. This requirement is very important especially for 3-D
recognition, as the depth map is sensitive to different levels of illumination. For 3-D detection we project a structured light
pattern to obtain a better representation of shape. Object reflectance and occlusion can cause errors in the recognition process
(more details concerning object reflectance can be found in [9]). If the level of light is not constant for each position, then we
have to acquire the same image with different degree of illumination. It is clear that in this case the number of images is huge
and to solve this problem is computationally intensive.
As a result we have obtained correct recognition for objects from an image set (universal image set) and position
estimation with an error of less than 6 degrees for an object set which contains 36 images. If we use a small number of
objects (less than 15) for an object set, the results concerning position estimation are unreliable.
7. CONCLUSIONS
We have presented a possible approach to object recognition for robotic vision applications. In comparison with other
methods this implementation is simple, fast and reliable even if it is not a unique solution for recognition problem (also see
[3], [4], [14]). It is very important to remember that many applications do not require unity rate of recognition for position
and only estimation with a certain error. Furthermore this approach can be easily implemented in hardware for a real time
application and the future research will be focused in this direction (for more details see [10], [15], [16]).
Our experiments show that the eigenspace technique can be used for object recognition and position estimation with a very
high degree of accuracy. Also a novel approach for 3-D object recognition was briefly presented. All these results prove the
robustness and recommend this method for robotic applications.
ACKNOWLEDGEMENTS
This research was supported by Motorola BV, Dublin, Ireland.
REFERENCES
1. P. Besl and R. Jain, “Three-dimensional object recognition”, ACM Computing Surveys, 17(1), pp. 75-145, 1985
2. T. Baily and A. K. Jain, “A note on distance-weighted k-nearest neighbour rules”, IEEE Trans. Syst. Man and Cybern.,
vol. 8, no. 4, pp. 311-313, 1978
3. K. Ikeuchi, “Generating an interpretation tree from a CAD model for 3-D object recognition in bin-picking tasks”, Inter. J.
Comp. Vision, vol.1, no.2, pp. 145-165, 1987
4. D. Kriegman and J. Ponce, ”On recognizing and positioning curved 3-D objects from image contours ”, IEEE Trans. on
Pattern Anal. and Machine Intell., PAMI 12, no. 12, pp 1127-1137,1990
5. J. Macleod, A. Luc and D. Titterington, “A re-examination of the distance-weighted k-nearest neighbour classification
rule”, IEEE Trans. Syst. Man and Cybern., vol. 17, no. 4, pp. 689-696, 1987
6. H. Murase and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance”, International Journal of
Computer Vision,14, pp. 5-24, 1995
7. B. Moghaddam and A. Pentland, “Face recognition using view-based and modular eigenspaces”, Automatic Systems for
the Identification and Inspection of Humans”, SPIE, vol.2277,1994
8. H. Murakami and V. Kumar, “Efficient calculation of primary images from a set of images”, IEEE Trans. on Pattern Anal.
and Machine Intell., PAMI 4, no. 5, pp. 511-515, 1982
9. S. K. Nayar and R. Bolle, “Reflectance based object recognition”, Inter. J. Comp. Vision vol. 17, no.3, pp.219-240,1996
10. S. A. Nene, S. K. Nayar, H. Murase, “SLAM: A software Library for Appearance Matching”, Proc. of ARPA Image
Understanding Workshop, Monterey, Nov. 1994
11. S. K. Nayar, M. Watanabe and M. Noguchi, “Real-time focus range sensor”, Columbia University, Tech. Rep. CUCS-028-
94, 1994
12. A. P. Pentland, “A new sense for depth of field”, IEEE Trans. on Pattern Anal. and Machine Intell., PAMI-9, no.4, pp.
523-531, 1987
13. L. Sirovich and M. Kirby, “Low-dimensional procedure for the caracterization of human faces”, J. Opt. Soc. Amer., vol. 4,
no.3, pp. 519-524, 1987
14. F. Stein and G. Medioni, “Structural indexing: efficient 3-D object recognition”, IEEE Trans. on Pattern Anal. and
Machine Intell., PAMI 14, no. 2, pp. 125-145,1992
15. M. Turk and A. Pentland, “Eigenfaces for recognition”, Journal of Cognitive Neuroscience, vol. 3, pp. 71-86, 1991
16. M. Turk and A. Pentland, “Face recognition using eigenfaces”, Proc. of IEEE Conference on Computer Vision and Pattern
Recognition, pp.586-591, June 1991