1066
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 22,
NO. 10,
OCTOBER 2000
Geometric Camera Calibration
Using Circular Control Points
Janne Heikkila
AbstractModern CCD cameras are usually capable of a spatial accuracy greater than 1/50 of the pixel size. However, such
accuracy is not easily attained due to various error sources that can affect the image formation process. Current calibration methods
typically assume that the observations are unbiased, the only error is the zero-mean independent and identically distributed random
noise in the observed image coordinates, and the camera model completely explains the mapping between the 3D coordinates and the
image coordinates. In general, these conditions are not met, causing the calibration results to be less accurate than expected. In this
paper, a calibration procedure for precise 3D computer vision applications is described. It introduces bias correction for circular control
points and a nonrecursive method for reversing the distortion model. The accuracy analysis is presented and the error sources that can
reduce the theoretical accuracy are discussed. The tests with synthetic images indicate improvements in the calibration results in
limited error conditions. In real images, the suppression of external error sources becomes a prerequisite for successful calibration.
Index TermsCamera model, lens distortion, reverse distortion model, calibration procedure, bias correction, calibration accuracy.
INTRODUCTION
3D machine vision, it necessary to know the relationship between the 3D object coordinates and the image
coordinates. This transformation is determined in geometric
camera calibration by solving the unknown parameters of
the camera model. Initially, camera calibration techniques
were developed in the field of photogrammetry for aerial
imaging and surveying. First, photographic cameras were
used, but recently video cameras have replaced them
almost completely. Also new application areas, like robot
vision and industrial metrology, have appeared, where
camera calibration plays an important role.
Depending on the application, there are different
requirements for camera calibration. In some applications,
such as robot guidance, the calibration procedure should be
fast and automatic, but in metrology applications, the
precision is typically a more important factor. The traditional camera calibration procedures, such as bundle
adjustment [1], are computationally greedy full-scale
optimization approaches. Therefore, most of the calibration
methods suggested during the last few decades in computer
vision literature are mainly designed for speeding up the
process by simplifying or linearizing the optimization
problem. The well-known calibration method developed
by Roger Tsai [2] belongs to this category. Other techniques
also based on the linear transformation, for example [3], [4],
[5], [6], are fast but quite inaccurate. The simplifications
made reduce the precision of the parameter estimates and,
as a consequence, they are not suitable for applications in
3D metrology as such.
N
. The author is with the Machine Vision and Media Processing Unit,
Infotech Oulu and Department of Electrical Engineering, FIN-90014
University of Oulu, Oulu, Finland. E-mail: jth@ee.oulu.fi.
Manuscript received 4 Dec. 1998; revised 4 Oct. 1999; accepted 31 May 2000.
Recommended for acceptance by K. Bowyer.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number 108389.
Due to the increased processing power of standard
workstations, the nonlinear nature of the estimation
problem is not as restricting as it was a few years ago.
The calibration procedure can be accomplished in a couple
of seconds iteratively. This gives us a good reason for
improving the accuracy of the calibration methods without
introducing a lot of extra time for computation. An accuracy
of 1/50 of the pixel size (around 1/50,000 of the image size)
is a realistic goal that can be achieved in low noise
conditions with proper subpixel feature extraction techniques. The main improvements in the new calibration
procedure presented in the following sections are the
camera model, which allows accurate mapping in both
directions, and the elimination of the bias in the coordinates
of the circular control points.
In Section 2, we begin by describing the camera model
for projection and back-projection. In Section 3, the
projective geometry of the circular control points is
reviewed and the necessary equations for mapping the
circles into the image plane are presented. Section 4
describes a three-step calibration procedure for circular
control points. Experimental results with the calibration
procedure are reported in Section 5 and the effects of some
typical error sources are discussed in Section 6. Finally,
Section 7 offers concluding remarks.
CAMERA MODEL
In camera calibration, the transformation from 3D world
coordinates to 2D image coordinates is determined by
solving the unknown parameters of the camera model.
Depending on the accuracy requirements, the model is
typically based on either orthographic or perspective
projection. Orthographic transformation is the roughest
approximation assuming the objects in 3D space to be
orthogonally projected on the image plane. It is more
suitable for vision applications where the requirements of
0162-8828/00/$10.00 2000 IEEE
HEIKKILA: GEOMETRIC CAMERA CALIBRATION USING CIRCULAR CONTROL POINTS
1067
Fig. 1. Pinhole camera model.
the geometric accuracy are somewhat low. Due to linearity,
it provides a simpler and computationally less expensive
solution than perspective projection which is a nonlinear
form of mapping. However, for 3D motion estimation and
reconstruction problems, perspective projection gives an
idealized mathematical framework, which is actually quite
accurate for high quality camera systems. For off-the-shelf
systems, the perspective projection model is often augmented with a lens distortion model.
Let us first consider a pure perspective projection (i.e.,
pinhole) camera model illustrated in Fig. 1. The center of
projection is at the origin O of the camera frame C. The
image plane is parallel to the xy plane and it is displaced
with a distance f (focal length) from O along the z axis. The
z axis is also called the optical axis, or the principal axis, and
the intersection of and the optical axis is called the
principal point o. The u and v axes of the 2D image
coordinate frame I are parallel to the x and y axes,
respectively. The coordinates of the principal point in I
are u0 ; v0 T :
Let P be an arbitrary 3D point located on the positive
side of the z axis and p its projection on . The coordinates
of P in the camera frame C are x; y; zT and in the world
frame W the coordinates are X; Y ; ZT . The coordinates of p
in I are u; vT and they can be solved from the homogeneous coordinates given by the transformation
2 3
2 3
2 3 2 3
X
X
u
u
6 7
6 7
4 v 5 / 4 v 5 F6 Y 7 PM6 Y 7;
1
4Z5
4Z5
1
1
1
where F is the perspective transformation matrix (PTM),
2
3
sf 0 u0 0
P 4 0 f v0 0 5 ;
2
0 0 1 0
is a scale factor, s is the aspect ratio, and M is a 4 by 4
matrix describing the mapping from W to C. It is
decomposed as follows:
R t
M
;
3
0 1
where t tx ; ty ; tz T describes the translation between the
two frames, and R is a 3 by 3 orthonormal rotation matrix
which can be defined by the three Euler angles !, ', and .
If R is known, these angles can be computed using, for
example, the following decomposition [6]:
' sin
r31
r32
r33
;
! atan 2
cos ' cos '
r21
r11
;
;
atan 2
cos ' cos '
where rij is the entry from the ith row and the jth column of
the matrix R and atan2y; x) is the two-argument inverse
tangent function giving the angle in the range ; .
Because sin ' sin ', the Euler angles do not represent the rotation matrix uniquely. Hence, there are two
equivalent decompositions for the matrix R. As we can see,
(4) has singularity if r31 1, i.e., ' =2 or ' 3=2. In
those cases, we can choose 0, and ! atan2r12 ; r22 , or
vice versa [6]. In most situations, we could also prevent the
singularities by carefully planning the calibration setup.
The parameters tx ; ty ; tz ; !; '; and are called extrinsic
parameters or exterior projective parameters and the parameters
s; f; u0 and v0 are the intrinsic parameters or interior projective
parameters of the pinhole camera model.
It is usually more convenient to express the image
coordinates in pixels. Therefore, the coordinates obtained
from (1) are multiplied by factors Du and Dv that specify the
relationship between pixels and the physical object units,
1068
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
for example, millimeters. However, knowing the precise
values of these conversion factors is not necessary because
they are linearly dependent on the parameters s and f that
are adjusted during calibration.
In real cameras, perspective projection is typically not
sufficient for modeling the mapping precisely. Ideally, the
light rays coming from the scene should pass through the
optical center linearly, but in practice, lens systems are
composed of several optical elements introducing nonlinear
distortion to the optical paths and the resulting images. The
camera model of (1) produces the ideal image coordinates
u; vT of the projected point p. In order to separate these
errorless but unobservable coordinates from their observable distorted counterparts, we will henceforth denote the
correct or corrected coordinates of (1) by ac uc ; vc T and
the distorted coordinates by ad ud ; vd T .
Several methods for correcting the lens distortion have
been developed. The most commonly used approach is to
decompose the distortion into radial and decentering
components [7]. Knowing the distorted image coordinates
ad , the corrected coordinates ac are approximated by
ac ad F D ad ; ;
where
F D ad ;
2
3
ud k1 r2d k2 r4d k3 r6d . . .
6 2p u v p r2 2
u2d 1 p3 r2d 7
1 d d
2 d
6
7
6
7;
4 vd k1 r2d k2 r4d k3 r6d . . .
5
v2d 2p2 ud vd 1 p3 r2d
p1 r2d 2
6
ud ud
u0 ; vd vd
v0 ; rd
q
u2d v2d ;
and k1 ; k2 ; . . . ; p1 ; p2 ; . . .T . The parameters k1 ; k2 ; . . . are
the coefficients for the radial distortion that causes the
actual image point to be displaced radially in the image
plane, and the parameters p1 ; p2 ; . . . are the coefficients for
the decentering or tangential distortion which may occur
when the centers of the curvature of the lens surfaces in the
lens system are not strictly collinear.
Other distortion models have also been proposed in the
literature. For example, Melen [6] used a model where P in
(2) was augmented with terms for linear distortion. This is
useful if the image axes are not orthogonal, but in most
cases the CCD arrays are almost perfect rectangles, and
hence, the linear distortion component is insignificant (less
than 0.01 pixels [8]). In the calibration method proposed by
Faugeras and Toscani [5], the geometric distortion was
corrected using bilinear transformation in small image
regions.
For camera calibration, it is favorable to find a
transformation from the 3D world coordinates to the real
image coordinates. This enables us to use a least-squares
technique as an optimal estimator of the camera parameters. Directly applying (1) and (5) implies that we first
need to correct the distortion and then estimate the camera
parameters of (1). An obvious problem is that the distortion
coefficients are not usually known in advance and due to
VOL. 22,
NO. 10,
OCTOBER 2000
strong coupling, they cannot be reliably estimated without
knowing the other camera parameters.
In the literature, there are several solutions to overcome
this problem. Tsai [2] and Lenz and Tsai [9] decomposed
the camera model into linear and nonlinear parts where the
parameters are decoupled. However, only radial distortion
can be used and the solution is not optimal. Weng et al. [10]
suggested an iterative scheme where the parameters of the
distortion model and the projection model are fixed, in turn,
and estimated separately. A commonly used approach in
photogrammetry [7] is to perform a full-scale optimization
for all parameters by minimizing the sum of squared errors
between the corrected image coordinates and the synthetic
coordinates given by the camera model. In practice, this
means that (6) is evaluated with noisy observations, which
may deteriorate the calibration result. In order to minimize
the error between the observed and model coordinates, the
distorted image coordinates should be expressed in terms of
their undistorted counterparts. For this, we need an inverse
distortion model. As can be easily noticed, there is no
analytic solution for the inverse problem, and thus, we need
to approximate it. Melen [6], for example, used the
following model:
ad ac
F D ac ; :
The fitting results given by this model are often satisfactory,
because the distortion coefficients are typically small values
causing the model to be almost linear. It should be noticed
that the optimal distortion coefficients in a least squares
sense are different for (5) and (7).
Another solution is to create the following recursion
based on (5):
ad ac
ac
F D ad ; ac
F D ac
F D ac
F D ac
F D ad ; ;
F D ad ; ; ; :
The error introduced when substituting ad with ac on the
right-hand side gets smaller for each iteration. In practice, at
least three or four iterations are required to compensate for
strong lens distortions. This means that the distortion
function F D is evaluated several times in different locations
of the image, which makes this technique less attractive.
In order to avoid extensive computation, we can take a
first order Taylor series approximation of F D about ac :
ac ad F D ac ; Dac ad
where
Dac
ac ;
@
@
F D a; F D a;
@u
@v
aac
10
Solving ad from (9) yields
ad ac
I Dac 1 F D ac ; :
11
The elements of Dac are small (<< 1) which makes it
possible to use the following approximation:
ad ac
1
F D ac ; ac
d11 ac d22 ac 1
F D ac ; ;
12
HEIKKILA: GEOMETRIC CAMERA CALIBRATION USING CIRCULAR CONTROL POINTS
where d11 ac and d22 ac are the upper left and lower right
elements of Dac , respectively. If only two radial distortion
coefficients k1 and k2 , and two decentering distortion
coefficients p1 and p2 are used, the approximation of the
inverse distortion model becomes
F D ac ;
1
F D ac ; ;
4k1 r2c 6k2 r4c 8p1 vc 8p2 uc 1
where A, B, C, D, E, and F are coefficients that define the
shape and location of the curve. In homogeneous coordinates, this curve can be written as
T
x
xH
Q H 0;
16
1
1
where
where is a scale factor, xH XH ; YH T are the back
projected coordinates in H and
h1 h2 h0
H
:
0
0
1
Due to the approximations made, F D is not the exact
inverse function of F D . Therefore, it may be necessary to
use slightly different parameters 0 in (6) than in (13). A
method for estimating these parameters will be presented in
Section 4.
CIRCULAR CONTROL POINTS
Lines in the object space are mapped as lines on the image
plane, but in general perspective projection is not a shape
preserving transformation. Two- and three-dimensional
shapes are deformed if they are not coplanar with the
image plane. This is also true for circular landmarks, which
are commonly used control points in calibration. However,
a bias between the observations and the model is induced if
the centers of their projections in the image are treated as
projections of the circle centers. In this section, we will
review the necessary equations for using the centers of the
projections as image observations without introducing any
bias.
Let us assume that a circular control point R with radius
r is located on the plane 0 so that its center is at the origin
of the planar coordinate frame H. Circles are quadratic
curves that can be expressed in the following manner:
2
AXH
2BXH YH CYH2 2DXH 2EYH F 0;
15
3
D
E 5:
F
A B
Q 4B C
D E
13
p
where rc u2c v2c . If necessary, extending this model
with higher order terms is straightforward.
By replacing u uc and v vc in (1) and combining it
with (12), we obtain a forward camera model which converts
the 3D world coordinates to distorted image coordinates.
Using a backward camera model, we can transform the
distorted camera coordinates to lines of sight in the
3D world coordinate system, or to the intersections of these
lines with a known 2D plane. Let us assume that we have a
2D plane 0 with a coordinate system H, whose origin is at
h0 X0 ; Y0 ; Z0 T , and it is spanned by the 3D vectors h1
X1 ; Y1 ; Z1 T and h2 X2 ; Y2 ; Z2 T . The transformation from
the corrected image coordinates ac uc ; vc T produced by
(5) on the plane 0 can be expressed as
a
x
14
H FH 1 c ;
1
1
1069
For the circle R,
1=r2
4
Q
0
0
0
1=r2
0
3
0
0 5:
1
Using (14), R can be projected on the image plane. The
resulting curve becomes
T
a
ac
S c 0;
17
1
1
where
S FH 1 T QFH 1 :
18
We can see that the result is a quadratic curve, whose
geometric interpretation is a circle, hyperbola, parabola, or
ellipse. In practice, due to the finite rectangle corresponding
to the image plane, the projection will be an ellipse, or in
some special cases, a circle. As shown in [11], the center of
the ellipse ec ue ; ve T can be obtained from
2 3
0
e
c S 1 4 0 5:
19
1
1
By combining (18) and (19), we get
e
c FGf 3 ;
1
20
where G HQ 1 HT , and f 3 is a vector consisting of the last
row of F. The matrix G specifies the geometry, position, and
orientation of the control point, and the perspective
transformation matrix F specifies the mapping from the
world coordinates to undistorted image coordinates that
can be further converted to distorted image coordinates ed
using (12). As a result, we get
ed ec
F D ec ; :
21
It should be noticed that using (20) instead of (1), we
obtain an unbiased relationship between the observed
centers of the ellipses and the camera model.
CALIBRATION PROCEDURE
The calibration procedure presented next is mainly
intended to be used with circular landmarks. However, it
is also suitable for small points without any specific
geometry. In that case, the radius is set to zero. It is
assumed that the camera model includes eight intrinsic
1070
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
parameters int= [s, f, u0, v0, k1, k2, p1, p2]T and six extrinsic
parameters ext = [tx, ty, tz, !, ', ]T. If necessary, adding
higher order terms to the model is straightforward. In
multiframe calibration, each frame has separate extrinsic
parameters, but common intrinsic parameters. The radius
of control points r and the conversion factors Du and Dv are
assumed to be known in advance, as well as the 3D
coordinates, and the observed image coordinates of the
control points.
Due to the nonlinear mapping of the camera model, there
are no direct estimators of all the camera parameters that
would produce an optimal solution in a least squares sense.
We therefore use an iterative searching technique to
estimate the vector
T
T
T
T
T
int ; ext 1; ext 2; ; ext K ;
22
where K is the number of the frames. Before using the
estimator, we need to have initial values of the camera
parameters for guaranteeing that a global minimum can be
achieved. This problem is solved by ignoring the lens
distortion and using a linear estimator for the rest of the
parameters.
Step 1: Initialization. Many variations of the linear
calibration technique have been proposed in the literature, for example, [3], [4], [5], [6]. However, they are all
based on the same idea where the perspective transformation matrix F is first estimated and then decomposed
into intrinsic and extrinsic camera parameters. These
methods provide a closed-form solution to the camera
parameters. The major shortcoming is that they are not
optimal estimators, because they do not produce minimum variance estimates. Another problem is that they
ignore the effects of the radial and decentering lens
distortion. Despite these shortcomings, linear calibration
techniques can provide a good starting point for iterative
search.
Several well-known procedures for estimating F are
available, e.g., [5]. The main difference between the
linear calibration methods is that how F is decomposed
into camera parameters. For example, Faugeras and
Toscani [5] extracted five intrinsic and six extrinsic
parameters using a set of linear equations and Melen [6]
used QR decomposition for estimating six intrinsic and
six extrinsic parameters. In principle, these techniques
can be directly applied to obtain the initial estimate for
the iterative optimization step. However, due to the
shortcomings listed above, these estimators tend to be
rather sensitive to observation noise. Even small inaccuracy in F can cause significant errors especially in the
intrinsic parameters. As a consequence, optimization
may fail.
In the initialization step, where the aim is to produce
the first, not the final, estimates of the camera parameters, it is often more reliable to use the nominal values
for the focal length, the aspect ratio and the image center
as the intrinsic parameters. With these values, we can
directly write the projection matrix P using (2). Let us
assume that the control points are not coplanar, which
means that we have a full 3 by 4 estimate of the
VOL. 22,
NO. 10,
OCTOBER 2000
^ Thus,
perspective transformation matrix F denoted by F.
we can write the following equation:
^ PM
^ P13 R
^ P13^
23
F
t ;
where P13 is a matrix containing the first three columns
^ and ^
of P, R,
t are the estimates of R and t, respectively.
Now, it is straightforward to get
^ P 1F
^
R
13 13
24
^ 4;
^
t P131 F
25
and
^ 13 is a matrix containing the first three columns,
where F
^
^
and F4 the last column of F.
^
The matrix R does not satisfy the orthonormality
constraint of a standard rotation matrix, but we can
normalize and orthogonalize it using the singular value
decomposition (SVD):
^ UVT :
R
26
^ is given by
The orthonormal version of R
^ 0 U0 VT ;
R
27
where 0 is a diagonal matrix with diagonal elements 1,
1, and |UVT| in descending order [11]. The Euler angles
^ 0 using (4).
can now be extracted from R
If all the control points are coplanar, only a submatrix
of F can be determined. Basically, the procedure is the
same as above, with the exception that only two columns
^ 0 are estimated. The third column is easily obtained
of R
by utilizing the orthogonality of the basis vectors. If
multiframe calibration is used, the initialization step is
performed separately for every frame. The initial parameters are collected into a vector
^0
T
T
T
1; f0 ; Nu =2; Nv =2; 0; 0; 0; 0; T
ext 1; ext 2; ; ext K ;
28
where f0 is the nominal focal length, Nu and Nv are the
horizontal and vertical dimensions of the image.
Step 2: Iterative search. In this step, the parameters of the
forward camera model are estimated by minimizing the
weighted sum of squared differences between the
observations and the model. Let us assume, that there
are N circular control points and K images that are
indexed by n 1; . . . ; N and k 1; . . . ; K. A vector
containing the observed image coordinates of the center
of the ellipse n in the frame k is denoted by eo(n, k), and
the corresponding vector produced by the forward
camera model of (20) and (21) is denoted by ed(n, k).
Now, the objective function can be expressed as
J yT Ce 1 y;
where
29
HEIKKILA: GEOMETRIC CAMERA CALIBRATION USING CIRCULAR CONTROL POINTS
y
h
T
eo 1; 1 ed 1; 1 ; eo 2; 1
T iT
eo N; K ed N; K
get more consistent results in both directions, we
estimate the parameter vector of (6) separately. The
other intrinsic parameters are the same for both forward
and backward models.
If the Steps 1 and 2 have been completed, the
parameters of the forward model are already available.
Using this model, we can produce a set of distorted
points {a d (i)} for arbitrary points {a c (i)}, where
i 1; . . . ; M. The points {ac(i)} are generated so that they
form an equally spaced 2D grid of about 1,000 to
2,000 points, for example, 40 by 40. Points must cover
the entire image area and the outermost points should be
approximately 5-10 percent outside the effective image
region in order to get accurate mapping also close to the
image borders. If the inversion were exact, these points
would fulfill (5) expressed in the following matrix form:
2
3 2
3
B1
ac 1 ad 1
6 ac 2 ad 2 7 6 B2 7
6
7 6
7
33
6
7 6 .. 7 ;
..
4
5
4
. 5
.
BM
ac M ad M
T
ed 2; 1 ; ;
and Ce is the covariance matrix of the observation error.
There are various sources that can affect Ce . Some of
them are discussed later, in Section 6. If the covariance
matrix is unknown or the measurement noise for
individual observations is statistically independent and
identically distributed, Ce 1 in (29) can be omitted. In
general, the assumption of the statistically independent
and identically distributed noise is not fulfilled, because
the error variance depends on the size of the ellipse in
the image, and the covariance between the horizontal
and vertical coordinates is nonzero if the ellipse is
inclined. As a consequence, the measurement noise
covariance matrix Ce can be expressed as
2
3
0
0
0
Ce 1; 1
6
7
0
0
Ce 2; 1
6
7
30
Ce 6
7;
..
..
..
4
5
.
...
.
.
0
0
. . . Ce N; K
where
Ce n; k
n
E eo n; k
E feo n; kg eo n; k
T o
E feo n; kg
:
where
Bi
"
#
ud i
vd i r2d i 2
u2d i . . .
ud ir2d i ud ir4d i . . . 2
v2d i 2
ud i
vd i . . .
vd ir2d i vd ir4d i . . . r2d i 2
ud i ud i
If Ce (n, k) is not known, it can be estimated. In practice,
this means that we need to take several images from the
same position and orientation, and then calculate the
sample covariance in the following manner:
1
W
ih
eo n; k eo n; k; i
rd i
31
where W is the number of the images used for estimating
Ce(n, k), eo(n, k, i) is the observed image coordinates
(ellipse n, pose k, frame i), and eo n; k is the vector of the
average coordinates. An obvious problem of this
approach is that a large number of images needs to be
taken for single calibration.
The parameters of the forward camera model are
obtained by minimizing J():
Step 3: Backward camera model. In many applications, the
observed image coordinates need to be projected back to
3D coordinates. This procedure was already described in
Section 2. However, the reverse distortion model of (13),
used in Step 2, is not exactly the inverse of (6). In order to
q
u2d i v2d i:
BM
ac M
ad M
where ^ is the vector of the distortion parameters to be
used in the backward camera model, and [ . ]+ denotes
the pseudoinverse of the matrix [13].
32
There are several numerical techniques, such as the
Levenberg-Marquardt method [12], that can be applied
for the optimization problem of (32). With an efficient
technique, only a few iterations are typically needed to
attain the minimum.
v0 ;
Due to the approximations made in derivation of the
inverse distortion model, (33) does not hold exactly.
Therefore, we need to adjust the parameter vector for
back projection. This can be performed using the
following least-squares formulation:
2
3 2
3
B1
ac 1 ad 1
6 B2 7 6 ac 2 ad 2 7
0
6
7 6
7
34
^ 6 . 7 6
7;
..
4 .. 5 4
5
.
iT
eo n; k ;
i1
^ arg min J:
u0 ; vd i vd i
and
^ e n; k
C
W h
X
eo n; k; i
1071
EXPERIMENTS
The camera calibration procedure suggested in Section 4
was tested in two experiments. First, the parameter
estimates of the forward camera model were analyzed
statistically using synthetic images and the results were
compared with the outcome of a corresponding real image.
Second, the precision of the inverse distortion model of (13)
was evaluated by correcting and distorting random image
coordinates.
1072
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 22,
NO. 10,
OCTOBER 2000
Fig. 2. Calibration images: (a) real and (b) synthetic.
5.1 Calibrating the Forward Camera Model
The tests were performed with 200 synthetic images and
one real image that was used as a reference model. This real
image shown in Fig. 2a was captured using an off-the-shelf
monochrome CCD camera (Sony SSC-M370CE) equipped
with a 6.3 mm by 4.7 mm image sensor and an 8.5 mm
Cosmicar TV lens. The image was digitized from an analog
CCIR type video signal using a Sunvideo frame grabber.
The size of the image is 768 by 576 pixels, and the maximum
dynamics is 256 gray levels. In the image, there is a
TABLE 1
Estimated Bias with Confidence Limits
HEIKKILA: GEOMETRIC CAMERA CALIBRATION USING CIRCULAR CONTROL POINTS
1073
TABLE 2
Estimated Standard Deviation with Confidence Limits
calibration object which has two perpendicular planes, each
with 256 circular control points. The centers of these points
were located using the moment and curvature preserving
ellipse detection technique [14] and renormalization conic
fitting [15]. The calibration procedure was first applied to
estimate the camera parameters based on the real image.
The synthetic images were then produced using ray tracing
with the camera parameters obtained from calibration and
the known 3D model of the control points. In order to make
the synthetic images better correspond to the real images,
their intensity values were perturbed with additive Gaussian noise ( = 2), and blurred using a 3 by 3 Gaussian filter
( = 1 pixel). A sample image is shown in Fig. 2b.
With the synthetic images, we restrict the error sources
only to quantization and random image noise. Ignoring all
the other error sources is slightly unrealistic, but in order to
achieve extremely accurate calibration results, these error
sources should be eliminated or at least minimized
somehow. The advantage of using simulated images
instead of real images is that we can estimate some
statistical properties of the procedure.
The control points from the synthetic images were
extracted using the same techniques as with the real image,
i.e., subpixel edge detection and ellipse fitting. Three
different calibration methods were applied separately for
all point sets. The first method (method A) is the traditional
camera calibration approach which does not assume any
geometry for the control points. In the second method
(method B), circular geometry is utilized and all the
observations are equally weighted. In the third method
(method C), each observation is weighted by the inverse of
the observation error covariance matrix Ce that was
estimated using (31) and all 200 images. Table 1 shows
the estimated bias (average error) of the intrinsic
parameters for all three methods. Confidence intervals with
a 95 percent confidence level are also presented. As we can
notice, all three methods produced slightly biased estimates. However, by regarding the circular geometry of the
control points we can reduce the bias significantly. The
remaining error is caused by the subpixel ellipse detector
which is not completely unbiased. However, the effect of
1074
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 22,
NO. 10,
OCTOBER 2000
the remaining error is so small that it does not have any
practical significance.
The estimated standard deviation of the intrinsic parameters are given in Table 2. The theoretical accuracy that can
be attained is approximated using the following equation:
n
o h
i 1
^
^ T LT C
^ e 1 L
^
;
35
C E
^ is the Jacobian matrix of y evaluated in .
^ The
where L
same Ce is used as in the calibration method C. Now, we
can see that the results provided by the methods A and B do
not differ significantly in standard deviation, but in method
C, where individual weighting values were used, the
precision is much better. For all methods, the theoretical
value is still quite far. The main reason for this is that the
covariance matrix Ce was not known precisely in advance,
and it had to be estimated first. As a result, the camera
parameters are not optimal, and on the other hand, Ce also
became underestimated in this case, causing the theoretical
values to be clearly smaller than their true values.
Computationally, the new procedure does not increase
the number of the iterations required. For methods A and B,
11 iterations were performed on average. For method C,
around 20 iterations were needed in order to achieved
convergence. For methods B and C, where the circular
geometry is regarded, the average fitting residual for the
synthetic images was around 0.01 pixels, which is better
than the original requirement of 1/50 pixels. For method A,
the error is over 0.02 pixels, which exceeds the limit.
In reality, the error sources that are not considered in the
previous simulations can increase the residual significantly.
For example, using the real image shown in Fig. 2a the
average residual error is 0.048 pixels (horizontal) and
0.038 pixels (vertical), which is a sign of the small error
components originating from the camera electronics, illumination, optical system, and calibration target. Obviously,
the accuracy of the calibration result for real images is
strongly coupled with these error sources.
5.2 Calibrating the Reverse Distortion Model
In this experiment, the accuracy of the reverse distortion
model was tested with a wide range of the distortion
parameters. The third step of the calibration procedure was
0
^ Next,
applied to solve the parameter vector ^ for each .
10,000 random points were generated inside the image
region. These simulated points were then corrected using
(5) and distorted again using (12). The difference between
the resulting coordinates and the original coordinates was
examined with respect to the different values of the
distortion parameters. The solid curve in Fig. 3a represents
the root mean square error in the image coordinates as a
function of the first radial distortion parameter k1. The
range of the parameter should cover most of the commonly
used off-the-shelf lenses. It can be seen that the error
remains below 0.005 pixels for almost all parameter values.
As a reference, results given by two other techniques have
also been presented. Method 2 is based on the idea of using
the distortion model in a reverse direction, as given by (7),
with the exception of estimating the parameter values
separately for both directions in order to improve the result.
Method 3 is a recursive approach and it is based on (8) with
Fig. 3. The error of three different inverse distortion models (a) as a
function of k1, (b) as a function of k2, and (c) the maximum distortion in
the image coordinates for k1 and k2.
HEIKKILA: GEOMETRIC CAMERA CALIBRATION USING CIRCULAR CONTROL POINTS
1075
Fig. 4. (a) Calibration object under two different light sources: a fluorescent lamp on the left and a halogen lamp on the right. (b) and (c) Difference
(ahalogen - afluorescent) between detected point locations.
three iterations. These two methods clearly result in a
significantly larger error than method 1.
In Fig. 3b, the same test was performed with the second
radial distortion coefficient k2. Now, method 3 seems to
outperform method 1, but in both cases the error remains
small. Method 2 is clearly the most inaccurate, causing
errors that can be significant in precision applications.
Fig. 3c shows the maximum radial distortion for both of the
two parameters. The first parameter is typically dominating
and the effect of the second parameter is much smaller. For
some lenses, using the third radial distortion coefficient can
be useful, but still its effect remains clearly below the other
two parameters. The same test procedure was also
performed with decentering distortion, but the error after
correcting and distorting the points was noticed to be
insignificantly small for all three methods.
OTHER CALIBRATION ERRORS
The synthetic images used in the previous section were only
subject to quantization noise and random Gaussian noise.
As noticed with the real image, there are also other sources
that may deteriorate the performance of the calibration
procedure. In this section, we will briefly review some of
the more significant error sources.
Insufficient projection model. Although the distortion
model of (6) is derived using exact ray tracing, there
can still exist distortion components, especially in wide
angle optics that are not compensated for by this model.
The camera model used assumes that the principal point
coincides with the center of distortion. As stated in [16],
this assumption is not eligible for real lens systems.
Furthermore, in the pinhole camera model, the rays
coming from different directions and distances go
through a single point, i.e., the projection center. This is
a good approximation for most of the applications, but
unfortunately real optics do not behave exactly like that.
There is usually some nonlinear dependence between the
focal length, imaging distance and the size of the
aperture. As a consequence, for apertures larger than a
pinhole the focal length is slightly changing, especially in
short distances.
Illumination changes. Changes in the irradiance and the
wavelength perceived by the camera are typically
neglected in geometric camera calibration. However,
1076
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
variation in lighting conditions can also have a substantial effect on the calibration results. In Fig. 4a, there
are two images captured from exactly the same position
and angle using the same camera settings, and only the
overall lighting has changed from fluorescent to halogen
light. Fig. 4b shows the difference between the detected
control point locations in the image plane. The difference
is smallest near the image center and increases radially
when approaching the image borders. From Fig. 4c, we
notice that the difference is almost linear as a function of
the horizontal and vertical coordinates. The reason for
this is chromatic aberration. According to Smith and
Thomson [17], chromatic aberration is a measurement of
the spread of an image point over a range of colors. It
may be represented either as a longitudinal movement of
an image plane, or as a change in magnification, but
basically it is a result of the dependence of the power of a
refracting surface on wavelength.
Camera electronics. Another type of systematic error is
distinguished from Fig. 4c, which is the horizontal shift
in the observed point locations. This shift is caused by a
small change in the overall intensity of lighting. The
problem originates from the phase locked loop (PLL)
operation, which is affected by the changes in the
intensity level of the incoming light. According to Beyer
[18], detection of sync pulses is level sensitive and
influenced by any changes on the signal level. This
deviation is called line jitter, and it can be systematic as
in Fig. 4c, or random, which is detected as increased
observation noise in the horizontal image coordinates of
the control points.
Calibration target. In the calibration procedure presented, it
is assumed that the coordinates of the 3D control points
are known with such precision that their errors are not
observable from the images. If this assumption is not
met, the estimates of the camera parameters become
inaccurate. The relative accuracy in the object space
should be better that the accuracy aspired to in the image
space. For example, if the goal is to achieve an accuracy
of 1/50 of the pixel or around 1/50,000 of the image size,
and if the dimensions of the calibration target are 100 mm
by 100 mm in lateral directions, the 3D coordinates of the
circular control points should be known to a resolution
better than 2m. Especially systematic errors that exceed
the limit can cause significant bias to the parameter
estimates.
The effects of the error sources listed above are mixed in
the observations and it is often very difficult to recognize
individual sources. However, the mixed effect of those
sources that are not compensated for by the camera model
can be observed from the fitting residual as increased
random noise and as a systematic fluctuation. In [7], the
systematic component of the remaining error is called
anomalous distortion. In precise camera calibration, this
component should be so small that its influence on the
parameter estimates becomes insignificant.
VOL. 22,
NO. 10,
OCTOBER 2000
CONCLUSIONS
Geometric camera calibration is needed to describe the
mapping between 3D world coordinates and 2D image
coordinates. An optimal calibration technique should
produce unbiased and minimum variance estimates of the
camera parameters. In practice, this is quite difficult to
achieve due to different error sources affecting the imaging
process. The calibration procedure suggested in this paper
utilizes circular control points and performs mapping from
world coordinates into image coordinates and backward
from image coordinates to lines of sight or 3D plane
coordinates. The camera model used allows least-squares
optimization with the distorted image coordinates. The
amount of computation is slightly increased with respect to
the traditional camera model, but the number of the
iterations needed in optimization remains the same if
weighting is not used. In the weighted least squares, the
number of the iterations is doubled. Experiments showed
that an accuracy of 1/50 of the pixel size is achievable with
this technique if error sources, such as line jitter and
chromatic aberration, are eliminated.
The camera calibration toolbox for Matlab used in the
experiments is available on the Internet at http://
www.ee.oulu.fi/~jth/calibr/.
ACKNOWLEDGMENTS
The author would like to thank the anonymous reviewers
for valuable comments and suggestions. The financial
support of the Academy of Finland and the Graduate
School in Electronics, Telecommunications, and Automation (GETA) is gratefully acknowledged.
REFERENCES
[1]
D.C. Brown, Evolution, Application and Potential of the Bundle
Method of Photogrammetric Triangulation, Proc. ISP Symp.,
pp. 69, Sept. 1974.
[2] R.Y. Tsai, A Versatile Camera Calibration Technique for HighAccuracy 3D Machine Vision Metrology Using Off-the-Shelf TV
Cameras and Lenses, IEEE J. Robotics and Automation, vol. 3, no. 4,
pp. 323-344, Aug. 1987.
[3] Y.I. Abdel-Aziz and H.M. Karara, Direct Linear Transformation
into Object Space Coordinates in Close-Range Photogrammetry,
Proc. Symp. Close-Range Photogrammetry, pp. 1-18, Jan. 1971.
[4] S. Ganapathy, Decomposition of Transformation Matrices for
Robot Vision, Pattern Recognition Letters, vol. 2, pp. 401-412, 1984.
[5] O.D. Faugeras and G. Toscani, Camera Calibration for 3D
Computer Vision, Proc. Int'l Workshop Industrial Applications of
Machine Vision and Machine Intelligence, pp. 240-247, Feb. 1987.
[6] T. Melen, Geometrical Modelling and Calibration of Video
Cameras for Underwater Navigation, doctoral dissertation,
Norwegian Univ. of Science and Technology, Trondheim, Norway, 1994.
[7] Manual of Photogrammetry, fourth ed., C.C. Slama, ed., Falls
Church, Va.: Am. Soc. Photogrammetry, 1980.
[8] H. Haggren, Photogrammetric Machine vision, Optics and Lasers
in Eng., vol. 10, nos. 3-4, pp. 265-286, 1989.
[9] R.K. Lenz and R.Y. Tsai, Techniques for Calibration of the Scale
Factor and Image Center for High Accuracy 3D Machine Vision
Metrology, IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 10, no. 5, pp. 713-720, May 1988.
[10] J. Weng, P. Cohen, and M. Herniou, Camera Calibration with
Distortion Models and Accuracy Evaluation, IEEE Trans. Pattern
Analysis and Machine Intelligence, nol. 14, no. 10, pp. 965-980, Oct.
1992.
HEIKKILA: GEOMETRIC CAMERA CALIBRATION USING CIRCULAR CONTROL POINTS
[11] K. Kanatani, Geometric Computation for Machine Vision. Oxford:
Clarendon Press, 1993.
[12] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery,
Numerical Recipes in CThe Art of Scientific Computing, second ed.,
Cambridge Univ. Press, 1992.
[13] G. Strang, Linear Algebra and Its Applications, third ed., San Diego,
Calif.: Harcourt Brace Jovanovich, 1988.
[14] J. Heikkila, Moment and Curvature Preserving Technique for
Accurate Ellipse Boundary Detection, Proc. 14th Int'l Conf. Pattern
Recognition, pp. 734-737, Aug. 1998.
[15] K. Kanatani, Statistical Bias of Conic Fitting and Renormalization, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16,
no. 3, pp. 320-326, Mar. 1994.
[16] R.G. Willson and S.A. Shafer, What is the Center of the Image?,
Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 670671, 1993.
[17] F.G. Smith and J.H. Thomson, Optics, second ed., Wiley,
Chichester, 1988.
[18] H.A. Beyer, Linejitter and Geometric Calibration of CCD
Cameras, ISPRS J. Photogrammetry and Remote Sensing, vol. 45,
pp. 17-32, 1990.
1077
Janne Heikkila received the MSc degree in
electrical engineering from the University of
Oulu, Finland, in 1993 and the PhD degree in
information engineering in 1998. From September 1998 to July 2000, he was a postdoctoral
fellow in Machine Vision and Media Processing
Unit at the University of Oulu. Currently, he is an
acting professor of signal processing engineering at the University of Oulu. His research
interests include camera-based precise 3D
measurements, geometric camera calibration, human tracking from
image sequences, and motion estimation.