Proceedings of the 2011 IEEE
International Conference on Mechatronics
     April 13-15, 2011, Istanbul, Turkey
     Altitude Control of a Quadrotor Helicopter Using
         Depth Map from Microsoft Kinect Sensor
                                John Stowers#l, Michael Hayes#2 and Andrew Bainbridge-Smith#3
                          #Electrical & Computer Engineering, University of Canterbury, New Zealand.
                                                                  Ijohn.stowers@ieee.org
                                                            2michael.hayes@canterbury.ac.nz
                                                    3 andrew.bainbridge-smith@canterbury.ac.nz
    Abstract-Reliable depth estimation is a cornerstone of many                   results. Section ill introduces quadrotor helicopters and the
  autonomous robotic control systems. The Microsoft Kinect is                     experimental platform and control system against which the
  a new, low cost, commodity game controller peripheral that
                                                                                  Kinect was tested. The paper concludes with Section IV, a
  calculates a depth map of the environment with good accuracy
  and high rate. In this paper we calibrate the Kinect depth and
                                                                                  discussion of experimental flight results using the sensor.
  image sensors and then use the depth map to control the altitude
  of a quadrotor helicopter. This paper presents the first results of             A. Computation of Depth Maps
  using this sensor in a real-time robotics control application.
                                                                                     Attempts to compute depth maps can be grouped into
    Index    Terms-quadrotor,         visual      flight   control,   Microsoft
                                                                                  passive or active methods [5]. Passive depth sensing tries
  Kinect, depth map
                                                                                  to infer depth from 2D images from multiple cameras, for
                            I. INTRODUCTION                                       example through stereo correspondence algorithms [6] or
                                                                                  optical flow [7]. Active methods usually employ additional
     The Microsoft Kinect (Figure 1) is a low cost peripheral,                    physical sensors such as lasers, lighting or infra-red illumi
  released November 20 10, for use as a game controller with the                  nation cast on the scene. Structured light [9] based sensors
  Xbox 360 game system. The device can be modified to obtain,                     use triangulation to detect the ranges of points within their
  simultaneously at 30 Hz, a 640 x 480 pixel monochrome                           field of view [ 10]. This solves the correspondence problem
  intensity coded depth map and a 640 x 480 RGB video stream.                     of stereo vision via the constraints induced by the structure
                                                                                  of the light source. Once determined that a camera pixel
                                                                                  contains primary laser return (e.g. not laser light returned
                                                                                  from secondary reflections), the range of the reflecting surface
                                                                                  viewed in the direction of the pixel is immediately determined
                                                                                  to within the resolution capabilities of the system. Thus the
                                                                                  correspondence problem which consumes a great deal of the
                                                                                  CPU time in stereo vision algorithms is replaced with the
                                                                                  computationally much simpler problem of determining which
                                                                                  pixels of the sensor detect the primary laser return. Time-of
  Fig. 1. The Microsoft Kinect sensor. From left to right,the sensors shown       flight (TOF) cameras too avoid the correspondence problem,
  are; the IR projector,the RGB camera,the monochrome camera used for the         instead utilising the time--of-flight principle. They illuminate
  depth computation.                                                              the scene for a short period of time, for example by using
                                                                                  a brief pulse of light, and measure the duration before the
    Computation of depth maps are common in visual robotic                        illumination pulse is reflected back and detected on the image
  control systems. They are used in autonomous navigation [ 1],                   sensor. TOF cameras typically have high power consump
  map building [2], [3] and obstacle avoidance [4].                               tion due to the high illumination switch currents required,
    Due to the importance of depth maps in robotics, this                         while only achieving moderate resolution [8]. The Primesense1
  paper attempts to quantify the accuracy, performance and                        chipset in the Kinect uses a form of structured light; a
  operation of the Kinect sensor, as no public information is                     proprietary Light Coding™technique to compute depth. The
  available. Furthermore, we test the Kinect sensor and its                       Kinect sensor consists of an infrared laser projector combined
  suitability for use in dynamic robotic environments by using                    with a monochrome CMOS camera, and a second RGB video
  the computed depth map to control the altitude of a flying                      camera. Both cameras provide 640x 480 pixel images at 30 Hz.
  quadrotor helicopter.                                                              Little   information     is    available    on   the    Light
    This paper will proceed as follows. Section I introduces the                  Coding™technology, or the accuracy of the depth map
  Kinect sensor hardware and the use of depth maps in research.
  Section II describes the calibration procedure and calibration                    1 http://www.primesense.coml?p=535
978-1-61284-985-0/11/$26.00 ©2011          IEEE
                                                                              358
from the Kinect. This paper quantifies the absolute accuracy                       Let f(x) be the true depth from the image sensor, and x the
of the Kinect depth map and verifies its performance in the                        raw range value, then
dynamic environment of quadrotor helicopter flight.
                                                                                                 f(x)          a1 * exp( -((x - bd/C1)2)                    ( 1)
                                                                                                               a2 * exp( -((x - b2)/C2)2),
B. Kinect Hardware Details
                                                                                                          +
   The Kinect sensor connects to a PClXbox using a modified
USB cable2• The physical USB interface remains unchanged,                          where
however subsequent to the Kinect release the protocol3 was
decoded and software to access the Kinect was enabled.                                                        a1   =   3.169   x   104
   The Kinect features two cameras, a Micron MT9Ml 12                                                         b1   =   1338.0
640 x 480 pixel RGB camera, and a 1.3 megapixel
                                                                                                              C1   =   140.4
monochrome Micron MT9M00 1 camera fitted with an
IR pass filter. Accompanying the monochrome IR camera                                                         a2   =   6.334 x 1018
is a laser diode for illuminating the scene. Through reverse                                                  b2   =   2.035 X 104
engineering it was determined the depth map has II-bit                                                        C2   =   3154.0.
resolution, and the video 8-bit. Despite the monochrome IR
camera having higher resolution, both cameras only deliver                            It can be seen from the calibration results (Figure 3) that
640 x 480 pixel images. Both image sensors have an angular                         the depth map is accurate and repeatable over the 0.4 ... 7.0 m
field of view of 57° horizontally and a 43° vertically. The                        range. Additionally, if the Kinect is unable to estimate the
Kinect also features a microphone array and a motorized                            depth to certain regions in the image, those pixels are filled
pivot, although neither of these features were required for                        with the value 2047, making it easy to ignore these pixels
visual flight control nor subsequently tested as part of this                      from further image analysis.
evaluation.
                                                                                   B. Camera Calibration and Alignment
         II. CALIBR ATION OF THE KINECT SENSORS
A. Depth Camera Calibration                                                           Future research may involve combining the images from
                                                                                   both cameras. In order to do so accurately, the intrinsic
   The depth camera returns an II-bit number (raw values
                                                                                   and extrinsic parameters of both cameras must be known
in the range 0 ... 2047 ) which needs further processing in
                                                                                   so their images may be represented in a single co-ordinate
order to extract the true depth from the sensor. A calibration
                                                                                   system. Standard stereo computer vision techniques illustrated
procedure was performed whereby a number of reference
                                                                                   in Figure 4 were used to perform this calibration [ 1 1].
images were captured at known distances (Figure 2).
   This process was repeated a multiple times over varied
ambient light conditions in order to check the insensitivity
of the depth measurement to environmental conditions. The
results of this calibration procedure is shown in Figure 3.
                                                                                                        (a)                              (b)
                                                                                   Fig. 4. The standard chessboard approach for calculating the camera intrinsic
                                                                                   parameters for the two cameras. (a) and (b) Manual matching of 4 points in
                                                                                   both images (the comer of the chessboard) in order to calculate R and T
                                                                                   matrices.
                                                                                      The RGB camera intrinsic parameters was calculated us
                                                             -meaSl.l'eddeplh
        ���-.����-,�
                   ��,� ,-�-.��,��
                                 -�'��
                                     ,�
                                      � �.�
                                          ' �,
                                                                                   ing the 'chessboard' approach and the implementation from
                                    5entorvalue
                                                                                   OpenCV (cvFindChessboardCorners was used). Calibration of
                                                                                   the intrinsic parameters for the depth camera was performed
   Fig. 3.   Kinect depth camera calibration results and line of best fit.
                                                                                   by manually picking out the chessboard comers from the depth
                                                                                   image. The calibration results for the two cameras are shown
   A second order Gaussian model was found to be an appro                         in Table I.
priate fit (r2 0.9 9 89 ) for the data over the calibrated range.
                                                                                      The extrinsic parameters, the physical relationship between
                =
  2necessary to provide additional current.                                        the cameras, was computed by manually matching the outline
  3gratefully started by the OpenKinect           project;   https:/Igithub.comJ   of the chessboard between the frames. The rotation (R) and
OpenKinect.                                                                        translation (T) matrices were thus computed to be;
                                                                                359
                                            (a)                                                            (b)
Fig. 2. The calibration environment for testing. The board in the centre of the frame is placed 650 rnrn from the image plane. (a) Image captured from RGB
camera. (b) False coloured depth map from depth camera.
                        RGB Camera        Depth Camera
                  ex    2.62xlO�          3.51 x 1O�
                  Cy    3.29xl0�          3.02x 1O�
                  Ix    5.22xlO�          5.80xlO�
                  Iy    5.25x 1O�         5.38x 1O�
                 kI     2.45x 10 1-       -2.01 x 10 -1
                 k2     -8.39x 10 -1      9.82xlO -1
                 PI     -2.05 x 10 -�     -7.72x 10 -4
                 P2     1.49 x 10 -�      4.89xl0 -�
                 k3     8.99x 10 -1       -1.38x lOU
                                  TABLE I
  INTRINSIC CALIBRATION VALUES FO R THE KINECT RGB AND DEPTH
                           CAMERAS.
                   X 10-1         1. 3 X 10-3       -1.83 X 10-2    ]
    R   =
            [9.99
            -1.88 X 10-3
             1.7 4 X 10-2
                                  9.999X 10-1
                                  1.20 X 10-2
                                                    -1.32 X 10-2
                                                     9.99X 10-1
                                                                              Fig. 5. Quadrotor helicopter in flight. The Kinect sensor is seen mounted
                                                                              below the centre of the craft,pointing towards the ground.
            2.09 X 10-2
            [                 ]                                               risk posed by the rotors should they collide with people or
    T   =   -7.12 X 10-4
                                                                              objects. These combined, greatly accelerate the design and
            -1.34 X 10-2
                                                                              test flight process by allowing testing to take place indoors,
III. QUADROTOR HELICOPTER EXPERIMENTAL PL ATFORM                              by inexperienced pilots, with a short turnaround time for
   The first quadrotors for UAV research were developed by                    recovery from incidents. Finally, the improvement of Lithium
Pounds et al. [ 12], Bouabdallah et al. [ 13], and are now a                  polymer battery technology has enabled longer flight times
popular rotorcraft concept for unmanned aerial vehicle (UAV)                  with heavier payloads, increasing the computational power
research platforms. The vehicle consists of four rotors in total,             that can be carried onboard, thus increasing the complexity
with two pairs of counter-rotating, fixed-pitch blades located                of visual algorithms that can be experimented in real time.
at the four comers of the aircraft (Figure 5).
                                                                              A. Experimental Hardware
   Due to its specific capabilities, quadrotors also provide a
good basis for visual flight control research. First, quadrotors                 The quadrotor is of custom design and construction [ 14].
do not require complex mechanical control linkages for rotor                  It features a real time embedded attitude controller running
actuation, relying instead on fixed pitch rotors and using                    on a 32-Bit ARM 7 Microprocessor at 60 MHz. An inertial
variation in motor speed for vehicle control. This simplifies                 measurement unit (IMU) is also present, and contains a 3-
both the design and maintenance of the vehicle. Second, the                   axis accelerometer, 3x single-axis gyroscopes, and a 3-axis
use of four rotors ensures that individual rotors are smaller                 magnetometer.
in diameter than the equivalent main rotor on a helicopter,                      The quadrotor hardware consists of a cross-frame made
relative to the airframe size. The individual rotors, there                  of square aluminum tubes joined by plates in the center.
fore, store less kinetic energy during flight, mitigating the                 This design has proved to be robust, usually only requiring a
                                                                           360
propeller replacement after a crash. On this frame are mounted                                                                           -MtpOinl
                                                                                                                                         -measurldvalutl
four brushless motor / propeller combinations, four brushless
motor controllers, the avionics and the battery. The two pairs
of opposite rotating propellers are mounted at a distance of
400 mm.
   The Kinect sensor is mounted under the craft, pointing
towards the ground (Figure 5). Using the calibrated output
from the depth camera, a control system was developed to
maintain a constant altitude during flight of the quadrotor
helicopter and thus evaluate the suitability of the Kinect sensor                   1100;--      ;--
                                                                                            -----,   - ;--
                                                                                                         -    -;-
                                                                                                                -   -;,----!-
                                                                                                                           , ---;----;----;-------!
                                                                                                                    AI�ll1me(.)
                                                                                        ,                           ,
in dynamic environments. The visual altitude controller runs
on a standard laptop Pc. The Kinect is connected to the PC
via USB which means the quadrotor is limited to operation                   Fig. 7.    Flight performance of the quadrotor attempting to maintain an
                                                                            absolute altitude of 1300 mm using the Kinect sensor provided depth map.
within cable range of the laptop.
   The embedded attitude controllet running on the quadrotor
hardware will maintain attitude stability, while all control of             7 shows the performance of the control system. A videoS is
the altitude will be handled by the visual controller using the             also available.
Kinect depth map. The attitude state (roll, pitch and yaw) is                  The quadrotor was commanded to hover at 1300 mm above
continuously sent from the attitude controller over a wireless              the laboratory floor. While the PID controller is not optimal
serial link, over which commands are also returned to the                   for this task (note the constant offset from the set-point due
quadrotor.                                                                  to insufficient integral action), it was suitable and allowed the
   Consider Figure 6, Let () be the pitch angle of the quadrotor,           quadrotor to maintain altitude until reflection from the wooden
                                                                            floor gave incorrect depth measurements and caused the craft
                                                                            to become unstable.
                                                                                                             V. CONCLUSION
                                                                               The successful control of quadrotor altitude using the Kinect
                                                                            depth map demonstrates that the sensor is capable of operation
                                                                            in dynamic environments. Its low cost, high frame rate and
                                                                            absolute depth accuracy over a useful range make it suitable
                                                                            for use on robotic platforms.
Fig. 6. Co-ordinate system for the quadrotor helicopter and Kinect camera
                                                                               Further work will involve integrating the Kinect into the
(represented here by the arrow adjacent to Zk).
                                                                            navigation layer of the quadrotor system. It will likely be
                                                                            moved into a traditional forward pointing orientation and the
and ¢ be the roll angle. Let Zk be the depth observed in the
                                                                            depth and RGB images combined in the manner described
Kinect body frame, that is the depth perpendicular to the image
                                                                            in Section II-B. The forward orientation of the depth camera
sensor. The true depth, Zb, corrected for the craft attitude is
                                                                            will require more robust methods to detect the ground plane;
given by;
                                                                            the Hough transform and RANSAC will be explored for this
                                                             (2)
                                                                            purpose.
   A proportional-integral (PI) controller was implemented to
                                                                                                              REFERENCES
control Zb, the quadrotor altitude. The commanded output, c,
from the controller is given by;                                              [1) D. Murray and J. J. Little, "Using real-time stereo vision for mobile
                                                                                  robot navigation," Autonomous Robots, vol. 8,no. 2,p. 161,2000.
                     c   =   Kp6. + KI   J   6.dt,                   (3)
                                                                              (2) D. Wooden,"A guide to vision·based map building," Robotics Automa·
                                                                                  tion Magazine, IEEE, vol. 13,no. 2,pp. 94--98,june 2006.
                                                                              (3) R. Sim and J. Little, "Autonomous vision-based robotic exploration
where Kp     = 5 and KI    1 are the proportional and integral
                              =
                                                                                  and mapping using hybrid maps and particle filters," Image and Vision
                                                                                  Computing, vol. 27,no. 1-2,p. 167,2009.
control gains determined experimentally. The control system is                (4) M. Kumano,A. Ohya,and S. Yuta,"Obstacle Avoidance of Autonomous
discrete, a 4th-order Runge-Kutta (RK4) integrator is used and                    Mobile Robot using Stereo Vision Sensor," in Proceedings of the 2nd
dt, the frequency of update of the control system, is 201Hz                       International Symposium on Robotics and Automation, 2000,pp. 497-
                                                                                  502.
5 ms.                                                                         (5) S.·Y. Kim, E.-K. Lee,and Y.-S. Ho,"Generation of ROl Enhanced Depth
                                                                                  Maps Using Stereoscopic Cameras and a Depth Camera," Broadcasting,
                     IV. CONTROL RESULTS                                          IEEE Transactions on, vol. 54,no. 4,pp. 732-740,dec. 2008.
                                                                              (6) L. Nalpantidis, D. Chrysostomou, and A. Gasteratos, "Obtaining Re
  The visual flight controller was given complete authority to                    liable Depth Maps for Robotic Applications from a Quad-Camera
command the quadrotor thrust, and hence its altitude. Figure                      System," p. 906,2009.
  4http://www.waspuav.org/                                                     5 http://www.waspuav.orgiresources/icm20lllkinect-hover- video
                                                                        361
 [7) S. C. Diamantas, A. Oikonomidis, and R. M. Crowder, "Depth es
     timation for autonomous robot navigation: A comparative approach,"
     in Imaging Systems and Techniques (1ST), 2010 IEEE International
     Conference on, july 2010,pp. 426-430.
 [8) A. Medina, F. Gaya, and F. d. Pozo, "Compact laser radar and three
     dimensional camera," 1. Opt. Soc. Am. A, vol. 23, no. 4,pp. 800-805,
     Apr 2006.
 [9) S. Yi, J. Suh, Y. Hong, and D. Hwang, "Active ranging system based
     on structured laser light image," in SICE Annual Conference 2010,
     Proceedings oj, aug. 2010,pp. 747-752.
[10) E. G. H. nstrup, David, "Single Frame Processing for Structured
     Light Based Obstacle Detection," in Proceedings of the 2008 National
     Technical Meeting of The Institute of Navigation, San Diego,CA,2008,
     pp. 514-520.
[II) R. I. Hartley and A. Zisserman,Multiple View Geometry in Computer
     Vision, Second ed. Cambridge University Press,ISBN: 0521540518,
     2004.
[12) P. Pounds,R. Mahony,P. Hynes,and J. Roberts,"Design of a Four-Rotor
     Aerial Robot," Auckland,New Zealand, November 2002.
[13) S. Bouabdallab,P. Murrieri,and R. Siegwart,"Design and control of an
     indoor micro quadrotor," vol. 5,apr. 2004,pp. 4393-4398.
[14) J. Stowers,M. Hayes,and A. Bainbridge-Smith,"Quadrotor Helicopters
     for Visual Flight Control," in Proceedings of Electronics New Zealand
     Conference 2010, Hamilton,New Zealand,2010,pp. 21-26.
                                                                        362