Video Stabilization: A Bachelor'S Thesis
Video Stabilization: A Bachelor'S Thesis
A BACHELOR’S THESIS
submitted in partial fulfillment
of the requirements for the award of the degree
of
BACHELOR OF TECHNOLOGY
in
ELECTRONICS & COMMUNICATION
(B.Tech. in ECE)
Submitted by:
ARSHDEEP SINGH
(Enrollment No. - IEC2006034)
May, 2010
CANDIDATE’S DECLARATION
I hereby declare that the work presented in this thesis entitled “Video Stabilization”, submitted
in the partial fulfillment of the degree of Bachelor of Technology (B.Tech), in Electronics &
Communication at Indian Institute of Information Technology, Allahabad, is an authentic record
of my original work carried out under the guidance of Dr. Pavan Chakraborty due
acknowledgements have been made in the text of the thesis to all other material used. This thesis
work was done in full compliance with the requirements and constraints of the prescribed
curriculum.
CERTIFICATE
This is to certify that the above statement made by the candidate is correct to the best of my
knowledge.
_______________________ _______________________
_______________________ _______________________
ACKNOWLEDGEMENTS
I would like to thank my guide for sparing his valuable time and effectively guiding me in the
project. He provided me with the general overview of the work I was required to do and was
always available for help when I encountered roadblocks in my project. Without Mentor, this
project would never have been a success. In the end I would like to thank Larry Page and Sergey
Brin for their prestigious product Google without whose extensive search, I could not have even
moved an inch. Their constant support and guidance has been inevitable for the timely
completion of the project.
Thanks a ton!
Arshdeep Singh
Place: Allahabad
Date: May 19, 2010
ABSTRACT
The aim of the project was to develop an efficient and robust algorithm for
compensation of camera motion which introduces shakiness and jitters in videos.
Shakiness may be introduced due to shooting video with handheld camera or shooting
from moving vehicle. I implemented the video stabilization process using block
matching algorithm. Given algorithm removes shakiness from YUV raw video and
produces stabilized YUV video. The algorithm is based on extracting motion between
consecutive frames of raw video. Intentional camera motion, such as panning,
translation motion with respect to the scene, is usually smooth with slow variations
from frame to frame (in the time domain). On the other hand, unwanted, parasitic
camera motion involves rapid motion variations over frame to frame. That is to say,
high frequency components in the motion vector variations over time are considered
to be effects of unwanted camera motion. Recovery of the intentional motion
parameter can be achieved by using a low-pass filter. I have used moving average filter
as low pass filter. After extracting noisy component of motion vector, frames can be
stabilized using corrective shifting according to noisy motion component. Algorithm
Developed here have been implemented in C++ and successfully run on shaky video
giving accurate results without incurring significant quality degradation.
Table of Contents
1. Introduction..……………………….……………………………………………………1
Bibliograph……………………………………………………………………………..38
Page |1
1 INTRODUCTION
Taking videos with a hand-held camera or from moving vehicles introduces shaking, which
incontrovertibly reduces video quality. Video stabilization is a process to compensate for camera
motion using image processing techniques. Starting from the root I had to understand what video
stabilization is and what are the applications of stabilizing videos in real world. After
understanding the basics I started with YUV videos which are the videos in raw format. It was
kind of preprocessing the data which is captured by camera for stabilization process before
saving it in standard formats like .mpeg, .avi etc.
In short, Aim of the project is to learn and execute the project in following sequential steps:-
Currently there exist two ways that can estimate the relative motion between frames of a
Shaky video.
1. Mechanical Compensation
2. Digital Stabilization by means of Image Processing
Mechanical
Compensation:
The sensor which captures the image can be moved in a way as to counteract the motion of the
camera, a technology referred to as mechanical image stabilization. When the camera rotates,
causing angular error, gyroscopes encode information to the actuator that moves the sensor. The
sensor is adjusted to maintain the projection of the image onto the image plane, which is a
function of the focal length of the lens being used; modern cameras acquire information about
focal length from the lens mounted.
Most of the cameras are mounted inside gimbals for mechanical stabilization, Yet there still is a
need for electronic stabilization. There are some reasons, why mechanical stabilization does not
means exclusion of the electronic stabilization.
First reason is that image vibration can not be removed completely by means of mechanical
stabilization. The performance of gimbals to compensate for vibration is typically specified by
the minimum angular displacement that can be mechanically stabilized.
A second reason that favors electronic image stabilization is its low cost. High performance
gimbals can provide stabilization as good as 12 micro-radians; however, this kind of optical
system is very expensive.
Third reason, the advanced mechanical stabilization systems introduce certain weight and
volume impact.
Flexibility in control is a further advantage of electronic stabilization. As opposed to mechanical
stabilization, an operator at the observation console, or smart software, can dynamically adjust or
even enable /disable the stabilization process and its parameters.
Page |4
The research paper [1] represents the video stabilization process using affine transformation.
The motion estimation between two sequential frames, f (x, y, t) and f (x, y, t − 1) is done using
a 6 parameter affine transform.
With relatively few parameters, the affine model captures a sufficiently rich range of motions
such as translation, scale, and rotation. The basic framework described here can easily be
extended to accommodate higher-order polynomial models that capture a richer set of motions
(or even a lower-order model that models the inter-frame motion with only a translation). In
order to estimate the affine parameters, a quadratic error function should be minimized.
Pros
Model can capture the translation, rotation and scaling motion between two frames.
Cons
Have to estimate relatively large number of parameters .So computationally extensive.
Page |5
In this paper [2] they presented a technique which infers inter-frame motion by tracking SIFT
features through consecutive frames. This algorithm does not depend on the point detector
adopted prior to SIFT descriptor creation: therefore performance have been evaluated against a
wide set of point detection algorithms, in order to investigate how to increase stabilization
quality with an appropriate detector. Feature points are detected and their stability is evaluated
through a combination of geometric error measures and fuzzy logic modeling.
Two different error measures have been adopted to evaluate the quality of a matching:
(a) Angle between the two local motion vectors: this measure performs well with rotational
components.
(b) Euclidean distance between expected and real point: this measure does perform well
since rejects matching that do not agree with the found translational components but may
results inaccurate for border points when a rotation occurs.
Compensation is done using fuzzy model. Fuzzy logic model takes as input the two error
measures mention before and gives as output a single quality index, a real value in the range [0,
1] which represents extent of matching between a pair of points.
Page |6
Taking videos with a hand-held camera introduces shaking, which incontrovertibly reduces video
quality. Video stabilization is a process, which compensate for camera motion. In this project I
am implementing Video stabilization program in C++ .User is supposed to give as input a shaky
raw video in YUV format and software developed will output processed and stable video
sequence in YUV format.
I formulated the problem in a sequential manner .I planned my work according to the following
steps.
Chapter 2 describes about hardware used for grabbing videos and software (converter, video
Player, dev c ++) used for development of the project.
Chapter 3 describes the theory and algorithm on which whole project is based. It discuss about
YUV frame format, Block matching algorithm steps as - motion estimation, motion
smoothing, motion correction.
Chapter 4 describes the complete software development process as- specification, design,
flowchart of algorithm. It also explains the data structures used to handle the input
frames.
Chapter 5 describes the testing and analysis of the software. Frames of input video and
Corresponding stabilized frames are shown as well and analysis is made on results
obtained.
Chapter 7 describes the modification possible and future prospective of the project.
Page |9
2.1 Hardware
I have used canon digital camera for making the video so that running simulation can
be done on it.
2.2 Software
1. Dev -C++
C++ implementation of the software was done using Dev –c++.Software was used for
coding and debugging purpose.
2. YUVTools_3.0 (converter)
This software was used to convert avi, mp4 etc file formats to YUV format. Resolution
of the YUV video can be varied using mentioned software.
Y'UV was first invented when engineers wanted color television in a black and white
infrastructure. They needed a signal transmission method that was compatible with Black and
white (B&W) TV while being able to add color. The luma component already existed as the
B&W signal; they added the UV signal to this as a solution.
A single YUV420 frame looks like as shown in figure below.
Fig-3.1
P a g e | 11
For each pixel we have different Y values. U and V values are shared amongst 2x2 blocks of
pixels as shown in the figure. In the byte stream first all the Y values are stored in contiguous
locations and then U and in the last V values are stored. In my application since I needed to
estimate the motion between continuous frames I used only Y values and while writing the
compensated frame U and V values were also modified according to estimated motion.
There are several different approaches available for Digital Video Stabilization. Some of them
are Differential Approaches, some are Matching based, some Energy based Approaches and
some are Phase Based Approaches. Mainly used approaches for video stabilization are-
In Block Matching Algorithm we estimate the motion between consecutive frames by matching
the blocks of current frame with the previous one.
Fig-3.1- figure shows the block diagram of our motion stabilization architecture.
After reading in video frame sequences, the motion estimator will compare the difference
between every two consecutive frames, expect the possible motions, and generate motion
vectors. Then, these motion vectors will be sent to the motion smoother, which helps to remove
those unwanted motions. Finally, the motion corrector does adjustments to current frames based
on those smoothed motion vector sequences, and output stabilized frames. The basic assumption
in the algorithm is that I have assumed the motion is translational only i.e. there is no rotation, no
zoom-in, no zoom-out etc.
This is the first step in a three step process to stabilize any video. For motion estimation I divided
a frame into blocks of 16x16 pixels. Each block of current frame is matched with the
corresponding blocks in the previous frame in its proximity of 8 pixels in each direction. To
decide the motion vector of each block, we calculate the Sum of Squared Differences (SSD). The
deviation which gives minimum SSD for a block is taken as the local motion of that block.
P a g e | 13
The local distances for each block are collected in a Local Motion Vector and the mode of that
vector is taken to be the Global Motion between the two consecutive frames.
Fig-3.3
The above figure shows the quiver plot of Local Motion Vectors for a frame. It is apparent from
the above graph that most of the Local Motion Vectors are in the same direction. These Local
P a g e | 14
Vectors represents the Global Motion of video sequence. The variation from the desired motion
is depicted as some random values in the graph. These Local Vectors represents the Local
Motions of the objects in that frame. For example in a video the motion of the background can be
considered as Global Motion and the motion of objects can be considered as Local Motion.
The right figure shows the histogram of the Local Motion Vectors in the two
directions-horizontal and vertical. As explained above the horizontal component Global Motion
will be the mode of the all the horizontal components of Local Motion Vectors. Similarly we can
find the vertical component of the Global Motion. These values is stored in Global Motion
Vector. Till now we have encountered two vectors:
This New Global Motion Vector is then smoothed to remove the shakes and jitters which results into the
Smooth Motion Vector. In our algorithm we took MA (Moving Average) as the smoothening filter with
window size N = 5.
By accumulating the Global Motion Vector we can now find how much is the displacement between any
two frames. We cannot use this vector for motion compensation as all the desired motion is also
accumulated in this process. We don’t have to remove the desired motion; we only have to compensate
the unwanted motion. So we smooth out the New Global Motion Vector to find out the Smooth Motion
Vector which only contains the desired motion as by smoothing out we have removed all the unwanted
random motion.
The above figure shows the original New Global Motion Vector and the Smooth Motion Vector.
It is apparent form the figures that the unwanted random shakes are removed by this
smoothening process leaving only the desired motion in the Smooth Motion Vector.
Motion corrector is used to perform motion compensation and construct smoothed frames. After
finding out all the desired motion in form of Smooth Motion Vector now we have to find out the
unwanted motion. To find this what I did is I subtracted the Smooth Motion Vector from New
Global Motion Vector. By doing this I removed all the desired motion from the New Global
Motion Vector, leaving behind the unwanted motion. This motion was stored in Difference
Vector.
After finding out the unwanted motion we compensated this motion while writing the new
stabilized video sequence.
P a g e | 17
Fig-3.4
In the above figure I1, I2, I3 are the input frames; GMV1, GMV2 are components of Global
Motion Vector; SMV1, SMV2 are components of Smooth Motion Vector; DV1, DV2 are
components of Difference Vector and O1, O2 are final smoothed stabilized frames. Motion
corrector generates the smoothed frame 2 by applying DV1 on the frame 1, and smoothed frame
3 by applying DV2 on input frame 2 (I2) and so on. The thick lines in the above figure show the
motion correction.
P a g e | 18
4) Development of Software:
4.2 Design:
Software is modularized according to following module-
1. GrabeFrame-
This module grabs the frames from input video sequence and calculates total number of
frames. Y, U, V pixel values are stored separately in some data structures.
2. Get motion-
This module is responsible for comparing two consecutive frames and calculates the local
motion vector as well as global motion vector.
3. Calmotion-
This module is actually implementing block matching algorithm and compares the
current frame with previous one. Entire frame is divided into 16 x 16 blocks. Each block
is compared with its corresponding block in previous frame in proximity of 16 pixels.
P a g e | 19
4. Mode-
This module implements the mode function. Mode of the localmotion vector is calculated
and stored as a globalvector element.
5. MovingAvg-
This module works as a low-pass filter. Input to this module is accumulated version of
the globalvector, newglobalvector. This vector contains shaky motion due to camera
movement as well as desired motion. Shaky motion is high pass component in
newglobalvector. This is removed due to low pass filtering. Hence we have output of this
module as smoothvector. Subtracting the smoothvector from newglobalvector gives the
vector whose elements give information about noisy component only. This vector we call
diffvector.
6. Write-
Write module is responsible for writing back the stabilized frames to the output file. Very
First frame of input is written as same.ith element of diffvector corresponds to the shifting
of (i+1)th frame with respect to ith frame due to camera vibration only. Hence ,when
Writing back i+1th frame, it is shifted according to diffvector [i]. Edges are filled with
black color.
P a g e | 20
4.3 Flowchart:
START
no
Calculate the local Grab first frame from
motion for each block input and write it
of current frame w.r.t output
previous frame
Yes
yes
Globalvector
Accumulated version
NewGlobalvector
4.4 Implementation
Algorithm was implemented with C++.First task was to read pixel values from input YUV video
and store them in appropriate data structure. To store a complete frame following class was
defined-
class Myframe{
public:
unsigned char *Ybuff,*Ubuff,*Vbuff;
Myframe(bool Cap_UV);
};
Myframe::Myframe(bool Cap_UV){
Ybuff=(unsigned char*)malloc(sizeof(char)*Ybuff_size);
if (Cap_UV){
Ubuff=(unsigned char*)malloc(sizeof(char)*Ubuff_size);
Vbuff=(unsigned char*)malloc(sizeof(char)*Ubuff_size);
}
}
Ybuff is a pointer to a contiguous memory block which is used to keep Y pixel values.
Ubuff is a pointer to contiguous memory block which is used to keep U pixel values.
Vbuff is a pointer to a contiguous memory block which is used to keep V pixel values.
When calculating the local motion vectors and global motion vectors, we use only Y values and
V values are required only when writing back the stabilized frames. When bool variable
Cap_UV is 0, declaring object of class Myframe only allocate space for Y values. When 1,
allocates memory for all three Y,U. and V values.
struct vector
{
int h;
int w;
Any variable of struct vector type can hold indices of shifting of a frame with respect to another.
P a g e | 23
The performance of my approach gave good results for the quality. First of all the code was
tested on simulated videos. The results observed were of super fine quality as even a small jitter
was removed with very high accuracy. After that some videos were shot using a canon digital
camera inside the office environment. Very heavy shakes and vibrations were introduced in the
video while shooting. When the code was tested on these videos, the results observed in the
beginning were not satisfactory but after debugging and few modifications, results were
satisfactory even for real videos.
Demo 1-
This demo shows video stabilization results for a yuv video taken in office environment.Video
was taken with canon digital camera. Video obatained from camera was in mp4 format.this video
was converted to yuv format with resolution 176 x 144 using a converter “yuv_tools”. The
Command line console is shown on next page after running the program on given video.
P a g e | 24
As it is clear from console, inputs to program are video file name(input),output video name,
width and height of video.
After running program, it shows the total number of frames in the video file and number of
frames for which motion calculation has been skipped. Motion estimation is not done for all the
frames. If a frame has high correlation with its previous frame, than motion calculation can be
skipped for that frame as there is no significant effect of shaking on that frame w.r.t. previous
frame.
Consecutive frames of input video and corresponding stbilized frames are shown in Fig-5.2 and
Fig-5.3 respectively.Frames were captured by using YUV player.
P a g e | 25
Here are corresponding stabilized frames (same frame number)from output video.
P a g e | 26
6) Conclusion
I have verified global motion estimation algorithm based on a 3-step estimation model that can
remove hand shacking effect with good performance for translational video motions. I also have
implemented the algorithm with C++.
Video stabilizer developed here can stabilize videos accurately and removes the
shakiness without incurring any significant quality degradation. Results were satisfactory for
simulated as well as real videos taken by handheld camera.
P a g e | 28
With the development of video stabilization algorithm for the Pocket PC, we can integrate it with
wireless network to conference people whenever you are. However, limited by the transmission
bandwidth and processing speed of Pocket PC, to achieve the goal of video conferencing, we
want to get much lower bit rate of video sequences. One way we can do is to compress the image
more and adopt the MPEG-7 technology to extract the image we want, i.e. human’s face out of
the background. Since background compared to human’s face is not that desired, so we don’t
have to send its information in every frame.
Otherwise, video stabilization can also be used to electronics, such as camcorders. It’s hard to
eliminate hand-shaking effect; however, we can use this algorithm to remove it. It will be useful
to make the video quality better and let people enjoy multimedia world more.
Video stabilization process is implemented using block matching algorithm, which
requires extensive computation for SSD computation. The above algorithm can be modified by
introducing parallel processing (like CUDA) for computation of motion vectors. It will help in
reducing running time of program.
P a g e | 29
Appendix 1
/
*************************************************************************************
****************************************
8Pixel_MA
3. threshold in CalMotion Function. its value willbe a optimization between eficiency and accuracy.
*************************************************************************************
*****************************************/
#include<iostream>
#include<fstream>
#include<cstring>
#include<cstdio>
#include<ctype.h>
bool Cap_UV; /*Decides whether to initialize UV Buffer or not. While calculating global motion
int SkipCount =0; //To know how many frames were skipped during Motion Estimation
struct vector{
int h;
int w;
};
class Myframe{
public:
Myframe(bool Cap_UV);
};
Myframe::Myframe(bool Cap_UV){
Ybuff=(unsigned char*)malloc(sizeof(char)*Ybuff_size);
if (Cap_UV){
Ubuff=(unsigned char*)malloc(sizeof(char)*Ubuff_size);
Vbuff=(unsigned char*)malloc(sizeof(char)*Ubuff_size);
}
P a g e | 31
struct vector* CalMotion(unsigned char *Curr, int h, int w, unsigned char *Prev, struct vector* LVector);
/
*************************************************************************************
***********
MAIN FUNCTION
*************************************************************************************
************/
int main(){
char str[20],str1[20];
cin>>str1;
cin>>Width;
cin>>Height;
Step = Width;
Framesize = Height*Width*(1.5);
Ybuff_size = Height*Width;
Ubuff_size = Ybuff_size/4;
FILE *in;
exit(0);
FILE *out;
exit(0);
rewind(out);
char ch;
do{
fseek(in,offset,SEEK_SET);
ch=fgetc(in);
FrameCount++;
}while( ch!=EOF );
TotalFrames = FrameCount-1;
cout << "Total no. of frames in the file are "<<TotalFrames <<"\n";
FrameCount = 1;
GrabeFrame ( in,Frame1);
GrabeFrame ( in,Frame2);
do{
SO FRAME1 WILL BE LATEST FRAME AND FRAME2 WILL BE THE OLDER FRAME
WE WILL DETECT THE MOTION OF FRAME1 W.R.T FRAME2 AND VICE VERSA*/
if (FrameCount%2==0)
GetMotion(Frame1.Ybuff ,Frame2.Ybuff);
else
GetMotion(Frame2.Ybuff ,Frame1.Ybuff);
if (FrameCount%2==0)
GrabeFrame(in,Frame2);
else
GrabeFrame(in,Frame1);
P a g e | 35
if (FrameCount%2==0)
GetMotion(Frame1.Ybuff ,Frame2.Ybuff);
else
GetMotion(Frame2.Ybuff ,Frame1.Ybuff);
cout<<"\n Total no. of Frames for which motion calculation is skipped are "<<SkipCount;
GlobalVector--;
DiffVector contains the difference of NewVector and SmoothVector which is then passed to Write
Function*/
if(i!=0){
else{
NewVector[i].h = GlobalVector[i].h ;
NewVector[i].w = GlobalVector[i].w;
MovingAvg(NewVector, SmoothVector);
if(i<N/2){
DiffVector[i].h = NewVector[i].h ;
DiffVector[i].w = NewVector[i].w ;
P a g e | 37
else{
Cap_UV = true;
//Call to Write
Write(in, out,DiffVector);
fclose(in);
fclose(out);
return(0);
}
P a g e | 38
8) Bibliography
[1] Hung-Chang Chang ,Shang-Hong Lai , Kuang-Rong Lu, “A robust and efficient video
Stabilization algorithm”, Multimedia and Expo, 2004. ICME '04. 2004 IEEE International
Conference on 27-30 June 2004
[2] Battiato S. , Gallo G. , Puglisi G. ,“Fuzzy based Motion Estimation for Video stabilization
Using SIFT interest points ”,conference on “Digital Photography” V , SanJose , CA , USA
19 January 2009
[4] Ondrej, M. ; Frantisek, Z. ; Martin, D. ,“Software video stabilization in a fixed point Arith-
metic ”, Applications of Digital Information and Web Technologies, 2008. ICADIWT 2008
First International conference on 4-6 Aug. 2008, Pages 389 – 393
[5] Chao-Ho Chen ;Yi-Li Kuo ;Tsong-Yi Chen; Jie-Ru Chen ; “Real-Time Video Stabilization
Based on Motion Compensation” , Innovative Computing, Information and Control
(ICICIC), 2009 Fourth International Conference on 7-9 Dec. 2009 , pages 1495 – 1498