0% found this document useful (0 votes)
89 views2 pages

Fast Convolution

A simple and fast algorithm to compute the convolution of an arbitrary sequence x with a sequence of a specific type, a. The number of steps for computing the convolution depends on a certain complexity of a and not on its length. This makes it feasible to convolve a sequence with very large kernels fast.

Uploaded by

gulanitoyo
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views2 pages

Fast Convolution

A simple and fast algorithm to compute the convolution of an arbitrary sequence x with a sequence of a specific type, a. The number of steps for computing the convolution depends on a certain complexity of a and not on its length. This makes it feasible to convolve a sequence with very large kernels fast.

Uploaded by

gulanitoyo
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Fast Convolution

Michael Werman
School of Computer Science and Engineering
The Hebrew University of Jerusalem
Jerusalem 91904, Israel
werman@cs.huji.ac.il
Abstract
We present a very simple and fast algorithm to compute the convolution of an arbitrary sequence
x with a sequence of a specific type, a. The sequence a is any linear combination of polynomials,
exponentials and trigonometric terms. The number of steps for computing the convolution depends
on a certain complexity of a and not on its length, thus making it feasible to convolve a sequence
with very large kernels fast.

Computing the convolution (correlation, filter- depend on r), or equivalently, ar = di=1 αi ar+i .
ing) of a sequence x together with a fixed sequence For d smaller than log |m| this is faster and much
a is one of the ubiquitous operations in graphics, simpler than using FFT.
image and signal processing. Often the sequence Examples of such sequences are; d−1
a is a polynomial, exponential or trigonometric • polynomials of degree d − 1, ai = j=0 λj ij ,
function sampled at discrete points or a piecewise the
d LHE isj d
sum of such terms, such as, splines, or else the se- j=0 (−1) j ai+j = 0, this is of complexity d.
quence can can be well approximated with a few • ai = βλi , the LHE is λai − ai+1 = 0, this is
such terms. The computation of these convolutions of complexity2.
d−1
is usually computed straight from the definition • ai = λi j=0 αj ij , the LHE is
taking O(|x||a|) time or using a more complicated d 
j d
 d−j
j=0 (−1) j ai+j λ = 0, this is of complexity
FFT based O(|x| log |a|) time algorithm.
d.
Here we present a simple and fast algorithm
• ai = α sin(iθ) + β cos(iθ), the LHE is ai −
to compute the convolution of x1 , x2 , . . . , xn with
2 cos(θ)ai+1 + ai+2 = 0, this is of complexity 3.
am , am−1
m, . . . , a1 , namely y1 , y2 , . . . , yn−m where • ai = λi (α sin(iθ) + β cos(iθ)), the LHE is
yi = k=1 ak xi+k−1 . The number of steps of the 2
λ ai − 2λ cos(θ)ai+1 + ai+2 = 0, this is of com-
algorithm depends on a measure of complexity of
plexity 3.
a and not on m, its length. The number of steps to
Sums of above like terms also satisfy a linear
compute the convolution is O(dn) (m < n) where
homogeneous equation with a complexity that is
the sequence asatisfies a linear homogeneous equa-
d additive, such as ai = 3 sin(21πi/4)+(−2)i +i3 −4,
tion, (LHE), i=0 βi ar+i = 0 (where the β do not
this is of complexity 3 + 2 + 4 = 9.
The complete algorithm, consists of two steps;
Permission to make digital or hard copies of all or initialization and the running computation:
part of this work for personal or classroom use is
granted without fee provided that copies are not • for i=1 . . . d
made or distributed for profit or commercial ad- m
– yi = k=1 ak xi+k−1
vantage and that copies bear this notice and the 
full citation on the first page. To copy otherwise, – Fd+1
d+1−i
= yi − d−i+1 ak xi+k−1
m+d−i+1 k=1
or republish, to post on servers or to redistribute + k=m ak xi+k−1 .
to lists, requires prior specific permission and/or
a fee. • for i = d + 1 . . . n − m
Journal of WSCG, Vol.11, No.1., ISSN 1213- d
– yi = Fi0 = j=1 αj Fij
6972 WSCG(92)2003, February 3-7, 2003, Plzen,
Czech Republic. Copyright UNION Agency (96) – for j = 1 . . . d
j
Science Press ∗ Fi+1 = Fij−1 −aj+1 xi +am+j+1 xm+i
m
The invariant is Fij = k=1 aj+k xi+k−1 . signals, images, whenever the function is separable
dThe correctness d of the malgorithm follows from, in x and y and if the 1-d functions are simple, for
2πj
α
j=1 j i F j
= j=1 jα k=1 aj+k xi+k−1 example, a(i, j) = 3 sin( 2πi 3 2
10 )(2j − j + cos( 20 )).
 m d Then the amount of work for each output is just
= k=1 ( j=1 αj aj+k )xi+k−1
m the sum of the two complexities.
= k=1 ak xi+k−1 = yi .
There is no reason to save all the previous F ’s The family of 2-d functions that satisfy a linear
so that the extra memory requirement over the in- homogeneous equation is much larger than what
put/output is just O(d). can be computed with the method of separable
Notes: functions, as it is possible to add arbitrary 1-d
• This is a special case of a recursive filter[Smi97], functions of linear combinations of x and y for free.
whereit is easy to define The reason that we do not propose using a straight
d d the sequence a.
d+1−k forward generalization of the the 1-d algorithm is
yi = j=1 αj yi−j − k=1 ( l=1 al )xi−k
d d+1−k that the updating of the F ’s (in the algorithm) now
+ k=1 ( l=1 al )xm+i−k takes time proportional to the the perimeter of the
• Special cases of LHE’s with a fast algorithm √
signal which is now very large, (O( m) in 2-d).
such as the constant function (order 0 polynomial),
box filtering, [SBHC88], and exponential functions Experiments
have appeared before [Smi97, SBW02]. In order to test the algorithm we compared dif-
• The case of polynomials where the filter can ferent convolutions with different algorithms. The
be space variant was treated in [SBHC88, Hec86] algorithms were convolution using a straight for-
using repeated integration and differentiation. They ward implementation of the definition, convolution
gave slightly different formula as they used a con- based on FFT and our proposed fast convolution.
tinuous set up. The code was written in JAVA and only the FFT
Formulas equivalent to ours can be derived us- based algorithm was optimized, time is in ms.
ing finite differences m[MT33] and the summation N M d Conv FFT Fast
by parts formula: k=1 x(r + k)a(k)
d 16384 16 3 160 611 110
= l=0 (−1)l x−l (r + m + l)∆l a(m). 16384 128 3 1212 751 110
x (sum) is defined by, x0 (i) = x(i) and xp (i) =
i −l l 0
16384 1024 3 9273 731 81
j=0 xp−1 (j), and ∆ a(m) by ∆ a(m) = a(m) 16384 2048 3 18066 751 101
and ∆ a(m) = ∆ a(m + 1) − ∆ a(m).
p+1 p p 8186 16 5 140 120 70
∆d+1 a(m) = 0 whenever a(m) is a sequence of 8186 128 5 991 101 70
equally spaced samples of a polynomial of degree d. 8186 1024 5 8242 100 60
So that as in [SBHC88, Hec86] once the sequences 8186 2048 5 15332 90 50
xl and ∆l a(m) for l = 0..d are precomputed m
O(d) per element the computation of k=1 x(r +
in
References
k)a(k) takes only O(d) time. [Hec86] P. Heckbert. Filtering by repeated inte-
l gration. In Proceedings of SIGGRAPH,
∆ a(m) isx especially
l x
 easy to compute using that
∆ n = n−l and that any polynomial can be pages 317–321, 1986.
 x
written as βj j .
[MT33] L. M. Milne-Thomson. The Calculus of
• The complexity of convolving with a sequence
Finite Differences. Macmillan, 1933.
that is a piecewise sum of simple functions such as
a spline can be computed by adding up the contri- [SBHC88] P. Y. Simard, L. Bottou, P. Haffner,
butions of each of the pieces. and Y. Le Cun. Boxlets: a fast convolu-
• Almost exactly the same same formulas can tion algorithm for signal processing and
be used to compute the convolution for sequences neural networks. Neural Information
defined by linear equations that are not homoge- Processing Systems, NIPS 11, 1988.
neous. This saves a few arithmetic operations
d but
does not add any new functions, for if i=0 βi ar+i = [SBW02] H. Schweitzer, W. Bell, and F. Wu.
Very fast template matching. In ECCV,
d−1 two consecutive sums gives β0 ar −
c then subtracting
pages 358–372, 2002.
βd ar+d+1 + i=1 (βi+1 − βi)ar+i = 0 a LHS of or-
der d + 1. [Smi97] S. W. Smith. The Scientist and Engi-
• Of course the algorithm can be used succes- neer’s Guide to Digital Signal Process-
sively on rows then on columns for convolving 2-d1 , ing. California Technical Publishing,
1 or any dimension http://www.dspguide.com/, 1997.

You might also like