Efficient polynomial L∞ -approximations
Nicolas Brisebarre
                               LaMUSE, Université J. Monnet
                   23, rue du Dr P. Michelon, 42023 St-Étienne Cedex 02
          and Projet Arénaire, LIP, 46 allée d’Italie, 69364 Lyon Cedex 07, France
                               Nicolas.Brisebarre@ens-lyon.fr
                                       Sylvain Chevillard
                        LIP (CNRS/ENS Lyon/INRIA/Univ. Lyon 1),
               Projet Arénaire, 46 allée d’Italie, 69364 Lyon Cedex 07, France
                                Sylvain.Chevillard@ens-lyon.fr
                    Abstract                             problem of getting for a given continuous real-
                                                         valued function f , a real interval [a, b], and a de-
   We address the problem of computing a good            gree n ∈ N, the polynomial p ∈ Rn [X] that ap-
floating-point-coefficient polynomial approxima-         proximates f the best way, where Rn [X] is the
tion to a function, with respect to the supremum         R-vector space of the polynomials with real co-
norm. This is a key step in most processes of eval-      efficients and degree less than or equal to n. The
uation of a function. We present a fast and efficient    optimal polynomial depends on the way used to
method, based on lattice basis reduction, that of-       measure the quality of the approximation. The
ten gives the best polynomial possible and most of       most usual choices are the supremum norm (or L∞
the time returns a very good approximation.              norm or absolute error)
   Keywords: Efficient polynomial approxima-                  ||p − f ||∞,[a,b] = sup |p(x) − f (x)|,
                                                                                          a≤x≤b
tion, floating-point arithmetic, absolute error, L∞
norm, lattice basis reduction, closest vector prob-      or the relative error
lem, LLL algorithm.
                                                                                              1
                                                           ||p − f ||rel,[a,b] = sup               |p(x) − f (x)|
                                                                                     a≤x≤b |f (x)|
1. Introduction
                                                         or least squares approximations norm
   To evaluate a mathematical function on a com-                                                             1/2
                                                                                     b
puter, in software or hardware, one frequently re-                                                         2
                                                         ||p−f ||2,[a,b] =               w(x) (f (x) − p(x)) dx       ,
places the function with good polynomial approx-                                 a
imations of it. This is due to the fact that floating-
point (FP) addition, subtraction and multiplica-         where w is a continuous weight function.
tion are carefully and efficiently implemented on            The method proposed in this paper aims at min-
modern processors and also because one may take          imizing the absolute error between f and polyno-
advantage of existing efficient schemes of poly-         mials with FP coefficients. In practice, people are
nomial evaluation. Such polynomial approxima-            mostly interested in minimizing the relative error.
tions are used in several recent elementary func-        This second problem is in general more difficult
tion evaluation algorithms [15, 17, 20, 6].              than the first one (even when searching real coef-
   The first natural goal for someone who tries to       ficient polynomials) and we are currently working
evaluate a function f is thus to obtain a polynomial     on this issue. But we can remark that there are
p which is sufficiently close to f for the approxi-      numerous situations where the two problems do
mation required by the application.                      not differ so much. For example, there are a lot
   Hence, we are naturally led to consider the           of applications where the domain of definition of
the function is firstly cut into small intervals where   obtain the polynomial
the order of magnitude of the function is constant.
                                                                           6369051672525773
In these cases, the absolute and relative errors be-                  P =
come almost proportional. Hence minimizing the                             4503599627370496
absolute error instead of the relative one is in gen-      884279719003555       6121026514868073 2
eral quite satisfying.                                   +                    X+                    X
                                                           281474976710656       2251799813685248
    From now on, we will focus on the supre-
mum norm that we shall write || · ||[a,b] instead        and we have f − P ∞  2.70622 · 10−15 . But
of || · ||∞,[a,b] in the sequel. Among impor-            the best degree-2 polynomial with double preci-
tant theoretical works by various mathematicians         sion coefficients is
(like Bernstein, Weierstrass, Lagrange for ex-                               6369051672525769
ample), the work of Chebyshev in this area of                        P∗ =
                                                                             4503599627370496
function L∞ -approximation, is especially remark-
able. He showed in particular that when p runs             3537118876014221          6121026514868073 2
                                                         +                       X+                      X
among Rn [X], ||f − p||∞ reaches a minimum at              1125899906842624          2251799813685248
a unique polynomial and gave a very precise char-        for which f − P ∗ ∞  2.2243 · 10−16 . There-
acterization of this best polynomial approximant         fore, there is a factor bigger than 10 between the
(see [5] for example). From that characterization,       optimum and the rounded polynomial.
Remez [19] designed an algorithm that makes it              In 2005, Brisebarre, Muller and Tisserand pro-
possible to compute the best L∞ -polynomial ap-          posed in [3] a first general method for tackling
proximation, also called minimax approximation           this problem. Given m0 , m1 , . . . , mn a fixed fi-
(see [5] for a proof of the algorithm) in a fairly       nite sequence of natural integers, they search for
short time (see [24] for a proof of its quadraticity).   the polynomials of the form
Therefore, we see that the general situation for L∞                a0     a1            an
approximation by real polynomials can be consid-         p(x) =     m
                                                                        + m1 x+ · · · + mn xn with ai ∈ Z
                                                                  2   0  2             2
ered quite satisfying. The problem for the scientist
that implements in software or hardware such ap-         that minimize ||f − p||∞ . The approach given
proximations is that he uses finite-precision arith-     in [3] consists in, first, constructing with care
metic and unfortunately, most of the time, the min-      a certain rational polytope containing all the
imax approximation given by Chebyshev’s theo-            (a0 , . . . , an ) ∈ Zn+1 solution and as few as possi-
rem and computed by Remez’ algorithm has co-             ble other elements of Zn+1 and, in a second time,
efficients which are transcendental (or at least ir-     efficiently scanning all the points with integer co-
rational) numbers, hence not exactly representable       ordinates that lie inside this polytope.
with a finite number of bits.                                This approach, that currently makes it possible
    Thus, the coefficients of the approximation usu-     to obtain polynomials of degree up to 10, has two
ally need to be rounded according to the require-        major drawbacks. First of all, its complexity is not
ments of the application targeted (for example, in       precisely estimated but it might be exponential in
current software implementations, one often uses         the worst case, since the scanning of the integer
FP numbers in IEEE single or double precision for        points of the polytope is done using linear ratio-
storing the coefficients of the polynomial approxi-      nal programming. The other problem is that the
mation). But this rounding, if carelessly done, may      efficiency of the method relies on the relevance
lead to an important loss of accuracy. For instance,     of the estimation beforehand of the optimal error
if we choose to round to the nearest each coeffi-        f − p∞ . If this error is overestimated, then the
cient of the minimax approximation to the required       polytope may contain too many integer points and
format (this yields a polynomial that we will call       the scanning step may become intractable. If this
rounded minimax in the sequel of the paper), the         error is underestimated, then the polytope contains
quality of the approximation we get can be very          no solution. Hence, we were led to design a tool
poor. To illustrate that problem, let us look at the     that could give us a better insight of the value of the
following simple example. √ We want to approach          optimal error, in order to speed up the method of
the function f : x → 2 + πx + ex2 on the in-            [3]. To do so, we developed a new approach based
terval [2, 4] by a degree-2 polynomial with IEEE         on Lattice Basis Reduction and in particular on the
double precision FP coefficients. The minimax ap-        LLL algorithm. But, indeed, that tool proved to be
proximation is the function f itself. If we round        far more useful than expected: it gives most of the
the coefficients of f to the closest double, we          time an excellent approximant (if not the best) and
this is done quickly and at low memory cost. The                   in [8] that approximating CVP,   in the euclidean
goal of that paper is to present this new approach.                norm case (CVP2 ), to a factor d/ log d is not
    The outline of the paper is the following. In the              NP-hard. Regarding to the infinity norm (CVP∞ ),
second section, we recall basic facts about lattices,              they also show that approximating CVP to a fac-
about the closest vector problem (CVP), an algo-                   tor d/ log d is not NP-hard2. Unfortunately, no
rithmic problem related to lattices and that appears               polynomial algorithm is known for approximating
naturally in our approach, and some algorithm that                 CVP to a polynomial factor.
solves an approximated version of the CVP. Then                       Though, if we relax the constraint on the factor,
we present our new approach in Section 3 and give                  the situation becomes far better.
some worked examples in Section 4, before giving                      In 1982, Lenstra, Lenstra and Lovász [13] gave
a brief conclusion in the last section.                            an algorithm that allows one to get “fairly” short
                                                                   vectors of the lattice in polynomial time. Among
2. A reminder on Lattice Basis Reduc-                              many important applications of that algorithm,
                                                                   Babai [2] proposed a polynomial algorithm for
    tion and the LLL Algorithm
                                                                   solving CVP with an exponential approximation
                                                                   factor. For x ∈ R, we denote x the integer near-
    Let x = (x1 , . . . , x ) ∈ R .          We set              est to x, defined by x = x + 1/2 .
||x||2 = (x|x)1/2 = (x21 + · · · + x2 )1/2         and
||x||∞ = max1≤i≤ |xi |.
    A lattice is a discrete algebraic object that is en-              Algorithm: ApproximatedCVP
countered in several domains of various sciences,                    Data: An LLL-reduced basis (bi )1≤i≤d ; its
such as Mathematics, Computer Science or Chem-                             Gram-Schmidt orthogonalization
istry. It is a rich and powerful modelling tool                            (b∗i )1≤i≤d ; a vector − →v
thanks to the deep and numerous theoretical, al-                     Result: An approximation to the factor 2d/2
gorithmic or implementation available works (see                             of the CVP2 of −  →v
[4, 9, 14, 22] for example). The lattice structure is                begin
                                                                        → −
                                                                        −
ubiquitous in our approach.                                              t =→    v ;
                                                                        for (j = d ; j ≥ 1 ; j- -)   do
Definition 2.1 Let L be a nonempty subset of R ,                                           →−
                                                                                            −     →∗
                                                                            → −
                                                                            −        →      t , bj  − →
L is a lattice if and only if there exists a set of vec-                     t = t − −      → ∗ − →∗
                                                                                                         bj ;
                                                                                            bj , bj 
tors b1 , . . . , bd R-linearly independent such that
                                                                        end
                                                                                       →
                                                                                       −
   L = {λ1 b1 + · · · + λd bd : λ1 , . . . , λd ∈ Z} .                  return −  →
                                                                                  v − t ;
                                                                     end
The family (b1 , . . . , bd ) is a basis of the lattice L           Algorithm 1: Babai’s nearest Plane algorithm
and d is called the rank of the lattice L.
   In the sequel, we’ll have to deal with the fol-                 Remark 2.3 In practice, the result of Babai’s al-
lowing algorithmic problem.
                                                                   gorithm is of better quality and given faster than
Problem 2.2 Closest vector problem (CVP) Let                       expected.
|| · || be a norm on R . Given a basis of a lat-                     For a practical exact CVP, see [12] in which the
tice L ⊂ Q of rank k, k ≤ , and x ∈ R , find                    given algorithm is super-exponential. This com-
y ∈ L s.t. ||x − y|| = dist(x, L), where dist(x, L)                plexity has been recently improved in [10].
denotes min{||x − y||, y ∈ L}.
    Associated approximation problem: find y ∈ L
s.t. ||x − y|| ≤ γ dist(x, L) where γ ∈ R is fixed.                3. Our approach
   Let us recall some of the complexity results                       In many applications, scientist and engineers
known about this problem1. In 1981, van Emde                       desire to find the best (or very good) polynomial
Boas [23] proved that CVP is NP-hard (see also                     approximation among those that have FP coeffi-
[16]). On the other hand, Goldreich and Gold-                      cients and this is the problem we address in that
wasser showed (but their proof is nonconstructive)                 paper. First, let us make the following important
    1 A presentation of different algorithms for solving the CVP       2 As it is noted in [18], it seems that the CVP problem when
is given in [1].                                                   the norm is the infinity norm is more difficult.
remark: to solve that problem, it is sufficient to be      The first idea is to discretize the continuous
able to solve the following problem:                    problem: let  ≥ n+1, let x1 , · · · , x in [a, b], we
                                                        want 2am00 + 2am11 xi + · · · + 2amnn xni to be as close
Problem 3.1 Let f be a continuous real-valued
                                                        as possible to f (xi ) for all i = 1, . . . , . That is to
function defined on a given real interval [a, b], let
                                                        say we want the vectors
n ∈ N, (mi )0≤i≤n a finite sequence of rational
integers, give one polynomial of the form                a0                                                         
                                                            2m0 + 2m1 x1 + · · · + 2mn x1
                                                                    a1                 an n
                                                                                                             f (x1 )
              a0     a1             an                   am00 + am11 x2 + · · · + amnn xn2               f (x2 )    
      p =         + m1 X + · · · + mn X n               2        2                  2                              
                                                                           ..                    and          ..    
             2 m 0  2              2                                        .                                  .    
where the ai ∈ Z, that minimizes (or at least makes          a0
                                                            2m0   +    a1
                                                                      2m1   x + · · · +    an
                                                                                           2mn   xn         f (x )
very small) f − p[a, b] .                                                                                       
                                                                                                               →
                                                                                                               −y
    The problem we address can be reduced to this
one by the following heuristic.                         to be as close as possible, with respect to the
    Let’s denote by P the best polynomial with FP       || · ||∞ norm. This can be rewritten as: we want
coefficients and by R the minimax. For the sake         the vectors
of simplicity, we will assume here that all the co-         1            x1                   xn               
                                                                                                       1
                                                               2m0            2m1                   2mnn
efficients of P have the same precision t (but the
                                                            m0 
                                                                1          m1 
                                                                               x                    x              
                                                            2            2
                                                                                 2
                                                                                                 2m2n             
arguments would be the same with several differ-        a0  .  +a1  .  + · · · + an          ..
                                                                                                                    
                                                                                                                    
ent precisions for the coefficients). In many cases         ..           ..                   .                
(in particular if t is big enough), the order of mag-           1                    x                       xn                                                                                                               
                                                               2m0                  2m1
nitude of the i-th coefficient of P is the same as                                                  2mn                                                                                                                     
                                                                →
                                                                −                    →
                                                                                     −
the corresponding coefficient of R. Hence, we can               v0                   v1                       v−
                                                                                                               →n
suppose, initially, that the exponents of these coef-
ficients are the same. From Remez’ algorithm, we        and − →
                                                              y to be as close as possible. Hence, we
know the i-th coefficient of R and thus its exponent    have to find (a0 , . . . , an ) ∈ Zn+1 that minimize
e. Since the i-th coefficient of P is a FP number       ||a0 −
                                                             →
                                                             v0 + · · · +an −
                                                                            v→     →
                                                                                   −
                                                                             n − y ||∞ : this is an instance of
with precision t and since we may suppose that its      CVP∞ .
exponent is e, we can write it 1.u1 . . . ut−1 · 2e        Two problems arise at this moment: first, how
where 1u1 . . . ut−1 is the unknown binary man-         to choose the points xi and then how do we deal
tissa of the coefficient. This can be rewritten         with the CVP∞ associated?
1u1 ...ut−1
  2t−1−e and we can set mi = t − 1 − e.
    However the guessed exponent may slightly           3.1. Choice of the points xi
differ from the real optimal one. In that case,
the computed coefficient of P will be of the form           This is a critical step in our approach. We want
ai /2mi but where ai needs more than t bits to be       that a small value of a0 − →
                                                                                    v0 + · · · + an −
                                                                                                    v→  →
                                                                                                        −
                                                                                                     n − y ∞
written. Our heuristic is then to use this computed     means a small value of the supremum norm
coefficient as a new insight for the order of magni-    ||p − f ||[a,b] .        More precisely, requiring
tude of the optimal one. We note e its exponent,       a0 −
                                                            →v0 + · · · + an −
                                                                             v→    →
                                                                                   −
                                                                              n − y ∞ to be small can
and we set mi to t − 1 − e and we try again until      be viewed as an approximate interpolation prob-
we reach a fixed point.                                 lem and it is well known that, if the points xi are
    This is just a heuristic and we have no proof       carelessly chosen, one may find a polynomial that
of convergence for that process. However, in ev-        coincides with the function on the points xi but
ery practical case we met, this procedure was ob-       is pretty far from it on the whole [a, b] (one may
served to converge in at most two or three steps. In    consider the classical Runge’s example [7, Chap.
particular, we (sometimes, but rarely) encountered      2] for instance).
situations where the initial assumption that the co-        Our choice of the points relies on the observa-
efficients of P and R have the same order of mag-       tion that when the imposed precision of the coef-
nitude were completely wrong. But in those cases,       ficients is big enough (i.e. with mi big enough,
this heuristic also converged in two or three steps     which is the case with double FP coefficients for
and let us find a polynomial with FP coefficients       instance), the good polynomial approximants to
which was really good.                                  the function will be close to the minimax approx-
    Now that we reduced our general problem to          imation. Hence, we generally choose the points
Problem 3.1, we start explaining how we solve it.       where f and the degree-n minimax polynomial R
are the closest possible, i.e. when f − R cancels.          explore the neighborhood of v, with the following
The following classical result (see [5, Chap. 3] for        (convergent) method:
a more general statement) tells us that there are at
least n + 1 such points.                                      1. Let w denote successively the 2n + 2 follow-
                                                                 ing vectors: v + ε0 , v − ε0 , . . . , v + εn , v − εn .
Theorem 3.2 (Chebyshev) A polynomial p ∈
Rn [X] is the best approximation of f in [a, b] if            2. If there is no w such that
and only if there exist a ≤ y0 < y1 < · · · < yn+1
                                                                             ||w − y||∞ < ||v − y||∞ ,
≤ b such that
  • ∀i ∈ {0, · · · , n + 1}, |f (yi ) − p(yi )| = f −           we may think that the quality of v is quite
    p∞ ,                                                        good: return v.
  • ∀i ∈ {0, · · · , n}, f (yi+1 ) − p(yi+1 ) =               3. Else, denote by wopt one w such that
    −(f (yi ) − p(yi )).
                                                                               ||w − y||∞ is minimal.
Another possible choice are the Chebyshev points
[7, Chap. 2]. They are known to be good choices in               Set v := wopt and goto 1.
some approximation problems (for example, they
are the starting points in Remez’ algorithm). They          4. Examples
proved to be a pretty valuable choice in the experi-
ments we made, especially when the size of the mi
                                                                We will now present two examples that illus-
is low.
                                                            trate the behavior of our algorithm in real situa-
                                                            tions. Our first example shows the possible ben-
3.2. Solving of the CVP∞                                    efit that may come from the use of our method
                                                            in the implementation of a function for the Intel
    Kannan’s algorithm [12] makes it possible to            Itanium. The second example well illustrates the
solve either CVP2 or CVP∞ but its complexity is             gap between an (almost) optimal polynomial and
super-exponential. Since the euclidean and infinity         the rounded minimax. This example comes from
norms have the same order of magnitude     √ (remem-        the implementation of the function arcsin in the
ber that, in R , || · ||∞ ≤ || · ||2 ≤  || · ||∞ ),       library CRlibm [21].
we preferred to use Babai’s algorithm that solves               To implement our method, we have used Maple
CVP2 with, in theory, an exponential approxima-             and GP3 . Maple lets us compute the minimax
tion factor but whose performance, in time, space           polynomial R (Remez’ algorithm is available in
and quality of the result, is directly related to LLL’s     Maple with the function minimax of the pack-
one, hence very good in practice for n, say, not            age numapprox) which is used to determine the
greater than 50.                                            points xi by solving the equation R(x) = f (x).
    We may then consider the approximation of               Then, we use GP to compute a LLL-reduced basis
CVP2 provided by Babai’s algorithm as the ap-               and to solve the approximated CVP2 by Babai’s
proximation of CVP∞ searched. But we can also               nearest-plane algorithm. We thus get a polyno-
refine that result in the following way. We can             mial P and we finally use Maple to compute ||P −
use the LLL-reduced basis to explore the neighbor-          f ||[a,b] (with Maple’s numapprox[infnorm])
hood of the computed approximated CVP2 . Let’s              and compare it to ||R − f ||[a,b] .
denote by v the vector given by Babai’s nearest                 Applying our method to an Itanium imple-
plane algorithm and denote by (ε0 , · · · , εn ) the        mentation of a function
LLL-reduced basis. LLL gives pretty short vectors               The Intel Itanium processor uses a particular
in the lattice: for small values of n (say up to 50),       FP format called extended double: it is a FP
ε0 is often the shortest nonzero vector in the lattice      format with a 64 bit mantissa. An extended
(in norm || · ||2 ) and the other vectors of√the basis      double allows more accurate computations, but
are also short. Since || · ||∞ ≤ || · ||2 ≤ || · ||∞       this gain in accuracy has a cost: extended
in R , they are also pretty short vectors for || · ||∞ .   doubles must be loaded in cache one after the
If v is not the exact CVP∞ , it is maybe not so far            3 GP is an interactive calculator for number theory al-
from it: the exact CVP∞ is probably of the form             gorithms.    In particular, it implements LLL algorithm.
v + a0 ε0 + · · · + an εn where the ai are rational         It is distibuted under GPL at http://pari.math.
integers with small absolute value. We can thus             u-bordeaux.fr/
other, whereas two regular doubles may be                ficients given a target accuracy. This can be also
loaded at the same time. Moreover, the latency           very interesting in hardware applications where
of such an operation is 6 cycles (when a multipli-       each single bit may count.
cation costs 4 cycles). Thus the loading time is             Another possibility offered by our method is
quite critical in such a processor and it is very in-    to improve the accuracy provided by the poly-
teresting to replace an extended double with             nomial approximation (compared to the rounded
a double when the accuracy doesn’t require it.           minimax for example) while keeping the same pre-
See [6] and [11] for more information about the          cision for the coefficients. Our second example
Itanium architecture.                                    comes from the implementation of the function
    Intel developed and included in the glibc a          arcsin in CRlibm. CRlibm is a mathematical li-
mathematical library optimized for the Itanium.          brary which provides correct rounding: the value
Let us consider an example inspired by the imple-        returned by CRlibm when evaluating a function is
mentation of the function erf in this library.
                                           x The        the exact mathematical value rounded to the near-
                                        2          2     est double. To achieve this goal, the develop-
function erf is defined by erf(x) = √          e−t dt
                                         π 0             ers of CRlibm must use a bigger precision than
for all x ∈ R. The domain is cut into several            the standard double precision provided by the pro-
intervals where the function can be approximated         cessor. They use some special formats such as
by polynomials with reasonable degree (say less          double-double and triple-double which are un-
than 30). This example deals with erf on the in-         evaluated sums of two or three regular doubles.
terval [1; 2]. The goal is to provide a polynomial,      In particular, they approximate functions by poly-
with extended and/or regular double co-                  nomials with double-double and triple-double
efficients, which approximates the function with a       coefficients (and also regular doubles).
relative error bounded by 2−64 . The domain is               Basically, a double-double gives the same
first translated to the interval [0, 1]. The problem     precision as a FP number with 106 bits man-
is thus reduced to the following: find a polyno-         tissa.       However, a number of the form
mial p(x) = a0 + a1 x + · · · + an xn such that          1.x1 x2 . . . x52 000000001y1y2 . . . y52 · 2e is repre-
||p(x) − erf(x + 1)||rel,[0,1] < 2−64 .                  sentable by the double-double (1.x1 x2 . . . x52 ·
    The minimax of degree 18 gives an error of           2e + 1.y1 y2 . . . y52 · 2e−8−53 ) and is not repre-
2−61.19 . A fortiori, a polynomial of degree 18          sentable in a 106 bits format. This limitation is not
with FP coefficients can’t provide the required ac-      a problem in general: we can often prove a priori
curacy. The minimax of degree 19 gives an error          that to achieve a given precision, the coefficients
of 2−66.92 . Can we find a satisfying polynomial of      of a polynomial are so constrained that the 53 first
degree 19 ?                                              bits are fixed. Thus, one of the two doubles of
    A common strategy consists, when 0 belongs           the double-double is known and we are back to
to the definition interval, in using a bigger preci-     the problem of searching a regular double (that
sion for the first coefficients and a smaller preci-     is a FP number with at most 53 bits). At worst, we
sion for the last ones. Setting the first 8 coeffi-      can’t prove that the first 53 bits are fixed and we
cients to be extended doubles and the other              just search for numbers with 106 bits: the resulting
to be doubles, and rounding each coefficient of          number is representable with a double-double (it
the minimax to the nearest in the corresponding          may just be nonoptimal).
format, we obtain an error of 2−61.13 : it is not suf-       We focus here on the approximation of arcsin
ficient. However, using 9 extended doubles               on the interval [0.79; 1]. It is a real-life example
and using the same classical procedure, we obtain        we worked on with C. Lauter (who is one of the
an error of 2−64.74 which is enough.                     developers of CRlibm). After an argument reduc-
    When we use our method, we obtain a poly-            tion we obtain the problem to approximate
nomial with an error of 2−64.74 and only two
extended doubles. We saved 7 extended                                     arcsin(1 − (z + m)) −       π
                                                                                                      2
                                                                 g(z) =         
doubles thanks to our method. The computing                                       2 · (z + m)
time is the time necessary to Maple for computing
minimax approximation since the other steps are          where −0.110  z  0.110 and m  0.110.
instantaneously performed.                               The developers know from previous computations
    Second example: an example from CRlibm               that they need an accuracy of more than 119 bits
    The first example showed that our method             to achieve correct rounding. The minimax of
makes it possible to reduce the size of the coef-        degree 21 provides this accuracy (see Figure 1).
   Figure 1. binary logarithm of the ab-
   solute error of several approximants
          Target                      -119
          Minimax                   -119.83              Figure 2. Absolute error between g
          Rounded minimax           -103.31              and our polynomial
          Our polynomial            -119.77
                                                       1e-36
                                                       8e-37
Our method gives a satisfying polynomial with          6e-37
two triple-doubles, eight double-doubles and
                                                       4e-37
twelve regular doubles (here again, the comput-
                                                       2e-37
ing time is the time needed by Remez algorithm
for computing the minimax approximation).                  0
   More precisely, we searched for a polynomial        -2e-37
of the form                                            -4e-37
     p0 + p1 x + p2 x2 + 
                          · · · + p9 x9
                                                       -6e-37
                                       -8e-37
    173     159         106         ···    106
                                                       -1e-36
                                                                -0.1   -0.05   0    0.05      0.1
                   10                     21
          + p10 x          · · · + p21 x
                        +                               
              53              ···   53
where the pi are FP numbers whose mantissa has
the corresponding size in bits. Note that the first
coefficient is actually searched with the method
described above: we prove that the first 53 bits are
fixed and we then search for a FP number with 106
bits for the two remaining doubles. The number
173 indicated under p0 only means that the 159
bits of this triple-double are nonconsecutive.
   The accuracy provided by our polynomial is
very close to the one given by the minimax. How-         Figure 3. Absolute error between g
ever, without our method, we would have to use           and rounded minimax
the rounded minimax which gives only 103 bits of
accuracy: it would not be enough to provide cor-       8e-32
rect rounding. This example illustrates the gap be-
                                                       7e-32
tween the rounded minimax and the optimal poly-
nomial with FP coefficients. Here, 16 precious bits    6e-32
are lost by the rounding of minimax’ coefficients.     5e-32
This loss can be seen very well if the absolute er-    4e-32
rors are plotted: Theorem 3.2 indicates that a poly-
                                                       3e-32
nomial is optimal if and only if the absolute error
oscillates n + 2 times between its extrema. Our        2e-32
polynomial almost satisfies this theorem (see Fig-     1e-32
ure 2). On the other hand, one can see on Figure           0
3 that the rounded minimax does not satisfy this
                                                       -1e-32
theorem at all.                                                 -0.1   -0.05   0    0.05      0.1
5. Conclusion
    We have presented a new method for computing
efficient polynomial approximation with machine-
number coefficients and in particular FP coeffi-                  ory of Computing (STOC), pages 1–9. ACM, May
cients. It improves the results provided by exist-                1998.
ing Remez’ based method. The method, based on               [9]   P. M. Gruber and C. G. Lekkerkerker. Geometry
lattice basis reduction, is much faster and more ef-              of numbers, volume 37 of North-Holland Mathe-
                                                                  matical Library. North-Holland Publishing Co.,
ficient than the one given in [3] and gives a very
                                                                  Amsterdam, second edition, 1987.
good approximant. Moreover, it may considerably            [10]   G. Hanrot and D. Stehlé. Tighter analysis of Kan-
speed up the polytope-based method of [3] that                    nan’s enumeration algorithm. 2007. Submitted.
gives the best polynomial possible, by providing           [11]   Intel corporation. Intel Itanium 2 processor ref-
a relevant (indeed often a very good) estimate of                 erence manual for software development and op-
the L∞ approximation error.                                       timization. Technical Report 251110-003, Intel,
    This method can be adapted to several kind of                 2004.
coefficients: fixed-point format, multi-double or          [12]   R. Kannan. Minkowski’s convex body theorem
classical floating point arithmetic with several pre-             and integer programming. Math. Oper. Res.,
                                                                  12(3):415–440, 1987.
cision formats.
                                                           [13]   A. K. Lenstra, H. W. Lenstra, and L. Lovász.
    For some applications, one may want to set the                Factoring polynomials with rational coefficients.
value of certain coefficients of the approximant.                 Math. Annalen, 261:515–534, 1982.
For example, one may search for approximants               [14]   L. Lovász. An algorithmic theory of numbers,
whose first terms match the corresponding one in                  graphs and convexity, volume 50 of CBMS-NSF
the Taylor expansion of the approximated func-                    Regional Conference Series in Applied Mathemat-
tion. We should be able to deal with this additional              ics. Society for Industrial and Applied Mathemat-
constraint in a close future.                                     ics (SIAM), Philadelphia, PA, 1986.
    We are currently working on an adaptation of           [15]   P. Markstein. IA-64 and Elementary Functions :
                                                                  Speed and Precision. Hewlett-Packard Profes-
that method to the problem of minimizing the rel-
                                                                  sional Books. Prentice Hall, 2000.
ative error and we plan to extend it to the problem        [16]   D. Micciancio. The hardness of the closest vector
of approximation by sums of cosines that arise in                 problem with preprocessing. IEEE Trans. Inform.
the field of Signal Processing and more specially                 Theory, 47(3):1212–1215, 2001.
in FIR filters.                                            [17]   J.-M. Muller. Elementary Functions, Algorithms
                                                                  and Implementation. Birkhäuser, Boston, 1997.
                                                           [18]   P. Q. Nguyen and J. Stern. The two faces of lat-
References                                                        tices in cryptology. In Proceedings of CALC ’01,
                                                                  Lecture Notes in Computer Science, volume 2146,
 [1] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger.              pages 146–180. Springer Verlag, 2001.
     Closest point search in lattices. IEEE Transactions   [19]   E. Remes.          Sur un procédé convergent
     on Information Theory, 48(8):2201–2214, Aug.                 d’approximations successives pour déterminer
     2002.                                                        les polynômes d’approximation. C.R. Acad. Sci.
 [2] L. Babai. On Lovász’ lattice reduction and the              Paris, 198:2063–2065, 1934.
     nearest lattice point problem. Combinatorica,         [20]   S. Story and P. T. P. Tang. New algorithms for
     6(1):1–13, 1986.                                             improved transcendental functions on IA-64. In
 [3] N. Brisebarre, J.-M. Muller, and A. Tisserand.               Koren and Kornerup, editors, Proceedings of the
     Computing machine-efficient polynomial approx-               14th IEEE Symposium on Computer Arithmetic,
     imations. ACM Transactions on Mathematical                   pages 4–11, Los Alamitos, CA, Apr. 1999. IEEE
     Software, 32(2), June 2006.                                  Computer Society Press.
 [4] J. W. S. Cassels. An introduction to the geometry     [21]   The Arénaire Project.         CRlibm, Correctly
     of numbers. Classics in Mathematics. Springer-               Rounded mathematical library, July 2006.
     Verlag, Berlin, 1997. Corrected reprint of the 1971          http://lipforge.ens-lyon.fr/www/
     edition.                                                     crlibm/.
 [5] E. W. Cheney. Introduction to approximation the-      [22]   V. Shoup. NTL, a library for doing number the-
     ory. AMS Chelsea Publishing, second edition,                 ory, version 5.4. http://shoup.net/ntl/,
     1982.                                                        2005.
 [6] M. Cornea, J. Harrison, and P. T. P. Tang. Scien-     [23]   P. van Emde Boas. Another NP-complete problem
     tific Computing on Itanium-Based Systems. Intel              and the complexity of computing short vectors in
     Press, 2002.                                                 a lattice. Technical Report 81-04, Mathematische
 [7] W. Gautschi. Numerical analysis. Birkhäuser                 Instituut, University of Amsterdam, 1981.
     Boston Inc., Boston, MA, 1997. An introduction.       [24]   L. Veidinger. On the numerical determination of
 [8] O. Goldreich and S. Goldwasser. On the limits of             the best approximations in the Chebyshev sense.
     non-approximability of lattice problems. In Pro-             Numerische Mathematik, 2:99–105, 1960.
     ceedings of 30th Annual ACM Symposium on The-