0% found this document useful (0 votes)
39 views25 pages

BV21

Uploaded by

nomialsCry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views25 pages

BV21

Uploaded by

nomialsCry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

THE CONDITION NUMBER OF

RIEMANNIAN APPROXIMATION PROBLEMS∗


PAUL BREIDING† AND NICK VANNIEUWENHOVEN‡

Abstract. We consider the local sensitivity of least-squares formulations of inverse problems.


The sets of inputs and outputs of these problems are assumed to have the structures of Riemann-
ian manifolds. The problems we consider include the approximation problem of finding the nearest
point on a Riemannian embedded submanifold from a given point in the ambient space. We char-
acterize the first-order sensitivity, i.e., condition number, of local minimizers and critical points to
arbitrary perturbations of the input of the least-squares problem. This condition number involves
the Weingarten map of the input manifold, which measures the amount by which the input manifold
curves in its ambient space. We validate our main results through experiments with the n-camera
triangulation problem in computer vision.
arXiv:1909.12186v4 [math.NA] 7 Dec 2020

Key words. Riemannian least-squares problem, sensitivity, condition number, local minimizers,
Weingarten map, second fundamental form

AMS subject classifications. 90C31, 53A55, 65H10, 65J05, 65D18, 65D19

1. Introduction. A prototypical instance of the Riemannian optimization prob-


lems we consider here arises from the following inverse problem (IP): Given an input
x ∈ I, compute y ∈ O such that F (y) = x. Herein, we are given a system modeled
as a smooth map F between a Riemannian manifold O, the output manifold, and a
Riemannian embedded submanifold I ⊂ Rn , the input manifold. In practice, however,
we are often given a point a ∈ Rn in the ambient space rather than x ∈ I, for example
because a was obtained from noisy or inaccurate physical measurements. This leads
to the Riemannian least squares optimization problem associated to the foregoing IP:
1
(PIP) argmin kF (y) − ak2 .
y∈O 2

In [15] we called this type of problem a parameter identification problem (PIP). Ap-
plication areas where PIPs originate are data completion with low-rank matrix and
tensor decompositions [16, 23, 38, 44, 60, 63], geometric modeling [18, 48, 61], computer
vision [32, 37, 50], and phase retrieval problems in signal processing [9, 12].
The manifold O is called the output manifold because its elements are the outputs
of the optimization problem. The outputs are the quantities of interest; for instance,
they could be the unknown system parameters that one wants to identify based on a
given observation a and the forward description of the system F . In order to distin-
guish between the inputs x ∈ I of the IP and the inputs a ∈ Rn of the optimization
formulation (PIP), we refer to the latter as ambient inputs. In this paper, we study
the sensitivity of the output of the optimization problem (PIP) (and an implicit gen-
eralization thereof) when the ambient input a ∈ Rn is perturbed by considering its
condition number κ at global and local minimizers and critical points.
1.1. The condition number. Condition numbers are one of the cornerstones in
numerical analysis [7,17,26,36,40,68,69]. Its definition was introduced for Riemannian
manifolds already in the early days of numerical analysis. In 1966, Rice [58] presented
the definition in the case where input and output space are general nonlinear metric
spaces; the special case of Riemannian manifolds is [58, Theorem 3]. Let I and O be

∗ Submitted to the editors.


Funding: P. Breiding has received funding from the European Research Council (ERC) un-
der the European Union’s Horizon 2020 research and innovation programme (grant agreement
No 787840). N. Vannieuwenhoven was supported by a Postdoctoral Fellowship of the Research
Foundation—Flanders (FWO) with project 12E8119N.
† Institute of Mathematics, TU Berlin, Berlin, Germany. (breiding@math.tu-berlin.de)
‡ KU Leuven, Department of Computer Science, Leuven, Belgium.
(nick.vannieuwenhoven@kuleuven.be)
1
Riemannian manifolds and assume that the map f : I → O models a computational
problem. Then, Rice’s definition of the condition number of f at a point x ∈ I is
distO (f (x), f (x0 ))
(1.1) κ[f ](x) := lim sup ,
→0 x0 ∈I, distI (x, x0 )
distI (x,x0 )≤

where distI (·, ·) denotes the distance given by the Riemannian metric on I and likewise
for O. It follows immediately that the condition number yields an asymptotically
sharp upper bound on the perturbation of the output of a computational problem
when subjected to a perturbation of the input x. Indeed, we have
distO (f (x), f (x0 )) ≤ κ[f ](x) · distI (x, x0 ) + o (distI (x, x0 )) .
The number κ[f ](x) is an absolute condition number. On the other hand, if I is a
submanifold of a vector space equipped with the norm k · kI and analogously for O
and k · kO , then the relative condition number is κrel [f ](x) := κ[f ](x) kfkxk I
(x)kO . We
focus on computing the absolute condition number, as the relative condition number
can be inferred from the absolute one.
1.2. Condition number of approximation problems. Our starting point
is Rice’s classic definition (1.1), which is often easy to compute for differentiable
maps, rather than Demmel and Renegar’s inverse distance to ill-posedness, which is
generally more difficult to compute. Nevertheless, Rice’s definition cannot be applied
unreservedly to measure how sensitive a solution of (PIP) is to perturbations of the
ambient input a ∈ Rn . At first glance, it seems that we could apply (1.1) to
φ : Rn → O, a 7→ argmin kF (y) − ak2 .
y∈O

However, this is an ill-posed expression, as φ is not a map in general! Indeed, there


can be several global minimizers for an ambient input a ∈ Rn and their number
can depend on a. Even if we could restrict the domain so that φ becomes a map, the
foregoing formulation has two additional limitations. First, when (PIP) is a nonconvex
Riemannian optimization problem, it is unrealistic to expect that we can solve it
globally. Instead we generally find local minimizers whose sensitivity to perturbations
of the input is also of interest. For our theory to be practical, we need to consider the
condition number of every local minimizer separately. Exceeding this goal, we will
present the condition number of each critical point in this paper. Second, requiring
that the systems in (PIP) can be modeled by a smooth map F : O → I excludes
some interesting computational problems. For example, computing the eigenvalues of
a symmetric matrix cannot be modeled as a map from outputs to inputs: there are
infinitely many symmetric matrices with a prescribed set of eigenvalues.
The solution to foregoing complications consists of allowing multivalued or set-
valued maps g : I ⇒ O in which one input x ∈ I can produce a set of outputs
Y ⊂ O [5, 28]. Such maps are defined implicitly by the graph
G = {(x, y) ∈ I × O | y ∈ g(x)}.
In numerical analysis, Wilkinson [68] and Woźniakowski [70] realized that the con-
dition number of single-valued localizations or local maps of G could still be studied
using the standard techniques. Blum, Cucker, Shub, and Smale [10] developed a
geometric framework for studying the condition number of single-valued localizations
when the graph G is an algebraic subvariety. More recently, Bürgisser and Cucker [17]
considered condition numbers when the graph G is a smooth manifold. Because these
graphs are topologically severely restricted, the condition number at an input–output
pair (x, y) ∈ G can be computed efficiently using only linear algebra.
The heart of this paper consists of characterizing the condition number of critical
points of PIPs (and an implicit generalization thereof) by employing the geometric
approach to conditioning from [10,17]. Therefore, we summarize its main points next.
2
1.3. Implicit smooth computational problems. Following [10,17], we model
an inverse problem with input manifold I and output manifold O implicitly by a
solution manifold :
S ⊂ I × O.
Given an input x ∈ I, the problem is finding a corresponding output y ∈ O such
that (x, y) ∈ S. As S models a computational problem, we can consider its condition
number. Now it is defined at (x, y) ∈ S and not only at the input x, because the
sensitivity to perturbations in the output depends on which output is considered.
The condition number of (the computational problem modeled by) S is derived
in [10, 17]. Let πI : S → I and πO : S → O be the projections on the input
and output manifolds, respectively. Recall that [10, 17] assume that dim S = dim I,
because otherwise almost all inputs x have infinitely many outputs (dim S > dim I) or
no outputs (dim S < dim I). Furthermore, we will assume that πI is surjective, which
means that every input x ∈ I has at least one output. When the derivative1 D(x,y) πI
is invertible, the inverse function theorem [47, Theorem 4.5] implies that πI is locally
invertible. Hence, there is a local smooth solution map πO ◦πI−1 , which locally around
(x, y) makes the computational problem explicit. This local map has condition number
κ[πO ◦ πI−1 ](x) from (1.1). Moreover, because the map is differentiable, Rice gives an
expression for (1.1) in terms of the spectral norm of the differential in [58, Theorem
4]. In summary, [10, 17] conclude that the condition number of S at (x, y) ∈ S is
(
κ[πO ◦ πI−1 ](x) if D(x,y) πI is invertible,
κ[S](x, y) =
∞ otherwise
(1.2) = k(D(x,y) πO )(D(x,y) πI )−1 kI→O ,
where the latter spectral norm is defined below in (2.2). Note that if S is the graph
of a smooth map f : I → O, then κ[S](x, f (x)) = κ[f ](x).
Remark 1.1. By definition, (1.2) evaluates to ∞ if D(x,y) πI is not invertible.
The points in W := {(x, y) ∈ S | κ[S](x, y) < ∞} are called well-posed tuples.
Points in S\W are called ill-posed tuples. An infinitesimal perturbation in the input x
of an ill-posed tuple (x, y) can cause an arbitrary change in the output y. It is therefore
natural to assume that W 6= ∅, because otherwise the computational problem is always
ill-posed. If there exists (x, y) ∈ W, it follows from the inverse function theorem that
W is an open submanifold of S, so dim S = dim W. We also assume that W is dense.
Otherwise, since S is Hausdorff, around any data point outside of W there would be
an open neighborhood on which the computational problem is ill-posed; we believe
these input-output pairs should be removed from the definition of the problem.
In summary, we make the following assumption throughout this paper.
Assumption 1. The m-dimensional input manifold I is an embedded smooth sub-
manifold of Rn equipped with the Euclidean inner product inherited from Rn as Rie-
mannian metric. The output manifold O is a smooth Riemannian manifold. The
solution manifold S ⊂ I × O is a smoothly embedded m-dimensional submanifold.
The projection πI : S → I is surjective. The well-posed tuples W form an open dense
embedded submanifold of S.
This assumption is typically not too stringent. It is satisfied for example if S is the
graph of a smooth map f : I → O or F : O → I (as for IPs) by [47, Proposition 5.7].
Another example is when S is the smooth locus of an algebraic subvariety of I × O,
as most problems in [17]. Moreover, the condition of an input–output pair (x, y) is a
local property. Since every immersed submanifold is locally embedded [47, Proposition
5.22], taking a restriction of the computational problem S to a neighborhood of (x, y),
in its topology as immersed submanifold, yields an embedded submanifold. The
assumption and proposed theory always apply in this local sense.
1 See Section 2 for a definition.
3
1.4. Related work. The condition number (1.1) was computed for several com-
putational problems. Already in the 1960s, the condition numbers of solving linear
systems, eigenproblems, linear least-squares problems, and polynomial rootfinding,
among others were determined [17, 26, 36, 68, 69]. An astute observation by Dem-
mel [24, 25] in 1987 paved the way for the introduction of condition numbers in op-
timization. He realized that many condition numbers from linear algebra can be
interpreted as an inverse distance to the nearest ill-posed problem. This characteri-
zation generalizes to contexts where an explicit map as required in (1.1) is not avail-
able. Renegar [56, 57] exploited this observation to determine the condition number
of linear programming instances and he connected it to the complexity of interior-
point methods for solving them. This ignited further research into the sensitivity of
optimization problems, leading to (Renegar’s) condition numbers for linear program-
ming [19,56,57], conic optimization [8,20,31,53,65], and convex optimization [4,21,34]
among others. In many cases, the computational complexity of methods for solving
these problems can be linked to the condition number; see [22,30,35,56,57,64]. In our
earlier work [15], we connected the complexity of Riemannian Gauss–Newton meth-
ods for solving (PIP) to the condition number of the exact inverse problem F (y) = x.
We anticipate that the analysis from [15] could be refined to reveal the dependence
on the condition number of the optimization problem (PIP).
Beside Renegar’s condition number, other measures of sensitivity are studied in
optimization. The sensitivity of a set-valued map g : I ⇒ O at (x, y) with y ∈ g(x) is
often measured by the Lipschitz modulus [28, Chapter 3E]. This paper only studies
the sensitivity of a single-valued localization ĝ of the set-valued map g, in which case
the Lipschitz modulus K of g at (x, y) is a Lipschitz constant of ĝ [28, Proposition
3E.2]. Consequently, Rice’s condition number κ[ĝ](x), which we characterize in this
paper for Riemannian approximation problems, is always upper bounded by K; see,
e.g., [43, Proposition 6.3.10].
1.5. Main contributions. We characterize the condition number of comput-
ing global minimizers, local minimizers, and critical points of the following implicit
generalization of (PIP):
1
(IPIP) πO ◦ argmin ka − πI (x, y)k2 ,
(x,y)∈S 2
where a ∈ Rn is a given ambient input and the norm is the Euclidean norm. We call
this implicit formulation of (PIP) a Riemannian implicit PIP.
Our main contributions are Theorem 4.3, 5.2 and 6.2. The condition number of
critical points of the Riemannian approximation problem minx∈O ka − xk2 , is treated
in Theorem 4.3. Theorem 5.2 contains the condition number of (IPIP) at global
minimizers when a is sufficiently close to the input manifold. Finally, the most general
case is Theorem 6.2: it gives the condition number of (IPIP) at all critical points
including local and global minimizers.
Aforementioned theorems will show that the way in which the input manifold I
curves in its ambient space Rn , as measured by classic differential invariants, enters
the picture. To the best of our knowledge, the role of curvature in the theory of con-
dition of smooth computational problems involving approximation was not previously
identified. We see this is as one of the main contributions of this article.
1.6. Approximation versus idealized problems. The results of this paper
raise a pertinent practical question: If one has a computational problem modeled by
a solution manifold S that is solved via an optimization formulation as in (IPIP),
which condition number is appropriate? The one of the idealized problem S, where
the inputs are constrained to I? Or the more complicated condition number of the
optimization problem, where we allow ambient inputs from I’s ambient space Rn ?
We argue that the choice is only an illusion.
The first approach should be taken if the problem is defined implicitly by a solu-
tion manifold S ⊂ I × O and the ambient inputs are perturbations a ∈ Rn of inputs
4
x ∈ I. The idealized framework will still apply because of Corollary 5.5. Consider for
example the n-camera triangulation problem, discussed in Section 9, where the prob-
lem SMV is defined by computing the inverse of the camera projection process. If the
measurements a were taken with a perfect pinhole camera for which the theoretical
model is exact, then even when a 6∈ IMV (e.g., errors caused by pixelization), as long
as it is a sufficiently small perturbation of some x ∈ IMV , the first approach is suited.
On the other hand, if it is only postulated that the computational problem can
be well approximated by the idealized problem S, then the newly proposed framework
for Riemannian approximation problems is the more appropriate choice. In this case,
the “perturbation” to the input is a modeling error. An example of this is applying n-
camera triangulation to images obtained by cameras with lenses, i.e., most real-world
cameras. In this case, the computational problem is not SMV , but rather a more
complicated problem that takes into account lens distortions. When using SMV as
a proxy, as is usual [32, 37, 50], the computational problem becomes an optimization
problem. Curvature should be taken into account then, as developed here.
1.7. Outline. Before developing the main theory, we briefly recall the necessary
concepts from differential geometry and fix the notation in the next section. A key
role will be played by the Riemannian Hessian of the distance from an ambient input
in Rn to the input manifold I. Its relation to classic differential-geometric objects
such as the Weingarten map is stated in Section 3. The condition number of critical
points of (IPIP) is first characterized in Section 4 for the special case where the
solution manifold represents the identity map. The general case is studied in Sections 5
and 6; the former deriving the condition number of global minimizers, while the latter
treats critical points. The expressions of the condition numbers in Section 4 to 6 are
obtained from implicit formulations of the Riemannian IPIPs. For clarity, in the case
of local minimizers, we also express them as condition numbers of global minimizers
of localized Riemannian optimization problems in Section 7. Section 8 explains a few
standard techniques for computing the condition number in practice along with their
computational complexity. Finally, an application of our theory to the triangulation
problem in multiview geometry is presented in Section 9.
2. Preliminaries. We briefly recall the main concepts from Riemannian geome-
try that we will use. The notation introduced here will be used throughout the paper.
Proofs of the fundamental results appearing below can be found in [27,46,47,51,52,54].
2.1. Manifolds. By smooth m-dimensional manifold M we mean a C ∞ topo-
logical manifold that is second-countable, Hausdorff, and locally Euclidean of dimen-
sion m. The tangent space Tp M is the m-dimensional linear subspace of differential
operators at p. A differential operator at p ∈ M is a linear map vp from the vector
space C ∞ (M) over R of smooth functions f : M → R to R that satisfies the product
rule: vp (f · g) = (vp f ) · g(p) + f (p) · (vp g) for all f, g ∈ C ∞ (M). If M ⊂ Rn is em-
bedded, the tangent space can be identified with the linear span of all tangent vectors
d
dt |t=0 γ(t) where γ ⊂ M is a smooth curve passing through p at 0.
A differentiable map F : M → N between manifolds M and N induces a linear
map between tangent spaces Dp F : Tp M → TF (p) M, called the derivative of F at
p. If vp ∈ Tp M, then wF (p) := (Dp F )(vp ) is the differential operator wF (p) (f ) :=
vp (f ◦ F ) for all f ∈ C ∞ (N ).
The identity map 1M : M → M will be denoted by 1, the space on which it acts
being clear from the context.
2.2. Tubular neighborhoods. Let Mm ⊂ Rn be an embedded submanifold.
The disjoint union of tangent spaces to Mm is the tangent bundle of M:
a
TM := Tp M = {(p, v) | p ∈ M and v ∈ Tp M}.
p∈M

The tangent bundle is a smooth manifold of dimension 2m. The normal bundle is
constructed similarly. The normal space of M in Rn at p is the orthogonal complement
5
in Rn of the tangent space Tp M, namely Np M := (Tp Rn )⊥ . The normal bundle is a
smooth embedded submanifold of Rn × Rn of dimension n. Formally, we write
a
NM := Np M = {(p, η) | p ∈ M and η ∈ Np M}.
p∈M

Let δ : M → R+ be a positive, continuous function. Consider the following open


neighborhood of the normal bundle:

(2.1) Vδ = {(p, ηp ) ∈ NM | kηp k < δ(p)},

where the norm is the one induced from the Riemannian metric; in our setting it is the
standard norm on Rn . The map E : NM → Rn , (p, η) 7→ p + η is smooth and there
exists a δ such that the restriction E|Vδ becomes a diffeomorphism onto its image
[47, Theorem 6.24]. Consequently, T = E(Vδ ) is an n-dimensional, open, smooth,
embedded submanifold of Rn that forms a neighborhood of M. The submanifold T
is called a tubular neighborhood of M and δ is called its height (function).
2.3. Vector fields. A smooth vector field is a smooth map X : M → TM
from the manifold to its tangent bundle taking a point p to a tangent vector vp . By
the notation X|p we mean X(p). A smooth vector field X on a properly embedded
submanifold M ⊂ N can be extended to a smooth vector field X b on N such that it
agrees on M: X| b M = X.
A smooth frame of Mm is a tuple of m smooth vector fields (X1 , . . . , Xm ) that
are linearly independent: (X1 |p , . . . , Xm |p ) is linearly independent for all p ∈ M. If
these tangent vectors are orthonormal for all p, then the frame is called orthonormal.
Let X be a smooth vector field on M. An integral curve of X is a smooth curve
γ ⊂ M such that Dt γ = X|γ(t) for all t ∈ (−1, 1). For every p ∈ M the vector field X
generates an integral curve γ with starting point p = γ(0). There is always a smooth
curve realizing a tangent vector vp ∈ Tp M, i.e., a curve γ with p = γ(0) and v = D0 γ.
2.4. Riemannian manifolds. A Riemannian manifold is a smooth manifold
M equipped with a Riemannian metric g: a positive definite symmetric bilinear form
gp : Tp M × Tp M → R that p varies smoothly with p. The metric gp induces a norm on
Tp M denoted ktp kM := gp (tp , tp ). The only Riemannian metric we will explicitly
use in computations is the standard Riemannian metric of Rn , i.e., the Euclidean
inner product gp (x, y) = hx, yi = xT y.
If M and N are Riemannian manifolds with induced norms k · kM and k · kN ,
and if F : M → N is a differentiable map, then the spectral norm of Dp F is
k(Dp F )(tp )kN
(2.2) kDp F kM→N := sup .
tp ∈Tp M\{0} ktp kM

If the manifolds are clear from the context we can drop the subscript and write kDp F k.
2.5. Riemannian gradient and Hessian. Let M be a Riemannian manifold
with metric g, and let f : M → R be a smooth function. The Riemannian gradient
of f at p ∈ M is the tangent vector ∇f |p ∈ Tp M that is dual to the derivative
Dp f : Tp M → R under g, i.e., gp (∇f |p , t) = (Dp f )(t) for all t ∈ Tp M. It is known
that ∇f is a vector field, hence explaining the notation ∇f |p for its value at p ∈ M.
When M ⊂ Rn is a Riemannian embedded submanifold with the inherited Euclid-
ean metric, the Riemannian Hessian [2, Definition 5.5.1] can be defined without ex-
plicit reference to connections, as in [13, Chapter 5]:

Hessp (f ) : Tp M → Tp M, ηp 7→ PTp M ((Dp ∇f )(ηp )) ,

where ∇f now has to be viewed as a vector field on Rn and PTp M projects or-
thogonally onto the tangent space Tp M. In other words, the Hessian takes ηp to
d
PTp M dt |t=0 ∇f |γ(t) where γ(t) ⊂ M is a curve realizing the tangent vector ηp .
6
The Hessian Hessp (f ) is a symmetric bilinear form on the tangent space Tp M
whose (real) eigenvalues contain essential information about the objective function f
at p. If the Hessian is positive definite, negative definite, or indefinite at a critical
point p of f (i.e., ∇f |p = 0 or, equivalently, Dp f = 0), then p is respectively a local
minimum, local maximum, or saddle point of f .
3. Measuring curvature with the Weingarten map. A central role in this
paper is played by the Riemannian Hessian of the distance function
1
(3.1) d : I → R, x 7→ kx − ak2 ,
2
which measures the Euclidean distance between a fixed point a ∈ Rn and points on the
embedded submanifold I ⊂ Rn . It is the inverse of this Riemannian Hessian matrix
that will appear as an additional factor to be multiplied with D(x,y) πO (D(x,y) πI )−1
from (1.2) in the condition numbers of the problems studied in Section 4 to 6.
The Riemannian Hessian of d contains classic differential-geometric information
about the way I curves inside of Rn [54]. Indeed, from [3, Equation (10)] we conclude
that it can be expressed at x ∈ I as
Hessx (d) = 1Tx I + II x (PNx M (x − a)) ,
where PNx M is the projection onto the normal space, and II x : Nx M → S 2 (Tx M) is
a classical differential invariant called the second fundamental form [27, 46, 51, 52, 54],
which takes a normal vector in Nx M and sends it to a symmetric bilinear form on
the tangent space Tx M. If we let ηx := a − x, then at a critical point of d we have
ηx ∈ Nx I, so that the Hessian of interest satisfies
(H) Hη := Hessx (d) = 1 − Sη , where Sη := II x (ηx ).
Note that we dropped the base point x ∈ I from the notation in Hη and Sη . In the
above, Sη is interpreted as a symmetric endomorphism on Tx I and it is classically
called the shape operator or Weingarten map [27, 46, 51, 52, 54].
The Weingarten map measures how much the embedding into the ambient space
Rn bends the manifold I m in the following concrete way [46, Chapter 8]. For η 6= 0,
η 1
let w := kηk and let c1 , . . . , cm be the eigenvalues of Sw = kηk Sη . As Sw is self-adjoint,
all ci ’s are all real. The eigenvalues of Sη are then c1 kηk, . . . , cm kηk. The ci ’s are
called the principal curvatures of I at x in direction of η. They have the following
classic geometric meaning [59, Chapter 1]: If ui ∈ Tx I is a unit length eigenvector
of the eigenvalue ci , then locally around x and in the direction of ui the manifold I
contains an infinitesimal arc of a circle with center x + c−1 i w = x + (ci kηk)
−1
η through
−1
x. This circle is called an osculating circle. The radii ri := |ci | of those circles are
called the critical radii of I at x in direction η. See also Figure 4.1.
For several submanifolds I ⊂ Rn relevant in optimization applications, the Wein-
garten map Sη was computed. Absil, Mahoney, and Trumpf [3] provide expressions
for the sphere, Stiefel manifold, orthogonal group, Grassmann manifold, and low-rank
matrix manifold. Feppon and Lermusiaux [33] further extended these results by com-
puting the Weingarten maps and explicit expressions for the principal curvatures of
the isospectral manifold and “bi–Grassmann” manifold, among others. Heidel and
Schultz [38] computed the Weingarten map of the manifold of higher-order tensors of
fixed multilinear rank [42].
4. The condition number of critical point problems. Prior to treating
(IPIP) in general, we focus on an important and simpler special case. This case arises
when the solution manifold S ⊂ I × O is the graph of the identity map 1 : I → I, so
O = I. Given an ambient input a ∈ Rn , the problem (IPIP) now comprises finding
an output x ∈ I so that the distance function d from (3.1) is minimized. This is can
be formulated as the standard Riemannian least-squares problem
1
(4.1) min kx − ak2 .
x∈I 2
7
Computing global minimizers of this optimization problem is usually too ambitious
a task when I is a complicated nonlinear manifold. Most Riemannian optimization
methods will converge to local minimizers by seeking to satisfy the first-order necessary
optimality conditions of (4.1), namely a−x ∈ Nx I. We say that such methods attempt
to solve the critical point problem (CPP) whose implicit formulation is

(CPP) SCPP := {(a, x) ∈ Rn × I | a − x ∈ Nx I} .

We call SCPP the critical point locus. It is an exercise to show that SCPP is
linearly diffeomorphic to the normal bundle NI so that we have the following result.
Lemma 4.1. SCPP is an embedded submanifold of Rn × I of dimension n.
Consequently, the CPP falls into the realm of the theory of condition from (1.2).
We denote the coordinate projections of SCPP by ΠRn : SCPP → Rn and ΠI : SCPP →
I for distinguishing them from the projections associated to S. The corresponding
submanifold of well-posed tuples is

WCPP := {(a, x) ∈ SCPP | D(a,x) ΠRn is invertible}.

The fact that it is a manifold is the following result, proved in Appendix A.1.
Lemma 4.2. WCPP is an open dense embedded submanifold of SCPP .
We show in Appendix A.1 that the condition number of this problem can be
characterized succinctly as follows.2
Theorem 4.3 (Critical points). Let (a, x) ∈ SCPP and η := a − x. Then,

κ[SCPP ](a, x) = kHη−1 kI→I ,

where Hη is given by (H). Moreover, if (a, x) ∈ SCPP \WCPP , then Hη is not invertible.
Corollary 4.4. κ[SCPP ] : SCPP → R ∪ {+∞} is continuous and κ[SCPP ](x, y)
is finite if and only if (x, y) ∈ WCPP .
Curvature has entered the picture in Theorem 4.3 in the form of the Riemannian
Hessian Hη from (H). It is an extrinsic property of I, however, so that the condition
number depends on the Riemannian embedding of I into Rn . Intuitively this is
coherent, as different embeddings give rise to different computational problems each
with their own sensitivity to perturbations in the ambient space.
Let us investigate the meaning of the foregoing theorem a little further. Recall,
from Section 3, the definition of the principal curvatures ci as the eigenvalues of the
η
Weingarten map Sw where w = kηk and η = a − x. Then, Theorem 4.3 implies that

1
(4.2) κ[SCPP ](a, x) = max ,
1≤i≤m 1 − ci kηk

so its condition is completely determined by the amount by which I curves inside of


Rn and the distance of the ambient input a to the critical point x ∈ I. We see that
κ[SCPP ](a, x) = ∞ if and only if I contains an infinitesimal arc of the circle with
center x + η = a (so that there is an i with ci kηk = 1). Consequently, if kηk lies close
to any of the critical radii ri = |ci |−1 then the CPP is ill-conditioned at (a, x).
The formula from (4.2) makes it apparent that for some inputs we may have that
κ[SCPP ](a, x) < 1. In other words, for some ambient inputs a ∈ Rn the infinitesimal
error k∆ak can shrink! Figure 4.1 gives a geometric explanation of when the error is
amplified (panel (a)) and when it is shrunk (panels (b) and (c)).
We also have an interpretation of κ[SCPP ](a, x) as a normalized inverse distance
to ill-posedness: for (a, x) ∈ SCPP , let η = a − x be the normal vector that connects x
2 We take the convention that whenever an expression k · kI→O is ill-posed it evaluates to ∞.
8
a + ∆a a

a + ∆a a
x + ∆x x + ∆x x x + ∆x
x x
a + ∆a a
(a) (b) (c)
Figure 4.1. The graphs shows ambient inputs a for the CPP of the parabola. The osculating
circle with critical radius lies above the parabola, and its center (the gray point) is a focal point of
the parabola, i.e., it is a point in ΣCPP = SCPP \ WCPP . In graph (a), the ambient input a also
lies above the parabola, so η = a − x points towards the focal point. This means that the principal
curvature at x in direction of η is c1 > 0. Moreover, c1 kηk < 1, and thus, by (4.2), we have
κ[SCPP ](a, x) > 1. In other words, the curvature of the parabola amplifies the perturbation k∆ak.
In graph (b), the ambient input a lies below the parabola, so η = a − x points away from the focal
point. This means that the principal curvature at x in direction of η is negative, so that by (4.2) we
have κ[SCPP ](a, x) < 1. The curvature of the parabola shrinks the perturbation k∆ak. In graph (c),
the ambient input a lies above the parabola, so that the corresponding critical curvature is positive.
However, now c1 kηk > 1, and so by (4.2) we have κ[SCPP ](a, x) < 1. Again, the curvature of the
parabola shrinks the perturbation k∆ak.

to a. The locus of ill-posed inputs for the CPP is ΣCPP = SCPP \ WCPP . The
projection ΠRn (ΣCPP ) is well studied in the literature. Thom [62] calls it the target
envelope of SCPP , while Porteous [55] calls it the focal set of I, and for curves in the
plane it is called the evolute; see Chapter 10 of [51]. The points in the intersection of
{x} × (x + Rη) with ΣCPP are given by the multiples of the normal vector η whose
length equals one of the critical radii r1 , . . . , rm ; this is also visible in Figure 9.3.
Comparing with (4.2) we find

1 kη0 k
= min , where η = a − x.
κ[SCPP ](a, x) η0 ∈Rη: (x,a+η0 )∈ΣCPP | kη0 k − kηk |

This is an interpretation of Rice’s condition number in the spirit of Demmel [24, 25]
and Renegar [56, 57].
Remark 4.5. If I is a smooth algebraic variety in Rn , then solving SCPP reduces
to solving a system of polynomial equations. For almost all inputs a ∈ Rn this system
has a constant number of solutions over the complex numbers, called the Euclidean
distance degree [29]. Theorem 4.3 thus characterizes the condition number of real
solutions of this special system of polynomial equations.
5. The condition number of approximation problems. Next, we analyse
the condition number of global minimizers of (IPIP). This setting will be crucial for
a finer interpretation of Theorem 4.3 and 6.2 in Section 7.
By suitably choosing the height function δ [41, Chapter 4, Section 5], we can
assume for each ambient input a in a tubular neighborhood T ⊂ Rn of I that there
is a unique point x on I that minimizes the distance from I to a. In this case, the
computational problem that consists of finding global minimizers of (IPIP) can be
modeled implicitly with the graph

(AP) SAP := {(a, y) ∈ T × O | (PI (a), y) ∈ S} ,

where PI : T → I, a 7→ argminx∈I kx−ak is the nonlinear projection that maps a ∈ T


to the unique point on I that minimizes the distance to a. We refer to the problem
SAP as the approximation problem (AP) to distinguish it from the problem of finding
critical points of (IPIP), which is treated in Section 6.
If SAP were a smooth submanifold of T × O, its condition number would be given
by (1.2). However, this is not the case in general. We can nevertheless consider the
9
subset of well-posed tuples for (AP):
WAP := {(a, y) ∈ T × O | (PI (a), y) ∈ W} .
In Appendix A.2, we prove the following result under Assumption 1.
Lemma 5.1. WAP is a smooth embedded submanifold of T × O of dimension n.
Moreover, WAP is dense in SAP .
The condition number of (AP) is now given by (1.2) on WAP . For the remaining
points, i.e., SAP \ WAP , the usual definition (1.1) still applies. From this, we obtained
the following characterization, which is proved in Appendix A.2.
Theorem 5.2 (Global minimizers). Let the ambient input a ∈ T ⊂ Rn be
sufficiently close to I, specifically lying in a tubular neighborhood of I, so that it has
a unique closest point x ∈ I. Let η = a − x. Then,
κ[SAP ](a, y) = k(D(x,y) πO )(D(x,y) πI )−1 Hη−1 kI→O ,
where πI and πO are the projections onto the input and output manifolds respectively,
and Hη is as in (H). Moreover, if (a, y) ∈ SAP \ WAP then D(x,y) πI is not invertible.
Corollary 5.3. κ[SAP ] : T → R∪{+∞} is continuous and κ[SAP ](a, y) is finite
if and only if (a, y) ∈ WAP .
Remark 5.4. If f : I → O is a smooth map and S is the graph of f , then the
foregoing specializes to
(D(x,y) πO )(D(x,y) πI )−1 = Dx f, so κ[SAP ](x, y) = k(Dx f )Hη−1 kI→O
Similarly, if F : O → I is a smooth map with graph S, then
(D(x,y) πO )(D(x,y) πI )−1 = (Dy F )−1 , so κ[SAP ](x, y) = k(Dy F )−1 Hη−1 kI→O
by the inverse function theorem. These observations apply to Theorem 6.2 as well.
The contribution (D(x,y) πO )(D(x,y) πI )−1 to the condition number originates from
the solution manifold S; it is intrinsic and does not depend on the embedding. On
the other hand, the curvature factor Hη−1 is extrinsic in the sense that it depends on
the embedding of I into Rn , as in Theorem 4.3.
The curvature factor in Theorem 5.2 disappears when Sη = 0. This occurs when
the ambient input a lies on the input manifold, so η = 0. In this case, H0−1 = 1, so
we have the following result.
Corollary 5.5. Let (x, y) ∈ S. Then, we have κ[S](x, y) = κ[SAP ](x, y).
In other words, if the ambient input, given by its coordinates in Rn , lies exactly
on I, the condition number of (AP) equals the condition number of the computational
problem S, even though there can be many more directions of perturbation in Rn than
in I! By continuity, the effect of curvature on the sensitivity of the computational
problem can essentially be ignored for small kηk, i.e., for points very close to I.
This is convenient in practice because computing the Weingarten map is often more
complicated than obtaining the derivatives of the projection maps πO and πI .
6. The condition number of general critical point problems. We now
consider the sensitivity of critical points of (IPIP). As mentioned before, our moti-
vation stems from solving (IPIP) when a closed-form solution is lacking. Applying
Riemannian optimization methods to them, one can in general only guarantee to find
points satisfying the first-order optimality conditions [2]. Consequently, in theory,
they solve a generalized critical point problem: Given an ambient input a ∈ Rn , pro-
duce an output y ∈ O such that there is a corresponding x ∈ I that satisfies the
first-order optimality conditions of (IPIP). We can implicitly formulate this compu-
tational problem as follows:
(GCPP) SGCPP := {(a, x, y) ∈ Rn × I × O | (a, x) ∈ SCPP and (x, y) ∈ S} .
10
The graph SGCPP is defined using three factors, where the first is the input and the
third is the output. The second factor I encodes how the output is obtained from the
input. Triples are ill-posed if either (x, y) ∈ S is ill-posed for the original problem or
(a, x) ∈ SCPP is ill-posed for the (CPP). Therefore, the well-posed locus is

WGCPP := {(a, x, y) ∈ Rn × I × O | (a, x) ∈ WCPP and (x, y) ∈ W} .

We prove in Appendix A.3 that the well-posed tuples form a manifold.


Lemma 6.1. WGCPP is an embedded submanifold of Rn × W of dimension n.
Moreover, WGCPP is dense in SGCPP .
Consequently, on WGCPP we can use the definition of condition from (1.2). Our
final main result generalizes Theorem 5.2; it is proved in Appendix A.3.
Theorem 6.2 (Generalized critical points). Let (a, x, y) ∈ SGCPP and η = a−x.
Then, we have

κ[SGCPP ](a, x, y) = k(D(x,y) πO ) (D(x,y) πI )−1 Hη−1 kI→O ,

where πI : S → I and πO : S → O are coordinate projections, and Hη is given by (H).


Moreover, if (a, x, y) ∈ SGCPP \ WGCPP , then either D(x,y) πI or Hη is not invertible.
Corollary 6.3. κ[SGCPP ] : SGCPP → R∪{+∞} is continuous and the condition
number κ[SGCPP ](a, x, y) is finite if and only if (a, x, y) ∈ WGCPP .
Since I inherits the Euclidean metric from Rn , the following result is immediately
obtained from Theorem 6.2.
Corollary 6.4. Let (a, x, y) ∈ SGCPP and η = a − x. We have

κ[S](x, y) κ[S](x, y)
≤ κ[SGCPP ](a, x, y) ≤ ,
max 1 − ci kηk min 1 − ci kηk
1≤i≤m 1≤i≤m

where c1 , . . . , cm are the principal curvatures of I m at x in direction η = a − x.


Proof. Since κ[S](x, y) = k(D(x,y) πO ) (D(x,y) πI )−1 kI→O , the submultiplicativ-
ity of the spectral norm both yields κ[S](x, y) ≤ κ[SGCPP ](a, x, y)kHη k and also
κ[SGCPP ](a, x, y) ≤ κ[S](x, y)kHη−1 k, concluding the proof.
7. Consequences for Riemannian optimization. In Section 4, we estab-
lished a connection between the CPP and the Riemannian optimization problem (4.1):
SCPP is the manifold of critical tuples (a, x) ∈ Rn × I, such that x is a critical point
of the squared distance function from I to a. For the general implicit formulation in
(GCPP), however, the connection to Riemannian optimization is more complicated.
For clarity, we therefore also characterize the condition number of local minimizers
of (IPIP) as the condition number of the unique global minimizer of a localized Rie-
mannian optimization problem.
The solutions of the GCPP were defined as the generalization of the AP to critical
points. However, we did not show that they too can be seen as the critical tuples
of a distance function optimized over in a Riemannian optimization problem. This
connection is established next and proved in Appendix A.4.
Proposition 7.1. (a, x, y) ∈ WGCPP if and only if (a, (x, y)) is a well-posed crit-
ical tuple of the function optimized over in the Riemannian optimization problem
1
min ka − πI (x, y)k2 .
(x,y)∈W 2

Our main motivation for studying critical points instead of only the global min-
imizer as in (AP) stems from our desire to predict the sensitivity of outputs of Rie-
mannian optimization methods for solving approximation problems like (4.1), (PIP),
11
and (IPIP). When applied to a well-posed optimization problem, these methods in
general only guarantee convergence to critical points [2]. However, in practice, con-
vergence to critical points that are not local minimizers is extremely unlikely [2]. The
reason is that they are unstable outputs of these algorithms: Applying a tiny per-
turbation to such critical points will cause the optimization method to escape their
vicinity, and with high probability, converge to a local minimizer instead.
Remark 7.2. One should be careful to separate condition from numerical stabil-
ity. The former is a property of a problem, the latter the property of an algorithm. We
do not claim that computing critical points of GCPPs other than local minimizers are
ill-conditioned problems. We only state that many Riemannian optimization methods
are unstable algorithms for computing them via (IPIP). This is not a bad property,
since the goal was to find minimizers, not critical points!
For critical points of the GCPP that are local minimizers of the problem in Propo-
sition 7.1 we can characterize their condition number as the condition number of the
unique global minimizer of a localized problem. This result is proved in Appendix A.4.
Theorem 7.3 (Condition of local minimizers). Let the input manifold I, the
output manifold O, and well-posed loci W and WGCPP be as in (GCPP). Assume that
(a? , x? , y ? ) ∈ WGCPP and x? is a local minimizer of da? : I → R, x 7→ 12 ka? − xk2 .
Then, there exist open neighborhoods Aa? ⊂ Rn around a? and N(x? ,y? ) ⊂ W around
(x? , y ? ), so that

1
ρ(a? ,x? ,y? ) : Aa? → O, a 7→ πO ◦ argmin ka − πI (x, y)k2
(x,y)∈N(x? ,y? ) 2

is an (AP) and the condition number of this smooth map satisfies

κ[ρ(a? ,x? ,y? ) ](a) = κ[SGCPP ](a, PIx? (a), ρ(a? ,x? ,y? ) (a)),

where Ix? = πI (N(x? ,y? ) ) is the projection onto the first factor, and PIx? (a) =
argminx∈Ix? ka − xk2 is the nonlinear projection onto Ix? .
Taking I = O and S as the graph of the identity map 1, the next result follows.
Corollary 7.4. Let the input manifold I, and well-posed loci W and WCPP
be as in (CPP). Assume that (a? , x? ) ∈ WCPP is well-posed and that x? is a local
minimizer of da? : I → R, x 7→ 21 ka? − xk2 . Then, there exist open neighborhoods
Aa? ⊂ Rn around a? and Ix? ⊂ I around x? , so that

1
ρ(a? ,x? ) : Aa? → I, a 7→ argmin ka − xk2
x∈Ix? 2

is an (AP) and its condition number is κ[ρ(a? ,x? ) ](a) = κ[SCPP ](a, ρ(a? ,x? ) (a)).
An interpretation of these results is as follows. The condition number of a gen-
eralized critical point (a? , x? , y ? ) that happens to correspond to a local minimum of
(IPIP) equals the condition number of the map ρ(a? ,x? ,y? ) . This smooth map mod-
els a Riemannian AP on the localized computational problem N(x? ,y? ) ⊂ W which
has a unique global minimum for all ambient inputs from Aa? . Consequently, every
local minimum of (IPIP) is robust in the sense that it moves smoothly according to
ρ(a? ,x? ,y? ) in neighborhoods of the input a? and (x? , y ? ) ∈ W. In this open neighbor-
hood, the critical point remains a local mimimum.
The foregoing entails that if a local minimum (x? , y ? ) ∈ W is computed for
(IPIP) with ambient input a? using Riemannian optimization, then computing a local
minimum (x, y) for (IPIP) with perturbed input a? + ∆ and starting point (x? , y ? )
results in distO (y, y ? ) ≤ κ[SGCPP ](a? , x? , y ? )k∆k + o(k∆k), where distO is as in (1.1),
provided that k∆k is sufficiently small.
12
8. Computing the condition number. Three main ingredients are required
for evaluating the condition number of (IPIP) numerically:
(i) the inverse of the Riemannian Hessian Hη from (H),
(ii) the derivative of the computational problem (D(x,y) πO )(D(x,y) πI )−1 , and
(iii) the spectral norm (2.2).
The only additional item for evaluating the condition number of (IPIP) relative to
the idealized inverse problem “given x ∈ I, find a y ∈ O such that (x, y) ∈ S” is the
curvature term Hη−1 from item (i). Therefore, we focus on evaluating (i) and only
briefly mention the standard techniques for (ii) and (iii).
The derivative in (ii) is often available analytically as in Remark 5.4. If no
analytical expression can be derived, it can be numerically approximated when O ⊂ Rq
is an embedded submanifold by computing the standard Jacobian of S, viewed as the
graph of a map Rn → Rq , and composing it with the projections onto Tx I and Ty O.
Evaluating the spectral norm in (iii) is a linear algebra problem. Let I ⊂ Rn be
embedded with the standard inherited metric and let the Riemannian metric of O be
given in coordinates by a matrix G. If G = RT R is its Cholesky factorization [36],
then kAkI→O = kRAk2 = σ1 (RA), where A : Tx I → TA(x) O is a linear map and
k · k2 is the usual spectral norm, i.e., the largest singular value σ1 (·). The largest
singular value can be computed numerically in O(min{m2 p, p2 m}) operations, where
m = dim I and p = dim O, using standard algorithms [36]. If the linear map LA can
be applied in fewer than O(mp) operations, a power iteration can be more efficient [6].
The Riemannian Hessian Hη can be computed analytically using standard tech-
niques from differential geometry for computing Weingarten maps [27, 46, 51, 52, 54],
or it can be approximated numerically [11]. Numerical approximation of the Hes-
sian with forward differences is implemented in the Matlab package Manopt [14]. Its
inverse can in both cases be computed with O(m3 ) operations.
We found two main analytical approaches helpful to compute Weingarten maps
of embedded submanifolds I ⊂ Rn . The first relies on the following characterization
of the Weingarten map Sηx in the direction of a normal vector ηx = a−x. It is defined
in [27, Section 6.2] as
 
(8.1) Sηx : Tx I → Tx I, vx 7→ PTx I −(∇ e v N )|x ,
x

where PTx I denotes the orthogonal projection onto Tx I, N is a smooth extension of η


to a normal vector field on an open neighborhood of x in I, and ∇ e v is the Euclidean
x
covariant derivative [27, 46, 51, 52, 54]. In practice, the latter can be computed as

d
(∇
e v Y )|x =
x Y |γ(t) ,
dt
where Y is an arbitrary vector field on I and γ(t) is a smooth curve realizing vx [46,
Lemma 4.9]. In other words, Sηx maps ηx to the tangential part of the usual directional
derivative of N in direction of vx ∈ Tx I ⊂ Rn .
The second approach relies on the second fundamental form of I, which is defined
as the projection of the Euclidean covariant derivative onto the normal space Nx I:
 
II x (X, Y ) := (II (X, Y ))|x := PNx I (∇
e X| Y )|x ,
x

where now both X and Y are vector fields on I. The second fundamental form is sym-
metric in X and Y , and II x (X, Y ) only depends on the tangent vectors X|x and Y |x .
It can be viewed as a smooth map II x : Tx M×Tx M → Nx M. Contracting the second
fundamental form with a normal vector ηx ∈ Nx I yields an alternative definition of
the Weingarten map: For all v, w ∈ Tx I we have hSηx (v), wi = hII x (v, w), ηx i, where
h·, ·i is the Euclidean inner product. Computing the second fundamental form is often
facilitated by pushing forward vector fields through a diffeomorphism F : M → N .
In this case, there exists a vector field Y on N that is F -related to a vector field X
13
image plane

y
x2 baseline
principal plane
x1 c2

c1

Figure 9.1. Setup of the 2-camera triangulation problem. The world coordinates of y ∈ R3 are
to be reconstructed from the projections x1 , x2 ∈ R2 (in the respective image coordinates) of y onto
the image planes of the cameras with centers at c1 and c2 (in world coordinates) respectively.

on M: Y |p = (F∗ X)|p := (DF −1 (p) F )(X|F −1 (p) ). The integral curves generated by X
and Y = F∗ X are related by Proposition 9.6 of [47]. This approach will be illustrated
in the next section.
As a rule of thumb, we find that computation of the condition number of (IPIP) is
approximately as expensive as one iteration of a Riemannian Newton method for solv-
ing this problem. Typically, the cost is not worse than O(m3 +nm2 +min{m2 p, p2 m})
operations, where the terms correspond one-to-one to items (i)–(iii), and where I m ⊂
Rn and p = dim O.
9. Triangulation in computer vision. A rich source of approximation prob-
lems whose condition can be studied with the proposed framework is multiview ge-
ometry in computer vision; for an introduction to this domain see [32, 37, 50]. The
tasks consist of recovering information from r ≥ 2 camera projections of a scene in the
world R3 . We consider the pinhole camera model. In this model the image formation
process is modeled by the following transformation, as visualised in Figure 9.1:
 r
A` y + b`
µr : y 7→ ,
cT` y + d` `=1

where A` ∈ R2×3 , b` ∈ R2 , c` ∈ R3 , and d` ∈ R; see [32, 37, 50]. Clearly, µr is not


defined on all of R3 . We clarify its domain in Lemma 9.1 below.
The vector x = µr (y) ∈ R2r obtained by this image formation process is called a
consistent point correspondence. Information that can often be identified from consis-
tent point correspondences include camera parameters and scene structure [32,37,50].
As an example application of our theory we compute the condition number of
the triangulation problem in computer vision [37, Chapter 12]. In this computational
problem we have r ≥ 2 stationary projective cameras and a consistent point corre-
spondence x = (x1 , x2 , . . . , xr ) ∈ R2r . The goal is to retrieve the world point y ∈ R3
from which they originate. Since the imaging process is subject to noisy measurements
and the above idealized projective camera model holds only approximately [37], we
expect that instead of x we are only given a = (a1 , a2 , . . . , ar ) close to x. Thus, x is
the true input of the problem, y is the output and a is the ambient input.
According to [37, p. 314] , the “gold standard algorithm” for triangulation solves
the (Riemannian) optimization problem
1
(9.1) min3 ka − µr (y)k2 .
y∈R 2

Consequently, the triangulation problem can be cast as a GCCP provided that two
conditions hold: (i) the set of consistent point correspondences is an embedded man-
ifold, and (ii) intersecting the back-projected rays is a smooth map from aforemen-
tioned manifold to R3 . We verify both conditions in the following subsection.
9.1. The multiview manifold. The image formation process can be inter-
3 2
preted as a projective transformation
h ifrom the projective space P to P in terms of
A` b`
the 4 × 3 camera matrices P` = cT d` [32,37,50]. It yields homogeneous coordinates
`

14
of x` = (z1` /z3` , z2` /z3` ) ∈ R2 where z ` = P` [ y1 ]. Note that if z3` = 0 then the point y
has no projection3 onto the image plane of the `-th camera, which occurs precisely
when y lies on the principal plane of the camera [37, p. 160]. This is the plane parallel
to the image plane through the camera center c` ; see Figure 9.1. It is also known that
points on the baseline of two cameras, i.e., the line connecting the camera centers
visualised by the dashed line in Figure 9.1, all project to the same two points on the
two cameras, called the epipoles [37, Section 10.1]. Such points cannot be triangulated
from only two images. For simplicity, let B ⊂ R3 be the union of the principal planes
of the first two cameras and their baseline, so B is a 2-dimensional subvariety. The
next result shows that a subset of the consistent point correspondences outside of B
forms a smooth embedded submanifold of R2r .
Lemma 9.1. Let OMV = R3 \ B with B as above. The map µr : OMV → R2r is a
diffeomorphism onto its image IMV = µr (OMV ).
Proof. Clearly µr is a smooth map between manifolds (cT` y + d` = z3` 6= 0). It
only remains to show that it has a smooth inverse. Theorem 4.1 of [39] states that
µ2 ’s projectivization is a birational map, entailing that µ2 is a diffeomorphism onto
its image. Let π1:4 : R2r → R4 denote projection onto the first 4 coordinates. Then,
µ−1 −1 −1
2 ◦ π1:4 is a smooth map such that (µ2 ◦ π1:4 ) ◦ µr = µ2 ◦ µ2 = 1OMV , having
used that the domain OMV is the same for all r. For the right inverse, we see that
any element of IMV can be written as µr (y) for some y ∈ OMV . Hence,

(µr ◦ (µ−1
2 ◦ π1:4 ))(µr (y)) = (µr ◦ 1OMV )(y) = µr (y),

so it is the identity on IMV . This proves that µr has µ−1


2 ◦ π1:4 as smooth inverse.
As µ−1
r is required in the statement of the Weingarten map, we give an algorithm
−1
for computing µ−1r = µ2 ◦ π1:4 . Assume that we are given x ∈ IMV ⊂ R
2r
in the r-
camera multiview manifold and let its first four coordinates be (x1 , y1 , x2 , y2 ). Then,
by classic results [37, Section 12.2], the unique element in the kernel of
 
(x1 eT3 − eT1 )P1
 (y1 eT3 − eT2 )P1  4×4
(x2 eT3 − eT1 )P2  ∈ R
 

(y2 eT3 − eT1 )P2

yields homogeneous coordinates of the back-projected point in R3 .


The solution manifold SMV ⊂ IMV × OMV is the graph of µ−1 r . Therefore, it is a
properly embedded smooth submanifold; see, e.g., [47, Proposition 5.7].
9.2. The second fundamental form. Vector fields and integral curves on IMV
can be viewed through the lens of µr : We construct a local smooth frame of IMV by
pushing forward a frame from OMV by µr . Then, the integral curves of each of the
local smooth vector fields are computed, after which we apply the Gauss formula for
curves [46, Lemma 8.5] to compute the second fundamental form.
Because of Lemma 9.1 a local smooth frame for the multiview manifold IMV ⊂
R2r is obtained by pushing forward the constant global smooth orthonormal frame
(e1 , e2 , e3 ) of R3 by the derivative of µr . Its derivative is
 r
A` ẏ T A` y + b`
Dy µr : Ty OMV → Tµr (y) IMV , ẏ 7→ − (c` ẏ) ,
cT` y + d` (cT` y + d` )2 `=1
and so a local smooth frame of IMV is given by
 r
A` e i A` y + b`
Ei : IMV → TIMV , µr (y) 7→ − c`,i T , i = 1, 2, 3,
cT` y + d` (c` y + d` )2 `=1
3 Actually, it has a projection if we allow points at infinity, i.e., if we consider the triangulation

problem in projective space.


15
where c`,i = cT` ei . It is generally neither orthonormal nor orthogonal.
We compute the second fundamental form of the multiview manifold by differenti-
ation along the integral curves generated by the smooth local frame (E1 , E2 , E3 ). The
integral curves through µr (y) generated by this frame are the images of the integral
curves passing through y generated by the ei ’s due to [47, Proposition 9.6]. The latter
are seen to be gi (t) = y + tei by elementary results on linear differential equations.
Therefore, the integral curves generated by Ei are
 r
A` (y + tei ) + b`
γi (t) = µr (gi (t)) = .
cT` (y + tei ) + d` `=1

The components of the second fundamental form at x = µr (y) ∈ IMV are then
d

II x (Ei , Ej ) = PNIMV dt Ej |γi (t)
 h ir 
d A` ej A` (y+tei )+b`
= PNIMV dt T
c` (y+tei )+d`
− c `,j T
(c` (y+tei )+d` )2
`=1
h c c c`,i c`,j
ir 
`,i `,j
= PNIMV − α2 (y) A` ej − α2 (y) A` ei + 2 α3 (y) (A` y + b` )
` ` `
,
`=1

where α` (y) = cT` y + d` and i, j = 1, 2, 3.


9.3. A practical algorithm. The Weingarten map of IMV in the direction of
the normal vector η ∈ Nx IMV is obtained by contracting the second fundamental form
with η; that is, Sbη = hII x (Ei , Ej ), ηi. This can be computed efficiently using linear
algebra operations. Partitioning η = [η` ]r`=1 with η` ∈ R2 , the symmetric coefficient
matrix of the Weingarten map relative to the frame E = (E1 , E2 , E3 ) becomes
r  3
X c`,i c`,j T c`,i c`,j
Sbη = 2 3 η` (A` y + b` ) − 2 η`T A` ej − 2 η`T A` ei ∈ R3×3 ,
α` (y) α` (y) α` (y) i,j=1
`=1

where y = µr−1 (x).


To compute the spectrum of the Weingarten map using efficient linear algebra
algorithms, we need to express it with respect to an orthonormal local smooth frame by
applying Gram–Schmidt orthogonalization to E. This is accomplished by placing the
tangent vectors E1 , E2 , E3 as columns of a 2r ×3 matrix J and computing its compact
QR decomposition QR = J. The coefficient matrix of the Weingarten map expressed
with respect to the orthogonalized frame Q = (Q1 , Q2 , Q3 ) is then Sη = R−T Sbη R−1 .
Since µr : OMV → IMV is a diffeomorphism, (D(x,y) πOMV )(D(x,y) πIMV )−1 equals
the inverse of the derivative of µr by Remark 5.4. As R is the matrix of Dy µr :
Ty OMV → Tx IMV expressed with respect to the orthogonal basis (Q1 , Q2 , Q3 ) of
IMV and the standard basis of R3 , we get that κ[SGCPP ](x + η, x, y) equals

1
k(D(x,y) πOMV )(D(x,y) πIMV )−1 Hη−1 kI→O = kR−1 (I − Sη )−1 k2 = ,
σ3 (I − Sη )R

where I is the 3 × 3 identity matrix, and σ3 is the third largest singular value.
The computational complexity of this algorithm grows linearly with the number
of cameras r. Indeed, the Weingarten map Sbη can be constructed in O(r) operations,
the smooth frame E is orthogonalized in O(r) operations, the change of basis from
Sbη to Sη requires a constant number of operations, and the singular values of a 3 × 3
matrix adds another constant.
9.4. Numerical experiments. The computations below were implemented in
Matlab R2017b [49]. The code we used is provided as supplementary files accompany-
ing the arXiv version of this article. It uses functionality from Matlab’s optimization
toolbox. The experiments were performed on a computer running Ubuntu 18.04.3
LTS that consisted of an Intel Core i7-5600U CPU with 8GB main memory.
16
We present a few numerical examples illustrating the behavior of the condition
number of triangulation. The basic setup is described next. We take all 10 cam-
era matrices Pi ∈ R3×4 from the “model house” data set of the Visual Geometry
Group of the University of Oxford [66]. These cameras are all pointing roughly in the
same direction. In our experiments we reconstruct the point y := p + 1.510 v ∈ R3 ,
where p = (−1.85213, −0.532959, −5.65752) ∈ R3 is one point of the house and
v = (−0.29292, −0.08800, −0.95208) is the unit-norm vector that points from the
center of the first camera p. Given a data point a ∈ R2r , it is triangulated as fol-
lows. First a linear triangulation method is applied, finding the right singular vector
corresponding to the least singular value of a matrix whose construction is described
(for two points) in [37, Section 12.2]. We then use it as a starting point for solving
optimization problem (9.1) with Matlab’s nonlinear least squares solver lsqnonlin.
For this solver, the following settings were used: TolFun and TolX set to 10−28 ,
StepTolerance equal to 10−14 , and both MaxFunEvals and MaxIter set to 104 . We
provided the Jacobian to the algorithm, which it can evaluate numerically.
Since the multiview manifold IMV is only a three-dimensional submanifold of R2r
and the computational complexity is linear in r, we do not focus on the computational
efficiency of computing κ[SGCPP ]. In practical cases, with r  1000, the performance
is excellent. For example, the average time to compute the condition number for r =
1000 cameras was less than 0.1 seconds in our setup. This includes the time to compute
the frame E, the Weingarten map expressed with respect to E, the orthogonalization
of the frame, and the computation of the smallest singular value.
Experiment 1. In the first experiment, we provide numerical evidence for the
correctness of our theoretical results. We project y to the image planes of the r = 10
cameras: x = µr (y). A random normal direction η ∈ Nx IMV is sampled by taking a
random vector with i.i.d. standard normal entries, projecting it to the normal space,
and then scaling it to unit norm. We investigate the sensitivity of the points along
the ray a(t) := x − tη.
The theoretical condition number of the triangulation problem at a(t) can be
computed numerically for every t using the algorithm from subsection 9.3. We can
also estimate the condition number experimentally, for example by generating a large
number of small perturbations a(t) + E(t) with kE(t)k ≈ 0, solving (9.1) numerically
using lsqnonlin and then checking the distance to the true critical point y. However,
from the theory we know that the worst direction of infinitesimal perturbation is
given by the left singular vector of (I − S−tη )R corresponding the smallest (i.e. third)
singular value, where R is as in subsection 9.3. Let u(t) ∈ R3 denote this vector,
which contains the coordinates of the perturbation relative to the orthonormal basis
Q = (Q1 , Q2 , Q3 ); see subsection 9.3. Then, it suffices to consider only a small
(relative) perturbation in this direction; we took E(t) := 10−6 ka(t)kQu(t). We solve
(9.1) with input a(t)+E(t) and output yest (t) using lsqnonlin with the exact solution
of the unperturbed problem y as starting point. The experimental estimate of the
condition number is then κest (t) = ky−y est (t)k
kE(t)k .
In Figure 9.2 we show the experimental results for the above setup, where we chose
t = 10i kxk with 100 values of i uniformly spaced between −3 and 2. It is visually
evident that the experimental results support the theory. Excluding the three last
values of t, the arithmetic and geometric means of the condition number divided
by the experimental estimate are approximately 1.006664 and 1.006497 respectively.
This indicates a near perfect match with the theory.
There appears to be one anomaly in Figure 9.2, however: the three crosses to the
right of the singularity at t? = 79.64416kxk. What happens is that past the singularity,
y is no longer a local minimizer of the optimization problem miny∈OMV 12 ka(t) −
µ10 (y)k2 . This is because x(t) is no longer a local minimizer of minx∈IMV 12 ka(t)−xk2 .
Since we are applying an optimization method, the local minimizer happens to move
away from the nearby critical point and instead converges to a different point. Indeed,
one can show that the Riemannian Hessians (see [2, Chapter 5.5]) of these problems
17
10 6

10 4

10 2

10 0

10 -2
10 -3 10 -2 10 -1 10 0 10 1 10 2

Figure 9.2. A comparison of theoretical and experimental data of the condition number of
the triangulation problem with all cameras for several points along the ray x − tη. Herein, η is a
randomly chosen unit-norm normal direction at x ∈ IMV . The numerically computed theoretical
condition numbers are indicated by circles, while the experimentally estimated condition numbers,
as described in the text, are plotted with crosses.

t
Figure 9.3. The GCPP condition number in function of the relative distance kxk from a
specified x ∈ IMV and a randomly chosen unit-length normal direction η. Each of the four lines
corresponds to a different number k of cameras taking the pictures, namely k = 2, 3, 5, 10.

are positive definite for small t, but become indefinite past the singularity t∗ .
Experiment 2. The next experiment illustrates how the condition number varies
with the distance t along a normal direction tη ∈ NIMV . We consider the setup from
the previous paragraph. The projection of the point y onto the image planes of the
first k cameras is xk := µk (y) = (P1 y, . . . , Pk y). A random unit-norm normal vector
ηk in Nxk IMV is chosen as described before. We investigate the sensitivity of the
perturbed inputs ak (t) := xk + tηk for t ∈ R. The condition number is computed for
these inputs for k = 2, 3, 5, 10 and kx±tk k = 10i for 104 values of i uniformly spaced
between −3 and 4. The results are shown in Figure 9.3. We have 3 peaks for a fixed k
because dim IMV = 3. The two-camera system has only two singularities in the range
|t| < 104 kx2 k. These figures illustrate the continuity of κGCPP from Corollary 6.3.
Up to about |t| ≤ 10−2 kxk k we observe that curvature hardly affects the condition
number of the GCPP for the triangulation problem. Beyond this value, curvature
plays the dominant role: both the singularities and the tapering off for high relative
errors are caused by it. Moreover, the condition number decreases as the number of
cameras increases. This indicates that the use of minimal problems [45] in computer
vision could potentially introduce numerical instability.

Acknowledgements. The authors thank Nicolas Boumal for posing a question


on unconstrained errors, which eventually started the project that led to this article.
We are greatly indebted to Carlos Beltrán who meticulously studied the manuscript
and asked pertinent questions about the theory. Furthermore, the authors would like
to thank Peter Bürgisser for hosting the second author at TU Berlin in August 2019,
and Joeri Van der Veken for fruitful suggestions and pointers to the literature.
18
We thank two anonymous reviewers whose remarks enticed us to immensely im-
prove the presentation of our results, specifically in Section 1 and 3, and to include a
discussion of computational aspects in Section 8. We also thank the editor, Anthony
So, for adeptly handling the reviewing process.
Appendix A. Proofs of the technical results. We recall two classic results
that we could not locate in modern references in differential geometry. A common
theme in the three computational problems considered in Section 4 to 6 is that they
involve the projection ΠI : NI → I, (x, η) 7→ x from the normal bundle NI to the
base of the bundle, the embedded submanifold I ⊂ Rn . This projection is smooth [47,
Corollary 10.36] and its derivative is characterized by the following classical result,
which dates back at least to Weyl [67].
Proposition A.1. Let (x, η) ∈ NI and (ẋ, η̇) ∈ T(x,η) NI. Then, PTx I (ẋ + η̇) =
Hη ẋ, where Hη is as in (H).
A useful consequence is that it allows us to characterize when Hη is singular.
Lemma A.2. Let (a, x) ∈ SCPP , η = a − x, and Hη be as in (H). Then, Hη is
invertible if and only if (a, x) ∈ WCPP .
Proof. Let (ȧ, ẋ) ∈ T(a,x) SCPP . Then, we have ẋ = (D(a,x) ΠI )(ȧ, ẋ) and ȧ =
(D(a,x) ΠRn )(ȧ, ẋ), so that Proposition A.1 yields the equality of operators

(A.1) PTx I D(a,x) ΠRn = Hη D(a,x) ΠI .

On the one hand, if (a, x) ∈ WCPP then the map D(a,x) ΠRn is invertible. By
multiplying with its inverse we get PTx I = Hη (D(a,x) ΠI )(D(a,x) ΠRn )−1 . As Hη is an
endomorphism on Tx I, which is the range of the left-hand side, the equality requires
Hη to be an automorphism, i.e., it is invertible.
On the other hand, if (a, x) 6∈ WCPP , there is some (ȧ, ẋ) 6= 0 in the kernel
of D(a,x) ΠRn . Since ȧ = (D(a,x) ΠRn )(ȧ, ẋ) = 0, we must have ẋ 6= 0. Consequently,
applying both sides of (A.1) to (ȧ, ẋ) gives 0 = Hη ẋ, which implies that Hη is singular.
This finishes the proof.
The next result follows almost directly from Proposition A.1 and appears several
times as an independent statement in the literature. The earliest reference we could
locate is Abatzoglou [1, Theorem 4.1], who presented it for C 2 embedded submanifolds
of Rn in a local coordinate chart.
Corollary A.3. Let I ⊂ Rn be a Riemannian embedded submanifold. Then,
there exists a tubular neighborhood T of I such that the projection map PI : T →
I, a 7→ argminx∈I ka − xk2 is a smooth submersion with derivative

Da PI = Hη−1 PTx I ,

where x = PI (a), η = a − x, and Hη is given by (H).


In the next subsections, proofs are supplied for the main results.
A.1. Proofs for Section 4.
Proof of Lemma 4.2. Recall that ΠRn is a smooth map, so if its derivative is
nonsingular at a point (a, x) ∈ SCPP , then there exists an open neighborhood of
(a, x) where this property remains valid. Therefore, applying the inverse function
theorem [47, Theorem 4.5] at (a, x) ∈ WCPP , the fact that D(a,x) ΠRn is invertible
implies that there is a neighborhood N(a,x) of (a, x) in WCPP and a neighborhood of
a in Rn such that N(a,x) is diffeomorphic to (an open ball in) Rn . Hence, WCPP is
open in SCPP , and, hence, it is an open submanifold of dimension n.
To show that WCPP is dense in SCPP , take any (a, x) ∈ SCPP \ WCPP . Let
η = a − x. Then, (x, x) + (αη, 0) ∈ SCPP because η ∈ Nx I. By Lemma A.2,
(x, x) + (αη, 0) ∈ WCPP with α ∈ (0, ∞) if and only if Hαη = 1 − Sαη = 1 − αSη is
19
singular. This occurs only if α1 equals one of the eigenvalues of the Weingarten map
Sη . Consequently, (a, x) can be reached as the limit of a sequence (x, x) + (αn η, 0),
which is wholly contained in WCPP by taking αn outside of the discrete set.
Applying [47, Proposition 5.1], we can conclude that WCPP is even an embedded
submanifold. This concludes the argument.
Proof of Theorem 4.3. Let (a, x) ∈ WCPP be arbitrary. On WCPP , the coor-
dinate projection ΠRn : SCPP → Rn has an invertible derivative. Consequently,
by the inverse function theorem [47, Theorem 4.5] there exist open neighborhoods
X ⊂ SCPP of (a, x) and Y of a ⊂ Rn such that ΠRn |X : X → Y has a smooth
inverse function that we call φ(a,x) . Its derivative is Da φ(a,x) = (D(a,x) ΠRn )−1 . Con-
sider the smooth map Φ(a,x) := ΠI ◦ φ(a,x) , where ΠI : SCPP → I is the coor-
dinate projection. The CPP condition number at (a, x) ∈ WCPP is κ[Φ(a,x) ](a) =
κ[WCPP ](a, x) = kDa Φ(a,x) kRn →I . Since Da Φ(a,x) = (D(a,x) ΠI )(D(a,x) ΠRn )−1 , we
have that κ[WCPP ](a, x) = k(D(a,x) ΠI ) (D(a,x) ΠRn )−1 kRn →I . Now it follows from
Lemma A.2 and (A.1) that (D(a,x) ΠI )(D(a,x) ΠRn )−1 = Hη−1 PTx I . As the metrics on
I and Rn are identical, kHη−1 PTx I kRn →I = kHη−1 kI→I . This concludes the first part.
The second part is Lemma A.2.
A.2. Proofs for Section 5.
Proof of Lemma 5.1. By Corollary A.3 the projection PI : T → I is a smooth
map on T . Let us denote the graph of this map by V := {(a, PI (a)) ∈ T × I |
a ∈ T }. It is a smooth embedded submanifold of dimension dim T = dim NI = n;
see, e.g., [47, Proposition 5.4]. Let x = PI (a), η = a − x and (ȧ, ẋ) ∈ T(a,x) V
be a nonzero tangent vector. Because of Corollary A.3 it satisfies ẋ = Hη−1 PTx I ȧ.
Hence, (D(a,x) ΠRn )(ȧ, Hη−1 PTx I ȧ) = ȧ 6= 0, so that its kernel is trivial. It follows that
V ⊂ WCPP . Since their dimensions match by Lemma 4.1, V is an open submanifold.
As WCPP is embedded by Lemma 4.2, V is embedded as well by [47, Proposition 5.1].
Moreover, by construction, we have
WAP = (V × O) ∩ (Rn × W) ⊂ Rn × I × O.
The first part of the proof is concluded by observing that the proof of Lemma 6.1
applies by replacing WCPP with V and WGCPP with WAP .
Finally, we show that WAP is dense in SAP . Let (x + η, y) ∈ SAP with x ∈ I and
η ∈ Nx I be arbitrary. Then PI (x + η) = x and (x, y) ∈ S. As W is an open dense
submanifold of S, there exists a sequence (xi , yi ) ∈ W such that limi→∞ (xi , yi ) =
(x, y). Consider (xi + ηi , yi ) ∈ WAP with ηi ∈ Nxi I and ηi → η chosen in such a way
that the first component lies in the tubular neighborhood; this is possible as T is an
open submanifold of Rn . Then, the limit of this sequence is (x + η, y). Q.E.D.
Proof of Theorem 5.2. For (a, y) ∈ WAP , let x = PI (a) and η = a − x. By (1.2),
κ[WAP ](a, y) = kDa (πO ◦ πI−1 ◦ PI )kRn →O . Since (x, y) ∈ W, we know that D(x,y) πI
is invertible. Moreover, by Corollary A.3 the derivative of the projection PI : T → I
at a is Da PI = Hη−1 PTx I . Therefore,

κ[WAP ](a, y) = k(D(x,y) πO )(D(x,y) πI )−1 Hη−1 kI→O .


For (a, y) ∈ SAP \ WAP , we have (x, y) 6∈ W with x = PI (a). By definition of W,
this entails that D(x,y) πI is not invertible, concluding the proof.
A.3. Proofs for Section 6.
Proof of Lemma 6.1. Combining Assumption 1 and Lemma 4.2, it is seen that
WGCPP is realized as the intersection of two smoothly embedded product manifolds:
WGCPP = (WCPP × O) ∩ (Rn × W) ⊂ Rn × I × O.
Note that codim(WCPP × O) = dim I, by Lemma 4.1, and codim(Rn × W) = dim O,
because dim W = dim S and Assumption 1 stating that dim S = dim I. Thus, it
20
suffices to prove that the intersection is transversal [47, Theorem 6.30], because then
WGCPP would be an embedded submanifold of codimension dim I + dim O.
Consider a point (a, x, y) ∈ WGCPP . Transversality means that we need to show
T(a,x,y) (WCPP × O) + T(a,x,y) (Rn × W) = T(a,x,y) (Rn × I × O).
Fix an arbitrary (ȧ, ẋ, ẏ) ∈ Ta Rn × Tx I × Ty O. Then, it suffices to show there exist
((ȧ1 , ẋ1 ), ẏ1 ) ∈ T(a,x) WCPP × Ty O and (ȧ2 , (ẋ2 , ẏ2 )) ∈ Ta Rn × T(x,y) W such that
(ȧ, ẋ, ẏ) = (ȧ1 + ȧ2 , ẋ1 + ẋ2 , ẏ1 + ẏ2 ). It follows from Proposition A.1 and Lemma A.2
that ȧ1 = Hη ẋ1 + η̇, where η = a − x and where η̇ ∈ Nx I. The tangent vectors
ẏ1 ∈ Ty O, ȧ2 ∈ Ta Rn and ẋ1 ∈ Tx I can be chosen freely. Choosing ẋ1 = ẋ and
ẋ2 = 0; ẏ1 = ẏ and ẏ2 = 0 and ȧ1 = Hη ẋ1 and ȧ2 = ȧ − ȧ1 yields (ȧ, ẋ, ẏ). This
concludes the first part of the proof.
It remains to prove WGCPP is dense in SGCPP . For this, we recall that
SGCPP = (SCPP × O) ∩ (Rn × S).
Let (a, x, y) ∈ SGCPP and let η = a − x ∈ Nx I. Then, (a, x) ∈ SCPP and (x, y) ∈ S.
As W is an open dense submanifold of S by Assumption 1, there exists a sequence
(xi , yi ) ∈ W such that limi→∞ (xi , yi ) = (x, y). Moreover, there exists a sequence
ηi → η with ηi ∈ Txi I so that (xi + ηi , xi ) ∈ SCPP , because ΠI (SCPP ) = I. Since
WCPP is open dense in SCPP by Lemma 4.2, only finitely many elements of the
sequence are not in WCPP . We can pass to a subsequence (xj + ηj , xj ) ∈ WCPP . Now
(xj + ηj , xj , yj ) ∈ WGCPP and moreover its limit is (x + η, x, y) ∈ SGCPP . Q.E.D.
Proof of Theorem 6.2. For a well-posed triple (a, x, y) ∈ WGCPP we obtain from
(1.2) that κ[WGCPP ](a, x, y) = k(D(a,x,y) ΠO )(D(a,x,y) ΠRn )−1 kRn →O . We compute the
right-hand side of this equation. The situation appears as follows:

WCPP
ΠRn
/ Rn o ΠRn
WGCPP
ΠI ΠO
 
Io πI
W πO
/O

We have in addition the projections (1 × ΠI ) : WGCPP → WCPP , (a, x, y) 7→ (a, x)


and (ΠI × 1) : WGCPP → W, (a, x, y) 7→ (x, y). With this we have
(ΠRn ◦ (1 × πI ))(a, x, y) = ΠRn (a, x, y) and (πO ◦ (ΠI × 1))(a, x, y) = ΠO (a, x, y).
Taking derivatives we get
(D(a,x) ΠRn ) D(a,x,y) (1 × πI ) = D(a,x,y) ΠRn and (D(a,x) πO ) D(a,x,y) (ΠI × 1) = D(a,x,y) ΠO .
Since (x, y) ∈ W, the derivative D(x,y) πI is invertible, and so D(a,x,y) (1 × πI ) is also
invertible. On other hand, since (a, x) ∈ WCPP , the derivative D(a,x) ΠRn is invertible.
Altogether we see that D(a,x,y) ΠRn is invertible, and that

(D(a,x,y) ΠO )(D(a,x,y) ΠRn )−1


−1
(D(a,x) ΠRn )−1 .

= (D(x,y) πO ) D(a,x,y) (ΠI × 1) D(a,x,y) (1 × πI )
The derivatives in the middle satisfy
−1
(ẋ, ẏ) = D(a,x,y) (ΠI × 1) D(a,x,y) (1 × πI ) (ȧ, ẋ),
−1
while on the other hand (ẋ, ẏ) = (D(x,y) πI ) (D(a,x) ΠI ) (ȧ, ẋ). This implies that the
two linear maps on the right hand side in the previous equations are equal. Hence,
(D(a,x,y) ΠO )(D(a,x,y) ΠRn )−1 = (D(x,y) πO )(D(x,y) πI )−1 (D(a,x) ΠI )(D(a,x) ΠRn )−1
(A.2) = (D(x,y) πO )(D(x,y) πI )−1 Hη−1 PTx I ,
where we used (A.1) and Lemma A.2. Taking spectral norms on both sides completes
the proof for points in WGCPP . Finally, for points outside WGCPP the proof follows
from the definition of WGCPP combined with Theorem 4.3 and 5.2.
21
Ix?
x
x?
Tx ? I
x0
a a?

Figure A.1. A sketch of the geometric construction in the proof of Theorem 7.3.
A.4. Proofs for Section 7.
Proof of Proposition 7.1. Taking the derivative of the objective function at a
point (x, y) ∈ W we see that the critical points satisfy ha − x, D(x,y) πI (ẋ, ẏ)i = 0
for all (ẋ, ẏ) ∈ T(x,y) W. By the definition of W, the derivative D(x,y) πI is surjective,
which implies that (D(x,y) πI )(T(x,y) W) = Tx I. Therefore, the condition of being a
critical point is equivalent to a − x ∈ Nx I.
Proof of Theorem 7.3. By assumption (a? , x? , y ? ) ∈ WGCPP . In particular, this
implies (a? , x? ) ∈ WGCPP , so that D(a? ,x? ) ΠRn : T(a? ,x? ) WCPP → Rn is invertible.
By the inverse function theorem [47, Theorem 4.5], there is an open neighborhood
U(a? ,x? ) ⊂ WCPP of (a? , x? ) such that ΠRn restricts to a diffeomorphism on

Aa? = ΠRn (U(a? ,x? ) ).

Let Π−1
Rn denote the inverse of ΠRn on Aa? . Combining (A.1) and Lemma A.2, the
derivative of ΠI ◦ Π−1
Rn is given by (D(a,x) ΠI )(D(a,x) ΠRn )
−1
= Hη−1 PTx I and thus has
constant rank on Aa? equal to dim I. This implies that

(1 × (D(x,y) πO ) (D(x,y) πI )−1 )(D(a,x) ΠI )(D(a,x) ΠRn )−1 : Ta Aa? → T(x,y) W

has constant rank on Aa? equal to dim I = dim W, by Assumption 1. Consequently,


there is a smooth function Φ : Aa? → W with Φ(a? ) = (x? , y ? ). The above showed
that its derivative has constant rank equal to dim W, so Φ is a smooth submersion.
By Proposition 4.28 of [47], Φ(Aa? ) ⊂ W is an open subset, so it is a submanifold
of dimension dim I = dim W. Similarly, the derivative D(x,y) πI at (x, y) ∈ W has
maximal rank equal to dim I, by definition of W. Hence, πI ◦ Φ is also a smooth
submersion. After possibly passing to an open subset of Aa? , we can thus assume

N(x? ,y? ) := Φ(Aa? ) ⊂ W and Ix? = πI (N(x? ,y? ) ) ⊂ I

are open submanifolds each of dimension dim I and that


(i) x? is the global minimizer of the distance to a? on Ix? , and hence,
(ii) all x ∈ Ix? lie on the same side of Tx? Ix? , considering the tangent space
Tx? Ix? as an affine linear subspace of Rn with base point x? .
Now let a ∈ Aa? be a point in the neighborhood of a? . The restricted squared distance
function da : N(x? ,y? ) → R, (x, y) 7→ 21 ka − πI (x, y)k2 that is minimized on N(x? ,y? )
is smooth. Its critical points satisfy

ha − πI (x, y), D(x,y) πI (ẋ, ẏ)i = 0 for all (ẋ, ẏ) ∈ T(x,y) N(x? ,y? ) .

Since (x? , y ? ) ∈ N(x? ,y? ) ⊂ W, the derivative D(x? ,y? ) πI of πI : W → I is invertible.


Then, for all (x, y) ∈ N(x? ,y? ) the image of D(x,y) πI is the whole tangent space Tx I.
Consequently, the critical points must satisfy a − x ⊥ Tx I.
By construction, for every a ∈ Aa? there is a unique (x, y) ∈ N(x? ,y? ) so that
(a, x, y) ∈ WGCPP , namely (x, y) = Φ(a). This implies that (x, y) is the unique
critical point of the squared distance function restricted to N(x? ,y? ) .
It only remains to show that for all a ∈ Aa? , the minimizer of da is attained in
the interior of N(x? ,y? ) . This entails that the unique critical point is also the unique
22
global minimizer on N(x? ,y? ) . We show that there exists a δ > 0 such that restricting
Aa? to the open ball Bδ (a? ) of radius δ centered at a? in Rn yields the desired result.
Here is the geometric setup for the rest of the proof, depicted in Figure A.1: let
a ∈ Bδ (a? ) and let (x, y) ∈ N(x? ,y? ) be the minimizer for da in the closure of N(x? ,y? ) .
Then, x is in the closure of Ix? by continuity of πI . Furthermore, let x0 = P (x), where
P := PTx? Ix? is the orthogonal projection onto the tangent space Tx? I ? considered
as an affine linear subspace of Rn with base point x? .
The image of Bδ (a? ) under P is an open ball Bδ (x? ) ∩ Tx? Ix? ⊂ Tx? Ix? . The
distance function h : Ix? → R, x 7→ kx − P (x)k measures the “height” of x from the
tangent space Tx? Ix? . Now, we can bound the distance from a to x as follows.
ka − xk = ka − a? + a? − x? + x? − x0 + x0 − xk
≤ ka − a? k + ka? − x? k + kx? − x0 k + kx − x0 k
≤ ka? − x? k + 2δ + h(x)
(A.3) ≤ ka? − x? k + 2δ + max ? h(z),
z∈Pδ (x )

where the first bound is the triangle inequality, and where Pδ (x? ) ⊂ Ix? is the preim-
age of Bδ (x? ) ∩ Tx? Ix? under the projection P restricted to Ix? .
Now let x ∈ ∂Ix? be a point in the boundary of Ix? and, again, x0 = P (x). We
want to show that the distance ka − xk is strictly larger than (A.3). This would imply
that local minimizer for da is indeed contained in the interior of N(x? ,y? ) . We have

ka − xk = ka − a? + a? − x? + x? − x0 + x0 − xk
≥ ka? − x? + x? − x0 + x0 − xk − ka − a? k.
Note that x? = P (a? ). Hence we have the orthogonal decomposition
a? − x? + x? − x0 + x0 − x = (1 − P )(a? − x) + P (a? − x) = a? − x.
Plugging this into the above, we find
ka − xk ≥ k(1 − P )(a? − x) + P (a? − x)k − δ
p
= k(1 − P )(a? − x)k2 + kP (a? − x)k2 − δ
p
≥ k(1 − P )(a? − x)k2 + 2 − δ,
where  = minz∈P (∂Ix? ) kx? − zk > 0.
Observe that a? − x? points from x? to a? and likewise x0 − x points from x to
x . Therefore, ha? − x? , x0 − xi ≥ 0 because both point away from the manifold Ix? ,
0

which lies entirely on the side of x by assumption (ii) above; see also Figure A.1.
Consequently, k(1 − P )(a? − x)k = ka? − x? + x0 − xk ≥ ka? − x? k, which implies
p 2
ka − xk ≥ ka? − x? k2 + 2 − δ = ka? − x? k + − δ + O(4 ).
2ka? − x? k
Note that h is a continuous function with limx→x? h(x) = 0. Hence, as δ → 0 the
maximum height tends to zero as well. Moreover,  and ka? − x? k are independent of
δ. All this implies that we can choose δ > 0 sufficiently small so that
2
− δ + O(4 ) ≥ 2δ + max ? h(z).
2ka? − x? k z∈Pδ (x )

It follows that ka − xk is lower bounded by (A.3), so that the minimizer for da must
be contained in the interior of N(x? ,y? ) .
The foregoing shows that on Bδ (a? ) ∩ Aa? , the map ΠI ◦ Π−1
Rn equals the nonlinear
projection PIx? onto the manifold Ix? . Putting everything together, we obtain
ρ(a? ,x? ,y? ) (a) = (ΠO ◦ Π−1 −1 −1 −1
Rn )(a) = (πO ◦ πI ◦ ΠI ◦ ΠRn )(a) = (πO ◦ πI ◦ PIx? )(a);
23
the second equality following (A.2). The condition number of this smooth map is
given by (1.2). Comparing with Theorem 5.2 and 6.2 concludes the proof.

REFERENCES

[1] T. J. Abatzoglou, The minimum norm projection on C 2 manifolds in Rn , Trans. Amer.


Math. Soc., 243 (1978), pp. 115–122.
[2] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds,
Princeton University Press, 2008.
[3] P.-A. Absil, R. Mahony, and J. Trumpf, An extrinsic look at the Riemannian Hessian, in
Geometric Science of Information, Lecture Notes in Computer Science, Springer, Berlin,
Heidelberg, 2013.
[4] D. Amelunxen and P. Bürgisser, A coordinate-free condition number for convex program-
ming, SIAM J. Optim., 22 (2012), pp. 1029–1041.
[5] J.-P. Aubin and H. Frankowska, Set-Valued Analysis, Birkhäuser Boston, reprint of the
1990 ed., 2009.
[6] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, eds., Templates for the
Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, PA,
2000.
[7] D. Bau and L. N. Trefethen, Numerical Linear Algebra, SIAM, 1997.
[8] A. Belloni and R. M. Freund, A geometric analysis of Renegar’s condition number, and its
interplay with conic curvature, Math. Program., Ser. A, 119 (2009), pp. 95–107.
[9] T. Bendory, Y. C. Eldar, and N. Boumal, Non-convex phase retrieval from STFT mea-
surements, IEEE Trans. Inform. Theory, 64 (2018), pp. 467–484.
[10] L. Blum, F. Cucker, M. Shub, and S. Smale, Complexity and Real Computation, Springer-
Verlag, New York, 1998.
[11] N. Boumal, Riemannian trust regions with finite-difference Hessian approximations are glob-
ally convergent, in Geometric Science of Information, F. Nielsen and F. Barbaresco, eds.,
vol. 2, 2015, pp. 467–475.
[12] , Nonconvex phase synchronization, SIAM J. Optim., 26 (2016), pp. 2355–2377.
[13] , An introduction to optimization on smooth manifolds. Available online, 2020.
[14] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre, Manopt, a Matlab toolbox for
optimization on manifolds, J. Mach. Learn. Res., 15 (2014), pp. 1455–1459.
[15] P. Breiding and N. Vannieuwenhoven, Convergence analysis of Riemannian Gauss–Newton
methods and its connection with the geometric condition number, Appl. Math. Letters, 78
(2018), pp. 42–50.
[16] , A Riemannian trust region method for the canonical tensor rank approximation prob-
lem, SIAM J. Optim., 28 (2018), pp. 2435–2465.
[17] P. Bürgisser and F. Cucker, Condition: The Geometry of Numerical Algorithms, Springer,
Heidelberg, 2013.
[18] N. Chernov and H. Ma, Least squares fitting of quadratic curves and surfaces, in Computer
Vision, S. Yoshida, ed., Nova Science Publishers, 2011.
[19] D. Cheung and F. Cucker, A new condition number for linear programming, Math. Program.,
91 (2001), pp. 163–174.
[20] D. Cheung, F. Cucker, and J. Peña, A condition number for multifold conic systems, SIAM
J. Optim., 19 (2008), pp. 261–280.
[21] A. Coulibaly and J.-P. Crouzeix, Condition numbers and error bounds in convex program-
ming, Math. Program., 116 (2009), pp. 79–113.
[22] F. Cucker and J. Peña, A primal-dual algorithm for solving polyhedral conic systems with a
finite-precision machine, SIAM J. Optim., 12 (2001), pp. 522–554.
[23] C. Da Silva and F. J. Herrmann, Optimization on the hierarchical Tucker manifold – appli-
cations to tensor completion, Linear Algebra Appl., 481 (2015), pp. 131–173.
[24] J. W. Demmel, The geometry of ill-conditioning, J. Complexity, 3 (1987), pp. 201–229.
[25] , On condition numbers and the distance to the nearest ill-posed problem, Numer. Math.,
51 (1987), pp. 251–289.
[26] , Applied Numerical Linear Algebra, SIAM, 1997.
[27] M. do Carmo, Riemannian Geometry, Birhäuser, 1993.
[28] A. L. Dontchev and R. T. Rockafeller, Implicit Functions and Solution Mappings: A
View From Varational Analysis, Springer Monographs in Mathematics, Springer, 2009.
[29] J. Draisma, E. Horobet, G. Ottaviani, B. Sturmfels, and R. Thomas, The Euclidean
distance degree of an algebraic variety, Found. Comput. Math., 16 (2016), pp. 99–149.
[30] M. Epelman and R. M. Freund, Condition number complexity of an elementary algorithm
for computing a reliable solution of a conic linear system, Math. Program., 88 (2000),
pp. 451–485.
[31] , A new condition measure, preconditioners, and relations between different measures of
conditioning for conic linear systems, SIAM J. Optim., 12 (2002), pp. 627–655.
[32] O. Faugeras and Q. Luong, The Geometry of Multiple Images, MIT Press, Cambridge, MA,
2001. The laws that govern the formation of multiple images of a scene and some of their

24
applications, With contributions from Théo Papadopoulo.
[33] F. Feppon and P. F. J. Lermusiaux, The extrinsic geometry of dynamical systems tracking
nonlinear matrix projections, SIAM J. Matrix Anal. Appl., 40 (2019), pp. 814–844.
[34] R. M. Freund and F. Ordóñez, On an extension of condition number theory to nonconic
convex optimization, Math. Oper. Res., 30 (2005), pp. 173–194.
[35] R. M. Freund and J. R. Vera, Condition-based complexity of convex opitimization in conic
linear form via the ellipsoid algorithm, SIAM J. Optim., 10 (1999), pp. 155–176.
[36] G. H. Golub and C. Van Loan, Matrix Computations, The John Hopkins University Press,
4 ed., 2013.
[37] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge
University Press, Cambridge, second ed., 2003. With a foreword by Olivier Faugeras.
[38] G. Heidel and V. Schulz, A Riemannian trust-region method for low-rank tensor completion,
Numer. Linear Algebra Appl., 25 (2018).
[39] A. Heyden and K. Åström, Algebraic properties of multilinear constraints, Math. Meth. Appl.
Sci., 20 (1997), pp. 1135–1162.
[40] N. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, PA, 1996.
[41] M. W. Hirsch, Differential Topology, no. 33 in Graduate Text in Mathematics, Springer-Verlag,
1976.
[42] S. Holtz, T. Rohwedder, and R. Schneider, On manifolds of tensors of fixed TT-rank,
Numer. Math., 120 (2012), pp. 701–731.
[43] J. Humpherys, T. J. Jarvis, and E. J. Evans, Foundations of Applied Mathematics, SIAM,
2017.
[44] D. Kressner, M. Steinlechner, and B. Vandereycken, Low-rank tensor completion by
Riemannian optimization, BIT Numer. Math., 54 (2014), pp. 447–468.
[45] Z. Kúkelová, Algebraic Methods in Computer Vision, PhD thesis, Czech Technical University
in Prague, 2013.
[46] J. M. Lee, Riemannian Manifolds: Introduction to Curvature, Springer-Verlag, 1997.
[47] J. M. Lee, Introduction to Smooth Manifolds, Springer, New York, USA, 2 ed., 2013.
[48] Y. Liu and W. Wang, A revisit to least squares orthogonal distance fitting of parametric curves
and surfaces, in Advances in Geometric Modeling and Processing, F. Chen and B. Juttler,
eds., Lecture Notes in Computer Science, 2008.
[49] MATLAB, R2017b. Natick, Massachusetts, 2017.
[50] S. Maybank, Theory of Reconstruction from Image Motion, vol. 28 of Springer Series in In-
formation Sciences, Springer-Verlag, Berlin, 1993.
[51] B. O’Neill, Semi-Riemannian Geometry, Academic Press, 1983.
[52] , Elementary Differential Geometry, Elsevier, revised second edition ed., 2001.
[53] J. Peña and V. Roshchina, A data-independent distance to infeasibility for linear conic sys-
tems, SIAM J. Optim., 30 (2020), pp. 1049–1066.
[54] P. Petersen, Riemannian Geometry, vol. 171 of Graduate Texts in Mathematics, Springer,
New York, second ed., 2006.
[55] I. R. Porteous, The normal singularities of a submanifold, J. Differ. Geom., 5 (1971).
[56] J. Renegar, Incorporating condition measures into the complexity theory of linear program-
ming, SIAM J. Optim., 5 (1995), pp. 506–524.
[57] , Linear programming, complexity theory and elementary functional analysis, Math. Pro-
gram., 70 (1995), pp. 279–351.
[58] J. R. Rice, A theory of condition, SIAM J. Numer. Anal., 3 (1966), pp. 287–310.
[59] M. Spivak, A Comprehensive Introduction to Differential Geometry, vol. 2, Publish or Perish,
Inc., 1999.
[60] M. Steinlechner, Riemannian optimization for high-dimensional tensor completion, SIAM
J. Sci. Comput., 38 (2016), pp. S461–S484.
[61] A. Sung-Joon, Geometric fitting of parametric curves and surfaces, J. Inf. Process. Syst., 4
(2008), pp. 153–158.
[62] R. Thom, Sur la theorie des enveloppes, J. Math. Pures Appl., 41 (1962), pp. 177–192.
[63] B. Vandereycken, Low-rank matrix completion by Riemannian optimization, SIAM J. Optim.,
23 (2013), pp. 1214–1236.
[64] J. R. Vera, Ill-posedness and the complexity of deciding existence of solutions to linear pro-
grams, SIAM J. Optim., 6 (1996), pp. 549–569.
[65] , Geometric measures of convex sets and bounds on problem sensitivity and robustness
for conic linear optimization, Math. Program., 147 (2014), pp. 47–79.
[66] Visual Geometry Group, University of Oxford, Model house data set, Last accessed 28
september 2020. Access online at https://www.robots.ox.ac.uk/∼vgg/data/data-mview.
html.
[67] H. Weyl, On the volume of tubes, Amer. J. Math., 2 (1939), pp. 461–472.
[68] J. H. Wilkinson, Rounding Errors in Algebraic Processes, Prentice-Hall, Englewood Cliffs,
New Jersey, 1963.
[69] , The Algebraic Eigenvalue Problem, Oxford University Press, Amen House, London,
United Kingdom, 1965.
[70] H. Woźniakowski, Numerical stability for solving nonlinear equations, Numer. Math., 27
(1976/77), pp. 373–390.

25

You might also like