Location Jul 29 12
Location Jul 29 12
DECOMPOSITION METHOD
Abstract. A generalized Weiszfeld method is proposed for the multi–facility location problem. The prob-
lem is relaxed using probabilistic assignments, and is decomposed into single facility location problems, that
are coupled by these assignments, and can be solved in parallel. The probabilistic assignments are updated
at each iteration, using the distances to the current centers. The method thus iterates between assignments
and centers updates, with the probabilistic assignments gradually approaching the pure assignments to the
computed centers. The method also provides intrinsic criteria for the quality of the solution, and for the
optimal number of facilities to serve the given customers. A duality theorem allows verifying the optimality
of the cluster centers by solving a dual problem. Numerical experience with several problems from the
literature is presented.
1. Introduction
The Fermat–Weber (also single facility) location problem is to locate a facility that will serve
optimally a set of customers, given by their locations and weights, in the sense of minimizing the
weighted sum of distances traveled by the customers, see [14] and [16]. A well known method for solving
the problem is the Weiszfeld method [44], a gradient method that expresses and updates the sought
center as a convex combination of the data points.
The multi–facility location problem (MFLP) is to locate a (given) number of facilities to serve the
customers as above. Each customer is assigned to a single facility, and the problem (also called location–
allocation) is to determine the optimal locations of the facilities, as well as the optimal assignments of
customers (assignment is not an issue in the single facility case.)
In some MFLP cases the facility locations are constrained to lie in a given subset of the plane, in
particular a given subset of the customers locations. This constrained, or discrete, MFLP model was
first considered by Hakimi [20], and is often solved by mixed integer programming, see [39].
We study here the continuous, or unconstrained, MFLP model, where the facilities can be anywhere
in the plane. We also assume that the facilities can handle all the customers assigned to them. The
capacitated problem, where some facilities may have limited capacities, see, e.g., [45], will be treated
elsewhere using the results of [23].
The continuous MFLP is a clustering problem, where customers and facility locations correspond
to data points and centers, and the clustering criterion is the sum of distances from each data point to its
cluster center, to be minimized. We use interchangeably center = facility location, and cluster = the
set of customers assigned to a facility.
This paper solves the continuous MFLP using the probabilistic clustering approach of [4], and is a
continuation of [26]. We approximate the MFLP (which is NP hard, [37]) by replacing rigid assignments
with probabilistic assignments, as in [5] and [12]. The resulting iteration is a natural generalization of the
Weiszfeld method [44] to several facilities.
The plan of the paper is as follows. Section 2 describes the multi–facility location problem. The special
case of a single facility is discussed in Section 3.
Section 4 introduces probabilistic assignments, with cluster membership probabilities that depend on
the distances involved. Section 5 introduces the probabilistic location problem (P.K), an approximation
of MFLP. The updates of the centers of (P.K) are explained in Section 6.
A generalized Weiszfeld algorithm for solving (P.K) is presented in Section 9. The cluster membership
probabilities are used in Section 10 to chart the territories of the facilities, a Voronoi diagram of the centers.
The gradient of the objective is modified in Section 11 to guarantee its existence everywhere. The modified
gradient is used to characterize optimality and to prove convergence, Theorem 1.
Section 12 gives a duality theory for (P.K), based on the geometric duality of Kuhn ([29], [30]). A
duality theorem, Theorem 2, allows verifying the optimality of any feasible solution (centers and their
assignments) by solving a dual problem.
Section 13 recalls two useful concepts from probabilistic clustering, the joint distance function (JDF)
and the classification uncertainty function (CUF), and applies them to the problem (P.K).
Section 14 presents numerical results from our method applied to several problems in the literature.
A proof of Theorem 2 is given in Appendix A. Appendix B shows the connection between the CUF
(introduced in Section 13) and the Kullback-Leibler entropic distance. The probabilistic model used here
is related to Luce’s choice axiom [36] in Appendix C.
index set {1, 2, . . . , K} by 1:K. For a vector x = (xj ) ∈ Rn , kxk denotes the
Notation: We denote an q
Pn 2
Euclidean norm, kxk := j=1 |xj | . The distance between the vectors x and y is denoted d(x, y).
The Euclidean distance
d(x, y) = kx − yk (1)
is used throughout.
Consider
Typically, the points {xi } are the locations of customers1, and the weights {wi } are their demands.
1Most location problems are planar, i.e. n = 2, but the results below hold for general n, and are stated for Rn .
THE MULTI–FACILITY LOCATION PROBLEM 3
The multi–facility location problem (MFLP)2 defined next, uses the data
and is to locate K facilities, and assign customers to these facilities, so as to minimize the sum of weighted
distances
XK X
min wi d(xi , ck ) (L.K)
c1 ,c2 ,...,cK
k=1 xi ∈Xk
where {ck : k ∈ 1:K} are the centers (facility locations), and Xk is the cluster of customers assigned to the
k th facility. For K = 1 this reduces to the Fermat–Weber location problem, see § 3, where assignment is
absent. The other extreme case, K = N (every point is a center), is of no interest.
The MFLP (L.K) can be broken into two problems:
(a) the assignment problem: given K centers {ck }, assign each of the N points {xi } to the nearest
center, and
(b) the location problem: given the K clusters Xk formed in (a), find the center of each cluster (using,
say, the Weiszfeld method.)
A natural heuristic ([11], [33]) for solving (L.K) is to iterate between these two problems: the clusters
are updated (assignment), requiring updating the centers (location), then the points are re-assigned, etc.
A solution {(Xk , ck ) : k ∈ 1:K} is stable if it cannot be improved by re–assigning any point xi to
another cluster. The heuristic method described above may terminate in a non–optimal stable solution,
declaring it to be optimal, as illustrated in Example 1.
Example 1. Given N = 4 points, x1 = (0, 0), x2 = (1, 0), x3 = (1, 1), x4 = (0, 1), and all weights wi = 1,
consider the problem (L.2), and three of its solutions:
Solution 1: Each cluster is a pair of adjacent points, see e.g. Fig. 1(a). The centers may lie anywhere
on the segments joining the points, and the value of the objective function is 2.
Solution 2: Unequal clusters with centers in opposite points, for example, c1 = (0, 0), c2 = (1, 1), see,
e.g., Fig. 1(b). The value of the objective is 2.
Solution 3: Unequal clusters, see, e.g., Fig. 1(b) with centers c1 = ( 3+1√3 , 3+1√3 ), c2 = (1, 1) and the
value of the objective is 1.9318.
Solution 3 is optimal, but Solutions 1 and 2 are stable: for example, in Solution 1, the point x2 is not
closer to the center of X1 than to the center of X2 , and should not be re–assigned (to get a better solution)
on the basis of distance information alone.
A fourth solution is where each cluster is a pair of opposite points, the centers are anywhere on the
√
diagonals, and the objective function value is 2 2. This solution is unstable, except in the (uninteresting)
case where the two centers coincide at ( 12 , 12 ).
If a simple example like this has stable solutions that are not optimal, one would expect many such
solutions in bigger problems, say hundreds of customers and several facilities.
x4 x3 x4 x3
• • • •
• • • •
x1 x2 x1 x2
Given {X, W, 1} as above, the Fermat–Weber location problem is to find a point c ∈ Rn minimizing
the sum of weighted distances,
N
X
minn wi d(xi , c), (L.1)
c∈R
i=1
see [17], [35], [45] and their references.
If the points {xi } are not collinear, as is assumed throughout, the objective function
N
X
f (c) = wi d(xi , c) (3)
i=1
and the optimal center c∗ , if not in X, is characterized by ∇f (c∗ ) = 0, expressing it as a convex combination
of the points xi ,
N
wi kxi − c∗ k
X
∗ ∗ ∗
c = λi xi , with coefficients λi = N that depend on c∗ .
wm kxm − c∗ k
P
i=1
m=1
c+ := T (c) (5)
In order to extend ∇f (c) to all c, Kuhn [30] modified its definition as follows: ∇f (c) := −R(c), where
−∇f (c), if c 6∈ X;
R(c) := R j (7)
max {0, kRj k − wj } , if c = xj ∈ X,
j
kR k
and X wi
Rj := (xi − xj ) (8)
kxi − xj k
i6=j
is the resultant force of N − 1 forces of magnitude wi and direction xi − xj , i 6= j.
3.1. The results of Kuhn, [30]. The following properties of the mappings R(·), T (·), and the optimal
center c∗ were proved by Kuhn [30]. By “= c∗ ” we mean “is optimal”.
(a) c = c∗ ⇐⇒ R(c) = 0.
(b) c∗ ∈ conv X (the convex hull of X).
(c) If c = c∗ then T (c) = c. Conversely, if c 6∈ X, T (c) = c then c = c∗ .
(d) If T (c) 6= c then f (T (c)) < f (c).
(e) xj = c∗ ⇐⇒ wj ≥ kRj k.
(f) If xj 6= c∗ , the direction of steepest descent of f at xj is Rj /kRj k.
(g) If xj 6= c∗ there exists δ > 0 such that
0 < kc − xj k =⇒ kT s (c) − xj k > δ for some s.
kT (c) − xj k kRj k
(h) lim = .
c→xj kc − xj k wj
(i) For any c0 , if no cr := T r (c0 ) ∈ X, then lim cr = c∗ .
r→∞
These results are generalized in Theorem 1 to the case of several facilities.
was refuted by Chandrasekaran and Tamir [9]. Convergence can be assured by modifying the algorithm
(5)–(6) at a sticky, non–optimal, center that coincides with a data point xj . Balas and Yu [3] suggested
moving from xj in the direction Rj (8) of steepest descent, assuring a decrease of the objective, and non–
return to xj by (g) above. Vardi and Zhang [43] guaranteed leaving xj by adding to the objective function
a quadratic term in the distances to the other data points. Convergence was also addressed by Brimberg
[7], Drezner [15], Eckhardt [18], Katz [28] and others.
For 1 < K < N , the problem (L.K) is NP hard, [37]. It can be solved polynomially in N for K = 2, see
[13], and possibly for other given K.
We relax the assignment problem in (L.K) by using probabilistic assignments, see § 5. First we require
the concept of cluster membership probabilities,
assumed to depend only on the distances {d(x, ck ) : k ∈ 1 : K} of the point x from the K centers. A
reasonable assumption is
where w is the weight of x, and D(·) is a function of x, that does not depend on k. D(x) is called the
joint distance function (JDF) at x.
Equations (9) are optimality conditions for the extremum problem
( K K
)
X X
2
min w pk (x) d(x, ck ) : pk (x) = 1, pk (x) > 0, k ∈ 1:K (E)
k=1 k=1
in the probabilities {p1 (x), · · · , pK (x)}. The squares of probabilities in the objective of (E) serve to smooth
the underlying non–smooth problem (L.K), see the seminal paper by Teboulle [41].
Since probabilities add to one we get from (9),
Q
d(x, cj )
j6=k
pk (x) = K Q
, k ∈ 1:K, (10)
P
d(x, cm )
ℓ=1 m6=ℓ
3There are other ways to model Assumption (A), e.g. [4], but the simple model (9) works well enough for our purposes.
THE MULTI–FACILITY LOCATION PROBLEM 7
Given the probabilistic assignments for each xi ∈ X we denote the set of K N numbers {qk (xi )} by
Probabilistic assignments are assumed to have the following property: If a data point xj coincides with
a center ck , i.e. if d(xj , ck ) = 0, then q(xj ) is a pure assignment,
For the cluster membership probabilities {pk (x)} this is assured by (10).
We use a probabilistic approximation of the problem (L.K) in the form
K X
X N
min wi qk (xi ) d(xi , ck ), (P.K)
{c1 , . . . , cK } k=1 i=1
{{q1 (xi ), . . . , qK (xi )} : i ∈ 1:N }
with two sets of variables, the centers {ck } and probabilistic assignments {qk (xi )}, that are updated
iteratively. The problem (P.K) uses the same data, (2), as in problem (L.K).
For any centers and probabilistic assignments, the objective function of (P.K) is an upper bound on the
optimal value of (L.K),
XK X N
wi qk (xi ) d(xi , ck ) ≥ min (L.K), (16)
k=1 i=1
and therefore so is the optimal value of (P.K),
The probabilistic assignments {qk (x)} used below are calculated by raising the cluster membership
probabilities {pk (x)} to a power ν ≥ 1 and normalizing4,
pνk (xi )
qk (xi ) = K
, i ∈ 1:N, k ∈ 1:K, (18a)
ν
P
pj (xi )
j=1
d(xi , cj )ν
Q
j6=k
qk (xi ) = K
. (18b)
)ν
P Q
d(xi , cm
ℓ=1 m6=ℓ
To record the dependence on {pk (x)} and the exponent ν, we denote the probabilistic assignments (18a)
(ν)
by {pk (xi )}. In previous work ([4] and [23]–[26]) we used the exponent ν = 2 without renormalization.
For a discussion of the exponent ν see § 8 below.
5.1. Bounds for (L.K) with given centers. Given a set of K centers {c1 , . . . , cK }, not necessarily
optimal, the minimal value of (L.K) is obtained by assigning each xi to its nearest center,
N
X
m(c1 , . . . , cK ) = wi min{d(xi , ck ) : k ∈ 1:K}. (19)
i=1
This value (not necessarily optimal since the centers are arbitrary) can be bounded as follows.
Corollary 1. Given a set of centers {c1 , . . . , cK }, the corresponding cluster membership probabilities
{pk (xi ) : k ∈ 1:K, i ∈ 1:N } of (10), and the values of the JDF (11), {D(xi ) : i ∈ 1:N }, at the points
{xi : i ∈ 1:N },
N N
X X D(xi )
D(xi ) ≤ m(c1 , . . . , cK ) ≤ PK 2
, (20a)
i=1 i=1 j=1 pj (xi )
N
X
≤K D(xi ). (20b)
i=1
To prove the right inequality, note that for any probabilistic assignments {qk (xi ) : i ∈ 1:N, k ∈ 1:K},
K X
X N
wi qk (xi ) d(xi , ck ) ≥ m(c1 , . . . , cK ) (21)
k=1 i=1
THE MULTI–FACILITY LOCATION PROBLEM 9
(2)
In particular, for qk (x) = pk (x),
K X
N
X p2 (xi )
LHS (21) = wi PK k d(xi , ck ),
2
k=1 i=1 j=1 pj (xi )
N K
X wi X
= PK pk (xi )2 d(xi , ck ),
i=1 j=1 p2j (xi ) k=1
N K
!
X 1 X
= PK pk (xi ) D(xi ), by (9),
i=1 j=1 p2j (xi ) k=1
N
X D(xi )
= PK = RHS (20a).
2
i=1 j=1 pj (xi )
PK 1
Finally, (20b) follows from j=1 p2j (xi ) ≥ K, for all i.
Note that the bounds (20a) are tighter as {pj (xi )} get closer to pure assignments.
6. Updates of centers
Given probabilistic assignments {qk (xi )}, the objective function of (P.K) is a separable function of the
centers,
K
X
f (c1 , . . . , cK ) := fk (ck ), (22a)
k=1
N
X
where fk (c) := wi qk (xi ) kxi − ck, k ∈ 1:K. (22b)
i=1
The centers problem thus separates into K problems of type (L.1), coupled by the probabilistic assign-
ments {qk (xi )}. If the data points are not collinear, as we assume throughout, the centers problem has a
unique optimal solution {c∗k : k ∈ 1:K}.
The gradient of (22b), wherever it exists, is
N
X wi qk (xi )
∇fk (c) = − (xi − c), k ∈ 1:K. (23)
kxi − ck
i=1
Zeroing the gradients (23) we get, in analogy with the Weiszfeld method, the optimal centers {c∗1 , . . . , c∗K }
as convex combinations of the data points,
N
X
c∗k = λ∗k (xi ) xi , (24a)
i=1
c+
k := Tk (ck ), for k ∈ 1:K. (26)
The gradient (23) is undefined (0/0) if c coincides with any of the data points. In order to define the
gradient everywhere, we modify its definition following Kuhn ([29]–[30]), and denote the modified gradient
by −Rk .
If a center ck 6∈ {xj : j ∈ 1:N },
N
X wi qk (xi )
Rk (ck ) := (xi − ck ). (27a)
kxi − ck k
i=1
Rjk
Rk (xj ) := max {kRjk k − wj qk (xj ), 0} , (27b)
kRjk k
where
X wi qk (xi )
Rjk = (xi − xj ), (27c)
kxi − xj k
i6=j
and qk (xj ) = 1 in (27b), by (15). Therefore, Rk (xj ) = 0 if kRjk k < wj ; otherwise, Rk (xj ) is a vector
with magnitude kRjk k − wj and direction Rjk . As in (8), the vector Rjk is the resultant of N − 1 forces of
magnitude wi qk (xi ) and direction xi − xj , i 6= j.
As is typically the case in gradient methods, the iterations (26) make big steps at first, approaching their
fixed points, then the iterations slow down and movement at each iteration is small. The “fast” iterations
are few in number, the number of the “slow” iterations is determined by the stopping rule.
Most progress towards solving the location problem occurs in the first few iterations. The slow iterations
at the end deal mainly with the assignment problem.
THE MULTI–FACILITY LOCATION PROBLEM 11
(ν)
With a low value of ν, such as ν = 2, the distributions {pk (x)} may be far from pure assignments, and
require rounding to the nearest integer, 0 or 1.
(ν)
High values of the exponent ν in (18b), say ν > 10, produce {pk (xi )} that are close to pure assignments.
For this reason, high values of ν are useful in the slow iterations at the end. In contrast, using high values
of ν at the beginning may cause premature convergence to a solution that is not necessarily optimal.
This suggests increasing the exponent ν at each iteration. We use here the simple update
ν + := ν + ∆ (28)
where ∆ > 0 is the increment per iteration. If ν0 is the initial exponent, the k th –exponent is thus
ν0 + k ∆. This amounts to a simultaneous approximation of the location and assignment problems of § 2,
concentrating on location first then switching gradually to assignment.
Increasing the exponent ν results in a decrease of the value of the objective function (22), as shown in
following lemma.
Lemma 1. Let {c1 , . . . , cK } be any centers, x any point, {dk (x, ck ) : k ∈ 1:K} the distances, not all
(ν)
equal, and {pk (x) : k ∈ 1:K} the probabilities in (18a). If ν2 > ν1 ≥ 1 then
K K
(ν ) (ν )
X X
pk 2 (x) d(x, ck ) < pk 1 (x) d(x, ck ), (29)
k=1 k=1
and
K
(ν)
X
lim pk (x) d(x, ck ) = min {d(x, ck ) : k ∈ 1:K}. (30)
ν→∞
k=1
and are not all equal. Raising the exponent from ν = ν1 to ν = ν2 , increases the highest probabilities and
decreases the lowest ones, in particular, there is an index j such that
(ν ) (ν ) (ν ) (ν )
pk 2 (x) > pk 1 (x) for k ≤ j, and pk 2 (x) ≤ pk 1 (x), for k > j.
K
(ν ) (ν ) P
Let αk := pk 2 (x) − pk 1 (x). Then αk > 0 for k ≤ j, and αk ≤ 0 for k > j. Also, αk = 0, and therefore
k=1
j
X K
X
αk = − αk = α, say.
k=1 k=j+1
j
X K
X
∴ 1
α (LHS(29) − RHS(29)) = ( ααk ) d(x, ck ) − (− ααk ) d(x, ck ) < 0
k=1 k=j+1
12 CEM IYIGUN AND ADI BEN-ISRAEL
j K
( ααk ) d(x, ck ) is a convex combination of low values, and (− ααk ) d(x, ck ) is a convex combi-
P P
since
k=1 k=j+1
nation of higher values.
To prove (30) let M be the index set of minimal distances,
Then
(ν)
lim pi = 0, for i 6∈ M,
ν→∞
(ν) 1
lim pi = , for i ∈ M,
ν→∞ |M |
where |M | is the number of elements of M . Therefore
K K
(ν) (ν)
X X
lim pk (x) d(x, ck ) = lim pk d(x, ck ) = min {d(x, ck ) : k ∈ 1:K}.
ν→∞ ν→∞
k=1 k=1
Remark 2.
(a) Convergence can be established as in the Weiszfeld algorithm, by forcing a move away from non–optimal
data points (where the algorithm gets stuck by (25b)), using ideas of Balas and Yu [3], Vardi and Zhang
THE MULTI–FACILITY LOCATION PROBLEM 13
Example 2. The dependence of Algorithm 1 on the exponent ν used in (18b) is illustrated here for the
simple Example 1.
Fig. 2 shows three trajectories, starting at the same initial solution (a pair of close points at the center
of the unit square [0, 1]2 ), using a fixed exponent ν throughout the iterations, and converging to the three
solutions of Example 1. For ν = 6, ν = 12, and ν = 18, the iterates converged to Solutions 2, 3 and 1,
respectively.
Table 1 shows the results for 1, 000 initial solutions (each a pair of points chosen randomly in [0, 1]2 ),
using different values of ν (that are held constant throughout), and a stopping criterion of ǫ = 0.0001.
The table shows the percentage of cases ending with Solutions 1, 2 or 3 of Example 1, and the average
number of iterations. The non–optimal Solution 2 is present only for small ν, and occurs in 100% of the
time for ν = 6. The optimal Solution 3 dominates for intermediate values of ν (it occurs 83% of the time
for ν = 13), and is present for all larger ν. The non–optimal Solution 1 dominates for large ν.
Each center ck serves the customers in a territory that is closer to it than to any other center, see, e.g.,
[21]–[22]. This territory is expressed naturally using the membership probabilities (10).
For any two (distinct) centers cj , ck the locus of the points x with pj (x) = pk (x) is by (10) represented
by the equation
kx − cj k = kx − ck k (32)
14 CEM IYIGUN AND ADI BEN-ISRAEL
70
50 55 0.6
60 0.
0.7
50 0.8
0.
40
33
9
40 0
0.3.55 0.3 0.333
33
30 30 0.333
0.6
0.5.56
0.5
20 0 00.7
.8 0.9
0.55
5
0.9
20 0.8
5
0.7
0.5
10
0.6
3
0.6 5
0.33
0 0.5
0.3
10
33
−10
0
0 10 20 30 40 50 0 20 40 60
(a) 15 data points (b) Probability level sets and Voronoi diagram
and is thus a hyperplane5, the orthogonal bisector of the segment joining cj and ck . On this hyperplane
one is indifferent between the two centers. The portion of the hyperplane (32) where
is either empty, bounded or unbounded6. We call it the common boundary of the clusters Xj and Xk .
The common boundaries constitute a Voronoi diagram, [2], of the centers. Each Voronoi cell is a
polyhedron, and is the territory served best by its facility.
Example 3. Problem 9 in Table 2, due to Cooper [10], has N = 15 data points, see Figure 3(a), and K = 3
centers. Figure 3(b) shows the optimal centers, found by Algorithm 1, some level sets of the membership
probability for each cluster, and the Voronoi diagram of the centers, partitioning the plane into territories
served by each facility.
0.35 0.3
5
10
10 0.4
9 0.4
0.4
9 0.
5
0.4
0
8 8 0.6 0.7 .6
0.5
0.8
0.35
6
0.
0.6
7
0.
7 0.0.8
0.5
5
7 0.4 0.4
6 0.5
6 0 5
0.7.6 0.3
0.8
0.35
5 0.4
5 0.35 5
0.
35
4
4 0.35 0.4 0.
0.4 0.
3 35 0.5 0.4 35
3 0. 6 0.
0. 0. 5
0.7
2 00.7.8 6
0.8
0.9
0.9
0.35
2 0.6
0.4
1 6
0.
0.35
1 0.5
0.4
0.4
0.5
0
5
0.3
4
0.
0
0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10
(a) 50 data points (b) Probability level sets and Voronoi diagram
Example 4. Figure 4(a) shows 50 data points of Problem 30 in Table 2 below. This problem, with K = 5,
was solved by Eilon et al in [19, p. 83]. Figure 4(b) shows the optimal centers (found by Algorithm 1),
some probability level sets for each cluster, and the common boundaries of the territories served by the
facilities (2 common boundaries are empty.)
11. Convergence
Algorithm 1 decomposes the problem (P.K) to K problems of type (L.1), coupled by the probabilistic
assignments. The convergence proof of Kuhn [29]–[30] (with modifications, e.g., [3], [7] or [43]) can be
adapted here.
While the algorithm converges from any initial set of centers {c1 , · · · , cK }, it does not necessarily
converge to an optimal solution of the original problem (L.K).
Writing (25a) as
N
X N
X
Tk (c) = λki xi − c + c = λki (xi − c) + c
i=1 i=1
we get, from (27a),
Theorem 1. Let the data {X, W, K}, and a set Q(X) of probabilistic assignments (14), be given.
(a) For any set of points {ck : k ∈ 1:K} ⊂ Rn , the condition7
is necessary and sufficient for the points {c1 , . . . , cK } to minimize f (c1 , . . . , cK ) in (22a).
(b) The optimal centers {c∗k : k ∈ 1:K} are in the convex hull of X.
(c) If the point c is an optimal location of the k th –facility (for the given Q(X)), then it is a fixed point8
of Tk ,
Tk (c) = c, for some k ∈ 1:K. (37)
Conversely, if c 6∈ X and satisfies (37), then c is optimal.
(d) If Tk (c) 6= c, then the function fk (·) of (22b) satisfies
0 < kxi − ck ≤ δ implies kxi − Tks−1 (c)k ≤ δ and kxi − Tks (c)k > δ
(h)
kxj − Tk (c)k kRjk k
lim = for k ∈ 1:K, j ∈ 1:N . (39)
c→xj kxj − ck wj qk (xj )
(i) Given any c0k , k ∈ 1:K, define the sequence {crk = Tkr (c0k ) : r = 1, 2, . . .}. If no crk is a data point, then
lim crk = c∗k , for some optimal centers {c∗1 , · · · , c∗K } (for the approximate problem (P.K)).
r→∞
This theorem can be proved by adapting the proof of [26, Theorem 1] for probabilistic assignments. We
give a proof in the supplementary material section at the end.
Remark 3.
(a) If the assignments Q(X) are pure, then Theorem 1 reduces to the results (a)–(i) of § 3.1, for each of
the clusters defined by (13).
(b) Theorem 1(d) guarantees a decrease of the function fk by the iteration (26), but the function fk on the
left side of (38) may be different than the function fk in RHS(38) if the probabilistic assignments {qk (xi )}
(ν)
are updated, as is the case in Algorithm 1. However, Lemma 1 guarantees a decrease if {pk (xi )} of (18b)
are used, with the exponent ν raised at each iteration.
(c) Theorem 1(g) states that each non–optimal data point has a neighborhood that eventually repulses the
sequence of approximations. Therefore the iterates may not get arbitrarily close to any non–optimal data
point.
7Recall the definition of R (·) in (27a)–(27b), and note that R (·) and (22a)–(22b) use the same (given) probabilistic assign-
k k
ments Q(X).
8Recall the definition in (25a)–(25b).
THE MULTI–FACILITY LOCATION PROBLEM 17
We abbreviate the probabilistic assignments qk (xi ) by qki , for k ∈ 1:K, i ∈ 1:N . A dual problem (D) for
(P.K) is now given, following [29]. It uses the data
consisting of data points X = {xi }, weights W = {wi }, the number of facilities K, and the probabilistic
assignments Q(X) = {qki }. The dual variables are KN vectors {uki : k ∈ 1:K, i ∈ 1:N }, one for each
center and data point. We denote the set of dual variables by U.
The dual problem is:
K X
X N
max g(U) = uki · xi (D)
k=1 i=1
N
X
s.t. uki = 0 , k ∈ 1:K, (41)
i=1
kuki k ≤ wi qki , i ∈ 1:N, k ∈ 1:K. (42)
Problem (D) is a generalization of the dual problem for the single facility location problem, see also [29],
[17, § 1.1.2], [27] and [42].
Remark 4.
(a) The dual problem (D) uses the same data as (P.K), namely the triple {X, W, K}, and in addition the
probabilistic assignments Q(X).
(b) Unlike problems (L.K) and (P.K) that have locally optimal solutions, the dual problem (D) has a
unique optimal value (for the given data (40)), and its optimal solutions form a closed convex set.
(c) The dual problem (D) has a natural geometric interpretation. Let the weight wi of the i th –customer
be split among the facilities, so that the k th –facility gets the weight qki wi . The vector uki is interpreted
as a force with direction (xi − ck ), and magnitude kuki k ≤ qki wi . The location of the k th –facility is then
the equilibrium point of these forces, as expressed by the constraint (41). The objective of the problem
(D) is
XK X N XK X N
uki · xi = ( uki · (xi − ck )), by (41),
k=1 i=1 k=1 i=1
and maximizing it is equivalent to minimizing the sum of terms uki · (ck − xi ) representing the work done
by the force uki along (ck − xi ).
Theorem 2.
(a) Let the data (40) be given. Then for any set of points {c1 , . . . , cK }, and any set of feasible dual variables
U = {uki }, the inequality
g(U) ≤ f (c1 , . . . , cK ), (43)
holds, where f (c1 , . . . , cK ) is the objective function (22a) of the primal problem (P.K).
(b) Given the data (40), and an optimal solution {c1 , . . . , cK } of the primal problem (P.K) (with the same
18 CEM IYIGUN AND ADI BEN-ISRAEL
(c) Let U be an optimal solution of the dual problem (D). Then there exist {c1 , . . . , cK } such that (44)
holds.
Remark 5.
(a) Theorem 2 establishes duality between the centers–problem of (P.K) and (D), part (a) is a weak
duality in the sense that any feasible solution U of (D) gives a lower bound for the optimal value of
(P.K), and conversely, any set of centers {ck } for (P.K) gives an upper bound on the optimal value of (D).
Parts (b)–(c) show that there is no duality gap.
(b) If Q(X) are pure assignments, then Theorem 2(b) can be used to check the optimality of the centers
for each of the clusters defined by (13). Indeed, if
then the centers {c1 , · · · , cK } are optimal for their (not necessarily optimal) clusters, see, e.g., the following
example.
Example 5. Consider the 4 points {x1 , · · · , x4 } of Example 1, and the 3 solutions given there (for K = 2),
starting with Solution 3.
Solution 3. Here the 2 clusters are X1 = {x1 , x2 , x4 }, X2 = {x3 }, see Fig. 1(b).
For k ∈ 1:2 and i ∈ 1:4 let uki be the dual variables, and let qki be given by pure assignments, qki = 1 if
xi is assigned to cluster Xk , 0 otherwise. Then
(q13 = 0 since x3 does not belong to X1 , etc.) and therefore, by the constraints (42),
max u11 · x1 + u12 · x2 + u14 · x4 + u23 · x3 = u12 [1] + u14 [2] + u23 [1] + u23 [2]
s.t. u11 + u12 + u14 = 0, (46)
u23 = 0,
kuki k ≤ 1, for ki = 11, 12, 14 and 23,
where uki [j] is the j th component of uki , j = 1, 2. This problem has the unique solution
! √ √ ! √ √ ! !
1 1 1 2+ 6 1 2− 6 0
u11 = − √2 , u12 = 4 √ √ , u14 = 4 √ √ , u23 = ,
1 2− 6 2+ 6 0
and the optimal value is
1
√ √
u12 [1] + u14 [2] = 2 ( 2 + 6) ≈ 1.931852
THE MULTI–FACILITY LOCATION PROBLEM 19
u14
•x x3•
4
c
• 1
•x x2•
1
u11 u12
Figure 5. Illustration of Example 5: The resultant of the forces u11 , u12 and u14 is zero
which equals the value of Solution 3 in Example 1. By Theorem 2, the centers in Solution 3 are optimal
for their clusters. Figure 5 illustrates that the equilibrium condition (46) is satisfied.
Solution 2. We have the same clusters, pure assignments, and dual problem (D) as in Solution 3, and
get the same optimal value of (D), which does not satisfy (45),
and therefore the centers in Solution 2 are not optimal for their clusters.
Solution 1. The dual problem for Solution 1 is different, and its solution g(U) satisfies (45), showing
the centers (arbitrarily chosen on the lines connecting the end points) are optimal for their clusters.
depends on the centers {ck : k ∈ 1:K}, and has the following useful property: for optimal centers, most of
the data points are contained in the lower level sets of D(x), see [1], [25]. We call this property contour
approximation of the data.
The JDF of the set X = {xi : i ∈ 1:N } is defined as the sum of the values D(x) over X,
N
X
D(X) := D(xi ). (47)
i=1
From the contour approximation property it follows that D(X) is a measure of the proximity of customers
to their respective facilities, and the lower the value of D(X), the better is the set of centers {ck }.
20 CEM IYIGUN AND ADI BEN-ISRAEL
The JDF has the dimension of weighted distance. Normalizing it, we get the dimensionless function
1/K
K
Y
E(x) = w1 K D(x) / d(x, cj ) , (48)
j=1
with 0/0 interpreted as zero. E(x) is the harmonic mean of the distances {d(x, cj ) : j ∈ 1:K} divided by
their geometric mean. It follows that 0 ≤ E(x) ≤ 1, with E(x) = 0 if any d(x, cj ) = 0, i.e. if x is a cluster
center, and E(x) = 1 if and only if the distances d(x, cj ) (and therefore the probabilities pj (x)) are all
equal.
E(x) can be written, using (10)–(11), as the geometric mean of the cluster membership probabilities
(up to a constant),
1/K
K
Y
E(x) = K pj (x) . (49)
j=1
In particular, for K = 2, p
d(x, c1 )d(x, c2 ) p
E(x) = 2 = 2 p1 (x)p2 (x) . (50)
d(x, c1 ) + d(x, c2 )
The function E(x) represents the uncertainty of classifying the point x, with higher [lower] values
of E(x) corresponding to higher [lower] uncertainty, see Appendix B. We call E(x) the classification
uncertainty function, abbreviated CUF, at x.
The CUF of the data set X = {xi : i ∈ 1:N } is defined as
N
1 X
E(X) := E(xi ) . (51)
N
i=1
E(X) is a monotone decreasing function of K, the number of clusters, decreasing from E(X) = 1 (for
K = 1), to E(X) = 0 (for K = N , the trivial case where every data point is a separate cluster, with no
uncertainty.) Indeed, increasing the number of centers from K to K + 1 allows more customers access to
closer centers, with increased probabilities for these centers.
The “right” number K of facilities to serve the given customers is in general determined by economic
considerations (operating costs, etc.) An intrinsic criterion for determining the optimal K is provided by
the rate of decrease of E(X), see e.g. Figure 6(d).
Example 6. Figure 6(a) shows the dataset of Example 3. This problem was solved for K = 3, and
Figure 6(b) shows the centers, and some level sets of the JDF (11), illustrating the contour approximation
property, namely that lower contours of the JDF capture most of the data.
Figure 6(c) shows some level sets of the CUF (49), with darker shades indicating higher uncertainty.
Uncertainty is minimal (E(x) = 0) if x is one of the centers. Note the patch of maximal uncertainty
(E(x) ≥ 0.99) in the middle, where one is indifferent between the three centers9.
The “right” number of clusters (if not known a priori) can be determined by solving the problem (P.K)
for values of K = 1, 2, · · · , calculating the CUF of the whole dataset (51), and stopping when the marginal
decrease of uncertainty is negligible. The results are plotted in Figure 6(d), showing that most of the
9In analogy with Lagrange points in Astronomy, where the gravity pulls from several stars cancel.
THE MULTI–FACILITY LOCATION PROBLEM 21
uncertainty is reduced by K = 3, with a marginal reduction of uncertainty for higher values of K. If the
data is amorphous with no clear number of clusters, then the graph of E(X) does not give a clue.
The “right” number of clusters can be determined similarly from the JDF D(X), (47), computed for
K = 1, 2, · · · , etc., see [24].
70 12.98 19
3 .4
.2 8
50 16
60
16
6.
.2
50 49
3
.98
9. 7
9.7
40
3
1.6
12
3.257
4
40
4.8
30 30
12.98
4.875
9
20 3.2
1.63
6.4
20 63
6.4
. 25
10 31 ..8
4 7 6.49
9
9.
74
12
0 9.74
.9
10
8
−10 16
.23 12.98 .23
0
16
0 10 20 30 40 50 −10 0 10 20 30 40 50 60
0.22
0.97
98
0.
80
0.9 0.2
5
60 0.18
0.95
0.16
0.70.60.4
0.9
0.
40 5 0.
8
98
5
0. 0.14
0.978
0.9 97
95 0 0.8
0. 0.6.75 5
0.
20
9
0.4 0.12
0.70.60.4
0.97
5
0.
0.9
0.1
85
0.9
0
5
0.95 0.08
0.97
−20 0
.98 0.06
0.98
0.04
−20 0 20 40 60 80 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Number of Clusters
(c) Level sets of the CUF (d) Rate of decrease of E(X) as a function of K
Tables 2–5 below present results for the 31 test problems considered in [6]. For each of these problems,
Table 2 gives the source (references in bibliography), the number N of customers, the number K of facilities,
and the coordinates of the N customers.
Each problem was solved 500 times, using random initializations. The results, for ν0 = 5, ∆ = 0.5 and
ǫ = 0.001, are reported in Table 3.
Column 2 (“Best known f ”) gives the best values of the objective in the literature ([6], [8]), that are
assumed optimal.
Column 3 (“Best f ”) gives the best values obtained with Algorithm 1. For all 31 problems, Algorithm 1
22 CEM IYIGUN AND ADI BEN-ISRAEL
8 [40] 15 2
9 [11], [40] 15 3 (5, 9) (5, 25) (5, 48) (13, 4) (12, 19)
10 [40] 15 4 (13, 39) (28, 37) (21, 45) (25, 50) (31, 9)
11 [40] 15 5 (39, 2) (39, 16) (45, 22) (41, 30) (49, 31)
12 [40] 15 6
13 [40] 20 2
14 [40] 20 3 same as problem 8, plus
15 [40] 20 4 (53, 8) (1, 34) (33, 8) (3, 26) (17, 9)
16 [40] 20 5
17 [40] 20 6
18 [40] 25 2
19 [40] 25 3 same as problem 13, plus
20 [40] 25 4 (53, 20) (24, 17) (40, 22) (22, 41) (7, 13)
21 [40] 25 5
22 [40] 25 6
23 [40] 30 2
24 [40] 30 3 same as problem 18, plus
25 [40] 30 4 (5, 17) (39, 3) (50, 50) (16, 40) (22, 45)
26 [40] 30 5
27 [19] 50 2 (1.33, 8.89) (1.89, 0.77) (9.27, 1.49) (9.46, 9.36) (9.20, 8.69)
28 [19] 50 3 (7.43, 1.61) (6.08, 1.34) (5.57, 4.60) (6.70, 2.77) (8.99, 2.45)
29 [19] 50 4 (8.93, 7.00) (8.60, 0.53) (4.01, 0.31) (3.34, 4.01) (6.75, 5.57)
30 [19] 50 5 (7.36, 4.03) (1.24, 6.69) (3.13, 1.92) (8.86, 8.74) (4.18, 3.74)
31 [38] 50 10 (2.22, 4.35) (0.88, 7.02) (8.53, 7.04) (6.49, 6.22) (4.53, 7.87)
(4.46, 7.91) (2.83, 9.88) (3.39, 5.65) (0.75, 4.98) (7.55, 5.79)
(8.45, 0.69) (3.33, 5.78) (6.27, 3.66) (7.31, 1.61) (6.37, 7.02)
(7.23, 7.05) (1.68, 6.45) (3.54, 7.06) (7.67, 4.17) (2.20, 1.12)
(3.57, 1.99) (7.34, 1.38) (6.58, 4.49) (5.00, 9.00) (6.63, 5.23)
(5.89, 8.06) (1.13, 5.25) (1.90, 8.35) (1.74, 1.37) (9.37, 6.44)
obtained the optimal values. In problems 27–31, marked by “†” in Column 3, Algorithm 1 gave somewhat
better results than the best known values, perhaps because of more precise cluster centers. In problems 19,
THE MULTI–FACILITY LOCATION PROBLEM 23
20 and 25, marked by a “*” in Column 3, getting the correct 3 rd –decimal required a tolerance ǫ = 0.0001.
Column 4 (“Frequency (%)”) gives the percentage of the runs where Algorithm 1 gave the optimal solution.
In problem 3 the best value was found in all runs.
Columns 5 and 6 give, respectively, the worst values (“Max. f Value”) found, and their frequency (“Fre-
quency (%)”).
Columns 7 and 8 give, respectively, the average values (“Average f Value”) of all 500 solutions obtained,
and the average number of iterations.
(Algorithm 1 is here tested using a program written in MATLAB R2010a and all experiments are conducted
on a machine with 2.53 GHz Intel(R) Core(TM)2 Duo processor.)
Table 4 reports the same information as Table 3, for ∆ = 0.1 and ǫ = 0.0001. The 2nd column of Table 3
is not repeated.
Remark 6. A comparison of Table 3 (with ∆ = 0.5, ǫ = 0.001) and Table 4 (∆ = 0.1, ǫ = 0.0001) shows
that the smaller increment ∆ requires more iterations, but also increases the frequency of optimal values.
To prevent early termination, a small tolerance ǫ is suggested in conjunction with the small ∆. This calls
for further research, including other selections of the exponent ν, in particular an adaptive control where
K
d(c+
P
the exponent ν is updated according to the change k , ck ) in Step 4 of Algorithm 1.
k=1
Example 7. This example uses data from Bongartz et al [6, p. 308], consisting of 287 points, with
different weights ranging from w = 1 to w = 698. The data points, shown in Fig. 7(a), are distinguished
according to their weights. Customers with weight < 20 (called “minor customers”) are represented by
dots. The squares show the “major customers”: the white squares corresponding to customers with weights
20 ≤ w < 100 and the gray squares to those with weights ≥ 100.
The problem was solved for K = 2, . . . , 5, 10 using 200 random initializations for each K. The results
are shown in Table 5. The optimal values in the 2nd column are in agreement with [8]. The 3rd and 5th
columns give, respectively, the best and average values found in 200 random initializations. The errors, in
columns 4 and 6, are relative to the optimal value. The last column gives the average number of iterations.
Example 8. To illustrate the robustness of Algorithm 1, see Remark 2(c), we solved Example 7 again just
for the 61 major customers. The same center locations were found in both cases. The solution for K = 3
is shown in Fig. 7(b), together with the major customers.
References
[1] M. Arav, Contour approximation of data and the harmonic mean, J. Math. Inequalities 2(2008), 161–167
[2] F. Aurenhammer, Voronoi diagrams - a survey of a fundamental geometric data structure, ACM Computing Surveys
23(1991), 345–405
[3] E. Balas and C-S Yu, A note on the Weiszfeld–Kuhn algorithm for the general Fermat problem, Tech. Report, Carnegie–
Melon Uninversity, September 1982
[4] A. Ben–Israel and C. Iyigun, Probabilistic distance clustering, J. Classification 25(2008), 5–26
[5] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum, New York, 1981
[6] I. Bongartz, E.H. Calamai and A.R. Conn, A projection method for ℓp norm location–allocation problems, Math. Pro-
gramming 66(1994), 283–312.
24 CEM IYIGUN AND ADI BEN-ISRAEL
[7] J. Brimberg, Further notes on convergence of the Weiszfeld algorithm, Yugoslav J. Operations Research 13(2003), 199–206
[8] J. Brimberg, P. Hansen, N. Mladenović and E.D. Taillard, Improvements and comparison of heuristics for solving the
uncapacitated multisource Weber problem, Operations Research 48(2000), 444–460
[9] R. Chandrasekaran and A. Tamir, Open questions concerning Weiszfeld’s algorithm for the Fermat–Weber location
problem, Math. Programming 44(1989), 293–295
[10] L. Cooper, Location–allocation problems, Operations Research 11(1963), 331–343.
THE MULTI–FACILITY LOCATION PROBLEM 25
[11] L. Cooper, Heuristic methods for location–allocation problems, SIAM Review 6(1964), 37–53
[12] T. Drezner, Z. Drezner, The gravity p–median model, European J. Oper. Res. 179(2007), 1239–1251.
[13] Z. Drezner, The planar two–center and two–median problems, Transportation Science 18(1984), 351–361.
[14] Z. Drezner (editor), Location Decisions, Annals Oper. Res. 40(1992)
[15] Z. Drezner, A note on the Weber location problem, pp. 153–161 in [14]
[16] Z. Drezner and H.W. Hamacher (editors), Facility Location: Applications and Theory, Springer, 2002
26 CEM IYIGUN AND ADI BEN-ISRAEL
50
42
45
40
40
38
35 36
30 34
25 32
30
20
28
15
26
10
24
5
0 5 10 15 20 25 30 35 40 45 50 10 12 14 16 18 20 22 24 26 28 30
[17] Z. Drezner, K. Klamroth, A. Schöbel and G.O. Wesolowsky, The Weber problem, Chapter 1 in [16]
[18] I.N. Eckhardt, Weber’s problem and Weiszfeld’s algorithm in general spaces, Math. Program. 18(1980), 186–196
[19] S. Eilon, C.D.T. Watson–Gandy and N. Christofides, Distribution Management: Mathematical Modelling and Practical
Analysis, Hafner, New York, 1971
[20] S.L. Hakimi, Optimum distribution of switching centers in a communication network and some related graph theoretic
problems, Operations Resesarch 13(1965), 462–475
[21] D.L. Huff, Defining and estimating a trading area, J. Marketing 28(1964), 34–38
[22] D.L. Huff, A programmed solution for approximating an optimum retail location, Land Economics 42(1966), 293–303.
[23] C. Iyigun and A. Ben–Israel, Probabilistic distance clustering adjusted for cluster size, Probability Engrg. Info. Sci.
22(2008), 1–19
[24] C. Iyigun and A. Ben–Israel, Probabilistic Distance Clustering, Algorithm and Applications, Chapter 2, Clustering
Challenges in Biological Networks, S. Butenko, W.A. Chaovalitwongse and P.M. Pardalos (Editors), World Scientific,
Singapore, 2009, ISBN-13 978-981-277-165-0
[25] C. Iyigun and A. Ben–Israel, Contour approximation of data: a duality theory, Lin. Algeb. and Appl. 430(2009), 2771–
2780
[26] C. Iyigun and A. Ben–Israel, A generalized Weiszfeld method for the multi–facility location problem, Operations Research
Letters 38(2010), 207–214
[27] W. Kaplan and W.H. Yang, Duality theorem for a generalized Fermat–Weber problem, Math. Programming 76(1997),
285–297
[28] I.N. Katz, Local convergence in Fermat’s problem, Math. Program. 6(1974), 89–104
[29] H.W. Kuhn, On a pair of dual nonlinear programs, in J. Abadie (ed.), Methods of Nonlinear Programming, Amsterdam,
North-Holland, (1967), 38–54
[30] H.W. Kuhn, A note on Fermat’s problem, Math. Programming 4(1973), 98–107
THE MULTI–FACILITY LOCATION PROBLEM 27
[31] S. Kullback. Information Theory and Statistics. J. Wiley, New York, 1959.
[32] S. Kullback and R.A. Leibler. On information and sufficiency. Annals Math. Statist., 22(1951), 79–86
[33] Y. Levin and A. Ben–Israel, A heuristic method for large-scale multi-facility location problems, Computers and Oper.
Res. 31(2004), 257–272
[34] R.F. Love and H. Juel, Properties and solution methods for large location–allocation problems, J. Operational Research
Society 33(1982), 443–452
[35] R.F. Love, J.G. Morris and G.O. Wesolowsky, Facilities Location: Models and Methods, North-Holland, 1988
[36] R.D. Luce, Individual Choice Behavior, Wiley, New York, 1959
[37] N. Megiddo and K.J. Supowit, On the complexity of some common geometric location problems, SIAM Journal on
Computing 13(1984), 182–196
[38] B.A. Murtagh and S.R. Niwattisyawong, An efficient method for the multi–depot location–allocation problem, Journal
of Operational Research Society 33 (1982) 629–634.
[39] C. ReVelle and R. Swain, Central facilities location, Geographical Analysis 2(1970), 30–42
[40] K.E. Rosing, An optimal method for solving the multi-Weber problems, European Journal of Operational Research 58
(1992) 414–426.
[41] M. Teboulle, A unified continuous optimization framework for center–based clustering methods, J. Machine Learning
8(2007), 65–102
[42] H. Üster and R.F. Love, Duality in constrained multi–facility location models, Naval Research Logistics Quart. 49(2002),
410–421
[43] Y. Vardi and C-H Zhang, A modified Weiszfeld algorithm for the Fermat–Weber location problem, Math. Programming
A90(2001), 559–566
[44] E. Weiszfeld, Sur le point par lequel la somme des distances de n points donnés est minimum, Tohoku Math. J. 43 (1937),
355–386
[45] G.O. Wesolowsky, The Weber problem: its history and perspectives, Location Science 1(1993) 5–23
[46] J.I. Yellott, Jr., Luce’s Choice Axiom, pp. 9094–9097 in International Encyclopedia of the Social & Behavioral Sciences
(N.J. Smelser and P.B. Baltes, editors), ISBN 0-08-043076-7, 2001.
Finally, from (A4) and (A5) it follows that (A2) and (A3) are satisfied as equalities, proving (44).
Case 2. A center coincides with one of the data points, say
ck = xj , for some k ∈ 1:K, j ∈ 1:N , (A6)
in which case (15) holds. Define
wi qki
uki := (xi − xj ) , for i 6= j , (A7a)
kxi − xj k
X
ukj := − uki . (A7b)
i6=j
P
Then uki = 0 by definition, and kuki k = wi qki for all i 6= j, as in (A5). Next,
i
reproducing (A7a)–(A7b), and equality in (44) follows as in the proof of Theorem 2(b), Case 2.
The meaning of (C2b) is that the probabilities pk (x) do not depend on the scale of measurement, i.e., the function
f is homogeneous of degree 0. It follows that the probabilities pk (x) depend only on the ratios of the distances
{dk (x) : k ∈ 1:K}, as is assumed in what follows.
The symmetry of f , expressed by (C2c), guarantees that the probabilities {pk (x)} do not depend on the numbering
of the clusters.
For any nonempty subset S ⊂ 1:K, let X
pS (x) = ps (x),
s∈S
the probability that x belongs to one of the clusters {Xs : s ∈ S}, and let pk (x|S) denote the conditional
probability that x belongs to the cluster Xk , given that it belongs to one of the clusters {Xs : s ∈ S}.
Since the probabilities pk (x) depend only on the ratios of the distances {dk (x) : k ∈ 1:K}, and these ratios are
unchanged in subsets S of the index set 1:K, it follows that for all k and S ⊂ 1:K with k ∈ S,
pk (x) = pk (x|S) pS (x) (C3)
which is the choice axiom of Luce, [36, Axiom 1], and therefore, [46],
vk (x)
pk (x|S) = P (C4)
vs (x)
s∈S
where vk (x) is a scale function, in particular,
v (x)
pk (x) = Pk . (C5)
vs (x)
s ∈ 1:K
Assuming vk (x) 6= 0 for all k, it follows that
1
pk (x)vk (x)−1 = P , (C6)
vs (x)
s ∈ 1:K
where the right hand side is a function of x, and does not depend on k.
Property (C2a) implies that the function vk (x) decreases as dk (x) increases. A simple choice is
1
vk (x) = , (C7)
w dk (x)
for which (C6) gives
1
w pk (x) dk (x) = , (C8)
P 1
s ∈ 1:K ds (x)
in agreement with (9).
The proof in [26, Appendix A] is adapted below for the case of probabilistic assignments (18a).
Proof of part (a). If ck is not one of the data points, then −Rk (ck ) is the gradient (23) at ck , and (36) is both
necessary and sufficient for a minimum, by the convexity of fk .
If ck coincides with a data point xj , consider the change from xj to xj + t z where kzk = 1. Then,
d
fk (xj + t z)t=0 = wj qk (xj ) − Rjk · z .
(D1)
dt
The greatest decrease of fk is along Rjk , i.e., when
Rjk
z= , (proving part (f)).
kRjk k
THE MULTI–FACILITY LOCATION PROBLEM 31
Since c 6= Tk (c),
N
X N
X
g(Tk (c)) = θki kxi − Tk (c)k2 < g(c) = θki kxi − ck2 = fk (c).
i=1 i=1
On the other hand,
N
X h i2
g(Tk (c)) = θki kxi − ck + kxi − Tk (c)k − kxi − ck
i=1
XN h i2
= fk (c) + 2 fk (Tk (c)) − fk (c) + θki kxi − Tk (c)k − kxi − ck
i=1
Combining these results,
N
X h i2
2fk (Tk (c)) + θki kxi − Tk (c)k − kxi − ck < 2fk (c)
i=1
proving that fk (Tk (c)) < fk (c).
Proof of part (e). Follows from part (a), the definition (27b)
Rjk
Rk (xj ) := max {kRjk k − wj qk (xj ), 0} ,
kRjk k
and qk (xj ) = 1 since xj = ck .
Proof of part (f ). This was shown in part (a).
Proof of part (g).
Tk (c) − xi = c + hk (c) Rk (c) − xi
N
X
= hk (c) θkj (xj − c) + hk (c) θki − 1 (xi − c),
j6=i
with hk (c) as in (35) and θki in (D2). Since xi is not optimal, we have, from (27c),
N
X
θkj (xj − xi )
> wi qk (xi ),
j6=i
32 CEM IYIGUN AND ADI BEN-ISRAEL
= (1 + ǫ) kxi − ck.
If kxi − Tk (c)k ≤ δ, then
kxi − Tk2 (c)k > (1 + ǫ)2 kxi − ck, etc.
Since kxi − ck > 0, (1 + ǫ)t kxi − ck > δ for some positive integer t and hence
kxi − Tks (c)k > δ for some positive integer s, and kxi − Tks−1 (c)k ≤ δ.
Using (25a),
wi qk (xi ) wi qk (xi )
λki kxi − ck kxi − ck
= N =
kxj − ck kxj − ck P kxj − ck
wj qk (xj ) + wm qk (xm )
P
wm qk (xm )
m=1 kxm − ck m6=j kxm − ck
λki wi qk (xi )
∴ lim = .
c→xj kxj − ck wj qk (xj ) kxi − ck
THE MULTI–FACILITY LOCATION PROBLEM 33
Finally, by (D3),
N
kTk (c) − xj k 1
X xi − xj
lim = wi qk (xi )
kxj − ck wj qk (xj ) kxi − xj k
c→xj
i6=j
kRjk k
= , by (27c).
wj qk (xj )
Proof of part (i). With the possible exception of c0k , the sequence {crk } lies in the convex hull of vertices, a
compact set. By the Bolzano–Weierstrass theorem, there exits a convergent subsequence {cℓk : ℓ = 1, 2, · · · }, such
that lim cℓk = c∞ ∞ ∗
k . We prove that ck = ck .
ℓ→∞
If cℓ+1
k = Tk (cℓk ) = cℓk for some ℓ, then the sequence repeats from that point and c∞ ℓ ℓ
k = ck . Since ck is not a data
∞ ∗
point, ck = ck by Theorem 1(c).
Otherwise, by Theorem 1(d),
fk (c0k ) > fk (c1k ) > . . . > fk (crk ) > . . . > fk (c∗k ) (D4)
Hence
fk (cℓk ) − fk (Tk (cℓk )) = 0.
lim (D5)
ℓ→∞
we have
fk (c∞ ∞
k ) − fk (Tk (ck )) = 0. (D7)
Therefore, by Theorem 1(d), c∞
k = Tk (c∞
k ).
If c∞
is not a data point, then
k = c∞ c∗k
by Theorem 1(c).
k
∞ ∗ ∗
In any event, ck lies in the finite set of points {x1 , . . . , xN , ck }, where ck may be a data point.
The only case that remains is c∞ ∗
k = xj for some j. If xj 6= ck , we first isolate xj from the other data points (and
∗
ck if it is not a data point) by a δ–neighborhood that satisfies Theorem 1(g). Then it is clear that we can choose our
kxj − Tk (cℓk )k
subsequence cℓk → xj such that kxj − Tk (cℓk )k > δ for all ℓ. This means that the ratio is unbounded.
kxj − cℓk k
However, this contradicts Theorem 1(h). Hence xj = c∗k and the theorem is proved.
Cem Iyigun, Department of Industrial Engineering, Middle East Technical University, Ankara, Turkey
E-mail address: iyigun@ie.metu.edu.tr
Adi Ben-Israel, RUTCOR–Rutgers Center for Operations Research, Rutgers University, 640 Bartholomew
Rd., Piscataway, NJ 08854-8003, USA
E-mail address: adi.benisrael@gmail.com