Coresets for ordered weighted clustering
V Braverman, SHC Jiang… - … on Machine Learning, 2019 - proceedings.mlr.press
International Conference on Machine Learning, 2019•proceedings.mlr.press
We design coresets for Ordered k-Median, a generalization of classical clustering problems
such as k-Median and k-Center. Its objective function is defined via the Ordered Weighted
Averaging (OWA) paradigm of Yager (1988), where data points are weighted according to a
predefined weight vector, but in order of their contribution to the objective (distance from the
centers). A powerful data-reduction technique, called a coreset, is to summarize a point set $
X $ in $\mathbb {R}^ d $ into a small (weighted) point set $ X'$, such that for every set of $ k …
such as k-Median and k-Center. Its objective function is defined via the Ordered Weighted
Averaging (OWA) paradigm of Yager (1988), where data points are weighted according to a
predefined weight vector, but in order of their contribution to the objective (distance from the
centers). A powerful data-reduction technique, called a coreset, is to summarize a point set $
X $ in $\mathbb {R}^ d $ into a small (weighted) point set $ X'$, such that for every set of $ k …
Abstract
We design coresets for Ordered k-Median, a generalization of classical clustering problems such as k-Median and k-Center. Its objective function is defined via the Ordered Weighted Averaging (OWA) paradigm of Yager (1988), where data points are weighted according to a predefined weight vector, but in order of their contribution to the objective (distance from the centers). A powerful data-reduction technique, called a coreset, is to summarize a point set in into a small (weighted) point set , such that for every set of potential centers, the objective value of the coreset approximates that of within factor . When there are multiple objectives (weights), the above standard coreset might have limited usefulness, whereas in a simultaneous coreset, the above approximation holds for all weights (in addition to all centers). Our main result is a construction of a simultaneous coreset of size for Ordered k-Median. We validate our algorithm on a real geographical data set, and we find our coreset leads to a massive speedup of clustering computations, while maintaining high accuracy for a range of weights.
proceedings.mlr.press