# Approximate clustering via core-sets

@inproceedings{Badoiu2002ApproximateCV, title={Approximate clustering via core-sets}, author={Mihai Badoiu and Sariel Har-Peled and Piotr Indyk}, booktitle={STOC '02}, year={2002} }

In this paper, we show that for several clustering problems one can extract a small set of points, so that using those core-sets enable us to perform approximate clustering efficiently. The surprising property of those core-sets is that their size is independent of the dimension.Using those, we present a (1+ ε)-approximation algorithms for the k-center clustering and k-median clustering problems in Euclidean space. The running time of the new algorithms has linear or near linear dependency on… Expand

#### Supplemental Presentations

#### 429 Citations

Clustering in High Dimensions

- Computer Science
- 2003

This thesis shows that, for several clustering problems, a small set of points can be extracted, so that, using these core-sets, the authors can approximate clustering efficiently and develops a (1 + e)-approximation algorithm for the 1-cylinder clustering problem. Expand

Efficient approximation algorithms for clustering point-sets

- Computer Science, Mathematics
- Comput. Geom.
- 2010

An O(m+nlogk)-time 3-approximation algorithm and an O(km)-time (1+3)-approximating algorithm, where m is the total number of input points and k is the number of clusters, for the k-center clustering problem on point-sets. Expand

On coresets for k-means and k-median clustering

- Mathematics, Computer Science
- STOC '04
- 2004

This paper shows the existence of small coresets for the problems of computing k-median/means clustering for points in low dimension, and improves the fastest known algorithms for (1+ε)-approximate k-means and k- median. Expand

Greedy Strategy Works for Clustering with Outliers and Coresets Construction

- Computer Science
- ArXiv
- 2019

It is shown that this greedy strategy actually can handle k-center/median/means clustering with outliers efficiently, in terms of qualities and complexities, and can be applied to speedup the popular density-based clustering approach DBSCAN. Expand

Linear Time Algorithms for Clustering Problems in Any Dimensions

- Mathematics, Computer Science
- ICALP
- 2005

This work generalizes the k-means algorithm and shows that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness), resulting in O(2(k/e)O(1)dn) time (1+e)-approximation algorithms for these problems. Expand

Fast and exact out-of-core k-means clustering

- Computer Science
- Fourth IEEE International Conference on Data Mining (ICDM'04)
- 2004

This paper presents a new algorithm which typically requires only one or a small number of passes on the entire dataset, and provably produces the same cluster centers as reported by the original k-means algorithm. Expand

Fast approximate spectral clustering

- Mathematics, Computer Science
- KDD
- 2009

This work develops a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data, and develops two concrete instances of this framework, one based on local k-means clustering (KASP) and onebased on random projection trees (RASP). Expand

Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

- Computer Science
- ESA
- 2019

It is shown that this greedy strategy actually can handle ordinary clustering with outliers efficiently, in terms of clustering quality and time complexity, and the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Expand

A Structural Theorem for Center-Based Clustering in High-Dimensional Euclidean Space

- Mathematics, Computer Science
- LOD
- 2019

We prove that, for any finite set X in Euclidean space of any dimension and for any fixed \(\varepsilon >0\), there exists a polynomial-cardinality set of points in this space which can be… Expand

Fast and exact out-of-core and distributed k-means clustering

- Computer Science
- Knowledge and Information Systems
- 2005

This paper presents a new algorithm, called fast and exact k-means clustering (FEKM), which typically requires only one or a small number of passes on the entire dataset and provably produces the same cluster centres as reported by the original k-Means algorithm. Expand

#### References

SHOWING 1-10 OF 20 REFERENCES

Clustering Motion

- Mathematics, Computer Science
- Discret. Comput. Geom.
- 2004

A linear time algorithm is presented for computing a 2-approximation to the k-center clustering of a set of n points in ℝd, that slightly improves the algorithm of Feder and Greene, that runs in Θ(n log k) time (which is optimal in the algebraic decision tree model). Expand

Polynomial time approximation schemes for geometric k-clustering

- Computer Science, Mathematics
- Proceedings 41st Annual Symposium on Foundations of Computer Science
- 2000

This work deals with the problem of clustering data points, and gives polynomial time approximation schemes for this problem in several settings, including the binary cube (0, 1)/sup d/ with Hamming distance, and R/Sup d/ either with L/sup 1/ distance, or with L /sup 2/ distance. Expand

Approximation schemes for Euclidean k-medians and related problems

- Mathematics, Computer Science
- STOC '98
- 1998

An approximation scheme for the plane that for any c > 0 produces a solution of cost at most 1+ 1/c times the optimum and runs in time O(n) and generalizes to some problems related to k-median. Expand

Exact and Approximation Algorithms for Clustering

- Mathematics, Computer Science
- SODA '98
- 1998

An n^ O(k1-1/d) -time algorithm for solving the k -center problem in \realsd , under L∈fty - and L2 -metrics and extends to other metrics, and to the discrete k - center problem. Expand

Sublinear time approximate clustering

- Mathematics, Computer Science
- SODA '01
- 2001

This paper motivates and introduces a new model of clustering that is in the spirit of the “PAC (probably approximately correct)” learning model, and gives examples of efficient PAC-clustering algorithms. Expand

Testing of clustering

- Mathematics, Computer Science
- Proceedings 41st Annual Symposium on Foundations of Computer Science
- 2000

The benefit of the algorithms is that they construct an implicit representation of such clusterings in time independent of |X|, and the implicit representation can be used to answer queries concerning the cluster any given point belongs to. Expand

Clustering to Minimize the Maximum Intercluster Distance

- Computer Science, Mathematics
- Theor. Comput. Sci.
- 1985

An O(kn) approximation algorithm that guarantees solutions with an objective function value within two times the optimal solution value is presented and it is shown that this approximation algorithm succeeds as long as the set of points satisfies the triangular inequality. Expand

Derandomized dimensionality reduction with applications

- Mathematics, Computer Science
- SODA '02
- 2002

A procedure is provided that constructs such a mapping deterministically in time almost linear in the number of distances to preserve times the dimension of the original space and is used to obtain an efficient derandomization of several approximation algorithms based on semidefinite programming. Expand

Reductions among high dimensional proximity problems

- Mathematics, Computer Science
- SODA '01
- 2001

Improved running times for a wide range of approximate high dimensional proximity problems are presented by reduction to Nearest Neighbour queries by obtaining subquadratic running time for each. Expand

Approximate shape fitting via linearization

- Computer Science
- Proceedings 2001 IEEE International Conference on Cluster Computing
- 2001

The authors present a general and unified technique for solving a certain family of shape fitting problems without resorting to a careful and painful case by case analysis, as was previously done for those problems. Expand