Given a dataset Vof points from some metric space, a popular robust formulation of the k-center clustering problem requires to select k points (centers) of V which minimize the maximum distance of any point of V from its closest center, excluding the z most distant points (outliers) from the computation of the maximum. In this paper, we focus on an important constrained variant of the robust k-center problem, namely, the Robust Matroid Center (RMC) problem, where the set of returned centers are constrained to be an independent set of a matroid of rank k built on V. Instantiat-ing the problem with the partition matroid yields a formulation of the fair k-center problem, which has attracted the interest of the ML community in recent years. In this paper, we target accurate solutions of the RMC problem under general matroids, when confronted with large inputs. Specifically, we devise a coreset-based algorithm afford-ing efficient sequential, distributed (MapReduce) and streaming implementations. For any fixed e > 0 , the algorithm returns solutions featuring a (3 + e)-approximation ratio, which is a mere additive term e away from the 3-approximations achievable by the best known polynomial-time sequential algorithms. Moreover, the algorithm obliviously adapts to the intrinsic complexity of the dataset, captured by its doubling dimension D. For wide ranges of k, z, e, D , our MapReduce/streaming implementations require two rounds/one pass and substantially sublinear local/work ing memory. The theoretical results are complemented by an extensive set of experiments on real-world datasets, which provide clear evidence of the accuracy and efficiency of our algorithms and of their improved performance with respect to previous solutions.
Scalable and Space-Efficient Robust Matroid Center Algorithms
Ceccarello, M
;Pietracaprina, A;Pucci, G;
2023
Abstract
Given a dataset Vof points from some metric space, a popular robust formulation of the k-center clustering problem requires to select k points (centers) of V which minimize the maximum distance of any point of V from its closest center, excluding the z most distant points (outliers) from the computation of the maximum. In this paper, we focus on an important constrained variant of the robust k-center problem, namely, the Robust Matroid Center (RMC) problem, where the set of returned centers are constrained to be an independent set of a matroid of rank k built on V. Instantiat-ing the problem with the partition matroid yields a formulation of the fair k-center problem, which has attracted the interest of the ML community in recent years. In this paper, we target accurate solutions of the RMC problem under general matroids, when confronted with large inputs. Specifically, we devise a coreset-based algorithm afford-ing efficient sequential, distributed (MapReduce) and streaming implementations. For any fixed e > 0 , the algorithm returns solutions featuring a (3 + e)-approximation ratio, which is a mere additive term e away from the 3-approximations achievable by the best known polynomial-time sequential algorithms. Moreover, the algorithm obliviously adapts to the intrinsic complexity of the dataset, captured by its doubling dimension D. For wide ranges of k, z, e, D , our MapReduce/streaming implementations require two rounds/one pass and substantially sublinear local/work ing memory. The theoretical results are complemented by an extensive set of experiments on real-world datasets, which provide clear evidence of the accuracy and efficiency of our algorithms and of their improved performance with respect to previous solutions.File | Dimensione | Formato | |
---|---|---|---|
scalable-and-space-efficient-matroid-center-algorithms.pdf
accesso aperto
Tipologia:
Published (publisher's version)
Licenza:
Creative commons
Dimensione
3.25 MB
Formato
Adobe PDF
|
3.25 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.