Cluster analysis is a multivariate statistical technique employed across various fields to detect clusters based on multivariate distance measures computed between objects. Being an exploratory analysis, the final outcome of a clustering algorithm heavily depends on several critical decisions made by the expert. A fundamental decision, widely discussed in the literature, especially for non-hierarchical clustering algorithms, concerns the identification of the final partition, i.e. the number of final clusters. In this paper, a permutation-based approach is introduced to reduce subjectivity in finding the final partition, assisting practitioners to tackle the specific challenge. In particular, an extension of the NonParametric Combination (NPC) methodology based on the idea of multi-aspect tests is adopted. In fact, the NPC method allows us to compare partitions using multiple clustering performance metrics, facilitating the identification of the best final partition. A simulation study, comprising four synthetic datasets, is presented and discussed to demonstrate the potentialities of the proposed approach.
A Multi-Aspect Permutation Test for Selecting the Optimum Number of Clusters
Ceccato, Riccardo;Barzizza, Elena;Biasetton, Nicolo;Disegna, Marta
2025
Abstract
Cluster analysis is a multivariate statistical technique employed across various fields to detect clusters based on multivariate distance measures computed between objects. Being an exploratory analysis, the final outcome of a clustering algorithm heavily depends on several critical decisions made by the expert. A fundamental decision, widely discussed in the literature, especially for non-hierarchical clustering algorithms, concerns the identification of the final partition, i.e. the number of final clusters. In this paper, a permutation-based approach is introduced to reduce subjectivity in finding the final partition, assisting practitioners to tackle the specific challenge. In particular, an extension of the NonParametric Combination (NPC) methodology based on the idea of multi-aspect tests is adopted. In fact, the NPC method allows us to compare partitions using multiple clustering performance metrics, facilitating the identification of the best final partition. A simulation study, comprising four synthetic datasets, is presented and discussed to demonstrate the potentialities of the proposed approach.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




