Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, being the number of variables much higher than the number of observations. Here, we present a novel approach to clustering of microarray data via nonparametric density estimation, based on the following steps: (i) selection of relevant variables; (ii) dimensionality reduction; (iii) clustering of observations in the reduced space. Applications on simulated and real data show promising results in comparison with those produced by two standard approaches, k-means and Mclust. In the simulation studies, our nonparametric approach shows performances comparable to those of models based on normality assumption, even in Gaussian settings. On the other hand, in two benchmarking real datasets, it outperforms the existing parametric approaches.
Clustering via nonparametric density estimation: an application to microarray data.
De Bin, Riccardo;Risso, Davide
2010
Abstract
Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, being the number of variables much higher than the number of observations. Here, we present a novel approach to clustering of microarray data via nonparametric density estimation, based on the following steps: (i) selection of relevant variables; (ii) dimensionality reduction; (iii) clustering of observations in the reduced space. Applications on simulated and real data show promising results in comparison with those produced by two standard approaches, k-means and Mclust. In the simulation studies, our nonparametric approach shows performances comparable to those of models based on normality assumption, even in Gaussian settings. On the other hand, in two benchmarking real datasets, it outperforms the existing parametric approaches.File | Dimensione | Formato | |
---|---|---|---|
2010_3_20100416154030.pdf
accesso aperto
Licenza:
Accesso gratuito
Dimensione
596.81 kB
Formato
Adobe PDF
|
596.81 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.