A robust procedure for Gaussian graphical model search from microarray data with p larger than n

Castelo, R.; Roverato, Alberto

Learning of large--scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full--order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full--order partial correlations does not exist. In this paper we consider limited--order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited--order partial correlations, that we call the non--rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data.