The identification of significant patterns, defined as patterns whose frequency significantly deviates from what is expected under a suitable null model of the data, is a key data mining task with application in several areas. We present PROMISE, an algorithm for identifying significant sequential patterns while guaranteeing that the probability that one or more false discoveries are reported in output (i.e., the Family-Wise Error Rate - FWER) is less than a user-defined threshold. PROMISE employs the Westfall-Young method to correct for multiple hypothesis testing, a more powerful method than the commonly used Bonferroni correction. PROMISE crucially hinges on the generation of (random) permuted datasets with features similar to the input dataset, for which we provide two efficient strategies. We also provide a rigorous analysis of one of such strategies, which is based on a properly defined swap operation, proving a rigorous bound on the number of swaps it requires. The results of our experimental evaluation show that PROMISE is an efficient method that allows the discovery of statistically significant sequential patterns from transactional datasets while properly controlling for false discoveries.
Permutation strategies for mining significant sequential patterns
Tonon A.;Vandin F.
2019
Abstract
The identification of significant patterns, defined as patterns whose frequency significantly deviates from what is expected under a suitable null model of the data, is a key data mining task with application in several areas. We present PROMISE, an algorithm for identifying significant sequential patterns while guaranteeing that the probability that one or more false discoveries are reported in output (i.e., the Family-Wise Error Rate - FWER) is less than a user-defined threshold. PROMISE employs the Westfall-Young method to correct for multiple hypothesis testing, a more powerful method than the commonly used Bonferroni correction. PROMISE crucially hinges on the generation of (random) permuted datasets with features similar to the input dataset, for which we provide two efficient strategies. We also provide a rigorous analysis of one of such strategies, which is based on a properly defined swap operation, proving a rigorous bound on the number of swaps it requires. The results of our experimental evaluation show that PROMISE is an efficient method that allows the discovery of statistically significant sequential patterns from transactional datasets while properly controlling for false discoveries.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.