Pattern mining is a fundamental data mining task with applications in several domains. In this work, we consider the scenario in which we have a sequence of datasets generated by potentially different underlying generative processes, and we study the problem of mining statistically robust patterns, which are patterns whose probabilities of appearing in transactions drawn from such generative processes respect well defined conditions. Such conditions define the patterns of interest, describing the evolution of their probabilities through the datasets in the sequence, which may, for example, increase, decrease, or stay stable, through the sequence. Due to the stochastic nature of the data, one cannot identify the exact set of the statistically robust patterns analyzing a sequence of samples, i.e., the datasets, taken from the generative processes, and has to resort to approximations. We then propose GRosSo, an algorithm to find a rigorous approximation of the statistically robust patterns that does not contain false positives with high probability. We apply our framework to the mining of statistically robust sequential patterns. Our extensive evaluation on pseudo-artificial and real data shows that GRosSo provides high-quality approximations for the problem of mining statistically robust sequential patterns.

GRosSo: Mining statistically robust patterns from a sequence of datasets

Tonon A.;Vandin F.
2020

Abstract

Pattern mining is a fundamental data mining task with applications in several domains. In this work, we consider the scenario in which we have a sequence of datasets generated by potentially different underlying generative processes, and we study the problem of mining statistically robust patterns, which are patterns whose probabilities of appearing in transactions drawn from such generative processes respect well defined conditions. Such conditions define the patterns of interest, describing the evolution of their probabilities through the datasets in the sequence, which may, for example, increase, decrease, or stay stable, through the sequence. Due to the stochastic nature of the data, one cannot identify the exact set of the statistically robust patterns analyzing a sequence of samples, i.e., the datasets, taken from the generative processes, and has to resort to approximations. We then propose GRosSo, an algorithm to find a rigorous approximation of the statistically robust patterns that does not contain false positives with high probability. We apply our framework to the mining of statistically robust sequential patterns. Our extensive evaluation on pseudo-artificial and real data shows that GRosSo provides high-quality approximations for the problem of mining statistically robust sequential patterns.
2020
Proceedings - IEEE International Conference on Data Mining, ICDM
978-1-7281-8316-9
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3401587
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact