MiSoSouP: Mining Interesting Subgroups with Sampling and Pseudodimension

We present MiSoSouP, a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different interestingness measures, from a random sample of a transactional dataset. We describe a new formulation of these measures that makes it possible to approximate them using sampling. We then discuss how pseudodimension, a key concept from statistical learning theory, relates to the sample size needed to obtain an high-quality approximation of the most interesting subgroups. We prove an upper bound on the pseudodimension of the problem at hand, which results in small sample sizes. Our evaluation on real datasets shows that MiSoSouP outperforms state-of-the-art algorithms offering the same guarantees, and it vastly speeds up the discovery of subgroups w.r.t. analyzing the whole dataset.

MiSoSouP: Mining Interesting Subgroups with Sampling and Pseudodimension

M. Riondato;F. Vandin

2018

Abstract

We present MiSoSouP, a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different interestingness measures, from a random sample of a transactional dataset. We describe a new formulation of these measures that makes it possible to approximate them using sampling. We then discuss how pseudodimension, a key concept from statistical learning theory, relates to the sample size needed to obtain an high-quality approximation of the most interesting subgroups. We prove an upper bound on the pseudodimension of the problem at hand, which results in small sample sizes. Our evaluation on real datasets shows that MiSoSouP outperforms state-of-the-art algorithms offering the same guarantees, and it vastly speeds up the discovery of subgroups w.r.t. analyzing the whole dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Titolo del Libro
	
				Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
			
	Titolo convegno
	
				ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3219819.3219989
			
	Codice WOS
	
				WOS:000455346400219
			
	Codice Scopus
	
				2-s2.0-85051567955
			
	Codice OpenAlex
	
				W2809301005
			
	Appare nelle tipologie:
	
				04.01 - Contributo in atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3270081

Citazioni

ND

8

5

9

social impact