A Framework to Evaluate the Quality of Integrated Datasets

Evaluation is a bottleneck in data integration processes: it is performed by domain experts through manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all integrated tuples infeasible. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process. The paper motivates and introduces the measure and provides extensive experimental evaluations, that show the effectiveness and the efficiency of the approach.

A Framework to Evaluate the Quality of Integrated Datasets

Buono, Francesco Del;Faggioli, Guglielmo;Paganelli, Matteo;Baraldi, Andrea;Guerra, Francesco;Ferro, Nicola

2022

Abstract

Evaluation is a bottleneck in data integration processes: it is performed by domain experts through manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all integrated tuples infeasible. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process. The paper motivates and introduces the measure and provides extensive experimental evaluations, that show the effectiveness and the efficiency of the approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Rivista su cui è pubblicata l'opera
	
				APPLIED COMPUTING REVIEW
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3584014.3584015
			
	Codice WOS
	
				WOS:000927582800001
			
	Codice OpenAlex
	
				W4319975904
			
	Appare nelle tipologie:
	
				01.01 - Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
3584014.3584015.pdf accesso aperto Tipologia: Published (Publisher's Version of Record) Licenza: Altro Dimensione 2.15 MB Formato Adobe PDF Visualizza/Apri	2.15 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3471919

Citazioni

ND

ND

0

1

social impact