On Finding Hubs in High Dimensions with Sampling

Hubs are a few points that frequently appear in the k-nearest neighbors (kNN) of many other points in a high-dimensional data set. The hubs' effects, called the hubness phenomenon, degrade the performance of kNN based models in high dimensions. We present SamHub, a simple sampling approach to efficiently identify hubs with theoretical guarantees. Apart from previous works based on approximate kNN indexes, SamHub is generic and applicable to any distance measure with negligible additional memory footprint. Empirically, by sampling only 10% of points, SamHub runs significantly faster and offers higher accuracy than existing hub detection methods on many real-world data sets with dot product, L1, L2, and dynamic time warping distances. Our ablation studies of SamHub on improving kNN-based classification show potential for other high-dimensional data analysis tasks.

On Finding Hubs in High Dimensions with Sampling

Dong H.;Zeng L.;Zhao Z.;Silvestri F.;Pham N.

2025

Abstract

Hubs are a few points that frequently appear in the k-nearest neighbors (kNN) of many other points in a high-dimensional data set. The hubs' effects, called the hubness phenomenon, degrade the performance of kNN based models in high dimensions. We present SamHub, a simple sampling approach to efficiently identify hubs with theoretical guarantees. Apart from previous works based on approximate kNN indexes, SamHub is generic and applicable to any distance measure with negligible additional memory footprint. Empirically, by sampling only 10% of points, SamHub runs significantly faster and offers higher accuracy than existing hub detection methods on many real-world data sets with dot product, L1, L2, and dynamic time warping distances. Our ablation studies of SamHub on improving kNN-based classification show potential for other high-dimensional data analysis tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo del Libro
	
				Proc. 39th Annual AAAI Conference on Artificial Intelligence
			
	Collana/serie monografica
	
				PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE
			
	Titolo convegno
	
				Annual AAAI Conference on Artificial Intelligence
			
	Codice DOI
	
				https://dx.doi.org/10.1609/aaai.v39i11.33261
			
	Codice WOS
	
				WOS:001477544600055
			
	Codice Scopus
	
				2-s2.0-105003911937
			
	Codice OpenAlex
	
				W4409364918
			
	Identificativo progetto
	
	Titolo Progetto
	
									Big-Mobility
								
	Finanziamento
	
									Uni-Impresa
								
	Titolo Progetto
	
									AHeAD
								
	Finanziamento
	
									MUR PRIN
								
	N. Contratto
	
									20174LF3T8
								
	Titolo Progetto
	
									Marsden Fund
								
	N. Contratto
	
									MFP-UOA2226
								
	Appare nelle tipologie:
	
				04.01 - Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
33261-Article Text-37329-1-2-20250410 (3).pdf accesso aperto Tipologia: Published (Publisher's Version of Record) Licenza: Accesso libero Dimensione 540.35 kB Formato Adobe PDF Visualizza/Apri	540.35 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3553501

Citazioni

ND

1

1

1

social impact