This paper reconsiders common benchmarking approaches to nearest neighbor search. It is shown that the concept of local intrinsic dimensionality (LID) allows to choose query sets of a wide range of difficulty for real-world datasets. Moreover, the effect of different LID distributions on the running time performance of implementations is empirically studied. To this end, different visualization concepts are introduced that allow to get a more fine-grained overview of the inner workings of nearest neighbor search principles. The paper closes with remarks about the diversity of datasets commonly used for nearest neighbor search benchmarking. It is shown that such real-world datasets are not diverse: results on a single dataset predict results on all other datasets well.

The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

Ceccarello M.
2019

Abstract

This paper reconsiders common benchmarking approaches to nearest neighbor search. It is shown that the concept of local intrinsic dimensionality (LID) allows to choose query sets of a wide range of difficulty for real-world datasets. Moreover, the effect of different LID distributions on the running time performance of implementations is empirically studied. To this end, different visualization concepts are introduced that allow to get a more fine-grained overview of the inner workings of nearest neighbor search principles. The paper closes with remarks about the diversity of datasets commonly used for nearest neighbor search benchmarking. It is shown that such real-world datasets are not diverse: results on a single dataset predict results on all other datasets well.
2019
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12th International Conference on Similarity Search and Applications, SISAP 2019
978-3-030-32046-1
978-3-030-32047-8
File in questo prodotto:
File Dimensione Formato  
978-3-030-32047-8_11.pdf

Accesso riservato

Tipologia: Published (Publisher's Version of Record)
Licenza: Accesso privato - non pubblico
Dimensione 1.06 MB
Formato Adobe PDF
1.06 MB Adobe PDF Visualizza/Apri   Richiedi una copia
1907.07387.pdf

accesso aperto

Tipologia: Accepted (AAM - Author's Accepted Manuscript)
Licenza: Creative commons
Dimensione 998.25 kB
Formato Adobe PDF
998.25 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3470333
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 11
  • OpenAlex 10
social impact