The semantic mismatch between query and document terms – i.e., the semantic gap – is a long-standing problem in Information Retrieval (IR). Two main linguistic features related to the semantic gap that can be exploited to improve retrieval are synonymy and polysemy. Recent works integrate knowledge from curated external resources into the learning process of neural language models to reduce the effect of the semantic gap. However, these knowledge-enhanced language models have been used in IR mostly for re-ranking. We propose the Semantic-Aware Neural Framework for IR (SAFIR), an unsupervised knowledge-enhanced neural framework explicitly tailored for IR. SAFIR jointly learns word, concept, and document representations from scratch. The learned representations encode both polysemy and synonymy to address the semantic gap. We investigate SAFIR application in the medical domain, where the semantic gap is prominent and there are many specialized and manually curated knowledge resources. The evaluation on shared test collections for medical retrieval shows the effectiveness of SAFIR to address the semantic gap.

SAFIR: A semantic-aware neural framework for IR

Agosti M.;Marchesin S.;Silvello G.
2021

Abstract

The semantic mismatch between query and document terms – i.e., the semantic gap – is a long-standing problem in Information Retrieval (IR). Two main linguistic features related to the semantic gap that can be exploited to improve retrieval are synonymy and polysemy. Recent works integrate knowledge from curated external resources into the learning process of neural language models to reduce the effect of the semantic gap. However, these knowledge-enhanced language models have been used in IR mostly for re-ranking. We propose the Semantic-Aware Neural Framework for IR (SAFIR), an unsupervised knowledge-enhanced neural framework explicitly tailored for IR. SAFIR jointly learns word, concept, and document representations from scratch. The learned representations encode both polysemy and synonymy to address the semantic gap. We investigate SAFIR application in the medical domain, where the semantic gap is prominent and there are many specialized and manually curated knowledge resources. The evaluation on shared test collections for medical retrieval shows the effectiveness of SAFIR to address the semantic gap.
2021
Proceedings of the 11th Italian Information Retrieval Workshop, September 13–15, 2021, Bari, Italy
IIR 2021
File in questo prodotto:
File Dimensione Formato  
paper13.pdf

accesso aperto

Tipologia: Published (Publisher's Version of Record)
Licenza: Creative commons
Dimensione 197.15 kB
Formato Adobe PDF
197.15 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3508123
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact