Protecting the privacy of a user querying an Information Retrieval (IR) system is of utmost importance. The problem is exacerbated when the IR system is not cooperative in satisfying the user’s privacy requirements. To address this, obfuscation techniques split the user’s sensitive query into multiple non-sensitive ones that can be safely transmitted to the IR system. To generate such queries, current approaches rely on lexical databases, such as WordNet, or heuristics of word co-occurrences. At the same time, advances in Natural Language Processing (NLP) have shown the power of Differential Privacy (DP) in releasing privacy-preserving text for completely different purposes, such as spam detection and sentiment analysis. We investigate for the first time whether DP mechanisms, originally designed for specific NLP tasks, can effectively be used in IR to obfuscate queries. We also assess their performance compared to state-of-the-art techniques in IR. Our empirical evaluation shows that the Vickrey DP mechanism based on the Mahalanobis norm with privacy budget ϵ∈[10,12.5] achieves state-of-the-art privacy protection and improved effectiveness. Furthermore, differently from previous approaches that are substantially on/off, by changing the privacy budget ϵ, DP allows users to adjust their desired level of privacy protection, offering a trade-off between effectiveness and privacy.

Query Obfuscation for Information Retrieval Through Differential Privacy

Faggioli G.;Ferro N.
2024

Abstract

Protecting the privacy of a user querying an Information Retrieval (IR) system is of utmost importance. The problem is exacerbated when the IR system is not cooperative in satisfying the user’s privacy requirements. To address this, obfuscation techniques split the user’s sensitive query into multiple non-sensitive ones that can be safely transmitted to the IR system. To generate such queries, current approaches rely on lexical databases, such as WordNet, or heuristics of word co-occurrences. At the same time, advances in Natural Language Processing (NLP) have shown the power of Differential Privacy (DP) in releasing privacy-preserving text for completely different purposes, such as spam detection and sentiment analysis. We investigate for the first time whether DP mechanisms, originally designed for specific NLP tasks, can effectively be used in IR to obfuscate queries. We also assess their performance compared to state-of-the-art techniques in IR. Our empirical evaluation shows that the Vickrey DP mechanism based on the Mahalanobis norm with privacy budget ϵ∈[10,12.5] achieves state-of-the-art privacy protection and improved effectiveness. Furthermore, differently from previous approaches that are substantially on/off, by changing the privacy budget ϵ, DP allows users to adjust their desired level of privacy protection, offering a trade-off between effectiveness and privacy.
2024
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
46th European Conference on Information Retrieval, ECIR 2024
9783031560262
9783031560279
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3524130
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
  • OpenAlex ND
social impact