Query Performance Prediction (QPP) estimates Information Retrieval (IR) systems' effectiveness without relying on manual relevance judgments. A central challenge in QPP lies in its unstable performance, which may vary significantly across queries. In parallel, the concept of risk-sensitive evaluation in IR seeks to enhance robustness by minimizing performance variance and mitigating poor retrieval outcomes. Despite commonalities and complementarities, existing research has failed to integrate these two perspectives, specifically by attempting to apply risk-sensitive metrics to enhance QPP evaluation robustness. Indeed, current QPP assessments, typically based on correlation measures and the sMARE framework, insufficiently address robustness, potentially incurring into misleading conclusions. This paper proposes a novel risk-sensitive evaluation methodology to assess QPP robustness. Through empirical analysis on the Deep Learning'19, Deep Learning'20, and Robust'04 datasets, we demonstrate that high correlation does not necessarily imply robustness. Risk-aware metrics such as URisk, TRisk, and GeoRisk uncover critical variations in QPP performance, offering statistically sound insights with reduced variability. Our findings underscore the value of incorporating risk-sensitive evaluation into QPP, ultimately contributing to the development of more reliable and robust IR systems. Code: https://github.com/RicardoMarcal/qpp-risk-evaluator

A Robustness Assessment of Query Performance Prediction (QPP) Methods Based on Risk-Sensitive Analysis

Faggioli G.;Ferro N.;
2025

Abstract

Query Performance Prediction (QPP) estimates Information Retrieval (IR) systems' effectiveness without relying on manual relevance judgments. A central challenge in QPP lies in its unstable performance, which may vary significantly across queries. In parallel, the concept of risk-sensitive evaluation in IR seeks to enhance robustness by minimizing performance variance and mitigating poor retrieval outcomes. Despite commonalities and complementarities, existing research has failed to integrate these two perspectives, specifically by attempting to apply risk-sensitive metrics to enhance QPP evaluation robustness. Indeed, current QPP assessments, typically based on correlation measures and the sMARE framework, insufficiently address robustness, potentially incurring into misleading conclusions. This paper proposes a novel risk-sensitive evaluation methodology to assess QPP robustness. Through empirical analysis on the Deep Learning'19, Deep Learning'20, and Robust'04 datasets, we demonstrate that high correlation does not necessarily imply robustness. Risk-aware metrics such as URisk, TRisk, and GeoRisk uncover critical variations in QPP performance, offering statistically sound insights with reduced variability. Our findings underscore the value of incorporating risk-sensitive evaluation into QPP, ultimately contributing to the development of more reliable and robust IR systems. Code: https://github.com/RicardoMarcal/qpp-risk-evaluator
2025
ICTIR 2025 - Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval
15th International Conference on Innovative Concepts and Theories in Information Retrieval, ICTIR 2025
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3563661
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact