A Robustness Assessment of Query Performance Prediction (QPP) Methods Based on Risk-Sensitive Analysis

Nascimento, R. M. D. A.; Sousa, D. X. D.; Faggioli, G.; Alvarenga, P. J. L.; Ferro, N.; Goncalves, M. A.

doi:10.1145/3731120.3744611

Query Performance Prediction (QPP) estimates Information Retrieval (IR) systems' effectiveness without relying on manual relevance judgments. A central challenge in QPP lies in its unstable performance, which may vary significantly across queries. In parallel, the concept of risk-sensitive evaluation in IR seeks to enhance robustness by minimizing performance variance and mitigating poor retrieval outcomes. Despite commonalities and complementarities, existing research has failed to integrate these two perspectives, specifically by attempting to apply risk-sensitive metrics to enhance QPP evaluation robustness. Indeed, current QPP assessments, typically based on correlation measures and the sMARE framework, insufficiently address robustness, potentially incurring into misleading conclusions. This paper proposes a novel risk-sensitive evaluation methodology to assess QPP robustness. Through empirical analysis on the Deep Learning'19, Deep Learning'20, and Robust'04 datasets, we demonstrate that high correlation does not necessarily imply robustness. Risk-aware metrics such as URisk, TRisk, and GeoRisk uncover critical variations in QPP performance, offering statistically sound insights with reduced variability. Our findings underscore the value of incorporating risk-sensitive evaluation into QPP, ultimately contributing to the development of more reliable and robust IR systems. Code: https://github.com/RicardoMarcal/qpp-risk-evaluator