This paper presents the participation of our team in Task 1 of the LongEval Lab at CLEF 2025, which investigates the temporal robustness of information retrieval (IR) systems. We compare a traditional Boolean query-based searcher with a neural reranking system based on CamemBERT, focusing on their effectiveness across six monthly web snapshots from March to August 2023. To assess whether observed differences are statistically significant and stable over time, we adopt a methodology inspired by the HIBALL team from CLEF 2023. We simulate realistic query-level variation by generating multiple observations per system and snapshot. We then apply two-way ANOVA and Tukey HSD tests to evaluate the impact of the system and the temporal dimension. Our results show that CamemBERT consistently outperforms Boolean retrieval, with statistically significant differences across all snapshots. We also observe a notable drop in performance for both systems in August, reflecting the impact of collection shift. These findings provide insights into the reliability and temporal stability of IR systems in evolving web environments.

DataHunter at LongEval: Temporal Stability Analysis of Boolean and CamemBERT-Based Retrieval Systems

Ferro N.
2025

Abstract

This paper presents the participation of our team in Task 1 of the LongEval Lab at CLEF 2025, which investigates the temporal robustness of information retrieval (IR) systems. We compare a traditional Boolean query-based searcher with a neural reranking system based on CamemBERT, focusing on their effectiveness across six monthly web snapshots from March to August 2023. To assess whether observed differences are statistically significant and stable over time, we adopt a methodology inspired by the HIBALL team from CLEF 2023. We simulate realistic query-level variation by generating multiple observations per system and snapshot. We then apply two-way ANOVA and Tukey HSD tests to evaluate the impact of the system and the temporal dimension. Our results show that CamemBERT consistently outperforms Boolean retrieval, with statistically significant differences across all snapshots. We also observe a notable drop in performance for both systems in August, reflecting the impact of collection shift. These findings provide insights into the reliability and temporal stability of IR systems in evolving web environments.
2025
26th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2025
26th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2025
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3571887
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact