This paper presents the participation of our team in Task 1 of the LongEval Lab at CLEF 2025, which investigates the temporal robustness of information retrieval (IR) systems. We compare a traditional Boolean query-based searcher with a neural reranking system based on CamemBERT, focusing on their effectiveness across six monthly web snapshots from March to August 2023. To assess whether observed differences are statistically significant and stable over time, we adopt a methodology inspired by the HIBALL team from CLEF 2023. We simulate realistic query-level variation by generating multiple observations per system and snapshot. We then apply two-way ANOVA and Tukey HSD tests to evaluate the impact of the system and the temporal dimension. Our results show that CamemBERT consistently outperforms Boolean retrieval, with statistically significant differences across all snapshots. We also observe a notable drop in performance for both systems in August, reflecting the impact of collection shift. These findings provide insights into the reliability and temporal stability of IR systems in evolving web environments.
DataHunter at LongEval: Temporal Stability Analysis of Boolean and CamemBERT-Based Retrieval Systems
Ferro N.
2025
Abstract
This paper presents the participation of our team in Task 1 of the LongEval Lab at CLEF 2025, which investigates the temporal robustness of information retrieval (IR) systems. We compare a traditional Boolean query-based searcher with a neural reranking system based on CamemBERT, focusing on their effectiveness across six monthly web snapshots from March to August 2023. To assess whether observed differences are statistically significant and stable over time, we adopt a methodology inspired by the HIBALL team from CLEF 2023. We simulate realistic query-level variation by generating multiple observations per system and snapshot. We then apply two-way ANOVA and Tukey HSD tests to evaluate the impact of the system and the temporal dimension. Our results show that CamemBERT consistently outperforms Boolean retrieval, with statistically significant differences across all snapshots. We also observe a notable drop in performance for both systems in August, reflecting the impact of collection shift. These findings provide insights into the reliability and temporal stability of IR systems in evolving web environments.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




