Several recent studies have explored the interaction effects between topics, systems, corpora, and components when measuring retrieval effectiveness. However, all of these previous studies assume that a topic or information need is represented by a single query. In reality, users routinely reformulate queries to satisfy an information need. In recent years, there has been renewed interest in the notion of "query variations"which are essentially multiple user formulations for an information need. Like many retrieval models, some queries are highly effective while others are not. This is often an artifact of the collection being searched which might be more or less sensitive to word choice. Users rarely have perfect knowledge about the underlying collection, and so finding queries that work is often a trial-And-error process. In this work, we explore the fundamental problem of system interaction effects between collections, ranking models, and queries. To answer this important question, we formalize the analysis using ANalysis Of VAriance (ANOVA) models to measure multiple components effects across collections and topics by nesting multiple query variations within each topic. Our findings show that query formulations have a comparable effect size of the topic factor itself, which is known to be the factor with the greatest effect size in prior ANOVA studies. Both topic and formulation have a substantially larger effect size than any other factor, including the ranking algorithms and, surprisingly, even query expansion. This finding reinforces the importance of further research in understanding the role of query rewriting in IR related tasks.
Topic Difficulty: Collection and Query Formulation Effects
Faggioli G.
;Ferro N.
;
2022
Abstract
Several recent studies have explored the interaction effects between topics, systems, corpora, and components when measuring retrieval effectiveness. However, all of these previous studies assume that a topic or information need is represented by a single query. In reality, users routinely reformulate queries to satisfy an information need. In recent years, there has been renewed interest in the notion of "query variations"which are essentially multiple user formulations for an information need. Like many retrieval models, some queries are highly effective while others are not. This is often an artifact of the collection being searched which might be more or less sensitive to word choice. Users rarely have perfect knowledge about the underlying collection, and so finding queries that work is often a trial-And-error process. In this work, we explore the fundamental problem of system interaction effects between collections, ranking models, and queries. To answer this important question, we formalize the analysis using ANalysis Of VAriance (ANOVA) models to measure multiple components effects across collections and topics by nesting multiple query variations within each topic. Our findings show that query formulations have a comparable effect size of the topic factor itself, which is known to be the factor with the greatest effect size in prior ANOVA studies. Both topic and formulation have a substantially larger effect size than any other factor, including the ranking algorithms and, surprisingly, even query expansion. This finding reinforces the importance of further research in understanding the role of query rewriting in IR related tasks.File | Dimensione | Formato | |
---|---|---|---|
faggioli.pdf
Accesso riservato
Tipologia:
Published (Publisher's Version of Record)
Licenza:
Accesso privato - non pubblico
Dimensione
7.51 MB
Formato
Adobe PDF
|
7.51 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.