Evaluation measures are the basis for quantifying the performance of information access systems and the way in which their values can be processed to perform statistical analyses depends on the scales on which these measures are defined. For example, mean and variance should be computed only when relying on interval scales. We define a formal theory of evaluation measures, based on the representational theory of measurement, which allows us to determine whether and when measures are interval scales. We found that common set- based retrieval measures – namely Precision, Recall, and F-measure – always are interval scales in the case of binary relevance while this does not happen in the multi-graded relevance case. In the case of rank-based retrieval measures – namely AP, gRBP, DCG, and ERR – only gRBP is an interval scale when we choose a specific value of the parameter p and define a specific total order among systems while all the other measures are not interval scales.

A formal theory to determine scale properties of evaluation measures

Ferrante M.
;
Ferro N.
;
Pontarollo S.
2019

Abstract

Evaluation measures are the basis for quantifying the performance of information access systems and the way in which their values can be processed to perform statistical analyses depends on the scales on which these measures are defined. For example, mean and variance should be computed only when relying on interval scales. We define a formal theory of evaluation measures, based on the representational theory of measurement, which allows us to determine whether and when measures are interval scales. We found that common set- based retrieval measures – namely Precision, Recall, and F-measure – always are interval scales in the case of binary relevance while this does not happen in the multi-graded relevance case. In the case of rank-based retrieval measures – namely AP, gRBP, DCG, and ERR – only gRBP is an interval scale when we choose a specific value of the parameter p and define a specific total order among systems while all the other measures are not interval scales.
2019
Proc. 27th Italian Symposium on Advanced Database Systems (SEBD 2019)
27th Italian Symposium on Advanced Database Systems, SEBD 2019
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3326098
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact