Evaluation measures are the basis for quantifying the performance of information access systems and the way in which their values can be processed to perform statistical analyses depends on the scales on which these measures are defined. For example, mean and variance should be computed only when relying on interval scales. We define a formal theory of evaluation measures, based on the representational theory of measurement, which allows us to determine whether and when measures are interval scales. We found that common set- based retrieval measures – namely Precision, Recall, and F-measure – always are interval scales in the case of binary relevance while this does not happen in the multi-graded relevance case. In the case of rank-based retrieval measures – namely AP, gRBP, DCG, and ERR – only gRBP is an interval scale when we choose a specific value of the parameter p and define a specific total order among systems while all the other measures are not interval scales.
A formal theory to determine scale properties of evaluation measures
Ferrante M.
;Ferro N.
;Pontarollo S.
2019
Abstract
Evaluation measures are the basis for quantifying the performance of information access systems and the way in which their values can be processed to perform statistical analyses depends on the scales on which these measures are defined. For example, mean and variance should be computed only when relying on interval scales. We define a formal theory of evaluation measures, based on the representational theory of measurement, which allows us to determine whether and when measures are interval scales. We found that common set- based retrieval measures – namely Precision, Recall, and F-measure – always are interval scales in the case of binary relevance while this does not happen in the multi-graded relevance case. In the case of rank-based retrieval measures – namely AP, gRBP, DCG, and ERR – only gRBP is an interval scale when we choose a specific value of the parameter p and define a specific total order among systems while all the other measures are not interval scales.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.