s-AWARE: Using crowd judgements in supervised measure-based methods for IR evaluation

Ferrante, M.; Ferro, N.; Piazzon, L.

Crowdsourcing methodologies have recently emerged as a cheap and fast alternative to the traditional document assessment process for ground truth creation. Early approaches make use of voting and/or classification methodologies to combine crowd judgements into a merged pool, used as reference in the evaluation process. A measure-based approach has instead been used in Assessor-driven Weighted Averages for Retrieval Evaluation (AWARE) [3], focusing in optimizing the final evaluation measure without merging judgements at pool level. s-AWARE extends AWARE with a set of supervised methods. We rely on several TREC collections to evaluate s-AWARE and we show that it outperforms state-of-the-art methods. Moreover, our results show that when moving to the real case scenario where a crowd-assessor only judges a portion of the dataset, s-AWARE is quite an effective approach.