In Information Retrieval (IR), ground truth creation is a crucial yet resource-intensive task that relies on human experts to build test collections -- essential for training and evaluating retrieval models. Large-scale evaluation campaigns, such as TREC and CLEF, demand significant human effort to produce reliable, high-quality annotations. To ease this process, tailored annotation tools are pivotal to supporting assessors and streamlining their workload. To this end, we introduce Doctron, a web-based, dockerized annotation tool designed to streamline ground truth creation for IR tasks. Doctron enables the annotation of both textual documents and images. It supports annotating textual passages, identifying relationships, tagging and linking entities, evaluating document relevance to a topic with graded labels, and performing object detection. It offers a collaborative environment where teams can work with defined user roles and permissions. The integration of Inter Annotator Agreement (IAA) measures helps to identify inconsistencies between annotators, thereby ensuring the reliability and high quality of the annotated ground truth data.

Doctron: A Web-based Collaborative Annotation Tool for Ground Truth Creation in IR

Ornella Irrera
;
Stefano Marchesin
;
Gianmaria Silvello
2025

Abstract

In Information Retrieval (IR), ground truth creation is a crucial yet resource-intensive task that relies on human experts to build test collections -- essential for training and evaluating retrieval models. Large-scale evaluation campaigns, such as TREC and CLEF, demand significant human effort to produce reliable, high-quality annotations. To ease this process, tailored annotation tools are pivotal to supporting assessors and streamlining their workload. To this end, we introduce Doctron, a web-based, dockerized annotation tool designed to streamline ground truth creation for IR tasks. Doctron enables the annotation of both textual documents and images. It supports annotating textual passages, identifying relationships, tagging and linking entities, evaluating document relevance to a topic with graded labels, and performing object detection. It offers a collaborative environment where teams can work with defined user roles and permissions. The integration of Inter Annotator Agreement (IAA) measures helps to identify inconsistencies between annotators, thereby ensuring the reliability and high quality of the annotated ground truth data.
2025
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval
The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3555926
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact