In Information Retrieval (IR), ground truth creation is a crucial yet resource-intensive task that relies on human experts to build test collections -- essential for training and evaluating retrieval models. Large-scale evaluation campaigns, such as TREC and CLEF, demand significant human effort to produce reliable, high-quality annotations. To ease this process, tailored annotation tools are pivotal to supporting assessors and streamlining their workload. To this end, we introduce Doctron, a web-based, dockerized annotation tool designed to streamline ground truth creation for IR tasks. Doctron enables the annotation of both textual documents and images. It supports annotating textual passages, identifying relationships, tagging and linking entities, evaluating document relevance to a topic with graded labels, and performing object detection. It offers a collaborative environment where teams can work with defined user roles and permissions. The integration of Inter Annotator Agreement (IAA) measures helps to identify inconsistencies between annotators, thereby ensuring the reliability and high quality of the annotated ground truth data.
Doctron: A Web-based Collaborative Annotation Tool for Ground Truth Creation in IR
Ornella Irrera
;Stefano Marchesin
;Gianmaria Silvello
2025
Abstract
In Information Retrieval (IR), ground truth creation is a crucial yet resource-intensive task that relies on human experts to build test collections -- essential for training and evaluating retrieval models. Large-scale evaluation campaigns, such as TREC and CLEF, demand significant human effort to produce reliable, high-quality annotations. To ease this process, tailored annotation tools are pivotal to supporting assessors and streamlining their workload. To this end, we introduce Doctron, a web-based, dockerized annotation tool designed to streamline ground truth creation for IR tasks. Doctron enables the annotation of both textual documents and images. It supports annotating textual passages, identifying relationships, tagging and linking entities, evaluating document relevance to a topic with graded labels, and performing object detection. It offers a collaborative environment where teams can work with defined user roles and permissions. The integration of Inter Annotator Agreement (IAA) measures helps to identify inconsistencies between annotators, thereby ensuring the reliability and high quality of the annotated ground truth data.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.