According to recent research, geometric deep learning allows to reach unprecedented accuracy for online misinformation detection. By fully leveraging the news social context, URL propagation paths in social networks are first represented as graphs and then classified using Graph Neural Network (GNN) models. Despite these remarkable efforts, researchers are still hampered by the scarcity of high-quality benchmark datasets, and as a result, the efficacy of state-of-the-art approaches could be overestimated. So far, in order to obtain a decent number of third-party fact-checked URLs, researchers have either sampled news from notoriously reliable and unreliable sources using distant supervision, or they have gathered pre-labeled URLs from third-party fact-checking websites. In the former case, resulting datasets can be quite large, but also noisy and biased since pieces of news are labeled as true or false according to their source label, and not individually fact-checked. In the latter case, assigned labels are more reliable, but the included news articles are usually in a single language and they may reflect unknown editorial decisions. As a result, datasets of the latter type are typically small, homogeneous, and thus unrealistically easy for automatic fake news detection models. In this work, we present FbMultiLingMisinfo, a new multilingual benchmark dataset, aimed at a more realistic evaluation of state-of-the-art misinformation detection models. URLs in our dataset come from the Facebook Privacy-Protected Full URLs Data Set, which we augmented with their propagation paths on Twitter. Our experimental results show that, when GNN-based models are tested on FbMultiLingMisinfo, recent misinformation detection results are only partially confirmed. We further show that a sharp reduction in the training size significantly reduces the model accuracy on FbMultiLingMisinfo, but not on two other widely used benchmark datasets for fake news detection.

FbMultiLingMisinfo: Challenging Large-Scale Multilingual Benchmark for Misinformation Detection

Da San Martino, Giovanni;
2022

Abstract

According to recent research, geometric deep learning allows to reach unprecedented accuracy for online misinformation detection. By fully leveraging the news social context, URL propagation paths in social networks are first represented as graphs and then classified using Graph Neural Network (GNN) models. Despite these remarkable efforts, researchers are still hampered by the scarcity of high-quality benchmark datasets, and as a result, the efficacy of state-of-the-art approaches could be overestimated. So far, in order to obtain a decent number of third-party fact-checked URLs, researchers have either sampled news from notoriously reliable and unreliable sources using distant supervision, or they have gathered pre-labeled URLs from third-party fact-checking websites. In the former case, resulting datasets can be quite large, but also noisy and biased since pieces of news are labeled as true or false according to their source label, and not individually fact-checked. In the latter case, assigned labels are more reliable, but the included news articles are usually in a single language and they may reflect unknown editorial decisions. As a result, datasets of the latter type are typically small, homogeneous, and thus unrealistically easy for automatic fake news detection models. In this work, we present FbMultiLingMisinfo, a new multilingual benchmark dataset, aimed at a more realistic evaluation of state-of-the-art misinformation detection models. URLs in our dataset come from the Facebook Privacy-Protected Full URLs Data Set, which we augmented with their propagation paths on Twitter. Our experimental results show that, when GNN-based models are tested on FbMultiLingMisinfo, recent misinformation detection results are only partially confirmed. We further show that a sharp reduction in the training size significantly reduces the model accuracy on FbMultiLingMisinfo, but not on two other widely used benchmark datasets for fake news detection.
2022
Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN)
2022 International Joint Conference on Neural Networks (IJCNN)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3537006
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 2
  • OpenAlex ND
social impact