Online Social Networks (OSNs) rely on content moderation systems to ensure platform and user safety by preventing malicious activities, like the spread of harmful content. However, there is a growing consensus suggesting that such systems are unfair to historically marginalized individuals, fragile users, and minorities. Additionally, OSN policies are often hardcoded in AI-based violation classifiers, making personalized content moderation challenging. In addition, there is a need for more communication between users and platform administrators, especially in case of disagreement about a moderation decision. To address these issues, we propose integrating content moderation systems with Large Language Models (LLMs) to enhance support for personal content moderation and improve user-platform communication. We also evaluate the content moderation capabilities of GPT 3.5 and LLaMa 2, comparing them to commercial products, as well as discuss the limitations of our approach and the open research directions.

Integrating Content Moderation Systems with Large Language Models

Franco, Mirko
;
Gaggi, Ombretta;Palazzi, Claudio E.
2024

Abstract

Online Social Networks (OSNs) rely on content moderation systems to ensure platform and user safety by preventing malicious activities, like the spread of harmful content. However, there is a growing consensus suggesting that such systems are unfair to historically marginalized individuals, fragile users, and minorities. Additionally, OSN policies are often hardcoded in AI-based violation classifiers, making personalized content moderation challenging. In addition, there is a need for more communication between users and platform administrators, especially in case of disagreement about a moderation decision. To address these issues, we propose integrating content moderation systems with Large Language Models (LLMs) to enhance support for personal content moderation and improve user-platform communication. We also evaluate the content moderation capabilities of GPT 3.5 and LLaMa 2, comparing them to commercial products, as well as discuss the limitations of our approach and the open research directions.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3540876
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact