Integrating Content Moderation Systems with Large Language Models

Franco, Mirko; Gaggi, Ombretta; Palazzi, Claudio E.

doi:10.1145/3700789

Online Social Networks (OSNs) rely on content moderation systems to ensure platform and user safety by preventing malicious activities, like the spread of harmful content. However, there is a growing consensus suggesting that such systems are unfair to historically marginalized individuals, fragile users, and minorities. Additionally, OSN policies are often hardcoded in AI-based violation classifiers, making personalized content moderation challenging. In addition, there is a need for more communication between users and platform administrators, especially in case of disagreement about a moderation decision. To address these issues, we propose integrating content moderation systems with Large Language Models (LLMs) to enhance support for personal content moderation and improve user-platform communication. We also evaluate the content moderation capabilities of GPT 3.5 and LLaMa 2, comparing them to commercial products, as well as discuss the limitations of our approach and the open research directions.