ToxVI: a Multimodal LLM-based Framework for Generating Intervention in Toxic Code-Mixed Videos

Krishanu Maity; A. S. Poornash; Sriparna Saha; Kitsuchart Pasupa

doi:10.1145/3627673.3680004

ToxVI: a Multimodal LLM-based Framework for Generating Intervention in Toxic Code-Mixed Videos

dc.contributor.author	Krishanu Maity
dc.contributor.author	A. S. Poornash
dc.contributor.author	Sriparna Saha
dc.contributor.author	Kitsuchart Pasupa
dc.date.accessioned	2026-05-08T19:18:45Z
dc.date.issued	2024-10-20
dc.description.abstract	While considerable research has delved into detecting toxic content in text-based data, the realm of video content, particularly in languages other than English, has received less attention. Prior studies have primarily focused on creating automated tools to identify online toxic speech but have often overlooked the crucial next steps of mitigating its impact and discouraging future use. We can discourage social media users from sharing such material by automatically generating interventions that explain why certain content is inappropriate. To bridge this research gap, we propose an innovative task: generating interventions for toxic videos in code-mixed languages which go beyond existing methods focusing on text and images to combat online toxicity. We are introducing a Toxic Code-Mixed Intervention Video benchmark dataset (ToxCMI), comprising 1697 code-mixed toxic video utterances sourced from YouTube. Each utterance in this dataset has been meticulously annotated for toxicity and severity, accompanied by interventions provided in Hindi-English code-mixed languages. We have developed an advanced multimodal framework ToxVI, specifically designed for the task of generating Toxic Video appropriate Interventions, leveraging Large Language Models (LLMs), which comprises three modules - Modality module, Cross-Modal Synchronization module and Generation module. Our experiments demonstrate that integrating multiple modalities from the videos significantly enhances the performance of the proposed task and outperforms all the baselines by a significant margin.
dc.identifier.doi	10.1145/3627673.3680004
dc.identifier.uri	https://dspace.kmitl.ac.th/handle/123456789/16691
dc.subject	Hate Speech and Cyberbullying Detection
dc.title	ToxVI: a Multimodal LLM-based Framework for Generating Intervention in Toxic Code-Mixed Videos
dc.type	Article

Collections

All

ToxVI: a Multimodal LLM-based Framework for Generating Intervention in Toxic Code-Mixed Videos

Files

Collections