ToxVI: a Multimodal LLM-based Framework for Generating Intervention in Toxic Code-Mixed Videos

dc.contributor.authorKrishanu Maity
dc.contributor.authorA. S. Poornash
dc.contributor.authorSriparna Saha
dc.contributor.authorKitsuchart Pasupa
dc.date.accessioned2026-05-08T19:18:45Z
dc.date.issued2024-10-20
dc.description.abstractWhile considerable research has delved into detecting toxic content in text-based data, the realm of video content, particularly in languages other than English, has received less attention. Prior studies have primarily focused on creating automated tools to identify online toxic speech but have often overlooked the crucial next steps of mitigating its impact and discouraging future use. We can discourage social media users from sharing such material by automatically generating interventions that explain why certain content is inappropriate. To bridge this research gap, we propose an innovative task: generating interventions for toxic videos in code-mixed languages which go beyond existing methods focusing on text and images to combat online toxicity. We are introducing a Toxic Code-Mixed Intervention Video benchmark dataset (ToxCMI), comprising 1697 code-mixed toxic video utterances sourced from YouTube. Each utterance in this dataset has been meticulously annotated for toxicity and severity, accompanied by interventions provided in Hindi-English code-mixed languages. We have developed an advanced multimodal framework ToxVI, specifically designed for the task of generating Toxic Video appropriate Interventions, leveraging Large Language Models (LLMs), which comprises three modules - Modality module, Cross-Modal Synchronization module and Generation module. Our experiments demonstrate that integrating multiple modalities from the videos significantly enhances the performance of the proposed task and outperforms all the baselines by a significant margin.
dc.identifier.doi10.1145/3627673.3680004
dc.identifier.urihttps://dspace.kmitl.ac.th/handle/123456789/16691
dc.subjectHate Speech and Cyberbullying Detection
dc.titleToxVI: a Multimodal LLM-based Framework for Generating Intervention in Toxic Code-Mixed Videos
dc.typeArticle

Files

Collections