HATE SPEECH DETECTION IN HAUSA CODE-MIXED TWEETS USING MACHINE LEARNING
Abstract
Code-mixed communication in Nigeria, involving English, Nigerian Pidgin, Hausa, Yoruba, and Igbo, poses challenges for Natural Language Processing (NLP) systems, especially in detecting hate speech. Existing research typically focuses on high-resource languages, leaving code-mixed African data underexplored. This study applies logistic regression and random forest algorithms to identify hate speech in Hausa code-mixed tweets, utilizing two datasets of annotated posts. Key preprocessing steps included text normalization and feature extraction using TF-IDF and Bag-of-Words. Results showed that the Logistic Regression model with TF-IDF features outperformed Random Forest in accuracy, recall, and F1-score, while the optimized Random Forest model demonstrated notable performance improvements. The results demonstrate the effectiveness of machine learning for hate speech detection in low-resource languages like Hausa.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Science World Journal

This work is licensed under a Creative Commons Attribution 4.0 International License.