HATE SPEECH DETECTION IN HAUSA CODE-MIXED TWEETS USING MACHINE LEARNING

Authors

  • Bashir Idris Sulaiman Department of Secure Computing, Kaduna State University, Kaduna,
  • Muhammad Aminu Ahmad Department of Secure Computing, Kaduna State University, Kaduna,

Abstract

Code-mixed communication in Nigeria, involving English, Nigerian Pidgin, Hausa, Yoruba, and Igbo, poses challenges for Natural Language Processing (NLP) systems, especially in detecting hate speech. Existing research typically focuses on high-resource languages, leaving code-mixed African data underexplored. This study applies logistic regression and random forest algorithms to identify hate speech in Hausa code-mixed tweets, utilizing two datasets of annotated posts. Key preprocessing steps included text normalization and feature extraction using TF-IDF and Bag-of-Words. Results showed that the Logistic Regression model with TF-IDF features outperformed Random Forest in accuracy, recall, and F1-score, while the optimized Random Forest model demonstrated notable performance improvements. The results demonstrate the effectiveness of machine learning for hate speech detection in low-resource languages like Hausa.

Downloads

Published

2025-12-29

Issue

Section

ARTICLES