COMPARATIVE ANALYSIS OF FEATURE EXTRACTION TECHNIQUES FOR SPAM DETECTION

Authors

  • Gbenga O. Ogunsanwo Department of Computer Science, College of Science and Information Technology, Tai Solarin University of Education, Ogun State,
  • Blessing C. Ngoka Department of Computer Science, College of Science and Information Technology, Tai Solarin University of Education, Ogun State,
  • Olumiywa O. Alaba Department of Computer Science, College of Science and Information Technology, Tai Solarin University of Education, Ogun State,
  • Ayokunle A. Omotunde Department of Computer Science, Babcock University, Ilisan Remo , Ogun State,

Abstract

The advent of smartphones has tremendously increased the spam rate in the communication sector. Developing a predictive model for spam detection plays a crucial role in enhance online security, improving user experience and protecting businesses from various risk that comes with spam. Feature extraction (FE) is a very important stage in increasing the accuracy of the model. This study, therefore, developed a comparative study of five FE techniques on a spam dataset. The study used dataset from Kaggle repository which contain 5,574 SMS messages in English tagged as ham (legitimate) or spam. The study employed five FE techniques which are: BoW, PCA, TF-IDF, N Gram and BERT with two classifier which are SVM and LR. The results pointed out that BERTS FE usually lead to the highest accuracy for the experiment carried out, while both SVM and LR achieved their best accuracy of 0.989 and 0.990 respectively. The study concluded that the accuracy results highlight the importance of choosing appropriate feature extraction techniques. The study recommends that careful selection of FE methods will optimize the model performance. Further works can be done with different dataset with different FE techniques and different Deep learning algorithm.

Downloads

Published

2025-06-30

Issue

Section

ARTICLES