COMPARATIVE ANALYSIS OF FEATURE EXTRACTION TECHNIQUES FOR SPAM DETECTION
Abstract
The advent of smartphones has tremendously increased the spam rate in the communication sector. Developing a predictive model for spam detection plays a crucial role in enhance online security, improving user experience and protecting businesses from various risk that comes with spam. Feature extraction (FE) is a very important stage in increasing the accuracy of the model. This study, therefore, developed a comparative study of five FE techniques on a spam dataset. The study used dataset from Kaggle repository which contain 5,574 SMS messages in English tagged as ham (legitimate) or spam. The study employed five FE techniques which are: BoW, PCA, TF-IDF, N Gram and BERT with two classifier which are SVM and LR. The results pointed out that BERTS FE usually lead to the highest accuracy for the experiment carried out, while both SVM and LR achieved their best accuracy of 0.989 and 0.990 respectively. The study concluded that the accuracy results highlight the importance of choosing appropriate feature extraction techniques. The study recommends that careful selection of FE methods will optimize the model performance. Further works can be done with different dataset with different FE techniques and different Deep learning algorithm.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Science World Journal

This work is licensed under a Creative Commons Attribution 4.0 International License.