DEVELOPMENT OF AN ENHANCED NAIVE BAYES ALGORITHM FOR FAKE NEWS CLASSIFICATION
Abstract
The proliferation of fake news on social media has become a major concern in recent times with a growing body of research focusing on understanding and detecting these false stories. Fake news can lead to the spread of misinformation, polarization and mistrust between different groups, manipulation, damage to reputation, eroding public trust in media, interfering with democratic processes, and having significant economic impact. It can create confusion and mistrust, making it difficult for people to distinguish between credible and non-credible sources of information. Several researchers have proposed and deployed several conventional techniques to detect fake news from true news. In recent times, Machine learning techniques like the Random Forest (RF), Naive Bayes (NB), Passive Aggressive (PA) among others has been used for fake news detection. Naïve Bayes has been shown to perform poorly due to its assumption of independent features/attributes and also computationally expensive when sparse matrix generated from textual data are converted to dense matrix before use by the algorithm. Against the backdrop of these enhancements, we evaluated the performance of the Naive Bayes classifier and calculated key metrics such as Accuracy (ACC), Precision (PRE), Recall (REC), and F1 Score (F1) for the BuzzFeed News dataset. The results showed an accuracy of 99%, demonstrating the effectiveness of the model. Comparison of the performance accuracy of Random Forest (RF), Naive Bayes (NB), and Passive Aggressive (PA) classifiers with and without text pre-processing was carried out in this study. Naive Bayes emerged as the most effective model in predicting fake and authentic news with 99% accuracy when applied to the body feature matrix without pre-processing. The Naive Bayes classifier, when integrated with Gradient Boost, outperformed both the Passive Aggressive and Random Forest classifiers in this study. Our approach contributes to the ongoing efforts to combat misinformation in online platforms and enhance the credibility of information dissemination. The scores of the Random Forests, Naïve Bayes and Passive Aggressive are as follows 80%, 69%, 87% while that of the new model was 99%.