DEVELOPMENT OF DIGITAL FORENSIC TOOLS FOR MALICIOUS URL DETECTION USING MACHINE LEARNING TECHNIQUES
Abstract
The proliferation of malicious Uniform Resource Locators (URLs) poses a significant cybersecurity threat, enabling phishing, malware distribution, and data breaches. Traditional detection methods like blacklisting struggle to keep pace with evolving threats. This study develops a digital forensic tool leveraging machine learning (ML) to detect malicious URLs. Using a dataset of 450,176 URLs (79.8% benign, 23.2% malicious), we engineered lexical, host-based, and geographical features, including URL length, special character count, secure HTTP usage, and URL region. Ensemble ML models (Random Forest, Decision Tree, AdaBoost, Extra Trees) achieved very high classification performance (accuracy: 0.998, precision: 0.997, recall: 0.999, F1-score: 0.998), with only rare misclassifications in highly obfuscated or previously unseen URLs, significantly outperforming Gaussian Naive Bayes (accuracy 0.775) and K-Nearest Neighbors (accuracy 0.772). Despite potential overfitting concerns, the tool demonstrates robust potential for real-time URL filtering and forensic investigations. This framework advances proactive cybersecurity by identifying zero-day threats and providing interpretable features for threat attribution, offering actionable insights for practitioners and policymakers.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Science World Journal

This work is licensed under a Creative Commons Attribution 4.0 International License.