A CNN-BASED APPROACH FOR SPEAKER IDENTIFICATION USING MFCC FEATURES IN NOISY AND REAL-WORLD ENVIRONMENTS

Authors

  • Philip O. Odion
  • Pasha R. Adeiza Department of Computer Science, Nigerian Defence Academy (NDA), Kaduna,
  • Tijjani Abdullahi

Abstract

Speech extraction and recognition have become essential components in modern intelligent systems, especially in applications requiring accurate speaker identification under real-world conditions. Thus, this study represents a deep learning-based approach to speech recognition for speaker identification using Convolutional Neural Networks (CNNs) trained on Mel-Frequency Cepstral Coefficients (MFCCs). This research integrates both locally collected speech data and an external benchmark dataset (Mikhailava et al., 2022) to evaluate model performance under varying acoustic conditions. A total of 630 audio samples were collected from 21 participants across diverse environments, including both clean and noisy recordings. The proposed model achieved training accuracy exceeding 95%, validation accuracy of approximately 73%, and test accuracy of 75.82% on the local dataset. Evaluation on the benchmark dataset produced a test accuracy of 100%, indicating strong model learning under controlled conditions. The results showed that while high accuracy can be achieved with clean data, performance declines in real-world noisy environments due to variability in speech patterns, recording quality, and background interference. This study demonstrates that CNN-based models can effectively support speaker identification tasks, while highlighting the need for improved generalization, larger datasets, and enhanced noise-handling techniques for practical deployment.

Downloads

Published

2026-06-30

Issue

Section

ARTICLES