A CNN-BASED APPROACH FOR SPEAKER IDENTIFICATION USING MFCC FEATURES IN NOISY AND REAL-WORLD ENVIRONMENTS

Philip O. Odion; Pasha R. Adeiza; Tijjani Abdullahi

Authors

Philip O. Odion
Pasha R. Adeiza Department of Computer Science, Nigerian Defence Academy (NDA), Kaduna,
Tijjani Abdullahi

Abstract

Speech extraction and recognition have become essential components in modern intelligent systems, especially in applications requiring accurate speaker identification under real-world conditions. Thus, this study represents a deep learning-based approach to speech recognition for speaker identification using Convolutional Neural Networks (CNNs) trained on Mel-Frequency Cepstral Coefficients (MFCCs). This research integrates both locally collected speech data and an external benchmark dataset (Mikhailava et al., 2022) to evaluate model performance under varying acoustic conditions. A total of 630 audio samples were collected from 21 participants across diverse environments, including both clean and noisy recordings. The proposed model achieved training accuracy exceeding 95%, validation accuracy of approximately 73%, and test accuracy of 75.82% on the local dataset. Evaluation on the benchmark dataset produced a test accuracy of 100%, indicating strong model learning under controlled conditions. The results showed that while high accuracy can be achieved with clean data, performance declines in real-world noisy environments due to variability in speech patterns, recording quality, and background interference. This study demonstrates that CNN-based models can effectively support speaker identification tasks, while highlighting the need for improved generalization, larger datasets, and enhanced noise-handling techniques for practical deployment.

A CNN-BASED APPROACH FOR SPEAKER IDENTIFICATION USING MFCC FEATURES IN NOISY AND REAL-WORLD ENVIRONMENTS

Authors

Abstract

Downloads

Published

Issue

Section

License

Developed By

Information