Spoken Persian digits recognition using deep learning

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Classification of isolated digits is a fundamental challenge for many speech classification systems. Previous works on spoken digits have been limited to the numbers 0 to 9. In this paper, we propose two deep learning-based models for spoken digit recognition in the range of 0 to 599. The first model is a Convolutional Neural Network (CNN) model that uses the Mel spectrogram obtained from the audio data. The second model uses the recent advances in deep sequential models, especially the Transformer model followed by a Long Short-Term Memory (LSTM) Network and a classifier. Moreover, we also collected a dataset, including audio data by a contribution of 145 people, covering the numerical range from 0 to 599. The experimental results on the collected dataset indicate a validation accuracy of 98.03%.
Language:
Persian
Published:
Journal of Modeling in Engineering, Volume:21 Issue: 74, 2023
Pages:
163 to 172
https://www.magiran.com/p2678834