Design an intelligent system based on a computational cognitive model using attention network task
Speech is the most effective way to exchange information. In a speech, the voice of a speaker carries additional information other than the words and grammar content of the speech, i.e., age, gender, emotional state, etc. Many studies have been conducted with various approaches to emotional content of speech. These studies show that emotion content in speech has a dynamic nature. The dynamics of speech makes it difficult to extract the emotion hidden in a speech. This study evaluates the implicit emotion in a message through emotional speech processing by applying the Mel-Frequency Cepstral Coefficient(MFCC) and Short-Time Fourier Transform(STFT) features.
The input data is the Berlin Emotional Speech Database consisting of seven emotional states, anger, boredom, disgust, anxiety/fear, happiness, sadness, and neutral version. MATLAB software is used to input audio files of the database. Next, the MFCC and STFT features are extracted. Feature vectors for each method is calculated based on seven statistical values, i.e. minimum, maximum, mean, standard deviation, median, skewness, and kurtosis. Then, they are used as an input to an Artificial Neural Network. Finally, the recognition of emotional states is done by training functions based on different algorithms.
The results show that the average and accuracy of emotional states recognized by using STFT features are better and more robust than MFCC features. Also, emotional states of anger and sadness have higher rate of recognition among the other emotions.
STFT features showed to be better than MFCC features to extract implicit emotion in speech.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.