Abstract: In this report we summarize the state-of-the-art of speech emotion recognition from the signal
processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the
observation is made that existing approaches for supervised machine learning lead to database dependent
classifiers which can not be applied for multi-language speech emotion recognition without additional training
because they discriminate the emotion classes following the used training language. As there are experimental
results showing that Humans can perform language independent categorisation, we made a parallel between
machine recognition and the cognitive process and tried to discover the sources of these divergent results. The
analysis suggests that the main difference is that the speech perception allows extraction of language
independent features although language dependent features are incorporated in all levels of the speech signal
and play as a strong discriminative function in human perception. Based on several results in related domains, we
have suggested that in addition, the cognitive process of emotion-recognition is based on categorisation, assisted
by some hierarchical structure of the emotional categories, existing in the cognitive space of all humans. We
propose a strategy for developing language independent machine emotion recognition, related to the
identification of language independent speech features and the use of additional information from visual
(expression) features.
ACM Classification Keywords: I.2 Artificial Intelligence, 1.2.0.Cognitive simulation, 1.2.7. Natural language
processing - Speech recognition and synthesis
Link:
A COGNITIVE SCIENCE REASONING IN RECOGNITION OF EMOTIONS IN AUDIO-VISUAL SPEECH
Velina Slavova, Werner Verhelst, Hichem Sahli
http://www.foibg.com/ijitk/ijitk-vol02/ijitk02-4-p05.pdf