This repository provides an off-the-shelf arousal recognition API, which can be used to infer whether the speaker recorded in the input speech file conveys a low, mid, or high arousal level.
If you have conda installed (either miniconda or anaconda), you can execute
conda env create -f .env-ymls/FIPsustAGE.ymlto setup the virtual environment required to execute the API. You can activate the FIPsustAGE environment with
source ./activate FIPsustAGEsrc contains the back-end Python scripts of the API, including the generation and segmentation of the Mel-spectrogram representations of the input speech file, and their analysis with a Convolutional Neural Network.
Next, we provide an example on how to execute the provided API inside a Python program.
>>> import module_arousal as arousal
>>> arousal.API('[audioFilePath].wav')
('mid', '0.524')The first element corresponds to the arousal level inferred by the model (low, mid, or high), and the second element indicates the probability score associated to the inference.
If you use the code from this repository, you are kindly asked to cite the following paper.
A. Mallol-Ragolta and B. Schuller, “Coupling Sentiment and Arousal Analysis Towards an Affective Dialogue Manager,” IEEE Access, vol. 12, pp. 20654–20662, February 2024.
@article{Mallol-Ragolta24-CSA,
author={Adria Mallol-Ragolta and Björn Schuller},
title={{Coupling Sentiment and Arousal Analysis Towards an Affective Dialogue Manager}},
journal={IEEE Access},
volume = {12},
publisher = {IEEE},
year={2024},
month = {February},
pages = {20654--20662},
}
The code and the model weights are released under the MIT License.
The research and development of this toolkit has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 826506 (sustAGE).