Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Last update: Nov 12, 2022

Overview

Light-SERNet

This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition", submitted in ICASSP 2022.

In this paper, we propose an efficient and lightweight fully convolutional neural network(FCNN) for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability. The extracted features are used to classify the emotion of the input speech segment. While our model has a smaller size than that of the state-of-the-art models, it achieves a higher performance on the IEMOCAP and EMO-DB datasets.

Run

1. Clone Repository

$ git clone https://github.com/AryaAftab/LIGHT-SERNET.git
$ cd LIGHT-SERNET/

2. Requirements

Tensorflow >= 2.3.0
Numpy >= 1.19.2
Tqdm >= 4.50.2
Matplotlib> = 3.3.1
Scikit-learn >= 0.23.2

$ pip install -r requirements.txt

3. Data:

Download EMO-DB and IEMOCAP(requires permission to access) datasets
extract them in data folder

4. Prepare datasets :

Use the following code to convert each dataset to the desired size(second):

$ python utils/segment/segment_dataset.py -dp data/{dataset_folder} -ip utils/DATASET_INFO.json -d {datasetname_in_jsonfile} -l {desired_size(seconds)}

For example, for EMO-DB Dataset :

$ python utils/segment/segment_dataset.py -dp data/EMO-DB -ip utils/DATASET_INFO.json -d EMO-DB -l 3

5. Set hyperparameters and training config :

You only need to change the constants in the hyperparameters.py to set the hyperparameters and the training config.

6. Strat training:

Use the following code to train the model on the desired dataset with the desired cost function.

Note 1: The database name is the name of the database folder after segmentation.
Note 2: The results for the confusion matrix are saved in the result folder.

$ python train.py -dn {dataset_name_after_segmentation} -ln {cost_function_name}

For example, for EMO-DB Dataset :

$ python train.py -dn EMO-DB_3s_Segmented -ln focal

Citation

If you find our code useful for your research, please consider citing:

@article{aftab2021light,
  title={Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition},
  author={Aftab, Arya and Morsali, Alireza and Ghaemmaghami, Shahrokh and Champagne, Benoit},
  journal={arXiv preprint arXiv:2110.03435},
  year={2021}
}

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Related tags

Overview

Light-SERNet

Run

1. Clone Repository

2. Requirements

3. Data:

4. Prepare datasets :

5. Set hyperparameters and training config :

6. Strat training:

Citation

Owner

Arya Aftab

Official PyTorch implementation of "Preemptive Image Robustification for Protecting Users against Man-in-the-Middle Adversarial Attacks" (AAAI 2022)

Genetic feature selection module for scikit-learn

InterfaceGAN++: Exploring the limits of InterfaceGAN

RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting

SE-MSCNN: A Lightweight Multi-scaled Fusion Network for Sleep Apnea Detection Using Single-Lead ECG Signals

A C implementation for creating 2D voronoi diagrams

[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Dynamic Slimmable Network (CVPR 2021, Oral)

NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

A Fast Knowledge Distillation Framework for Visual Recognition

Examples of using f2py to get high-speed Fortran integrated with Python easily

Library for fast text representation and classification.

GANmouflage: 3D Object Nondetection with Texture Fields

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

Self-Supervised Learning with Kernel Dependence Maximization

Code for layerwise detection of linguistic anomaly paper (ACL 2021)

Attempt at implementation of a simple GAN using Keras

K-Nearest Neighbor in Pytorch