This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"

Overview

trimodal_person_verification

This repository contains the code, and preprocessed dataset featured in "A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data".

Person verification is the general task of verifying person’s identity using various biometric characteristics. We study an approach to multimodal person verification using audio, visual, and thermal modalities. In particular, we implemented unimodal, bimodal, and trimodal verification systems using the state-of-the-art deep learning architectures and compared their performance under clean and noisy conditions.

Dependencies

pip install -r requirements.txt

Dataset

In this work, we utilized the SpeakingFaces dataset to train, validate, and test the person verification systems. SpeakingFaces is a publicly available multimodal dataset comprised of audio, visual, and thermal data streams. The preprocessed data used for our experiments can be downloaded from Google Drive.

The data directory contains the compressed version of the preprocessed data used for the reported experiments. For each utterance, only the first frame (visual and thermal) is selected. The train set is split into 5 parts that should be extracted into the same location.

The data/metadata subdirectory contains lists prepared for the train, validation, and test sets following the format of VoxCeleb. In particular, the train list contains the paths to the recordings and the corresponding subject identifiers. The validation and test lists consist of randomly generated positive and negative pairs. For each subject, the same number of positive and negative pairs were selected. In total, the numbers of pairs in the validation and test sets are 38,000 and 46,200, respectively.

Note, to run noisy training and evaluation, you should first download the MUSAN dataset.

See trainSpeakerNet.py for details on where the data should be stored.

Training examples : clean data

Unimodal models

python trainSpeakerNet.py --model ResNetSE34Multi --modality wav --log_input True --trainfunc angleproto --max_epoch 1500 --batch_size 100 --nPerSpeaker 9 --max_frames 200 --eval_frames 200 --weight_decay 0.01 --seed 1 --save_path exps/wav/exp1 
python trainSpeakerNet.py --model ResNetSE34Multi --modality rgb --log_input True --trainfunc angleproto --max_epoch 600 --batch_size 100 --nPerSpeaker 9 --max_frames 200 --eval_frames 200 --weight_decay 0.01 --seed 1 --save_path exps/rgb/exp1 
python trainSpeakerNet.py --model ResNetSE34Multi --modality thr --log_input True --trainfunc angleproto --max_epoch 600 --batch_size 100 --nPerSpeaker 9 --max_frames 200 --eval_frames 200 --weight_decay 0.01 --seed 1 --save_path exps/thr/exp1 

Multimodal models

python trainSpeakerNet.py --model ResNetSE34Multi --modality wavrgb --log_input True --trainfunc angleproto --max_epoch 600 --batch_size 100 --nPerSpeaker 9 --max_frames 200 --eval_frames 200 --weight_decay 0.01 --seed 1 --save_path exps/wavrgb/exp1 
python trainSpeakerNet.py --model ResNetSE34Multi --modality wavrgbthr --log_input True --trainfunc angleproto --max_epoch 600 --batch_size 100 --nPerSpeaker 9 --max_frames 200 --eval_frames 200 --weight_decay 0.1 --seed 1 --save_path exps/wavrgb/exp1 

Training examples : noisy data

Unimodal models

python trainSpeakerNet.py --model ResNetSE34Multi --modality wav --noisy_train True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --max_epoch 1500 --batch_size 100 --nPerSpeaker 9 --max_frames 200 --eval_frames 200 --weight_decay 0.001 --seed 1 --save_path exps/wav/exp2
python trainSpeakerNet.py --model ResNetSE34Multi --modality rgb --noisy_train True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --max_epoch 600 --batch_size 100 --nPerSpeaker 9 --max_frames 200 --eval_frames 200 --weight_decay 0.01 --seed 1 --save_path exps/rgb/exp2 
python trainSpeakerNet.py --model ResNetSE34Multi --modality thr --noisy_train True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --max_epoch 600 --batch_size 100 --nPerSpeaker 9 --max_frames 200 --eval_frames 200 --weight_decay 0.01 --seed 1 --save_path exps/thr/exp2 

Multimodal models

python trainSpeakerNet.py --model ResNetSE34Multi --modality wavrgb --noisy_train True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --max_epoch 600 --batch_size 100 --nPerSpeaker 9 --max_frames 200 --eval_frames 200 --weight_decay 0.01 --seed 1 --save_path exps/wavrgb/exp2 
python trainSpeakerNet.py --model ResNetSE34Multi --modality wavrgbthr --noisy_train True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --max_epoch 600 --batch_size 100 --nPerSpeaker 9 --max_frames 200 --eval_frames 200 --weight_decay 0.1 --seed 1 --save_path exps/wavrgb/exp2 

Evaluating pretrained models: clean test data

Unimodal models

python trainSpeakerNet.py --model ResNetSE34Multi --modality wav --eval True --valid_model True --test_path data/test --test_list data/metadata/test_list.txt --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/wav/exp1 
python trainSpeakerNet.py --model ResNetSE34Multi --modality rgb --eval True --valid_model True --test_path data/test --test_list data/metadata/test_list.txt   --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/rgb/exp1 
python trainSpeakerNet.py --model ResNetSE34Multi --modality thr --eval True --valid_model True --test_path data/test --test_list data/metadata/test_list.txt   --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/thr/exp1 

Multimodal models

python trainSpeakerNet.py --model ResNetSE34Multi --modality wavrgb  --eval True --valid_model True --test_path data/test --test_list data/metadata/test_list.txt   --log_input True  --trainfunc angleproto --eval_frames 200 --save_path exps/wavrgb/exp1 
python trainSpeakerNet.py --model ResNetSE34Multi --modality wavrgbthr --eval True --valid_model True --test_path data/test --test_list data/metadata/test_list.txt   --log_input True  --trainfunc angleproto --eval_frames 200 --save_path exps/wavrgb/exp1 

Evaluating pretrained models: noisy test data

Unimodal models

python revalidate.py --model ResNetSE34Multi --modality wav --noisy_eval True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/wav/exp2

python revalidate.py --model ResNetSE34Multi --modality wav --eval True --valid_model True --test_path data/test --test_list data/metadata/test_list.txt    --noisy_eval True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/wav/exp2
python revalidate.py --model ResNetSE34Multi --modality rgb --noisy_eval True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/rgb/exp2

python revalidate.py --model ResNetSE34Multi --modality rgb --eval True --valid_model True --test_path data/test --test_list data/metadata/test_list.txt    --noisy_eval True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/rgb/exp2 
python revalidate.py --model ResNetSE34Multi --modality thr --noisy_eval True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/thr/exp2

python revalidate.py --model ResNetSE34Multi --modality thr --eval True --valid_model True --test_path data/test --test_list data/metadata/test_list.txt    --noisy_eval True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/thr/exp2 

Multimodal models

python revalidate.py --model ResNetSE34Multi --modality wavrgb --noisy_eval True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/wavrgb/exp2

python revalidate.py --model ResNetSE34Multi --modality wavrgb --eval True --valid_model True --test_path data/test --test_list data/metadata/test_list.txt    --noisy_eval True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/wavrgb/exp2 
python revalidate.py --model ResNetSE34Multi --modality wavrgbthr --noisy_eval True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/wavrgbthr/exp2

python revalidate.py --model ResNetSE34Multi --modality wavrgbthr --eval True --valid_model True --test_path data/test --test_list data/metadata/test_list.txt    --noisy_eval True --p_noise 0.3 --snr 8 --log_input True --trainfunc angleproto --eval_frames 200 --save_path exps/wavrgb/exp2 
Owner
ISSAI
Institute of Smart Systems and Artificial Intelligence
ISSAI
U-Net Brain Tumor Segmentation

U-Net Brain Tumor Segmentation 🚀 :Feb 2019 the data processing implementation in this repo is not the fastest way (code need update, contribution is

Hao 448 Jan 02, 2023
Matthew Colbrook 1 Apr 08, 2022
QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

QueryDet-PyTorch This repository is the official implementation of our paper: QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small O

Chenhongyi Yang 276 Dec 31, 2022
MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning

MetaDrive: Composing Diverse Driving Scenarios for Generalizable RL [ Documentation | Demo Video ] MetaDrive is a driving simulator with the following

DeciForce: Crossroads of Machine Perception and Autonomy 276 Jan 04, 2023
This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios"

TinyWeaklyIsolationForest This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised a

2 Mar 21, 2022
Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

DIGAN (ICLR 2022) Official PyTorch implementation of "Generating Videos with Dyn

Sihyun Yu 147 Dec 31, 2022
Source code for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning Official implementation of ACC, described in the paper "Adaptively Calibrated C

3 Sep 16, 2022
Out-of-distribution detection using the pNML regret. NeurIPS2021

OOD Detection Load conda environment conda env create -f environment.yml or install requirements: while read requirement; do conda install --yes $requ

Koby Bibas 23 Dec 02, 2022
A collection of Google research projects related to Federated Learning and Federated Analytics.

Federated Research Federated Research is a collection of research projects related to Federated Learning and Federated Analytics. Federated learning i

Google Research 483 Jan 05, 2023
A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥

Lightning-Hydra-Template A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥 Click on Use this template to initialize new re

Hyunsoo Cho 1 Dec 20, 2021
PyContinual (An Easy and Extendible Framework for Continual Learning)

PyContinual (An Easy and Extendible Framework for Continual Learning) Easy to Use You can sumply change the baseline, backbone and task, and then read

Zixuan Ke 176 Jan 05, 2023
Deep Learning for Time Series Classification

Deep Learning for Time Series Classification This is the companion repository for our paper titled "Deep learning for time series classification: a re

Hassan ISMAIL FAWAZ 1.2k Jan 02, 2023
Groceries ARL: Association Rules (Birliktelik Kuralı)

Groceries_ARL Association Rules (Birliktelik Kuralı) Birliktelik kuralları, mark

Şebnem 5 Feb 08, 2022
Physics-Aware Training (PAT) is a method to train real physical systems with backpropagation.

Physics-Aware Training (PAT) is a method to train real physical systems with backpropagation. It was introduced in Wright, Logan G. & Onodera, Tatsuhiro et al. (2021)1 to train Physical Neural Networ

McMahon Lab 230 Jan 05, 2023
Generate fine-tuning samples & Fine-tuning the model & Generate samples by transferring Note On

UPMT Generate fine-tuning samples & Fine-tuning the model & Generate samples by transferring Note On See main.py as an example: from model import PopM

7 Sep 01, 2022
Manifold-Mixup implementation for fastai V2

Manifold Mixup Unofficial implementation of ManifoldMixup (Proceedings of ICML 19) for fast.ai (V2) based on Shivam Saboo's pytorch implementation of

Nestor Demeure 16 Jul 25, 2022
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022
EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit

EvoJAX: Hardware-Accelerated Neuroevolution EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit. Built on top of the JA

Google 598 Jan 07, 2023
SeMask: Semantically Masked Transformers for Semantic Segmentation.

SeMask: Semantically Masked Transformers Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi This repo co

Picsart AI Research (PAIR) 186 Dec 30, 2022
Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Manga Filling with ScreenVAE SIGGRAPH ASIA 2020 | Project Website | BibTex This repository is for ScreenVAE introduced in the following paper "Manga F

30 Dec 24, 2022