Skip to content

liyidi/MPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MPT

This is a Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities.

We provide the MATLAB&Python implementation for our AAAI 2022 paper: Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking.

Requirements

Python >=3.6 PyTorch >=1.7 MATLAB ==2016

Data Preparation:

  • AV16.3: the original dataset, available at http://www.glat.info/ma/av16.3/.
  • MPTdata: the preprocessed data provided for demo, available at MPTdata, and use cat AAAI22_MPT.tar.gz.* | tar -zxv to unzip the file.

Descriptions:

  1. Audio Measurement: The MATLAB implement of stGCF. The parameter files that the camera projection model depends on can downloaded from AV16.3 dataset.
  2. Visual Measurement: A pre-trained Siamese network is employed to extract the response maps. The PyTorch implementation of SiamFC tracker is described in the paper: Fully-Convolutional Siamese Networks for Object Tracking.
  3. MPAtt Network: The implement of proposed network. avdataCombine.py is used firstly to integrate the audio and visual cues and normalize the data.
  4. PF: The tracker is based on an improved PF algorithm.

Citation

Please cite our paper if you find this repository useful in your resesarch:

@inproceedings{li2022mpt,
  Title= {Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking},
  Author= {Yidi, Li and Hong, Liu and Hao, Tang},
  Booktitle= {AAAI},
  Year= {2022}
}

Licence

This project is licensed under the terms of the MIT license.

About

A Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published