MPT

This is a Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities.

We provide the MATLAB&Python implementation for our AAAI 2022 paper: Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking.

Requirements

Data Preparation:

AV16.3: the original dataset, available at http://www.glat.info/ma/av16.3/.
MPTdata: the preprocessed data provided for demo, available at MPTdata, and use cat AAAI22_MPT.tar.gz.* | tar -zxv to unzip the file.

Descriptions:

Audio Measurement: The MATLAB implement of stGCF. The parameter files that the camera projection model depends on can downloaded from AV16.3 dataset.
Visual Measurement: A pre-trained Siamese network is employed to extract the response maps. The PyTorch implementation of SiamFC tracker is described in the paper: Fully-Convolutional Siamese Networks for Object Tracking.
MPAtt Network: The implement of proposed network. avdataCombine.py is used firstly to integrate the audio and visual cues and normalize the data.
PF: The tracker is based on an improved PF algorithm.

Citation

Please cite our paper if you find this repository useful in your resesarch:

@inproceedings{li2022mpt,
  Title= {Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking},
  Author= {Yidi, Li and Hong, Liu and Hao, Tang},
  Booktitle= {AAAI},
  Year= {2022}
}

Licence

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
MPAtt		MPAtt
PF		PF
stGCF		stGCF
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPAtt

MPAtt

PF

PF

stGCF

stGCF

LICENSE

LICENSE

README.md

README.md

Repository files navigation

MPT

Requirements

Data Preparation:

Descriptions:

Citation

Licence

About

Releases

Packages

Languages

License

liyidi/MPT

Folders and files

Latest commit

History

Repository files navigation

MPT

Requirements

Data Preparation:

Descriptions:

Citation

Licence

About

Resources

License

Stars

Watchers

Forks

Languages