Distributed Arcface Training in Pytorch

Related tags

Deep LearningMaske_FR
Overview

Distributed Arcface Training in Pytorch

This is a deep learning library that makes face recognition efficient, and effective, which can train tens of millions identity on a single server.

Requirements

How to Training

To train a model, run train.py with the path to the configs:

1. Single node, 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=1234 train.py configs/ms1mv3_r50

2. Multiple nodes, each node 8 GPUs:

Node 0:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr="ip1" --master_port=1234 train.py train.py configs/ms1mv3_r50

Node 1:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr="ip1" --master_port=1234 train.py train.py configs/ms1mv3_r50

3.Training resnet2060 with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=1234 train.py configs/ms1mv3_r2060.py

Model Zoo

  • The models are available for non-commercial research purposes only.
  • All models can be found in here.
  • Baidu Yun Pan: e8pw
  • onedrive

Performance on ICCV2021-MFR

ICCV2021-MFR testset consists of non-celebrities so we can ensure that it has very few overlap with public available face recognition training set, such as MS1M and CASIA as they mostly collected from online celebrities. As the result, we can evaluate the FAIR performance for different algorithms.

For ICCV2021-MFR-ALL set, TAR is measured on all-to-all 1:1 protocal, with FAR less than 0.000001(e-6). The globalised multi-racial testset contains 242,143 identities and 1,624,305 images.

For ICCV2021-MFR-MASK set, TAR is measured on mask-to-nonmask 1:1 protocal, with FAR less than 0.0001(e-4). Mask testset contains 6,964 identities, 6,964 masked images and 13,928 non-masked images. There are totally 13,928 positive pairs and 96,983,824 negative pairs.

Datasets backbone Training throughout Size / MB ICCV2021-MFR-MASK ICCV2021-MFR-ALL
MS1MV3 r18 - 91 47.85 68.33
Glint360k r18 8536 91 53.32 72.07
MS1MV3 r34 - 130 58.72 77.36
Glint360k r34 6344 130 65.10 83.02
MS1MV3 r50 5500 166 63.85 80.53
Glint360k r50 5136 166 70.23 87.08
MS1MV3 r100 - 248 69.09 84.31
Glint360k r100 3332 248 75.57 90.66
MS1MV3 mobilefacenet 12185 7.8 41.52 65.26
Glint360k mobilefacenet 11197 7.8 44.52 66.48

Performance on IJB-C and Verification Datasets

Datasets backbone IJBC(1e-05) IJBC(1e-04) agedb30 cfp_fp lfw log
MS1MV3 r18 92.07 94.66 97.77 97.73 99.77 log
MS1MV3 r34 94.10 95.90 98.10 98.67 99.80 log
MS1MV3 r50 94.79 96.46 98.35 98.96 99.83 log
MS1MV3 r100 95.31 96.81 98.48 99.06 99.85 log
MS1MV3 r2060 95.34 97.11 98.67 99.24 99.87 log
Glint360k r18-0.1 93.16 95.33 97.72 97.73 99.77 log
Glint360k r34-0.1 95.16 96.56 98.33 98.78 99.82 log
Glint360k r50-0.1 95.61 96.97 98.38 99.20 99.83 log
Glint360k r100-0.1 95.88 97.32 98.48 99.29 99.82 log

Speed Benchmark

Arcface Torch can train large-scale face recognition training set efficiently and quickly. When the number of classes in training sets is greater than 300K and the training is sufficient, partial fc sampling strategy will get same accuracy with several times faster training performance and smaller GPU memory. Partial FC is a sparse variant of the model parallel architecture for large sacle face recognition. Partial FC use a sparse softmax, where each batch dynamicly sample a subset of class centers for training. In each iteration, only a sparse part of the parameters will be updated, which can reduce a lot of GPU memory and calculations. With Partial FC, we can scale trainset of 29 millions identities, the largest to date. Partial FC also supports multi-machine distributed training and mixed precision training.

Image text

More details see speed_benchmark.md in docs.

1. Training speed of different parallel methods (samples / second), Tesla V100 32GB * 8. (Larger is better)

- means training failed because of gpu memory limitations.

Number of Identities in Dataset Data Parallel Model Parallel Partial FC 0.1
125000 4681 4824 5004
1400000 1672 3043 4738
5500000 - 1389 3975
8000000 - - 3565
16000000 - - 2679
29000000 - - 1855

2. GPU memory cost of different parallel methods (MB per GPU), Tesla V100 32GB * 8. (Smaller is better)

Number of Identities in Dataset Data Parallel Model Parallel Partial FC 0.1
125000 7358 5306 4868
1400000 32252 11178 6056
5500000 - 32188 9854
8000000 - - 12310
16000000 - - 19950
29000000 - - 32324

Evaluation ICCV2021-MFR and IJB-C

More details see eval.md in docs.

Test

We tested many versions of PyTorch. Please create an issue if you are having trouble.

  • torch 1.6.0
  • torch 1.7.1
  • torch 1.8.0
  • torch 1.9.0

Citation

@inproceedings{deng2019arcface,
  title={Arcface: Additive angular margin loss for deep face recognition},
  author={Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4690--4699},
  year={2019}
}
@inproceedings{an2020partical_fc,
  title={Partial FC: Training 10 Million Identities on a Single Machine},
  author={An, Xiang and Zhu, Xuhan and Xiao, Yang and Wu, Lan and Zhang, Ming and Gao, Yuan and Qin, Bin and
  Zhang, Debing and Fu Ying},
  booktitle={Arxiv 2010.05222},
  year={2020}
}
Complete the code of prefix-tuning in low data setting

Prefix Tuning Note: 作者在论文中提到使用真实的word去初始化prefix的操作(Initializing the prefix with activations of real words,significantly improves generation)。我在使用作者提供的

Andrew Zeng 4 Jul 11, 2022
Fuse radar and camera for detection

SAF-FCOS: Spatial Attention Fusion for Obstacle Detection using MmWave Radar and Vision Sensor This project hosts the code for implementing the SAF-FC

ChangShuo 18 Jan 01, 2023
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model a

Google 3.2k Dec 31, 2022
NAVER BoostCamp Final Project

CV 14조 final project Super Resolution and Deblur module Inference code & Pretrained weight Repo SwinIR Deblur 실행 방법 streamlit run WebServer/Server_SRD

JiSeong Kim 5 Sep 06, 2022
PyTorch implementation of residual gated graph ConvNets, ICLR’18

Residual Gated Graph ConvNets April 24, 2018 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbress

Xavier Bresson 112 Aug 10, 2022
A tensorflow implementation of an HMM layer

tensorflow_hmm Tensorflow and numpy implementations of the HMM viterbi and forward/backward algorithms. See Keras example for an example of how to use

Zach Dwiel 283 Oct 19, 2022
A PyTorch toolkit for 2D Human Pose Estimation.

PyTorch-Pose PyTorch-Pose is a PyTorch implementation of the general pipeline for 2D single human pose estimation. The aim is to provide the interface

Wei Yang 1.1k Dec 30, 2022
ViSD4SA, a Vietnamese Span Detection for Aspect-based sentiment analysis dataset

UIT-ViSD4SA PACLIC 35 General Introduction This repository contains the data of the paper: Span Detection for Vietnamese Aspect-Based Sentiment Analys

Nguyễn Thị Thanh Kim 5 Nov 13, 2022
Self-Supervised Learning of Event-based Optical Flow with Spiking Neural Networks

Self-Supervised Learning of Event-based Optical Flow with Spiking Neural Networks Work accepted at NeurIPS'21 [paper, video]. If you use this code in

TU Delft 43 Dec 07, 2022
Efficient Training of Visual Transformers with Small Datasets

Official codes for "Efficient Training of Visual Transformers with Small Datasets", NerIPS 2021.

Yahui Liu 112 Dec 25, 2022
Proof-Of-Concept Piano-Drums Music AI Model/Implementation

Rock Piano "When all is one and one is all, that's what it is to be a rock and not to roll." ---Led Zeppelin, "Stairway To Heaven" Proof-Of-Concept Pi

Alex 4 Nov 28, 2021
Simple codebase for flexible neural net training

neural-modular Simple codebase for flexible neural net training. Allows for seamless exchange of models, dataset, and optimizers. Uses hydra for confi

Jannik Kossen 7 Apr 05, 2022
It's a powerful version of linebot

CTPS-FINAL Linbot-sever.py 主程式 Algorithm.py 推薦演算法,媒合餐廳端資料與顧客端資料 config.ini 儲存 channel-access-token、channel-secret 資料 Preface 生活在成大將近4年,我們每天的午餐時間看著形形色色

1 Oct 17, 2022
Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships.

feature-set-comp Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships. Reposito

Trent Henderson 7 May 25, 2022
Pure python implementations of popular ML algorithms.

Minimal ML algorithms This repo includes minimal implementations of popular ML algorithms using pure python and numpy. The purpose of these notebooks

Alexis Gidiotis 3 Jan 10, 2022
FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

FIRA is a learning-based commit message generation approach, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically.

Van 21 Dec 30, 2022
Learning to Initialize Neural Networks for Stable and Efficient Training

GradInit This repository hosts the code for experiments in the paper, GradInit: Learning to Initialize Neural Networks for Stable and Efficient Traini

Chen Zhu 124 Dec 30, 2022
Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation of the NeurIPS 2021 paper Alias-Free Generative Adversarial Net

Diego Porres 185 Dec 24, 2022
Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction

Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction. arxiv This repository contains python scripts for tr

12 Dec 12, 2022
Revisiting Temporal Alignment for Video Restoration

Revisiting Temporal Alignment for Video Restoration [arXiv] Kun Zhou, Wenbo Li, Liying Lu, Xiaoguang Han, Jiangbo Lu We provide our results at Google

52 Dec 25, 2022