[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Overview

Versatile Multi-Modal Pre-Training for
Human-Centric Perception

Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3Ziwei Liu1*
1S-Lab, Nanyang Technological University  2SenseTime Research  3Shanghai AI Laboratory

Accepted to CVPR 2022 (Oral)

This repository contains the official implementation of Versatile Multi-Modal Pre-Training for Human-Centric Perception. For brevity, we name our method HCMoCo.


arXivProject PageDataset

Citation

If you find our work useful for your research, please consider citing the paper:

@article{hong2022hcmoco,
  title={Versatile Multi-Modal Pre-Training for Human-Centric Perception},
  author={Hong, Fangzhou and Pan, Liang and Cai, Zhongang and Liu, Ziwei},
  journal={arXiv preprint arXiv:2203.13815},
  year={2022}
}

Updates

[03/2022] Code release!

[03/2022] HCMoCo is accepted to CVPR 2022 for Oral presentation 🥳 !

Installation

We recommend using conda to manage the python environment. The commands below are provided for your reference.

git clone [email protected]:hongfz16/HCMoCo.git
cd HCMoCo
conda create -n HCMoCo python=3.6
conda activate HCMoCo
conda install -c pytorch pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1
pip install -r requirements.txt

Other than the above steps, if you want to run the PointNet++ experiments, please remember to compile the pointnet operators.

cd pycontrast/networks/pointnet2
python setup.py install

Dataset Preparation

1. NTU RGB-D Dataset

This dataset is for the pre-train process. Download the 'NTU RGB+D 60' dataset here. Extract the data to pycontrast/data/NTURGBD/NTURGBD. The folder structure should look like:

./
├── ...
└── pycontrast/data/NTURGBD/
    ├──NTURGBD/
        ├── nturgb+d_rgb/
        ├── nturgb+d_depth_masked/
        ├── nturgb+d_skeletons/
        └── ...

Preprocess the raw data using the following two python scripts which could produce calibrated RGB frames in nturgb+d_rgb_warped_correction and extracted skeleton information in nturgb+d_parsed_skeleton.

cd pycontrast/data/NTURGBD
python generate_skeleton_data.py
python preprocess_nturgbd.py

2. NTURGBD-Parsing-4K Dataset

This dataset is for both the pre-train process and depth human parsing task. Follow the instructions here for the preparation of NTURGBD-Parsing-4K dataset.

3. MPII Human Pose Dataset

This dataset is for the pre-train process. Download the 'MPII Human Pose Dataset' here. Extract them to pycontrast/data/mpii. The folder structure should look like:

./
├── ...
└── pycontrast/data/mpii
    ├── annot/
    └── images/

4. COCO Keypoint Detection Dataset

This dataset is for both the pre-train process and DensePose estimation. Download the COCO 2014 train/val images/annotations here. Extract them to pycontrast/data/coco. The folder structure should look like:

./
├── ...
└── pycontrast/data/coco
    ├── annotations/
        └── *.json
    └── images/
        ├── train2014/
            └── *.jpg
        └── val2014/
            └── *.jpg

5. Human3.6M Dataset

This dataset is for the RGB human parsing task. Download the Human3.6M dataset here and extract under HRNet-Semantic-Segmentation/data/human3.6m. Use the provided script mp_parsedata.py for the pre-processing of the raw data. The folder structure should look like:

./
├── ...
└── HRNet-Semantic-Segmentation/data/human3.6m
    ├── protocol_1/
        ├── rgb
        └── seg
    ├── flist_2hz_train.txt
    ├── flist_2hz_eval.txt
    └── ...

6. ITOP Dataset

This dataset is for the depth 3D pose estimation. Download the ITOP dataset here and extract under A2J/data. Use the provided script data_preprocess.py for the pre-processing of the raw data. The folder structure should look like:

./
├── ...
└── A2J/data
    ├── side_train/
    ├── side_test/
    ├── itop_size_mean.npy
    ├── itop_size_std.npy
    ├── bounding_box_depth_train.pkl
    ├── itop_side_bndbox_test.mat
    └── ...

Model Zoo

TBA

HCMoCo Pre-train

Finally, let's start the pre-training process. We use slurm to manage the distributed training. You might need to modify the below mentioned scripts according to your own distributed training method. We develop HCMoCo based on the CMC repository. The codes for this part are provided under pycontrast.

1. First Stage

For the first stage, we only perform 'Sample-level modality-invariant representation learning' for 100 epoch. We provide training scripts for this stage under pycontrast/scripts/FirstStage. Specifically, we provide the scripts for training with 'NTURGBD+MPII': train_ntumpiirgbd2s_hrnet_w18.sh and 'NTURGBD+COCO': train_ntucocorgbd2s_hrnet_w18.sh.

cd pycontrast
sh scripts/FirstStage/train_ntumpiirgbd2s_hrnet_w18.sh

2. Second Stage

For the second stage, all three proposed learning targets in HCMoCo are used to continue training for another 100 epoch. We provide training scripts for this stage under pycontrast/scripts/SecondStage. The naming of scripts are corresponding to that of the first stage.

3. Extract pre-trained weights

After the two-stage pre-training, we need to extract pre-trained weights of RGB/depth encoders for transfering to downstream tasks. Specifically, please refer to pycontrast/transfer_ckpt.py for extracting pre-trained weights of the RGB encoder and pycontrast/transfer_ckpt_depth.py for that of the depth encoder.

Evaluation on Downstream Tasks

1. DensePose Estimation

The DensePose estimation is performed on COCO dataset. Please refer to detectron2 for the training and evaluation of DensePose estimation. We provide our config files under DensePose-Config for your reference. Fill the config option MODEL.WEIGHTS with the path to the pre-trained weights.

2. RGB Human Parsing

The RGB human parsing is performed on Human3.6M dataset. We develop the RGB human parsing task based on the HRNet-Semantic-Segmentation repository and include the our version in this repository. We provide a config template HRNet-Semantic-Segmentation/experiments/human36m/config-template.yaml. Remember to fill the config option MODEL.PRETRAINED with the path to the pre-trained weights. The training and evaluation commands are provided below.

cd HRNet-Semantic-Segmentation
# Training
python -m torch.distributed.launch \
  --nproc_per_node=2 \
  --master_port=${port} \
  tools/train.py \
      --cfg ${config_file}
# Evaluation
python tools/test.py \
    --cfg ${config_file} \
    TEST.MODEL_FILE ${path_to_trained_model}/best.pth \
    TEST.FLIP_TEST True \
    TEST.NUM_SAMPLES 0

3. Depth Human Parsing

The depth human parsing is performed on our proposed NTURGBD-Parsing-4K dataset. Similarly, the code for depth human parsing is developed based on the HRNet-Semantic-Segmentation repository. We provide a config template HRNet-Semantic-Segmentation/experiments/nturgbd_d/config-template.yaml. Please refer to the above 'RGB Human Parsing' section for detailed usages.

4. Depth 3D Pose Estimation

The depth 3D pose estimation is evaluated on ITOP dataset. We develop the codes based on the A2J repository. Since the original repository does not provide the training codes, we implemented it by ourselves. The training and evaluation commands are provided below.

cd A2J
python main.py \
    --pretrained_pth ${path_to_pretrained_weights} \
    --output ${path_to_the_output_folder}

Experiments on the Versatility of HCMoCo

1. Cross-Modality Supervision

The experiments for the versatility of HCMoCo are evaluated on NTURGBD-Parsing-4K datasets. For the 'RGB->Depth' cross-modality supervision, please refer to pycontrast/scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgb_cmc1_other1.sh. For the 'Depth->RGB' cross-modality supervision, please refer to pycontrast/scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_d_cmc1_other1.sh.

cd pycontrast
sh scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgb_cmc1_other1.sh
sh scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_d_cmc1_other1.sh

2. Missing-Modality Inference

Please refer to the provided script pycontrast/scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgbd_cmc1_other1.sh

cd pycontrast
sh scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgbd_cmc1_other1.sh

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements

This work is supported by NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

We thank the following repositories for their contributions in our implementation: CMC, HRNet-Semantic-Segmentation, SemGCN, PointNet2.PyTorch, and A2J.

Owner
Fangzhou Hong
Ph.D. Student in [email protected]
Fangzhou Hong
Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation.

PersonLab This is a Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation. The model predicts heatmaps and vari

OCTI 160 Dec 21, 2022
HCQ: Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval

HCQ: Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval [toc] 1. Introduction This repository provides the code for our paper at

13 Dec 08, 2022
Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22) Ok-Topk is a scheme for distributed training with sparse gradients

Shigang Li 9 Oct 29, 2022
This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-grained Classification".

HA-in-Fine-Grained-Classification This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-g

16 Oct 29, 2022
PyTorch implementation of PSPNet

PSPNet with PyTorch Unofficial implementation of "Pyramid Scene Parsing Network" (https://arxiv.org/abs/1612.01105). This repository is just for caffe

Kazuto Nakashima 52 Nov 16, 2022
FB-tCNN for SSVEP Recognition

FB-tCNN for SSVEP Recognition Here are the codes of the tCNN and FB-tCNN in the paper "Filter Bank Convolutional Neural Network for Short Time-Window

Wenlong Ding 12 Dec 14, 2022
Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization

Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization Code for reproducing our results in the Head2Toe paper. Paper: arxiv.or

Google Research 62 Dec 12, 2022
Fbone (Flask bone) is a Flask (Python microframework) starter/template/bootstrap/boilerplate application.

Fbone (Flask bone) is a Flask (Python microframework) starter/template/bootstrap/boilerplate application.

Wilson 1.7k Dec 30, 2022
RSNA Intracranial Hemorrhage Detection with python

RSNA Intracranial Hemorrhage Detection This is the source code for the first place solution to the RSNA2019 Intracranial Hemorrhage Detection Challeng

24 Nov 30, 2022
Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning

MINER_pl Unofficial implementation of MINER: Multiscale Implicit Neural Representations in pytorch-lightning. 📖 Ref readings Laplacian pyramid explan

AI葵 51 Nov 28, 2022
Point-NeRF: Point-based Neural Radiance Fields

Point-NeRF: Point-based Neural Radiance Fields Project Sites | Paper | Primary c

Qiangeng Xu 662 Jan 01, 2023
Implementation of PersonaGPT Dialog Model

PersonaGPT An open-domain conversational agent with many personalities PersonaGPT is an open-domain conversational agent cpable of decoding personaliz

ILLIDAN Lab 42 Jan 01, 2023
A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightwe

Miles Zhang 54 Dec 21, 2022
Head and Neck Tumour Segmentation and Prediction of Patient Survival Project

Head-and-Neck-Tumour-Segmentation-and-Prediction-of-Patient-Survival Welcome to the Head and Neck Tumour Segmentation and Prediction of Patient Surviv

5 Oct 20, 2022
Weighted K Nearest Neighbors (kNN) algorithm implemented on python from scratch.

kNN_From_Scratch I implemented the k nearest neighbors (kNN) classification algorithm on python. This algorithm is used to predict the classes of new

1 Dec 14, 2021
Official PyTorch implementation of the paper: Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.

Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting Official PyTorch implementation of the paper: Improving Graph Neural Net

Giorgos Bouritsas 58 Dec 31, 2022
An educational resource to help anyone learn deep reinforcement learning.

Status: Maintenance (expect bug fixes and minor updates) Welcome to Spinning Up in Deep RL! This is an educational resource produced by OpenAI that ma

OpenAI 7.6k Jan 09, 2023
Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

The Stem Cell Hypothesis Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP

Emory NLP 5 Jul 08, 2022
Node for thenewboston digital currency network.

Project setup For project setup see INSTALL.rst Community Join the community to stay updated on the most recent developments, project roadmaps, and ra

thenewboston 27 Jul 08, 2022
Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

ACE Please find the preliminary version published at BMVC 2020 in the folder BMVC_version, and its extended journal version in Journal_version. Datase

28 Dec 25, 2022