[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Overview

Versatile Multi-Modal Pre-Training for
Human-Centric Perception

Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3Ziwei Liu1*
1S-Lab, Nanyang Technological University  2SenseTime Research  3Shanghai AI Laboratory

Accepted to CVPR 2022 (Oral)

This repository contains the official implementation of Versatile Multi-Modal Pre-Training for Human-Centric Perception. For brevity, we name our method HCMoCo.


arXivProject PageDataset

Citation

If you find our work useful for your research, please consider citing the paper:

@article{hong2022hcmoco,
  title={Versatile Multi-Modal Pre-Training for Human-Centric Perception},
  author={Hong, Fangzhou and Pan, Liang and Cai, Zhongang and Liu, Ziwei},
  journal={arXiv preprint arXiv:2203.13815},
  year={2022}
}

Updates

[03/2022] Code release!

[03/2022] HCMoCo is accepted to CVPR 2022 for Oral presentation 🥳 !

Installation

We recommend using conda to manage the python environment. The commands below are provided for your reference.

git clone [email protected]:hongfz16/HCMoCo.git
cd HCMoCo
conda create -n HCMoCo python=3.6
conda activate HCMoCo
conda install -c pytorch pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1
pip install -r requirements.txt

Other than the above steps, if you want to run the PointNet++ experiments, please remember to compile the pointnet operators.

cd pycontrast/networks/pointnet2
python setup.py install

Dataset Preparation

1. NTU RGB-D Dataset

This dataset is for the pre-train process. Download the 'NTU RGB+D 60' dataset here. Extract the data to pycontrast/data/NTURGBD/NTURGBD. The folder structure should look like:

./
├── ...
└── pycontrast/data/NTURGBD/
    ├──NTURGBD/
        ├── nturgb+d_rgb/
        ├── nturgb+d_depth_masked/
        ├── nturgb+d_skeletons/
        └── ...

Preprocess the raw data using the following two python scripts which could produce calibrated RGB frames in nturgb+d_rgb_warped_correction and extracted skeleton information in nturgb+d_parsed_skeleton.

cd pycontrast/data/NTURGBD
python generate_skeleton_data.py
python preprocess_nturgbd.py

2. NTURGBD-Parsing-4K Dataset

This dataset is for both the pre-train process and depth human parsing task. Follow the instructions here for the preparation of NTURGBD-Parsing-4K dataset.

3. MPII Human Pose Dataset

This dataset is for the pre-train process. Download the 'MPII Human Pose Dataset' here. Extract them to pycontrast/data/mpii. The folder structure should look like:

./
├── ...
└── pycontrast/data/mpii
    ├── annot/
    └── images/

4. COCO Keypoint Detection Dataset

This dataset is for both the pre-train process and DensePose estimation. Download the COCO 2014 train/val images/annotations here. Extract them to pycontrast/data/coco. The folder structure should look like:

./
├── ...
└── pycontrast/data/coco
    ├── annotations/
        └── *.json
    └── images/
        ├── train2014/
            └── *.jpg
        └── val2014/
            └── *.jpg

5. Human3.6M Dataset

This dataset is for the RGB human parsing task. Download the Human3.6M dataset here and extract under HRNet-Semantic-Segmentation/data/human3.6m. Use the provided script mp_parsedata.py for the pre-processing of the raw data. The folder structure should look like:

./
├── ...
└── HRNet-Semantic-Segmentation/data/human3.6m
    ├── protocol_1/
        ├── rgb
        └── seg
    ├── flist_2hz_train.txt
    ├── flist_2hz_eval.txt
    └── ...

6. ITOP Dataset

This dataset is for the depth 3D pose estimation. Download the ITOP dataset here and extract under A2J/data. Use the provided script data_preprocess.py for the pre-processing of the raw data. The folder structure should look like:

./
├── ...
└── A2J/data
    ├── side_train/
    ├── side_test/
    ├── itop_size_mean.npy
    ├── itop_size_std.npy
    ├── bounding_box_depth_train.pkl
    ├── itop_side_bndbox_test.mat
    └── ...

Model Zoo

TBA

HCMoCo Pre-train

Finally, let's start the pre-training process. We use slurm to manage the distributed training. You might need to modify the below mentioned scripts according to your own distributed training method. We develop HCMoCo based on the CMC repository. The codes for this part are provided under pycontrast.

1. First Stage

For the first stage, we only perform 'Sample-level modality-invariant representation learning' for 100 epoch. We provide training scripts for this stage under pycontrast/scripts/FirstStage. Specifically, we provide the scripts for training with 'NTURGBD+MPII': train_ntumpiirgbd2s_hrnet_w18.sh and 'NTURGBD+COCO': train_ntucocorgbd2s_hrnet_w18.sh.

cd pycontrast
sh scripts/FirstStage/train_ntumpiirgbd2s_hrnet_w18.sh

2. Second Stage

For the second stage, all three proposed learning targets in HCMoCo are used to continue training for another 100 epoch. We provide training scripts for this stage under pycontrast/scripts/SecondStage. The naming of scripts are corresponding to that of the first stage.

3. Extract pre-trained weights

After the two-stage pre-training, we need to extract pre-trained weights of RGB/depth encoders for transfering to downstream tasks. Specifically, please refer to pycontrast/transfer_ckpt.py for extracting pre-trained weights of the RGB encoder and pycontrast/transfer_ckpt_depth.py for that of the depth encoder.

Evaluation on Downstream Tasks

1. DensePose Estimation

The DensePose estimation is performed on COCO dataset. Please refer to detectron2 for the training and evaluation of DensePose estimation. We provide our config files under DensePose-Config for your reference. Fill the config option MODEL.WEIGHTS with the path to the pre-trained weights.

2. RGB Human Parsing

The RGB human parsing is performed on Human3.6M dataset. We develop the RGB human parsing task based on the HRNet-Semantic-Segmentation repository and include the our version in this repository. We provide a config template HRNet-Semantic-Segmentation/experiments/human36m/config-template.yaml. Remember to fill the config option MODEL.PRETRAINED with the path to the pre-trained weights. The training and evaluation commands are provided below.

cd HRNet-Semantic-Segmentation
# Training
python -m torch.distributed.launch \
  --nproc_per_node=2 \
  --master_port=${port} \
  tools/train.py \
      --cfg ${config_file}
# Evaluation
python tools/test.py \
    --cfg ${config_file} \
    TEST.MODEL_FILE ${path_to_trained_model}/best.pth \
    TEST.FLIP_TEST True \
    TEST.NUM_SAMPLES 0

3. Depth Human Parsing

The depth human parsing is performed on our proposed NTURGBD-Parsing-4K dataset. Similarly, the code for depth human parsing is developed based on the HRNet-Semantic-Segmentation repository. We provide a config template HRNet-Semantic-Segmentation/experiments/nturgbd_d/config-template.yaml. Please refer to the above 'RGB Human Parsing' section for detailed usages.

4. Depth 3D Pose Estimation

The depth 3D pose estimation is evaluated on ITOP dataset. We develop the codes based on the A2J repository. Since the original repository does not provide the training codes, we implemented it by ourselves. The training and evaluation commands are provided below.

cd A2J
python main.py \
    --pretrained_pth ${path_to_pretrained_weights} \
    --output ${path_to_the_output_folder}

Experiments on the Versatility of HCMoCo

1. Cross-Modality Supervision

The experiments for the versatility of HCMoCo are evaluated on NTURGBD-Parsing-4K datasets. For the 'RGB->Depth' cross-modality supervision, please refer to pycontrast/scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgb_cmc1_other1.sh. For the 'Depth->RGB' cross-modality supervision, please refer to pycontrast/scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_d_cmc1_other1.sh.

cd pycontrast
sh scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgb_cmc1_other1.sh
sh scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_d_cmc1_other1.sh

2. Missing-Modality Inference

Please refer to the provided script pycontrast/scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgbd_cmc1_other1.sh

cd pycontrast
sh scripts/Versatility/train_ntusegrgbd2s_hrnet_w18_sup_rgbd_cmc1_other1.sh

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements

This work is supported by NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

We thank the following repositories for their contributions in our implementation: CMC, HRNet-Semantic-Segmentation, SemGCN, PointNet2.PyTorch, and A2J.

Owner
Fangzhou Hong
Ph.D. Student in [email protected]
Fangzhou Hong
Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

105 Nov 07, 2022
Human4D Dataset tools for processing and visualization

HUMAN4D: A Human-Centric Multimodal Dataset for Motions & Immersive Media HUMAN4D constitutes a large and multimodal 4D dataset that contains a variet

tofis 15 Nov 09, 2022
STEAL - Learning Semantic Boundaries from Noisy Annotations (CVPR 2019)

STEAL This is the official inference code for: Devil Is in the Edges: Learning Semantic Boundaries from Noisy Annotations David Acuna, Amlan Kar, Sanj

469 Dec 26, 2022
Code for "Learning Graph Cellular Automata"

Learning Graph Cellular Automata This code implements the experiments from the NeurIPS 2021 paper: "Learning Graph Cellular Automata" Daniele Grattaro

Daniele Grattarola 37 Oct 26, 2022
Laplace Redux -- Effortless Bayesian Deep Learning

Laplace Redux - Effortless Bayesian Deep Learning This repository contains the code to run the experiments for the paper Laplace Redux - Effortless Ba

Runa Eschenhagen 28 Dec 07, 2022
Physics-informed Neural Operator for Learning Partial Differential Equation

PINO Physics-informed Neural Operator for Learning Partial Differential Equation Abstract: Machine learning methods have recently shown promise in sol

107 Jan 02, 2023
This is the code of "Multi-view Contrastive Graph Clustering" in NeurlPS 2021.

MCGC Description This is the code of "Multi-view Contrastive Graph Clustering" in NeurlPS 2021. Datasets Results ACM DBLP IMDB Amazon photos Amazon co

31 Nov 14, 2022
Bringing Computer Vision and Flutter together , to build an awesome app !!

Bringing Computer Vision and Flutter together , to build an awesome app !! Explore the Directories Flutter · Machine Learning Table of Contents About

Padmanabha Banerjee 14 Apr 07, 2022
Manage the availability of workspaces within Frappe/ ERPNext (sidebar) based on user-roles

Workspace Permissions Manage the availability of workspaces within Frappe/ ERPNext (sidebar) based on user-roles. Features Configure foreach workspace

Patrick.St. 18 Sep 26, 2022
DeiT: Data-efficient Image Transformers

DeiT: Data-efficient Image Transformers This repository contains PyTorch evaluation code, training code and pretrained models for DeiT (Data-Efficient

Facebook Research 3.2k Jan 06, 2023
KoCLIP: Korean port of OpenAI CLIP, in Flax

KoCLIP This repository contains code for KoCLIP, a Korean port of OpenAI's CLIP. This project was conducted as part of Hugging Face's Flax/JAX communi

Jake Tae 100 Jan 02, 2023
Radar-to-Lidar: Heterogeneous Place Recognition via Joint Learning

radar-to-lidar-place-recognition This page is the coder of a pre-print, implemented by PyTorch. If you have some questions on this project, please fee

Huan Yin 37 Oct 09, 2022
Synthetic Scene Text from 3D Engines

Introduction UnrealText is a project that synthesizes scene text images using 3D graphics engine. This repository accompanies our paper: UnrealText: S

Shangbang Long 215 Dec 29, 2022
Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]

Knowledge-enhanced Contrastive Learning (KCL) Molecular Contrastive Learning with Chemical Element Knowledge Graph [ AAAI 2022 ]. We construct a Chemi

Fangyin 58 Dec 26, 2022
PyTorch implementation of "Continual Learning with Deep Generative Replay", NIPS 2017

pytorch-deep-generative-replay PyTorch implementation of Continual Learning with Deep Generative Replay, NIPS 2017 Results Continual Learning on Permu

Junsoo Ha 127 Dec 14, 2022
Introducing neural networks to predict stock prices

IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o

Vivek Palaniappan 637 Jan 04, 2023
CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

Zhiwu Qing 63 Sep 27, 2022
A texturizer that I just made. Nothing special here.

texturizer This is a little project that I did with an hour's time. It texturizes an image given a image and a texture to texturize it with. There is

1 Nov 11, 2021
Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data

1 Meta-FDMIxup Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data. (ACM MM 2021) paper News! the rep

Fu Yuqian 44 Nov 18, 2022
Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection PyTorch code release of the paper "Attentive Prototypes for Sour

Deepti Hegde 23 Oct 17, 2022