[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Last update: Dec 15, 2022

Related tags

Overview

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Getting Started

Our codes are implemented and tested with python 3.6 and pytorch 1.5.

Install Pytorch following the official guide on Pytorch website.

And install the requirements using virtualenv or conda:

pip install -r requirements.txt

Data Preparation

Refer to data.md for instructions.

Training

Stage 1 training

Generally, you can use the distributed launch script of pytorch to start training.

For example, for a training on 2 nodes, 4 gpus each (2x4=8 gpus total): On node 0, run:

python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=0 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage1.yaml

On node 1, run:

python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=1 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage1.yaml

Otherwise, if you are using task scheduling system such as Slurm to submit your training tasks, you can refer to this script to start your training:

# training on 2 nodes, 4 gpus each (2x4=8 gpus total)
sh scripts/run.sh 2 4 configs/config_stage1.yaml

The checkpoint of training will be saved in [results/] by default. You are free to modify it in the config file.

Stage 2 training

Use the last checkpoint of stage 1 to initialize the model and starts training stage 2.

# On Node 0.
python -u -m torch.distributed.launch \
    --nnodes=2 \
    --node_rank=0 \
    --nproc_per_node=4 \
    --master_port=<MASTER_PORT> \
    --master_addr=<MASTER_NODE_ID> \
    --use_env \
    train.py --cfg configs/config_stage2.yaml --pretrained <PATH_TO_CHECKPOINT_FILE>

Similar on node 1.

Evaluation

To evaluate model on 3dpw test set:

python eval.py --cfg <PATH_TO_EXPERIMENT>/config.yaml --checkpoint <PATH_TO_EXPERIMENT>/model_best.pth.tar --eval_set 3dpw

Evaluation metric is Procrustes Aligned Mean Per Joint Position Error (PA-MPJPE) in mm.

Models	PA-MPJPE ↓	MPJPE ↓	PVE ↓	ACCEL ↓
HMR (w/o 3DPW)	81.3	130.0	-	37.4
SPIN (w/o 3DPW)	59.2	96.9	116.4	29.8
MEVA (w/ 3DPW)	54.7	86.9	-	11.6
VIBE (w/o 3DPW)	56.5	93.5	113.4	27.1
VIBE (w/ 3DPW)	51.9	82.9	99.1	23.4
ours (w/o 3DPW)	50.7	88.8	104.5	18.0
ours (w/ 3DPW)	45.7	79.1	92.6	17.6

Citation

@inproceedings{wan2021,
  title={Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation},
  author={Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng Li},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
  year = {2021}
}

[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Related tags

Overview

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Getting Started

Data Preparation

Training

Stage 1 training

Stage 2 training

Evaluation

Citation

Owner

ZiNiU WaN

Trajectory Variational Autoencder baseline for Multi-Agent Behavior challenge 2022

[CVPR'22] COAP: Learning Compositional Occupancy of People

Python implementation of Project Fluent

Distributed Asynchronous Hyperparameter Optimization better than HyperOpt.

A user-friendly research and development tool built to standardize RL competency assessment for custom agents and environments.

Enhancing Column Generation by a Machine-Learning-BasedPricing Heuristic for Graph Coloring

A Real-World Benchmark for Reinforcement Learning based Recommender System

Assginment for UofT CSC420: Intro to Image Understanding

TransCD: Scene Change Detection via Transformer-based Architecture

This code is a near-infrared spectrum modeling method based on PCA and pls

OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

Video Instance Segmentation using Inter-Frame Communication Transformers (NeurIPS 2021)

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Code for CPM-2 Pre-Train

Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Topic Modelling for Humans

Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model