Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

Overview

SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021]

Open In Colab report report

SPEC: Seeing People in the Wild with an Estimated Camera,
Muhammed Kocabas, Chun-Hao Paul Huang, Joachim Tesch, Lea Müller, Otmar Hilliges, Michael J. Black,
International Conference on Computer Vision (ICCV), 2021

Features

SPEC is a camera-aware human body pose and shape estimation method. It both predicts the camera parameters and SMPL body model for a given image. CamCalib predicts the camera parameters. SPEC uses these parameters to predict SMPL body model parameters.

This implementation:

  • has the demo code for SPEC and CamCalib implemented in PyTorch.
  • achieves SOTA results in SPEC-SYN and SPEC-MTP datasets.
  • shows how to perform evaluation on SPEC-SYN and SPEC-MTP datasets.

Updates

  • 13/10/2021: Demo and evaluation code is released.

Getting Started

SPEC has been implemented and tested on Ubuntu 18.04 with python >= 3.7. If you don't have a suitable device, try running our Colab demo.

Clone the repo:

git clone https://github.com/mkocabas/SPEC.git

Install the requirements using virtualenv or conda:

# pip
source scripts/install_pip.sh

# conda
source scripts/install_conda.sh

Running the Demo

SPEC

First, you need to download the required data (i.e our trained model and SMPL model parameters). It is approximately 1GB. To do this you can just run:

source scripts/prepare_data.sh

Then, running the demo is as simple as:

python scripts/spec_demo.py \
  --image_folder data/sample_images \
  --output_folder logs/spec/sample_images

Sample demo output:

Here the green line is the horizon obtained using estimated camera parameters. On the right, the ground plane is visualized to show how accurate the global translation is.

CamCalib

If you are only interested in estimating the camera parameters of an image, run the CamCalib demo:

python scripts/camcalib_demo.py \
  --img_folder <input image folder> \
  --out_folder <output folder> \
  --show # visualize the raw network predictions

This script outputs a pickle file which contains the predicted camera parameters for each input image along with an output image which visualizes the camera parameters as a horizon line. Pickle file contains:

'vfov' : vertical field of view in radians
'f_pix': focal length in pixels
'pitch': pitch in radians
'roll' : roll in radians

Google Colab

Training

Training instructions will follow soon.

Datasets

Pano360, SPEC-MTP, and SPEC-SYN are new datasets introduced in our paper. You can download them from the Downloads section of our project page.

For Pano360 dataset, we have released the Flickr image ids which can be used to download images using FlickrAPI. We have provided a download script in this repo. Some of the images will be missing due to users deleting their photos. In this case, you can also use scrape_and_download function provided in the script to find and download more photos.

After downloading the SPEC-SYN, SPEC-MTP, Pano360, and 3DPW datasets, the data folder should look like:

data/
├── body_models
│   └── smpl
├── camcalib
│   └── checkpoints
├── dataset_extras
├── dataset_folders
│   ├── 3dpw
│   ├── pano360
│   ├── spec-mtp
│   └── spec-syn
├── sample_images
└── spec
    └── checkpoints

Evaluation

You can evaluate SPEC on SPEC-SYN, SPEC-MTP, and 3DPW datasets by running:

python scripts/spec_eval.py \
  --cfg data/spec/checkpoints/spec_config.yaml \
  --opts DATASET.VAL_DS spec-syn_spec-mtp_3dpw-test-cam

Running this script should give results reported in this table:

W-MPJPE PA-MPJPE W-PVE
SPEC-MTP 124.3 71.8 147.1
SPEC-SYN 74.9 54.5 90.5
3DPW 106.7 53.3 124.7

Citation

@inproceedings{SPEC:ICCV:2021,
  title = {{SPEC}: Seeing People in the Wild with an Estimated Camera},
  author = {Kocabas, Muhammed and Huang, Chun-Hao P. and Tesch, Joachim and M\"uller, Lea and Hilliges, Otmar and Black, Michael J.},
  booktitle = {Proc. International Conference on Computer Vision (ICCV)},
  pages = {11035--11045},
  month = oct,
  year = {2021},
  doi = {},
  month_numeric = {10}
}

License

This code is available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.

References

We indicate if a function or script is borrowed externally inside each file. Here are some great resources we benefit:

Consider citing these works if you use them in your project.

Contact

For questions, please contact [email protected]

For commercial licensing (and all related questions for business applications), please contact [email protected].

Comments
  • Translation of the camera

    Translation of the camera

    I'd like to know if it's possible to get a translation of the camera. I've found the code just for rotation and focal length estimation.

    On this image https://github.com/mkocabas/SPEC/blob/master/docs/assets/spec_gif.gif the camera is placed in the world space somehow. How did you do this? Thanks.

    opened by Dene33 4
  • Segmentation fault (core dumped)

    Segmentation fault (core dumped)

    Hi~ When I run the command python scripts/spec_demo.py --batch_size 1 --image_folder data/sample_images --output_folder logs/spec/sample_images, I got the error as this: Segmentation fault (core dumped) I have no idea about it, could you help me, please?

    opened by JinkaiZheng 2
  • There is no flickr_photo_ids.npy

    There is no flickr_photo_ids.npy

    Hello author:

    After unzipping the spec-github-data.zip, there is no data/dataset_folders/pano360/flickr_photo_ids.npy. I don't know whether I find the right place, could you help me?

    Best regards.

    opened by songxujay 1
  • video demo

    video demo

    It's a great job. Is the video demo not available now? `def main(args):

    demo_mode = args.mode
    
    if demo_mode == 'video':
        raise NotImplementedError
    elif demo_mode == 'webcam':
        raise NotImplementedError`
    
    opened by FatherPrime 1
  • Fix PARE requirement in requirement.txt

    Fix PARE requirement in requirement.txt

    Thanks for sharing this work! Please fix the requirement in requirement.txt file of PARE. It should be (currently missing "git+" prefix): git+https://github.com/mkocabas/PARE.git

    opened by Omri-CG 1
  • get error horizon line by camcalib_demo.py

    get error horizon line by camcalib_demo.py

    all example_image seem get error result by camcalib_demo.py, look like this: COCO_val2014_000000326555 it seems the predicted horizon line shift by model,how should I solve this problem?

    opened by hhhlllyyy 0
  • feet fit accuracy

    feet fit accuracy

    I've encountered frequent problems that SPEC is having to fit the model to feet correctly:

    image image

    Is this the same problem as mentioned in the VIBE discussion: https://github.com/mkocabas/VIBE/issues/24#issuecomment-596714558 ? Have you tried to include the OpenPose feet keypoint predictions to the loss?

    Thanks for the great work!

    opened by smidm 0
  • Dataset npz interpretation

    Dataset npz interpretation

    Dear Muhammed,

    Thank you for this work and the datasets! I'm trying to interpret the data in the SPEC-SYN npz data files, but I'm not sure what each key means. Is there documentation? They are the following: ['imgname', 'center', 'scale', 'pose', 'shape', 'part', 'mmpose_keypoints', 'openpose', 'openpose_gt', 'S', 'focal_length', 'cam_rotmat', 'cam_trans', 'cam_center', 'cam_pitch', 'cam_roll', 'cam_hfov', 'cam_int', 'camcalib_pitch', 'camcalib_roll', 'camcalib_vfov', 'camcalib_f_pix']

    At this point, I'd just like to plot the 24 SMPL joints on the image. Based on the array shape, I assume 'S' contains the joints. Is cam_rotmat the rotation from world space to camera space? Is the cam_trans the position of the camera or the top right part of the extrinsic matrix? I assume for this plotting exercise I can ignore everything except S, cam_rotmat, cam_trans and cam_int. Still for some reason the points end up at at very wrong places. Maybe 'S' is something else? Or am I using the camera params wrong?

    Thanks! Istvan

    opened by isarandi 5
  • Colab notebook inaccesible

    Colab notebook inaccesible

    Hi,I tried accesing the colab notebook but wasn't able to access it.Would you be able to kindly suggest another way to run inference as I am currently working on a windows based CPU system.

    opened by sparshgarg23 0
  • Error during single dataset evaluation

    Error during single dataset evaluation

    I tried running the evaluation only for the 3DPW dataset with the following command:

    python scripts/spec_eval.py --cfg data/spec/checkpoints/spec_config.yaml --opts DATASET.VAL_DS 3dpw-test-cam 
    

    But it gives the following error:

    TypeError: test_step() missing 1 required positional argument: 'dataloader_nb'
    

    Fixed it by giving a default value for dataloader_nb for the test_step function in trainer.py:

    def test_step(self, batch, batch_nb, dataloader_nb=0):
        return self.validation_step(batch, batch_nb, dataloader_nb)
    

    The error occurs because we don't append dataloader_idx to args in evaluation_loop.py:

    if multiple_test_loaders or multiple_val_loaders:
        args.append(dataloader_idx)
    
    opened by umariqb 1
Owner
Muhammed Kocabas
Muhammed Kocabas
Orchestrating Distributed Materials Acceleration Platform Tutorial

Orchestrating Distributed Materials Acceleration Platform Tutorial This tutorial for orchestrating distributed materials acceleration platform was pre

BIG-MAP 1 Jan 25, 2022
Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

ATP: AMRize Then Parse! Enhancing AMR Parsing with PseudoAMRs Hi this is the source code of our paper "ATP: AMRize Then Parse! Enhancing AMR Parsing w

Chen Liang 13 Nov 23, 2022
the code for paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration"

EOW-Softmax This code is for the paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration". Accepted by ICCV21. Usage Commnd exa

Yezhen Wang 36 Dec 02, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
AISTATS 2019: Confidence-based Graph Convolutional Networks for Semi-Supervised Learning

Confidence-based Graph Convolutional Networks for Semi-Supervised Learning Source code for AISTATS 2019 paper: Confidence-based Graph Convolutional Ne

MALL Lab (IISc) 56 Dec 03, 2022
TVNet: Temporal Voting Network for Action Localization

TVNet: Temporal Voting Network for Action Localization This repo holds the codes of paper: "TVNet: Temporal Voting Network for Action Localization". P

hywang 5 Jul 26, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
FairMOT for Multi-Class MOT using YOLOX as Detector

FairMOT-X Project Overview FairMOT-X is a multi-class multi object tracker, which has been tailored for training on the BDD100K MOT Dataset. It makes

Jonathan Tan 33 Dec 28, 2022
Planar Prior Assisted PatchMatch Multi-View Stereo

ACMP [News] The code for ACMH is released!!! [News] The code for ACMM is released!!! About This repository contains the code for the paper Planar Prio

Qingshan Xu 127 Dec 31, 2022
Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all view

4 Nov 19, 2022
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

AdderNet: Do We Really Need Multiplications in Deep Learning? This code is a demo of CVPR 2020 paper AdderNet: Do We Really Need Multiplications in De

HUAWEI Noah's Ark Lab 915 Jan 01, 2023
Source code for our paper "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash"

Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash Abstract: Apple recently revealed its deep perceptual hashing system NeuralHash to

<a href=[email protected]"> 11 Dec 03, 2022
QT Py Media Knob using rotary encoder & neopixel ring

QTPy-Knob QT Py USB Media Knob using rotary encoder & neopixel ring The QTPy-Knob features: Media knob for volume up/down/mute with "qtpy-knob.py" Cir

Tod E. Kurt 56 Dec 30, 2022
PyTorch IPFS Dataset

PyTorch IPFS Dataset IPFSDataset(Dataset) See the jupyter notepad to see how it works and how it interacts with a standard pytorch DataLoader You need

Jake Kalstad 2 Apr 13, 2022
Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ [ICCV-2021]. Overview This package contains the model implementation and training

Google Research 365 Dec 30, 2022
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
A vision library for performing sliced inference on large images/small objects

SAHI: Slicing Aided Hyper Inference A vision library for performing sliced inference on large images/small objects Overview Object detection and insta

Open Business Software Solutions 2.3k Jan 04, 2023
Architecture Patterns with Python (TDD, DDD, EDM)

architecture-traning Architecture Patterns with Python (TDD, DDD, EDM) Chapter 5. 높은 기어비와 낮은 기어비의 TDD 5.2 도메인 계층 테스트를 서비스 계층으로 옮겨야 하는가? 도메인 계층 테스트 def

minsung sim 2 Mar 04, 2022
Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

FLAME Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation, accepted at the 17th IEEE Internation Co

Neelabh Sinha 19 Dec 17, 2022
The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection Updates | Introduction | Results | Usage | Citation |

33 Jan 05, 2023