Code for Ditto: Building Digital Twins of Articulated Objects from Interaction

Last update: Dec 22, 2022

Overview

Ditto: Building Digital Twins of Articulated Objects from Interaction

Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu

CVPR 2022, Oral

Project | arxiv

News

2022-04-28: We released the data generation code of Ditto here.

Introduction

Ditto (Digital Twins of Articulated Objects) is a model that reconstructs part-level geometry and articulation model of an articulated object given observations before and after an interaction. Specifically, we use a PointNet++ encoder to encoder the input point cloud observations, and fuse the subsampled point features with a simple attention layer. Then we use two independent decoders to propagate the fused point features into two sets of dense point features, for geometry reconstruction and articulation estimation separately. We construct feature grid/planes by projecting and pooling the point features, and query local features from the constructed feature grid/planes. Conditioning on local features, we use different decoders to predict occupancy, segmentation and joint parameters with respect to the query points. At then end, we can extract explicit geometry and articulation model from the implicit decoders.

If you find our work useful in your research, please consider citing.

Installation

Create a conda environment and install required packages.

conda env create -f conda_env_gpu.yaml -n Ditto

You can change the pytorch and cuda version in conda_env_gpu.yaml.

Build ConvONets dependents by running python scripts/convonet_setup.py build_ext --inplace.
Download the data, then unzip the data.zip under the repo's root.

Training

# single GPU
python run.py experiment=Ditto_s2m

# multiple GPUs
python run.py trainer.gpus=4 +trainer.accelerator='ddp' experiment=Ditto_s2m

# multiple GPUs + wandb logging
python run.py trainer.gpus=4 +trainer.accelerator='ddp' logger=wandb logger.wandb.group=s2m experiment=Ditto_s2m

Testing

# only support single GPU
python run_test.py experiment=Ditto_s2m trainer.resume_from_checkpoint=/path/to/trained/model/

Demo

Here is a minimum demo that starts from multiview depth maps before and after interaction and ends with a reconstructed digital twin. To run the demo, you need to install this library for visualization.

We provide the posed depth images of a real word laptop to run the demo. You can download from here and put it under data. You can also run demo your own data that follows the same format.

Data and pre-trained models

Data: here. Remeber to cite Shape2Motion and Abbatematteo et al. as well as Ditto when using these datasets.

Pre-trained models: Shape2Motion dataset, Synthetic dataset.

Useful tips

Run eval "$(python run.py -sc install=bash)" under the root directory, you can have auto-completion for commandline options.
Install pre-commit hooks by pip install pre-commit; pre-commit install, then you can have automatic formatting before each commit.

Related Repositories

Our code is based on this fantastic template Lightning-Hydra-Template.
We use ConvONets as our backbone.

Citing

@inproceedings{jiang2022ditto,
   title={Ditto: Building Digital Twins of Articulated Objects from Interaction},
   author={Jiang, Zhenyu and Hsu, Cheng-Chun and Zhu, Yuke},
   booktitle={arXiv preprint arXiv:2202.08227},
   year={2022}
}

Code for Ditto: Building Digital Twins of Articulated Objects from Interaction

Related tags

Overview

Ditto: Building Digital Twins of Articulated Objects from Interaction

News

Introduction

Installation

Training

Testing

Demo

Data and pre-trained models

Useful tips

Related Repositories

Citing

Owner

UT Robot Perception and Learning Lab

Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers.

SigOpt wrappers for scikit-learn methods

TorchFlare is a simple, beginner-friendly, and easy-to-use PyTorch Framework train your models effortlessly.

Repo for "Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks"

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

An Unpaired Sketch-to-Photo Translation Model

Official implementation for "Symbolic Learning to Optimize: Towards Interpretability and Scalability"

This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

The aim of this project is to build an AI bot that can play the Wordle game, or more generally Squabble

Semantic Segmentation Suite in TensorFlow

Generic Event Boundary Detection: A Benchmark for Event Segmentation

A PyTorch Toolbox for Face Recognition

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

This is a repository with the code for the ACL 2019 paper

An addon uses SMPL's poses and global translation to drive cartoon character in Blender.

Coded illumination for improved lensless imaging

The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Repository for MDPGT