ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

Overview

ManipulaTHOR: A Framework for Visual Object Manipulation

Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi

(Oral Presentation at CVPR 2021)

(Project Page)--(Framework)--(Video)--(Slides)

We present ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm. Our framework is built upon a physics engine and enables realistic interactions with objects while navigating through scenes and performing tasks. Object manipulation is an established research domain within the robotics community and poses several challenges including avoiding collisions, grasping, and long-horizon planning. Our framework focuses primarily on manipulation in visually rich and complex scenes, joint manipulation and navigation planning, and generalization to unseen environments and objects; challenges that are often overlooked. The framework provides a comprehensive suite of sensory information and motor functions enabling development of robust manipulation agents.

This code base is based on AllenAct framework and the majority of the core training algorithms and pipelines are borrowed from AllenAct code base.

Citation

If you find this project useful in your research, please consider citing:

   @inproceedings{ehsani2021manipulathor,
     title={ManipulaTHOR: A Framework for Visual Object Manipulation},
     author={Ehsani, Kiana and Han, Winson and Herrasti, Alvaro and VanderBilt, Eli and Weihs, Luca and Kolve, Eric and Kembhavi, Aniruddha and Mottaghi, Roozbeh},
     booktitle={CVPR},
     year={2021}
   }

Contents

πŸ’» Installation

To begin, clone this repository locally

git clone https://github.com/ehsanik/manipulathor.git
See here for a summary of the most important files/directories in this repository

Here's a quick summary of the most important files/directories in this repository:

  • utils/*.py - Helper functions and classes including the visualization helpers.
  • projects/armpointnav_baselines
    • experiments/
      • ithor/armpointnav_*.py - Different baselines introduced in the paper. Each files in this folder corresponds to a row of a table in the paper.
      • *.py - The base configuration files which define experiment setup and hyperparameters for training.
    • models/*.py - A collection of Actor-Critic baseline models.
  • plugins/ithor_arm_plugin/ - A collection of Environments, Task Samplers and Task Definitions
    • ithor_arm_environment.py - The definition of the ManipulaTHOREnvironment that wraps the AI2THOR-based framework introduced in this work and enables an easy-to-use API.
    • itho_arm_constants.py - Constants used to define the task and parameters of the environment. These include the step size taken by the agent, the unique id of the the THOR build we use, etc.
    • ithor_arm_sensors.py - Sensors which provide observations to our agents during training. E.g. the RGBSensor obtains RGB images from the environment and returns them for use by the agent.
    • ithor_arm_tasks.py - Definition of the ArmPointNav task, the reward definition and the function for calculating the goal achievement.
    • ithor_arm_task_samplers.py - Definition of the ArmPointNavTaskSampler samplers. Initializing the sampler, reading the json files from the dataset and randomly choosing a task is defined in this file.
    • ithor_arm_viz.py - Utility functions for visualization and logging the outputs of the models.

You can then install requirements by running

pip install -r requirements.txt

Python 3.6+ 🐍 . Each of the actions supports typing within Python.

AI2-THOR <43f62a0> 🧞 . To ensure reproducible results, please install this version of the AI2THOR.

πŸ“ ArmPointNav Task Description

ArmPointNav is the goal of addressing the problem of visual object manipulation, where the task is to move an object between two locations in a scene. Operating in visually rich and complex environments, generalizing to unseen environments and objects, avoiding collisions with objects and structures in the scene, and visual planning to reach the destination are among the major challenges of this task. The example illustrates a sequence of actions taken a by a virtual robot within the ManipulaTHOR environment for picking up a vase from the shelf and stack it on a plate on the countertop.

πŸ“Š Dataset

To study the task of ArmPointNav, we present the ArmPointNav Dataset (APND). This consists of 30 kitchen scenes in AI2-THOR that include more than 150 object categories (69 interactable object categories) with a variety of shapes, sizes and textures. We use 12 pickupable categories as our target objects. We use 20 scenes in the training set and the remaining is evenly split into Val and Test. We train with 6 object categories and use the remaining to test our model in a Novel-Obj setting. For more information on dataset, and how to download it refer to Dataset Details.

πŸ–ΌοΈ Sensory Observations

The types of sensors provided for this paper include:

  1. RGB images - having shape 224x224x3 and an FOV of 90 degrees.
  2. Depth maps - having shape 224x224 and an FOV of 90 degrees.
  3. Perfect egomotion - We allow for agents to know precisely what the object location is relative to the agent's arm as well as to its goal location.

πŸƒ Allowed Actions

A total of 13 actions are available to our agents, these include:

  1. Moving the agent
  • MoveAhead - Results in the agent moving ahead by 0.25m if doing so would not result in the agent colliding with something.

  • Rotate [Right/Left] - Results in the agent's body rotating 45 degrees by the desired direction.

  1. Moving the arm
  • Moving the wrist along axis [x, y, z] - Results in the arm moving along an axis (Β±x,Β±y, Β±z) by 0.05m.

  • Moving the height of the arm base [Up/Down] - Results in the base of the arm moving along y axis by 0.05m.

  1. Abstract Grasp
  • Picks up a target object. Only succeeds if the object is inside the arm grasper.
  1. Done Action
  • This action finishes an episode. The agent must issue a Done action when it reaches the goal otherwise the episode considers as a failure.

✨ Defining a New Task

In order to define a new task, redefine the rewarding, try a new model, or change the enviornment setup, checkout our tutorial on defining a new task here.

πŸ‹ Training An Agent

You can train a model with a specific experiment setup by running one of the experiments below:

python3 main.py -o experiment_output -s 1 -b projects/armpointnav_baselines/experiments/ithor/ <EXPERIMENT-NAME>

Where <EXPERIMENT-NAME> can be one of the options below:

armpointnav_no_vision -- No Vision Baseline
armpointnav_disjoint_depth -- Disjoint Model Ablation
armpointnav_rgb -- Our RGB Experiment
armpointnav_rgbdepth -- Our RGBD Experiment
armpointnav_depth -- Our Depth Experiment

πŸ’ͺ Evaluating A Pre-Trained Agent

To evaluate a pre-trained model, (for example to reproduce the numbers in the paper), you can add --mode test -c <WEIGHT-ADDRESS> to the end of the command you ran for training.

In order to reproduce the numbers in the paper, you need to download the pretrained models from here and extract them to pretrained_models. The full list of experiments and their corresponding trained weights can be found here.

python3 main.py -o experiment_output -s 1 -b projects/armpointnav_baselines/experiments/ithor/ <EXPERIMENT-NAME> --mode test -c <WEIGHT-ADDRESS>
NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size

NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size Xuanyi Dong, Lu Liu, Katarzyna Musial, Bogdan Gabrys in IEEE Transactions o

D-X-Y 137 Dec 20, 2022
Face Identity Disentanglement via Latent Space Mapping [SIGGRAPH ASIA 2020]

Face Identity Disentanglement via Latent Space Mapping Description Official Implementation of the paper Face Identity Disentanglement via Latent Space

150 Dec 07, 2022
Implementation of hyperparameter optimization/tuning methods for machine learning & deep learning models

Hyperparameter Optimization of Machine Learning Algorithms This code provides a hyper-parameter optimization implementation for machine learning algor

Li Yang 1.1k Dec 19, 2022
Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling This repo contains the official implementation for the paper On Path Int

Ruiqi Gao 39 Nov 10, 2022
Styled Augmented Translation

SAT Style Augmented Translation Introduction By collecting high-quality data, we were able to train a model that outperforms Google Translate on 6 dif

139 Dec 29, 2022
A PyTorch Toolbox for Face Recognition

FaceX-Zoo FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards stat

JDAI-CV 1.6k Jan 06, 2023
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30

Aiden Nibali 25 Jun 20, 2021
For visualizing the dair-v2x-i dataset

3D Detection & Tracking Viewer The project is based on hailanyi/3D-Detection-Tracking-Viewer and is modified, you can find the original version of the

34 Dec 29, 2022
Dynamical Wasserstein Barycenters for Time Series Modeling

Dynamical Wasserstein Barycenters for Time Series Modeling This is the code related for the Dynamical Wasserstein Barycenter model published in Neurip

8 Sep 09, 2022
buildseg is a building extraction plugin of QGIS based on PaddlePaddle.

buildseg buildseg is a building extraction plugin of QGIS based on PaddlePaddle. TODO Extract building on 512x512 remote sensing images. Extract build

Yizhou Chen 11 Sep 26, 2022
The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

Inferring Spatial Uncertainty in Object Detection A teaser version of the code for the paper Labels Are Not Perfect: Inferring Spatial Uncertainty in

ZINING WANG 21 Mar 03, 2022
Code for the paper "TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks"

TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks This is a Python3 / Pytorch implementation of TadGAN paper. The associated

Arun 92 Dec 03, 2022
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 360 Jan 06, 2023
Airborne magnetic data of the Osborne Mine and Lightning Creek sill complex, Australia

Osborne Mine, Australia - Airborne total-field magnetic anomaly This is a section of a survey acquired in 1990 by the Queensland Government, Australia

Fatiando a Terra Datasets 1 Jan 21, 2022
Real-time Object Detection for Streaming Perception, CVPR 2022

StreamYOLO Real-time Object Detection for Streaming Perception Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Sun Jian Real-time Object Detection

Jinrong Yang 237 Dec 27, 2022
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack is a neural network training interface based on TensorFlow. Features: It's Yet Another TF high-level API, with speed, and flexibility built

Tensorpack 6.2k Jan 01, 2023
Bilinear attention networks for visual question answering

Bilinear Attention Networks This repository is the implementation of Bilinear Attention Networks for the visual question answering and Flickr30k Entit

Jin-Hwa Kim 506 Nov 29, 2022
Cascaded Pyramid Network (CPN) based on Keras (Tensorflow backend)

ML2 Takehome Project Reimplementing the paper: Cascaded Pyramid Network for Multi-Person Pose Estimation Dataset The model uses the COCO dataset which

Vo Van Tu 1 Nov 22, 2021
QT Py Media Knob using rotary encoder & neopixel ring

QTPy-Knob QT Py USB Media Knob using rotary encoder & neopixel ring The QTPy-Knob features: Media knob for volume up/down/mute with "qtpy-knob.py" Cir

Tod E. Kurt 56 Dec 30, 2022
Let's Git - Versionsverwaltung & Open Source Hausaufgabe

Let's Git - Versionsverwaltung & Open Source Hausaufgabe Herzlich Willkommen zu dieser Hausaufgabe fΓΌr unseren MOOC: Let's Git! Wir hoffen, dass Du vi

1 Dec 13, 2021