ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

Overview

ManipulaTHOR: A Framework for Visual Object Manipulation

Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi

(Oral Presentation at CVPR 2021)

(Project Page)--(Framework)--(Video)--(Slides)

We present ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm. Our framework is built upon a physics engine and enables realistic interactions with objects while navigating through scenes and performing tasks. Object manipulation is an established research domain within the robotics community and poses several challenges including avoiding collisions, grasping, and long-horizon planning. Our framework focuses primarily on manipulation in visually rich and complex scenes, joint manipulation and navigation planning, and generalization to unseen environments and objects; challenges that are often overlooked. The framework provides a comprehensive suite of sensory information and motor functions enabling development of robust manipulation agents.

This code base is based on AllenAct framework and the majority of the core training algorithms and pipelines are borrowed from AllenAct code base.

Citation

If you find this project useful in your research, please consider citing:

   @inproceedings{ehsani2021manipulathor,
     title={ManipulaTHOR: A Framework for Visual Object Manipulation},
     author={Ehsani, Kiana and Han, Winson and Herrasti, Alvaro and VanderBilt, Eli and Weihs, Luca and Kolve, Eric and Kembhavi, Aniruddha and Mottaghi, Roozbeh},
     booktitle={CVPR},
     year={2021}
   }

Contents

💻 Installation

To begin, clone this repository locally

git clone https://github.com/ehsanik/manipulathor.git
See here for a summary of the most important files/directories in this repository

Here's a quick summary of the most important files/directories in this repository:

  • utils/*.py - Helper functions and classes including the visualization helpers.
  • projects/armpointnav_baselines
    • experiments/
      • ithor/armpointnav_*.py - Different baselines introduced in the paper. Each files in this folder corresponds to a row of a table in the paper.
      • *.py - The base configuration files which define experiment setup and hyperparameters for training.
    • models/*.py - A collection of Actor-Critic baseline models.
  • plugins/ithor_arm_plugin/ - A collection of Environments, Task Samplers and Task Definitions
    • ithor_arm_environment.py - The definition of the ManipulaTHOREnvironment that wraps the AI2THOR-based framework introduced in this work and enables an easy-to-use API.
    • itho_arm_constants.py - Constants used to define the task and parameters of the environment. These include the step size taken by the agent, the unique id of the the THOR build we use, etc.
    • ithor_arm_sensors.py - Sensors which provide observations to our agents during training. E.g. the RGBSensor obtains RGB images from the environment and returns them for use by the agent.
    • ithor_arm_tasks.py - Definition of the ArmPointNav task, the reward definition and the function for calculating the goal achievement.
    • ithor_arm_task_samplers.py - Definition of the ArmPointNavTaskSampler samplers. Initializing the sampler, reading the json files from the dataset and randomly choosing a task is defined in this file.
    • ithor_arm_viz.py - Utility functions for visualization and logging the outputs of the models.

You can then install requirements by running

pip install -r requirements.txt

Python 3.6+ 🐍 . Each of the actions supports typing within Python.

AI2-THOR <43f62a0> 🧞 . To ensure reproducible results, please install this version of the AI2THOR.

📝 ArmPointNav Task Description

ArmPointNav is the goal of addressing the problem of visual object manipulation, where the task is to move an object between two locations in a scene. Operating in visually rich and complex environments, generalizing to unseen environments and objects, avoiding collisions with objects and structures in the scene, and visual planning to reach the destination are among the major challenges of this task. The example illustrates a sequence of actions taken a by a virtual robot within the ManipulaTHOR environment for picking up a vase from the shelf and stack it on a plate on the countertop.

📊 Dataset

To study the task of ArmPointNav, we present the ArmPointNav Dataset (APND). This consists of 30 kitchen scenes in AI2-THOR that include more than 150 object categories (69 interactable object categories) with a variety of shapes, sizes and textures. We use 12 pickupable categories as our target objects. We use 20 scenes in the training set and the remaining is evenly split into Val and Test. We train with 6 object categories and use the remaining to test our model in a Novel-Obj setting. For more information on dataset, and how to download it refer to Dataset Details.

🖼️ Sensory Observations

The types of sensors provided for this paper include:

  1. RGB images - having shape 224x224x3 and an FOV of 90 degrees.
  2. Depth maps - having shape 224x224 and an FOV of 90 degrees.
  3. Perfect egomotion - We allow for agents to know precisely what the object location is relative to the agent's arm as well as to its goal location.

🏃 Allowed Actions

A total of 13 actions are available to our agents, these include:

  1. Moving the agent
  • MoveAhead - Results in the agent moving ahead by 0.25m if doing so would not result in the agent colliding with something.

  • Rotate [Right/Left] - Results in the agent's body rotating 45 degrees by the desired direction.

  1. Moving the arm
  • Moving the wrist along axis [x, y, z] - Results in the arm moving along an axis (±x,±y, ±z) by 0.05m.

  • Moving the height of the arm base [Up/Down] - Results in the base of the arm moving along y axis by 0.05m.

  1. Abstract Grasp
  • Picks up a target object. Only succeeds if the object is inside the arm grasper.
  1. Done Action
  • This action finishes an episode. The agent must issue a Done action when it reaches the goal otherwise the episode considers as a failure.

Defining a New Task

In order to define a new task, redefine the rewarding, try a new model, or change the enviornment setup, checkout our tutorial on defining a new task here.

🏋 Training An Agent

You can train a model with a specific experiment setup by running one of the experiments below:

python3 main.py -o experiment_output -s 1 -b projects/armpointnav_baselines/experiments/ithor/ <EXPERIMENT-NAME>

Where <EXPERIMENT-NAME> can be one of the options below:

armpointnav_no_vision -- No Vision Baseline
armpointnav_disjoint_depth -- Disjoint Model Ablation
armpointnav_rgb -- Our RGB Experiment
armpointnav_rgbdepth -- Our RGBD Experiment
armpointnav_depth -- Our Depth Experiment

💪 Evaluating A Pre-Trained Agent

To evaluate a pre-trained model, (for example to reproduce the numbers in the paper), you can add --mode test -c <WEIGHT-ADDRESS> to the end of the command you ran for training.

In order to reproduce the numbers in the paper, you need to download the pretrained models from here and extract them to pretrained_models. The full list of experiments and their corresponding trained weights can be found here.

python3 main.py -o experiment_output -s 1 -b projects/armpointnav_baselines/experiments/ithor/ <EXPERIMENT-NAME> --mode test -c <WEIGHT-ADDRESS>
[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

RTD-Net (ICCV 2021) This repo holds the codes of paper: "Relaxed Transformer Decoders for Direct Action Proposal Generation", accepted in ICCV 2021. N

Multimedia Computing Group, Nanjing University 80 Nov 30, 2022
LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

LiDAR Distillation Paper | Model LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection Yi Wei, Zibu Wei, Yongming Rao, Jiax

Yi Wei 75 Dec 22, 2022
The reference baseline of final exam for XMU machine learning course

Mini-NICO Baseline The baseline is a reference method for the final exam of machine learning course. Requirements Installation we use /python3.7 /torc

JoaquinChou 3 Dec 29, 2021
A transformer which can randomly augment VOC format dataset (both image and bbox) online.

VocAug It is difficult to find a script which can augment VOC-format dataset, especially the bbox. Or find a script needs complex requirements so it i

Coder.AN 1 Mar 05, 2022
Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

RARE Technologies 13.8k Jan 03, 2023
The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of

1 Jan 27, 2022
Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution [arXiv 2021].

Christoph Reich 122 Dec 12, 2022
ICLR 2021, Fair Mixup: Fairness via Interpolation

Fair Mixup: Fairness via Interpolation Training classifiers under fairness constraints such as group fairness, regularizes the disparities of predicti

Ching-Yao Chuang 49 Nov 22, 2022
A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS).

UniNAS A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS). under development (which happens mostly on our internal Gi

Cognitive Systems Research Group 19 Nov 23, 2022
Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Informative-tracking-benchmark Informative tracking benchmark (ITB) higher diversity. It contains 9 representative scenarios and 180 diverse videos. m

Xin Li 15 Nov 26, 2022
Image Captioning using CNN and Transformers

Image-Captioning Keras/Tensorflow Image Captioning application using CNN and Transformer as encoder/decoder. In particulary, the architecture consists

24 Dec 28, 2022
Full Stack Deep Learning Labs

Full Stack Deep Learning Labs Welcome! Project developed during lab sessions of the Full Stack Deep Learning Bootcamp. We will build a handwriting rec

Full Stack Deep Learning 1.2k Dec 31, 2022
Offical implementation for "Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation".

Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation (NeurIPS 2021) by Qiming Hu, Xiaojie Guo. Dependencies P

Qiming Hu 31 Dec 20, 2022
Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

CLIN-X (CLIN-X-ES) & (CLIN-X-EN) This repository holds the companion code for the system reported in the paper: "CLIN-X: pre-trained language models a

Bosch Research 4 Dec 05, 2022
PyTorch implementation of Higher Order Recurrent Space-Time Transformer

Higher Order Recurrent Space-Time Transformer (HORST) This is the official PyTorch implementation of Higher Order Recurrent Space-Time Transformer. Th

13 Oct 18, 2022
Object Detection with YOLOv3

Object Detection with YOLOv3 Bu projede YOLOv3-608 modeli kullanılmıştır. Requirements Python 3.8 OpenCV Numpy Documentation Yolo ile ilgili detaylı b

Ayşe Konuş 0 Mar 27, 2022
Semantic Segmentation in Pytorch. Network include: FCN、FCN_ResNet、SegNet、UNet、BiSeNet、BiSeNetV2、PSPNet、DeepLabv3_plus、 HRNet、DDRNet

🚀 If it helps you, click a star! ⭐ Update log 2020.12.10 Project structure adjustment, the previous code has been deleted, the adjustment will be re-

Deeachain 269 Jan 04, 2023
GrabGpu_py: a scripts for grab gpu when gpu is free

GrabGpu_py a scripts for grab gpu when gpu is free. WaitCondition: gpu_memory

tianyuluan 3 Jun 18, 2022
YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4. YOLTv4 is designed to detect objects in aerial or satellite imagery in arbitraril

Adam Van Etten 161 Jan 06, 2023
Oriented Response Networks, in CVPR 2017

Oriented Response Networks [Home] [Project] [Paper] [Supp] [Poster] Torch Implementation The torch branch contains: the official torch implementation

ZhouYanzhao 217 Dec 12, 2022