ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

Overview

ManipulaTHOR: A Framework for Visual Object Manipulation

Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi

(Oral Presentation at CVPR 2021)

(Project Page)--(Framework)--(Video)--(Slides)

We present ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm. Our framework is built upon a physics engine and enables realistic interactions with objects while navigating through scenes and performing tasks. Object manipulation is an established research domain within the robotics community and poses several challenges including avoiding collisions, grasping, and long-horizon planning. Our framework focuses primarily on manipulation in visually rich and complex scenes, joint manipulation and navigation planning, and generalization to unseen environments and objects; challenges that are often overlooked. The framework provides a comprehensive suite of sensory information and motor functions enabling development of robust manipulation agents.

This code base is based on AllenAct framework and the majority of the core training algorithms and pipelines are borrowed from AllenAct code base.

Citation

If you find this project useful in your research, please consider citing:

   @inproceedings{ehsani2021manipulathor,
     title={ManipulaTHOR: A Framework for Visual Object Manipulation},
     author={Ehsani, Kiana and Han, Winson and Herrasti, Alvaro and VanderBilt, Eli and Weihs, Luca and Kolve, Eric and Kembhavi, Aniruddha and Mottaghi, Roozbeh},
     booktitle={CVPR},
     year={2021}
   }

Contents

💻 Installation

To begin, clone this repository locally

git clone https://github.com/ehsanik/manipulathor.git
See here for a summary of the most important files/directories in this repository

Here's a quick summary of the most important files/directories in this repository:

  • utils/*.py - Helper functions and classes including the visualization helpers.
  • projects/armpointnav_baselines
    • experiments/
      • ithor/armpointnav_*.py - Different baselines introduced in the paper. Each files in this folder corresponds to a row of a table in the paper.
      • *.py - The base configuration files which define experiment setup and hyperparameters for training.
    • models/*.py - A collection of Actor-Critic baseline models.
  • plugins/ithor_arm_plugin/ - A collection of Environments, Task Samplers and Task Definitions
    • ithor_arm_environment.py - The definition of the ManipulaTHOREnvironment that wraps the AI2THOR-based framework introduced in this work and enables an easy-to-use API.
    • itho_arm_constants.py - Constants used to define the task and parameters of the environment. These include the step size taken by the agent, the unique id of the the THOR build we use, etc.
    • ithor_arm_sensors.py - Sensors which provide observations to our agents during training. E.g. the RGBSensor obtains RGB images from the environment and returns them for use by the agent.
    • ithor_arm_tasks.py - Definition of the ArmPointNav task, the reward definition and the function for calculating the goal achievement.
    • ithor_arm_task_samplers.py - Definition of the ArmPointNavTaskSampler samplers. Initializing the sampler, reading the json files from the dataset and randomly choosing a task is defined in this file.
    • ithor_arm_viz.py - Utility functions for visualization and logging the outputs of the models.

You can then install requirements by running

pip install -r requirements.txt

Python 3.6+ 🐍 . Each of the actions supports typing within Python.

AI2-THOR <43f62a0> 🧞 . To ensure reproducible results, please install this version of the AI2THOR.

📝 ArmPointNav Task Description

ArmPointNav is the goal of addressing the problem of visual object manipulation, where the task is to move an object between two locations in a scene. Operating in visually rich and complex environments, generalizing to unseen environments and objects, avoiding collisions with objects and structures in the scene, and visual planning to reach the destination are among the major challenges of this task. The example illustrates a sequence of actions taken a by a virtual robot within the ManipulaTHOR environment for picking up a vase from the shelf and stack it on a plate on the countertop.

📊 Dataset

To study the task of ArmPointNav, we present the ArmPointNav Dataset (APND). This consists of 30 kitchen scenes in AI2-THOR that include more than 150 object categories (69 interactable object categories) with a variety of shapes, sizes and textures. We use 12 pickupable categories as our target objects. We use 20 scenes in the training set and the remaining is evenly split into Val and Test. We train with 6 object categories and use the remaining to test our model in a Novel-Obj setting. For more information on dataset, and how to download it refer to Dataset Details.

🖼️ Sensory Observations

The types of sensors provided for this paper include:

  1. RGB images - having shape 224x224x3 and an FOV of 90 degrees.
  2. Depth maps - having shape 224x224 and an FOV of 90 degrees.
  3. Perfect egomotion - We allow for agents to know precisely what the object location is relative to the agent's arm as well as to its goal location.

🏃 Allowed Actions

A total of 13 actions are available to our agents, these include:

  1. Moving the agent
  • MoveAhead - Results in the agent moving ahead by 0.25m if doing so would not result in the agent colliding with something.

  • Rotate [Right/Left] - Results in the agent's body rotating 45 degrees by the desired direction.

  1. Moving the arm
  • Moving the wrist along axis [x, y, z] - Results in the arm moving along an axis (±x,±y, ±z) by 0.05m.

  • Moving the height of the arm base [Up/Down] - Results in the base of the arm moving along y axis by 0.05m.

  1. Abstract Grasp
  • Picks up a target object. Only succeeds if the object is inside the arm grasper.
  1. Done Action
  • This action finishes an episode. The agent must issue a Done action when it reaches the goal otherwise the episode considers as a failure.

Defining a New Task

In order to define a new task, redefine the rewarding, try a new model, or change the enviornment setup, checkout our tutorial on defining a new task here.

🏋 Training An Agent

You can train a model with a specific experiment setup by running one of the experiments below:

python3 main.py -o experiment_output -s 1 -b projects/armpointnav_baselines/experiments/ithor/ <EXPERIMENT-NAME>

Where <EXPERIMENT-NAME> can be one of the options below:

armpointnav_no_vision -- No Vision Baseline
armpointnav_disjoint_depth -- Disjoint Model Ablation
armpointnav_rgb -- Our RGB Experiment
armpointnav_rgbdepth -- Our RGBD Experiment
armpointnav_depth -- Our Depth Experiment

💪 Evaluating A Pre-Trained Agent

To evaluate a pre-trained model, (for example to reproduce the numbers in the paper), you can add --mode test -c <WEIGHT-ADDRESS> to the end of the command you ran for training.

In order to reproduce the numbers in the paper, you need to download the pretrained models from here and extract them to pretrained_models. The full list of experiments and their corresponding trained weights can be found here.

python3 main.py -o experiment_output -s 1 -b projects/armpointnav_baselines/experiments/ithor/ <EXPERIMENT-NAME> --mode test -c <WEIGHT-ADDRESS>
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

Ajinkya Kulkarni 43 Nov 27, 2022
Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers.

Contra-OOD Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers. Requirements PyTorch Transformers datasets

Wenxuan Zhou 27 Oct 28, 2022
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

632 Dec 13, 2022
A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.

Adversarial Video Generation This project implements a generative adversarial network to predict future frames of video, as detailed in "Deep Multi-Sc

Matt Cooper 704 Nov 26, 2022
This is an official implementation for "PlaneRecNet".

PlaneRecNet This is an official implementation for PlaneRecNet: A multi-task convolutional neural network provides instance segmentation for piece-wis

yaxu 50 Nov 17, 2022
List of awesome things around semantic segmentation 🎉

Awesome Semantic Segmentation List of awesome things around semantic segmentation 🎉 Semantic segmentation is a computer vision task in which we label

Dam Minh Tien 18 Nov 26, 2022
This is a vision-based 3d model manipulation and control UI

Manipulation of 3D Models Using Hand Gesture This program allows user to manipulation 3D models (.obj format) with their hands. The project support bo

Cortic Technology Corp. 43 Oct 23, 2022
Additional functionality for use with fastai’s medical imaging module

fmi Adding additional functionality to fastai's medical imaging module To learn more about medical imaging using Fastai you can view my blog Install g

14 Oct 31, 2022
[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

Learning to Compose Visual Relations This is the pytorch codebase for the NeurIPS 2021 Spotlight paper Learning to Compose Visual Relations. Demo Imag

Nan Liu 88 Jan 04, 2023
Competitive Programming Club, Clinify's Official repository for CP problems hosting by club members.

Clinify-CPC_Programs This repository holds the record of the competitive programming club where the competitive coding aspirants are thriving hard and

Clinify Open Sauce 4 Aug 22, 2022
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Multimedia Computing Group, Nanjing University 235 Jan 03, 2023
In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

Dirk Neuhäuser 6 Dec 08, 2022
Repository for training material for the 2022 SDSC HPC/CI User Training Course

hpc-training-2022 Repository for training material for the 2022 SDSC HPC/CI Training Series HPC/CI Training Series home https://www.sdsc.edu/event_ite

sdsc-hpc-training-org 21 Jul 27, 2022
Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies

An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks Novel and high-performance medical ima

14 Dec 18, 2022
Hierarchical Metadata-Aware Document Categorization under Weak Supervision (WSDM'21)

Hierarchical Metadata-Aware Document Categorization under Weak Supervision This project provides a weakly supervised framework for hierarchical metada

Yu Zhang 53 Sep 17, 2022
Syllabus del curso IIC2115 - Programación como Herramienta para la Ingeniería 2022/I

IIC2115 - Programación como Herramienta para la Ingeniería Videos y tutoriales Tutorial CMD Tutorial Instalación Python y Jupyter Tutorial de git-GitH

21 Nov 09, 2022
AITUS - An atomatic notr maker for CYTUS

AITUS an automatic note maker for CYTUS. 利用AI根据指定乐曲生成CYTUS游戏谱面。 效果展示:https://www

GradiusTwinbee 6 Feb 24, 2022
A Python wrapper for Google Tesseract

Python Tesseract Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded i

Matthias A Lee 4.6k Jan 05, 2023
Official repository for the NeurIPS 2021 paper Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided curriculum Learning Approach

Get Fooled for the Right Reason Official repository for the NeurIPS 2021 paper Get Fooled for the Right Reason: Improving Adversarial Robustness throu

Sowrya Gali 1 Apr 25, 2022
a grammar based feedback fuzzer

Nautilus NOTE: THIS IS AN OUTDATE REPOSITORY, THE CURRENT RELEASE IS AVAILABLE HERE. THIS REPO ONLY SERVES AS A REFERENCE FOR THE PAPER Nautilus is a

Chair for Sys­tems Se­cu­ri­ty 158 Dec 28, 2022