RGB-stacking ๐Ÿ›‘ ๐ŸŸฉ ๐Ÿ”ท for robotic manipulation

Overview

RGB-stacking ๐Ÿ›‘ ๐ŸŸฉ ๐Ÿ”ท for robotic manipulation

BLOG | PAPER | VIDEO

Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes,
Alex X. Lee*, Coline Devin*, Yuxiang Zhou*, Thomas Lampe*, Konstantinos Bousmalis*, Jost Tobias Springenberg*, Arunkumar Byravan, Abbas Abdolmaleki, Nimrod Gileadi, David Khosid, Claudio Fantacci, Jose Enrique Chen, Akhil Raju, Rae Jeong, Michael Neunert, Antoine Laurens, Stefano Saliceti, Federico Casarini, Martin Riedmiller, Raia Hadsell, Francesco Nori.
In Conference on Robot Learning (CoRL), 2021.

The RGB environment

This repository contains an implementation of the simulation environment described in the paper "Beyond Pick-and-Place: Tackling robotic stacking of diverse shapes". Note that this is a re-implementation of the environment (to remove dependencies on internal libraries). As a result, not all the features described in the paper are available at this point. Noticeably, domain randomization is not included in this release. We also aim to provide reference performance metrics of trained policies on this environment in the near future.

In this environment, the agent controls a robot arm with a parallel gripper above a basket, which contains three objects โ€” one red, one green, and one blue, hence the name RGB. The agent's task is to stack the red object on top of the blue object, within 20 seconds, while the green object serves as an obstacle and distraction. The agent controls the robot using a 4D Cartesian controller. The controlled DOFs are x, y, z and rotation around the z axis. The simulation is a MuJoCo environment built using the Modular Manipulation (MoMa) framework.

Corresponding method

The RGB-stacking paper "Beyond Pick-and-Place: Tackling robotic stacking of diverse shapes" also contains a description and thorough evaluation of our initial solution to both the 'Skill Mastery' (training on the 5 designated test triplets and evaluating on them) and the 'Skill Generalization' (training on triplets of training objects and evaluating on the 5 test triplets). Our approach was to first train a state-based policy in simulation via a standard RL algorithm (we used MPO) followed by interactive distillation of the state-based policy into a vision-based policy (using a domain randomized version of the environment) that we then deployed to the robot via zero-shot sim-to-real transfer. We finally improved the policy further via offline RL based on data collected from the sim-to-real policy (we used CRR). For details on our method and the results please consult the paper.

Installing and visualizing the environment

Please ensure that you have a working MuJoCo200 installation and a valid MuJoCo licence.

  1. Clone this repository:

    git clone https://github.com/deepmind/rgb_stacking.git
    cd rgb_stacking
  2. Prepare a Python 3 environment - venv is recommended.

    python3 -m venv rgb_stacking_venv
    source rgb_stacking_venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Run the environment viewer:

    python -m rgb_stacking.main

Step 2-4 can also be done by running the run.sh script:

./run.sh

Specifying the object triplet

The default environment will load with Triplet 4 (see Sect. 3.2.1 in the paper). If you wish to use a different triplet you can use the following commands:

from rgb_stacking import environment

env = environment.rgb_stacking(object_triplet=NAME_OF_SET)

The possible NAME_OF_SET are:

  • rgb_test_triplet{i} where i is one of 1, 2, 3, 4, 5: Loads test triplet i.
  • rgb_test_random: Randomly loads one of the 5 test triplets.
  • rgb_train_random: Triplet comprised of blocks from the training set.
  • rgb_heldout_random: Triplet comprised of blocks from the held-out set.

For more information on the blocks and the possible options, please refer to the rgb_objects repository.

Specifying the observation space

By default, the observations exposed by the environment are only the ones we used for training our state-based agents. To use another set of observations please use the following code snippet:

from rgb_stacking import environment

env = environment.rgb_stacking(
    observations=environment.ObservationSet.CHOSEN_SET)

The possible CHOSEN_SET are:

  • STATE_ONLY: Only the state observations, used for training expert policies from state in simulation (stage 1).
  • VISION_ONLY: Only image observations.
  • ALL: All observations.
  • INTERACTIVE_IMITATION_LEARNING: Pair of image observations and a subset of proprioception observations, used for interactive imitation learning (stage 2).
  • OFFLINE_POLICY_IMPROVEMENT: Pair of image observations and a subset of proprioception observations, used for the one-step offline policy improvement (stage 3).

Real RGB-Stacking Environment: CAD models and assembly instructions

The CAD model of the setup is available in onshape.

We also provide the following documents for the assembly of the real cell:

  • Assembly instructions for the basket.
  • Assembly instructions for the robot.
  • Assembly instructions for the cell.
  • The bill of materials of all the necessary parts.
  • A diagram with the wiring of cell.

The RGB-objects themselves can be 3D-printed using the STLs available in the rgb_objects repository.

Citing

If you use rgb_stacking in your work, please cite the accompanying paper:

@inproceedings{lee2021rgbstacking,
    title={Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes},
    author={Alex X. Lee and
            Coline Devin and
            Yuxiang Zhou and
            Thomas Lampe and
            Konstantinos Bousmalis and
            Jost Tobias Springenberg and
            Arunkumar Byravan and
            Abbas Abdolmaleki and
            Nimrod Gileadi and
            David Khosid and
            Claudio Fantacci and
            Jose Enrique Chen and
            Akhil Raju and
            Rae Jeong and
            Michael Neunert and
            Antoine Laurens and
            Stefano Saliceti and
            Federico Casarini and
            Martin Riedmiller and
            Raia Hadsell and
            Francesco Nori},
    booktitle={Conference on Robot Learning (CoRL)},
    year={2021},
    url={https://openreview.net/forum?id=U0Q8CrtBJxJ}
}
Owner
DeepMind
DeepMind
Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

V-Sense 171 Dec 26, 2022
The missing CMake project initializer

cmake-init - The missing CMake project initializer Opinionated CMake project initializer to generate CMake projects that are FetchContent ready, separ

1k Jan 01, 2023
A crossplatform menu bar application using mpv as DLNA Media Renderer.

Macast Chinese README A menu bar application using mpv as DLNA Media Renderer. Install MacOS || Windows || Debian Download link: Macast release latest

4.4k Jan 01, 2023
Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

MOSNet pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion" https://arxiv.org/abs/1904.08352 Dependency L

9 Nov 18, 2022
Net2net - Network-to-Network Translation with Conditional Invertible Neural Networks

Net2Net Code accompanying the NeurIPS 2020 oral paper Network-to-Network Translation with Conditional Invertible Neural Networks Robin Rombach*, Patri

CompVis Heidelberg 206 Dec 20, 2022
Convolutional neural network that analyzes self-generated images in a variety of languages to find etymological similarities

This project is a convolutional neural network (CNN) that analyzes self-generated images in a variety of languages to find etymological similarities. Specifically, the goal is to prove that computer

1 Feb 03, 2022
PyTorch implementation of MICCAI 2018 paper "Liver Lesion Detection from Weakly-labeled Multi-phase CT Volumes with a Grouped Single Shot MultiBox Detector"

Grouped SSD (GSSD) for liver lesion detection from multi-phase CT Note: the MICCAI 2018 paper only covers the multi-phase lesion detection part of thi

Sang-gil Lee 36 Oct 12, 2022
Code for NeurIPS 2021 paper "Curriculum Offline Imitation Learning"

README The code is based on the ILswiss. To run the code, use python run_experiment.py --nosrun -e your YAML file -g gpu id Generally, run_experim

ApexRL 12 Mar 19, 2022
Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations Trevor Ablett, Daniel (Yifan) Zhai, Jonatha

STARS Laboratory 3 Feb 01, 2022
Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"

The Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more" Arxiv preprint Louay Hazami โ€ƒ ยท โ€ƒ Rayhane Mama โ€ƒ ยท โ€ƒ Ragavan Thurairatn

Rayhane Mama 144 Dec 23, 2022
Code of the paper "Part Detector Discovery in Deep Convolutional Neural Networks" by Marcel Simon, Erik Rodner and Joachim Denzler

Part Detector Discovery This is the code used in our paper "Part Detector Discovery in Deep Convolutional Neural Networks" by Marcel Simon, Erik Rodne

Computer Vision Group Jena 17 Feb 22, 2022
A simple editor for captions in .SRT file extension

WaySRT A simple editor for captions in .SRT file extension The program doesn't use any external dependecies, just run: python way_srt.py {file_name.sr

Gustavo Lopes 3 Nov 16, 2022
Exploring Image Deblurring via Blur Kernel Space (CVPR'21)

Exploring Image Deblurring via Encoded Blur Kernel Space About the project We introduce a method to encode the blur operators of an arbitrary dataset

VinAI Research 118 Dec 19, 2022
Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

CoGAIL Table of Content Overview Installation Dataset Training Evaluation Trained Checkpoints Acknowledgement Citations License Overview This reposito

Jeremy Wang 29 Dec 24, 2022
Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020) Official implementation of: Forest R-CNN: Large-Vo

Jialian Wu 54 Jan 06, 2023
Speech Recognition using DeepSpeech2.

deepspeech.pytorch Implementation of DeepSpeech2 for PyTorch using PyTorch Lightning. The repo supports training/testing and inference using the DeepS

Sean Naren 2k Jan 04, 2023
PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
PyTorch implementation of EfficientNetV2

[NEW!] Check out our latest work involution accepted to CVPR'21 that introduces a new neural operator, other than convolution and self-attention. PyTo

Duo Li 375 Jan 03, 2023
PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper.

deep-linear-shapes PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper. If you find this code useful i

Romain Loiseau 27 Sep 24, 2022
SMPLpix: Neural Avatars from 3D Human Models

subject0_validation_poses.mp4 Left: SMPL-X human mesh registered with SMPLify-X, middle: SMPLpix render, right: ground truth video. SMPLpix: Neural Av

Sergey Prokudin 292 Dec 30, 2022