Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Overview

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation

License: MIT PWC

This repository is the pytorch implementation of our paper:

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation
Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira
International Conference on Robotics and Automation (ICRA), 2021

[Project Page] [arXiv] [GitHub]

Installation

Clone the current repository and required submodules:

git clone https://github.com/GT-RIPL/robo-vln
cd robo-vln
  
export robovln_rootdir=$PWD
    
git submodule init 
git submodule update

Habitat and Other Dependencies

Install robo-vln dependencies as follows:

conda create -n habitat python=3.6 cmake=3.14.0
cd $robovln_rootdir
python -m pip install -r requirements.txt

We use modified versions of Habitat-Sim and Habitat-API to support continuous control/action-spaces in Habitat Simulator. The details regarding continuous action spaces and converting discrete VLN dataset into continuous control formulation can be found in our paper. The specific commits of our modified Habitat-Sim and Habitat-API versions are mentioned below.

# installs both habitat-api and habitat_baselines
cd $robovln_rootdir/environments/habitat-lab
python -m pip install -r requirements.txt
python -m pip install -r habitat_baselines/rl/requirements.txt
python -m pip install -r habitat_baselines/rl/ddppo/requirements.txt
python setup.py develop --all
	
# Install habitat-sim
cd $robovln_rootdir/environments/habitat-sim
python setup.py install --headless --with-cuda

Data

Similar to Habitat-API, we expect a data folder (or symlink) with a particular structure in the top-level directory of this project.

Matterport3D

We utilize Matterport3D (MP3D) photo-realistic scene reconstructions to train and evaluate our agent. A total of 90 Matterport3D scenes are used for robo-vln. Here is the official Matterport3D Dataset download link and associated instructions: project webpage. To download the scenes needed for robo-vln, run the following commands:

# requires running with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/

Extract this data to data/scene_datasets/mp3d such that it has the form data/scene_datasets/mp3d/{scene}/{scene}.glb.

Dataset

The Robo-VLN dataset is a continuous control formualtion of the VLN-CE dataset by Krantz et al ported over from Room-to-Room (R2R) dataset created by Anderson et al. The details regarding converting discrete VLN dataset into continuous control formulation can be found in our paper.

Dataset Path to extract Size
robo_vln_v1.zip data/datasets/robo_vln_v1 76.9 MB

Robo-VLN Dataset

The dataset robo_vln_v1 contains the train, val_seen, and val_unseen splits.

  • train: 7739 episodes
  • val_seen: 570 episodes
  • val_unseen: 1224 episodes

Format of {split}.json.gz

{
    'episodes' = [
        {
            'episode_id': 4991,
            'trajectory_id': 3279,
            'scene_id': 'mp3d/JeFG25nYj2p/JeFG25nYj2p.glb',
            'instruction': {
                'instruction_text': 'Walk past the striped area rug...',
                'instruction_tokens': [2384, 1589, 2202, 2118, 133, 1856, 9]
            },
            'start_position': [10.257800102233887, 0.09358400106430054, -2.379739999771118],
            'start_rotation': [0, 0.3332950713608026, 0, 0.9428225683587541],
            'goals': [
                {
                    'position': [3.360340118408203, 0.09358400106430054, 3.07817006111145], 
                    'radius': 3.0
                }
            ],
            'reference_path': [
                [10.257800102233887, 0.09358400106430054, -2.379739999771118], 
                [9.434900283813477, 0.09358400106430054, -1.3061100244522095]
                ...
                [3.360340118408203, 0.09358400106430054, 3.07817006111145],
            ],
            'info': {'geodesic_distance': 9.65537166595459},
        },
        ...
    ],
    'instruction_vocab': [
        'word_list': [..., 'orchids', 'order', 'orient', ...],
        'word2idx_dict': {
            ...,
            'orchids': 1505,
            'order': 1506,
            'orient': 1507,
            ...
        },
        'itos': [..., 'orchids', 'order', 'orient', ...],
        'stoi': {
            ...,
            'orchids': 1505,
            'order': 1506,
            'orient': 1507,
            ...
        },
        'num_vocab': 2504,
        'UNK_INDEX': 1,
        'PAD_INDEX': 0,
    ]
}
  • Format of {split}_gt.json.gz
{
    '4991': {
        'actions': [
          ...
          [-0.999969482421875, 1.0],
          [-0.9999847412109375, 0.15731772780418396],
          ...
          ],
        'forward_steps': 325,
        'locations': [
            [10.257800102233887, 0.09358400106430054, -2.379739999771118],
            [10.257800102233887, 0.09358400106430054, -2.379739999771118],
            ...
            [-12.644463539123535, 0.1518409252166748, 4.2241311073303220]
        ]
    }
    ...
}

Depth Encoder Weights

Similar to VLN-CE, our learning-based models utilizes a depth encoder pretained on a large-scale point-goal navigation task i.e. DDPPO. We utilize depth pretraining by using the DDPPO features from the ResNet50 from the original paper. The pretrained network can be downloaded here. Extract the contents of ddppo-models.zip to data/ddppo-models/{model}.pth.

Training and reproducing results

We use run.py script to train and evaluate all of our baseline models. Use run.py along with a configuration file and a run type (either train or eval) to train or evaluate:

python run.py --exp-config path/to/config.yaml --run-type {train | eval}

For lists of modifiable configuration options, see the default task config and experiment config files.

Evaluating Models

All models can be evaluated using python run.py --exp-config path/to/config.yaml --run-type eval. The relevant config entries for evaluation are:

EVAL_CKPT_PATH_DIR  # path to a checkpoint or a directory of checkpoints
EVAL.USE_CKPT_CONFIG  # if True, use the config saved in the checkpoint file
EVAL.SPLIT  # which dataset split to evaluate on (typically val_seen or val_unseen)
EVAL.EPISODE_COUNT  # how many episodes to evaluate

If EVAL.EPISODE_COUNT is equal to or greater than the number of episodes in the evaluation dataset, all episodes will be evaluated. If EVAL_CKPT_PATH_DIR is a directory, one checkpoint will be evaluated at a time. If there are no more checkpoints to evaluate, the script will poll the directory every few seconds looking for a new one. Each config file listed in the next section is capable of both training and evaluating the model it is accompanied by.

Off-line Data Buffer

All our models require an off-line data buffer for training. To collect the continuous control dataset for both train and val_seen splits, run the following commands before training (Please note that it would take some time on a single GPU to store data. Please also make sure to dedicate around ~1.5 TB of hard-disk space for data collection):

Collect data buffer for train split:

python run.py --exp-config robo_vln_baselines/config/paper_configs/robovln_data_train.yaml --run-type train

Collect data buffer for val_seen split:

python run.py --exp-config robo_vln_baselines/config/paper_configs/robovln_data_val.yaml --run-type train 

CUDA

We use 2 GPUs to train our Hierarchical Model hierarchical_cma.yaml. To train the hierarchical model, dedicate 2 GPUs for training as follows:

CUDA_VISIBLE_DEVICES=0,1 python run.py --exp-config robo_vln_baselines/config/paper_configs/hierarchical_cma.yaml --run-type train

Models/Results From the Paper

Model val_seen SPL val_unseen SPL Config
Seq2Seq 0.34 0.30 seq2seq_robo.yaml
PM 0.27 0.24 seq2seq_robo_pm.yaml
CMA 0.25 0.25 cma.yaml
HCM (Ours) 0.43 0.40 hierarchical_cma.yaml
Legend
Seq2Seq Sequence-to-Sequence. Please see our paper on modification made to the model to match the continuous action spaces in robo-vln
PM Progress monitor
CMA Cross-Modal Attention model. Please see our paper on modification made to the model to match the continuous action spaces in robo-vln
HCM Hierarchical Cross-Modal Agent Module (The proposed hierarchical VLN model from our paper).

Pretrained Model

We provide pretrained model for our best Hierarchical Cross-Modal Agent (HCM). Pre-trained Model can be downloaded as follows:

Pre-trained Model Size
HCM_Agent.pth 691 MB

Citation

If you find this repository useful, please cite our paper:

@inproceedings{irshad2021hierarchical,
title={Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation},
author={Muhammad Zubair Irshad and Chih-Yao Ma and Zsolt Kira},
booktitle={Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
year={2021},
url={https://arxiv.org/abs/2104.10674}
}

Acknowledgments

  • This code is built upon the implementation from VLN-CE
object recognition with machine learning on Respberry pi

Respberrypi_object-recognition object recognition with machine learning on Respberry pi line.py 建立一支與樹梅派連線的 linebot 使用此 linebot 遠端控制樹梅派拍照 config.ini l

1 Dec 11, 2021
A Python Library for Graph Outlier Detection (Anomaly Detection)

PyGOD is a Python library for graph outlier detection (anomaly detection). This exciting yet challenging field has many key applications, e.g., detect

PyGOD Team 757 Jan 04, 2023
Official Repsoitory for "Mish: A Self Regularized Non-Monotonic Neural Activation Function" [BMVC 2020]

Mish: Self Regularized Non-Monotonic Activation Function BMVC 2020 (Official Paper) Notes: (Click to expand) A considerably faster version based on CU

Xa9aX ツ 1.2k Dec 29, 2022
Prototype for Baby Action Detection and Classification

Baby Action Detection Table of Contents About Install Run Predictions Demo About An attempt to harness the power of Deep Learning to come up with a so

Shreyas K 30 Dec 16, 2022
Scheduling BilinearRewards

Scheduling_BilinearRewards Requirement Python 3 =3.5 Structure main.py This file includes the main function. For getting the results in Figure 1, ple

junghun.kim 0 Nov 25, 2021
This repository contains the entire code for our work "Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding"

Two-Timescale-DNN Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding This repository contains the entire code for our work

QiyuHu 3 Mar 07, 2022
[NIPS 2021] UOTA: Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration.

UOTA: Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration This repository is the official PyTorch implementation of UOT

6 Jun 29, 2022
This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

Lite-HRNet: A Lightweight High-Resolution Network Introduction This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution

HRNet 675 Dec 25, 2022
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Rainbow 🌈 An implementation of Rainbow DQN which outperforms the paper's (Hessel et al. 2017) results on 40% of tested games while using 20x less dat

Dominik Schmidt 31 Dec 21, 2022
Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors (CVPR 2022) Personal Web Pages | Paper | Project Page This repository contains

151 Dec 26, 2022
Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES)

Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES) This repo contains the full NITRATES pipeline for maximum likelihood-driven discov

13 Nov 08, 2022
G-NIA model from "Single Node Injection Attack against Graph Neural Networks" (CIKM 2021)

Single Node Injection Attack against Graph Neural Networks This repository is our Pytorch implementation of our paper: Single Node Injection Attack ag

Shuchang Tao 18 Nov 21, 2022
Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting

StochFuzz: A New Solution for Binary-only Fuzzing StochFuzz is a (probabilistically) sound and cost-effective fuzzing technique for stripped binaries.

Zhuo Zhang 164 Dec 05, 2022
Tensorflow implementation of Character-Aware Neural Language Models.

Character-Aware Neural Language Models Tensorflow implementation of Character-Aware Neural Language Models. The original code of author can be found h

Taehoon Kim 751 Dec 26, 2022
When BERT Plays the Lottery, All Tickets Are Winning

When BERT Plays the Lottery, All Tickets Are Winning Large Transformer-based models were shown to be reducible to a smaller number of self-attention h

Sai 16 Nov 10, 2022
Randstad Artificial Intelligence Challenge (powered by VGEN). Soluzione proposta da Stefano Fiorucci (anakin87) - primo classificato

Randstad Artificial Intelligence Challenge (powered by VGEN) Soluzione proposta da Stefano Fiorucci (anakin87) - primo classificato Struttura director

Stefano Fiorucci 1 Nov 13, 2021
[CVPR2021] Invertible Image Signal Processing

Invertible Image Signal Processing This repository includes official codes for "Invertible Image Signal Processing (CVPR2021)". Figure: Our framework

Yazhou XING 281 Dec 31, 2022
Binary Stochastic Neurons in PyTorch

Binary Stochastic Neurons in PyTorch http://r2rt.com/binary-stochastic-neurons-in-tensorflow.html https://github.com/pytorch/examples/tree/master/mnis

Onur Kaplan 54 Nov 21, 2022
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022