Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Related tags

Text Data & NLPembert
Overview

EmBERT: A Transformer Model for Embodied, Language-guided Visual Task Completion

We present Embodied BERT (EmBERT), a transformer-based model which can attend to high-dimensional, multi-modal inputs across long temporal horizons for language-conditioned task completion. Additionally, we bridge the gap between successful object-centric navigation models used for non-interactive agents and the language-guided visual task completion benchmark, ALFRED, by introducing object navigation targets for EmBERT training. We achieve competitive performance on the ALFRED benchmark, and EmBERT marks the first transformer-based model to successfully handle the long-horizon, dense, multi-modal histories of ALFRED, and the first ALFRED model to utilize object-centric navigation targets.

In this repository, we provide the entire codebase which is used for training and evaluating EmBERT performance on the ALFRED dataset. It's mostly based on AllenNLP and PyTorch-Lightning therefore it's inherently easily to extend.

Setup

We used Anaconda for our experiments. Please create an anaconda environment and then install the project dependencies with the following command:

pip install -r requirements.txt

As next step, we will download the ALFRED data using the script scripts/download_alfred_data.sh as follows:

sh scripts/donwload_alfred_data.sh json_feat

Before doing so, make sure that you have installed p7zip because is used to extract the trajectory files.

MaskRCNN fine-tuning

We provide the code to fine-tune a MaskRCNN model on the ALFRED dataset. To create the vision dataset, use the script scripts/generate_vision_dataset.sh. This will create the dataset splits required by the training process. After this, it's possible to run the model fine-tuning using:

PYTHONPATH=. python vision/finetune.py --batch_size 8 --gradient_clip_val 5 --lr 3e-4 --gpus 1 --accumulate_grad_batches 2 --num_workers 4 --save_dir storage/models/vision/maskrcnn_bs_16_lr_3e-4_epochs_46_7k_batches --max_epochs 46 --limit_train_batches 7000

We provide this code for reference however in our experiments we used the MaskRCNN model from MOCA which applies more sophisticated data augmentation techniques to improve performance on the ALFRED dataset.

ALFRED Visual Features extraction

MaskRCNN

The visual feature extraction script is responsible for generating the MaskRCNN features as well as orientation information for every bounding box. For the MaskrCNN model, we use the pretrained model from MOCA. You can download it from their GitHub page. First, we create the directory structure and then download the model weights:

mkdir -p storage/models/vision/moca_maskrcnn;
wget https://alfred-colorswap.s3.us-east-2.amazonaws.com/weight_maskrcnn.pt -O storage/models/vision/moca_maskrcnn/weight_maskrcnn.pt; 

We extract visual features for training trajectories using the following command:

sh scripts/generate_moca_maskrcnn.sh

You can refer to the actual extraction script scripts/generate_maskrcnn_horizon0.py for additional parameters. We executed this command on an p3.2xlarge instance with NVIDIA V100. This command will populate the directory storage/data/alfred/json_feat_2.1.0/ with the visual features for each trajectory step. In particular, the parameter --features_folder will specify the subdirectory (for each trajectory) that will contain the compressed NumPy files constituting the features. Each NumPy file has the following structure:

dict(
    box_features=np.array,
    roi_angles=np.array,
    boxes=np.array,
    masks=np.array,
    class_probs=np.array,
    class_labels=np.array,
    num_objects=int,
    pano_id=int
)

Data-augmentation procedure

In our paper, we describe a procedure to augment the ALFREd trajectories with object and corresponding receptacle information. In particular, we reply the trajectories and we make sure to track object and its receptacle during a subgoal. The data augmentation script will create a new trajectory file called ref_traj_data.json that mimics the same data structure of the original ALFRED dataset but adds to it a few fields for each action.

To start generating the refined data, use the following script:

PYTHONPATH=. python scripts/generate_landmarks.py 

EmBERT Training

Vocabulary creation

We use AllenNLP for training our models. Before starting the training we will generate the vocabulary for the model using the following command:

allennlp build-vocab training_configs/embert/embert_oscar.jsonnet storage/models/embert/vocab.tar.gz --include-package grolp

Training

First, we need to download the OSCAR checkpoint before starting the training process. We used a version of OSCAR which doesn't use object labels which can be freely downloaded following the instruction on GitHub. Make sure to download this file in the folder storage/models/pretrained using the following commands:

mkdir -p storage/models/pretrained/;
wget https://biglmdiag.blob.core.windows.net/oscar/pretrained_models/base-no-labels.zip -O storage/models/pretrained/oscar.zip;
unzip storage/models/pretrained/oscar.zip -d storage/models/pretrained/;
mv storage/models/pretrained/base-no-labels/ep_67_588997/pytorch_model.bin storage/models/pretrained/oscar-base-no-labels.bin;
rm storage/models/pretrained/oscar.zip;

A new model can be trained using the following command:

allennlp train training_configs/embert/embert_widest.jsonnet -s storage/models/alfred/embert --include-package grolp

When training for the first time, make sure to add to the previous command the following parameters: --preprocess --num_workers 4. This will make sure that the dataset is preprocessed and cached in order to speedup training. We run training using AWS EC2 instances p3.8xlarge with 16 workers on a single GPU per configuration.

The configuration file training_configs/embert/embert_widest.jsonnet contains all the parameters that you might be interested in if you want to change the way the model works or any reference to the actual features files. If you're interested in how to change the model itself, please refer to the model definition. The parameters in the constructor of the class will reflect the ones reported in the configuration file. In general, this project has been developed by using AllenNLP has a reference framework. We refer the reader to the official AllenNLP documentation for more details about how to structure a project.

EmBERT evaluation

We modified the original ALFRED evaluation script to make sure that the results are completely reproducible. Refer to the original repository for more information.

To run the evaluation on the valid_seen and valid_unseen you can use the provided script scripts/run_eval.sh in order to evaluate your model. The EmBERT trainer has different ways of saving checkpoints. At the end of the training, it will automatically save the best model in an archive named model.tar.gz in the destination folder (the one specified with -s). To evaluate it run the following command:

sh scripts/run_eval.sh <your_model_path>/model.tar.gz 

It's also possible to run the evaluation of a specific checkpoint. This can be done by running the previous command as follows:

sh scripts/run_eval.sh <your_model_path>/model-epoch=6.ckpt

In this way the evaluation script will load the checkpoint at epoch 6 in the path . When specifying a checkpoint directly, make sure that the folder contains both config.json file and vocabulary directory because they are required by the script to load all the correct model parameters.

Citation

If you're using this codebase please cite our work:

@article{suglia:embert,
  title={Embodied {BERT}: A Transformer Model for Embodied, Language-guided Visual Task Completion},
  author={Alessandro Suglia and Qiaozi Gao and Jesse Thomason and Govind Thattai and Gaurav Sukhatme},
  journal={arXiv},
  year={2021},
  url={https://arxiv.org/abs/2108.04927}
}
Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Welcome to Healthsea ✨ Create better access to health with spaCy. Healthsea is a pipeline for analyzing user reviews to supplement products by extract

Explosion 75 Dec 19, 2022
✨Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.

✨A Python framework to explore, label, and monitor data for NLP projects

Recognai 1.5k Jan 02, 2023
Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

eBook Reader Dictionaries Finally, decent dictionaries based on Wiktionary for your beloved eBook reader. Dictionaries Catalan 🚧 Ελληνικά (help welco

Mickaël Schoentgen 163 Dec 31, 2022
DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy: A SpaCy NLP Pipeline for Danish DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Ar

Kenneth Enevoldsen 71 Jan 06, 2023
FewCLUE: 为中文NLP定制的小样本学习测评基准

FewCLUE: 为中文NLP定制的小样本学习测评基准

CLUE benchmark 387 Jan 04, 2023
Unsupervised Language Modeling at scale for robust sentiment classification

** DEPRECATED ** This repo has been deprecated. Please visit Megatron-LM for our up to date Large-scale unsupervised pretraining and finetuning code.

NVIDIA Corporation 1k Nov 17, 2022
Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

Twitter-News-Summarizer Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline 1.) Extracts all tweets fr

Rohit Govindan 1 Jan 27, 2022
StarGAN - Official PyTorch Implementation

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

Yunjey Choi 5.1k Dec 30, 2022
Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:

Vaibhaw 12 Sep 28, 2022
Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

CIRPLANT This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) For d

Zheyuan (David) Liu 29 Nov 17, 2022
TFIDF-based QA system for AIO2 competition

AIO2 TF-IDF Baseline This is a very simple question answering system, which is developed as a lightweight baseline for AIO2 competition. In the traini

Masatoshi Suzuki 4 Feb 19, 2022
A fast and easy implementation of Transformer with PyTorch.

FasySeq FasySeq is a shorthand as a Fast and easy sequential modeling toolkit. It aims to provide a seq2seq model to researchers and developers, which

宁羽 7 Jul 18, 2022
The SVO-Probes Dataset for Verb Understanding

The SVO-Probes Dataset for Verb Understanding This repository contains the SVO-Probes benchmark designed to probe for Subject, Verb, and Object unders

DeepMind 20 Nov 30, 2022
Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

BiQQLSTM_HS Code and data for paper: Title: An Effective, Robust and Fairness-awareHate Speech Detection Framework. Authors: Guanyi Mou and Kyumin Lee

Guanyi Mou 2 Dec 27, 2022
Code for PED: DETR For (Crowd) Pedestrian Detection

Code for PED: DETR For (Crowd) Pedestrian Detection

36 Sep 13, 2022
HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

Jonatas Grosman 247 Dec 26, 2022
OpenChat: Opensource chatting framework for generative models

OpenChat is opensource chatting framework for generative models.

Hyunwoong Ko 427 Jan 06, 2023
nlpcommon is a python Open Source Toolkit for text classification.

nlpcommon nlpcommon, Python Text Tool. Guide Feature Install Usage Dataset Contact Cite Reference Feature nlpcommon is a python Open Source

xuming 3 May 29, 2022
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

44 Dec 31, 2022
An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

PMR computer tutorials on HMMs (2021-2022) This is a repository for computer tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a Univer

Vaidotas Šimkus 10 Dec 06, 2022