Learning to Draw: Emergent Communication through Sketching

Overview

Learning to Draw: Emergent Communication through Sketching

This is the official code for the paper "Learning to Draw: Emergent Communication through Sketching".

ArXivPapers With CodeGetting StartedGame setupsModel setupDatasets

About

We demonstrate that it is possible for a communication channel based on line drawing to emerge between agents playing a visual referential communication game. Furthermore we show that with a simple additional self-supervised loss that the drawings the agent produces are interpretable by humans.

Getting started

You'll need to install the required dependencies listed in requirements.txt. This includes installing the differentiable rasteriser from the DifferentiableSketching repository, and the source version of https://github.com/pytorchbearer/torchbearer:

pip install git+https://github.com/jonhare/DifferentiableSketching.git
pip install git+https://github.com/pytorchbearer/torchbearer.git
pip install -r requirements.txt

Once the dependencies are installed, you can run the commgame.py script to train and test models:

python commgame.py train [args]
python commgame.py test [args]

For example, to train a pair of agents on the original game using the STL10 dataset (which will be downloaded if required), you would run:

python commgame.py train --dataset STL10 --output stl10-original-model --sigma2 5e-4 --nlines 20 --learning-rate 0.0001 --imagenet-weights --freeze-vgg --imagenet-norm --epochs 250 --invert --batch-size 100

The options --sigma2 and --nlines control the thickness and number of lines respectively. --imagenet-weights uses the standard pretrained imagenet vgg16 weights (use --sin-weights for stylized imagenet weights). Finally, --freeze-vgg freezes the backbone CNN, --imagenet-norm specifies to apply the imagenet normalisation to images (this should be used when using either imagenet or stylized imagenet weights), and --invert draws black strokes on a white canvas.

The training scripts compute a running communication rate in addition to loss and this is displayed as training progresses. After each epoch a validation pass is performed and images of the sketches and sender inputs and receiver targets are saved to the output directory along with a model snapshot. The output directory also contains a log file with the training and validation statistics per epoch.

Example commands to run the experiments in the paper are given in commands.md

Further details on commandline arguments are given below.

Game setups

All the setups involve a referential game where the reciever tries to select the "correct" image from a pool on the basis of a "sketch" provided by the sender. The primary measure of success is the communication rate. The different command line arguments to control the different game variants are listed in the following subsections:

Havrylov and Titov's Original Game Setup

Sender sees one image; Reciever sees many, where one is exactly the same as sender.

Number of reciever images (target + distractors) is controlled by the batch-size. Number of sender images per iteration can also be controlled for completeness, but defaults to the same as batch size (e.g. each forward pass with a batch plays all possible game combinations using each of the images as a target).

arguments:
--batch-size
[--sender-images-per-iter]

Object-oriented Game Setup (same)

Sender sees one image; Reciever sees many, where one is exactly the same as sender and the others are all of different classes.

arguments:
--object-oriented same
[--num-targets]
[--sender-images-per-iter]

Object-oriented Game Setup (different)

Sender sees one image; Reciever sees many, each of different classes; one of the images is the same class as the sender, but is a completely different image).

arguments:
--object-oriented different 
[--num-targets]
[--sender-images-per-iter]
[--random-transform-sender]

Model setup

Sender

The "sender" consists of a backbone VGG16 CNN which translates the input image into a latent vector and a "decoder" with an MLP that projects the latent representation from the backbone to a set of drawing commands that are differentiably rendered into an image which is sent to the "reciever".

The backbone can optionally be initialised with pretrained weight and also optionally frozen (except for the final linear projection). The backbone, including linear projection can be shared between sender and reciever (default) or separate (--separate_encoders).

arguments:
[--freeze-vgg]
[--imagenet-weights --imagenet-norm] 
[--sin-weights --imagenet-norm] 
[--separate_encoders]

Receiver

The "receiver" consists of a backbone CNN which is used to convert visual inputs (both the images in the pool and the sketch) into a latent vector which is then transformed into a different latent representation by an MLP. These projected latent vectors are used for prediction and in the loss as described below.

The actual backbone CNN model architecture will be the same as the sender's. The backbone can optionally share parameters with the "sender" agent. Alternatively it can be initialised with pre-trained weights, and also optionally frozen.

arguments:
[--freeze-vgg]
[--imagenet-weights --imagenet-norm]
[--separate_encoders]

Datasets

  • MNIST
  • CIFAR-10 / CIFAR-100
  • TinyImageNet
  • CelebA (--image-size to control size; default 64px)
  • STL-10
  • Caltech101 (training data is balanced by supersampling with augmentation)

Datasets will be downloaded to the dataset root directory (default ./data) as required.

arguments: 
--dataset {CIFAR10,CelebA,MNIST,STL10,TinyImageNet,Caltech101}  
[--dataset-root]

Citation

If you find this repository useful for your research, please cite our paper using the following.

  @@inproceedings{
  mihai2021learning,
  title={Learning to Draw: Emergent Communication through Sketching},
  author={Daniela Mihai and Jonathon Hare},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021},
  url={https://openreview.net/forum?id=YIyYkoJX2eA}
  }
Gin provides a lightweight configuration framework for Python

Gin Config Authors: Dan Holtmann-Rice, Sergio Guadarrama, Nathan Silberman Contributors: Oscar Ramirez, Marek Fiser Gin provides a lightweight configu

Google 1.7k Jan 03, 2023
End-to-end image segmentation kit based on PaddlePaddle.

English | 简体中文 PaddleSeg PaddleSeg has released the new version including the following features: Our team won the 6.2k Jan 02, 2023

Blind Video Temporal Consistency via Deep Video Prior

deep-video-prior (DVP) Code for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior PyTorch implementation | paper | project web

Chenyang LEI 272 Dec 21, 2022
Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation Source code of Robust On-Policy Data Collection for Data-Efficient Policy Evalua

Autonomous Agents Research Group (University of Edinburgh) 2 Oct 09, 2022
The code of "Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer".

Code data_preprocess.py: preprocess data for Dependent-T5. parameters.py: define parameters of Dependent-T5. train_tools.py: traning and evaluation co

1 Apr 21, 2022
Anomaly detection related books, papers, videos, and toolboxes

Anomaly Detection Learning Resources Outlier Detection (also known as Anomaly Detection) is an exciting yet challenging field, which aims to identify

Yue Zhao 6.7k Dec 31, 2022
The FIRST GANs-based omics-to-omics translation framework

OmiTrans Please also have a look at our multi-omics multi-task DL freamwork 👀 : OmiEmbed The FIRST GANs-based omics-to-omics translation framework Xi

Xiaoyu Zhang 6 Dec 14, 2022
Run Keras models in the browser, with GPU support using WebGL

**This project is no longer active. Please check out TensorFlow.js.** The Keras.js demos still work but is no longer updated. Run Keras models in the

Leon Chen 4.9k Dec 29, 2022
A modular, open and non-proprietary toolkit for core robotic functionalities by harnessing deep learning

A modular, open and non-proprietary toolkit for core robotic functionalities by harnessing deep learning Website • About • Installation • Using OpenDR

OpenDR 304 Dec 28, 2022
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

SMU A Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE arXiv https://arxiv.org/abs/211

Fuhang 5 Jan 18, 2022
GPU-Accelerated Deep Learning Library in Python

Hebel GPU-Accelerated Deep Learning Library in Python Hebel is a library for deep learning with neural networks in Python using GPU acceleration with

Hannes Bretschneider 1.2k Dec 21, 2022
Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

advantage-weighted-regression Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning, by Peng et al. (

Omar D. Domingues 1 Dec 02, 2021
Bayesian Deep Learning and Deep Reinforcement Learning for Object Shape Error Response and Correction of Manufacturing Systems

Bayesian Deep Learning for Manufacturing 2.0 (dlmfg) Object Shape Error Response (OSER) Digital Lifecycle Management - In Process Quality Improvement

Sumit Sinha 30 Oct 31, 2022
This repository contains code to run experiments in the paper "Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers."

Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers This repository contains code to run experiments in the paper "Signal Stre

0 Jan 19, 2022
Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Poisson Surface Reconstruction for LiDAR Odometry and Mapping Surfels TSDF Our Approach Table: Qualitative comparison between the different mapping te

Photogrammetry & Robotics Bonn 305 Dec 21, 2022
Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Non-Metric Space Library (NMSLIB) Important Notes NMSLIB is generic but fast, see the results of ANN benchmarks. A standalone implementation of our fa

2.9k Jan 04, 2023
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Real Time Object Detection and Classification using Yolo Algorithm.

Real time Object detection & Classification using YOLO algorithm. Real Time Object Detection and Classification using Yolo Algorithm. What is Object D

Ketan Chawla 1 Apr 17, 2022
A minimalist environment for decision-making in autonomous driving

highway-env A collection of environments for autonomous driving and tactical decision-making tasks An episode of one of the environments available in

Edouard Leurent 1.6k Jan 07, 2023