Text to image synthesis using thought vectors

Overview

Text To Image Synthesis Using Thought Vectors

Join the chat at https://gitter.im/text-to-image/Lobby

This is an experimental tensorflow implementation of synthesizing images from captions using Skip Thought Vectors. The images are synthesized using the GAN-CLS Algorithm from the paper Generative Adversarial Text-to-Image Synthesis. This implementation is built on top of the excellent DCGAN in Tensorflow. The following is the model architecture. The blue bars represent the Skip Thought Vectors for the captions.

Model architecture

Image Source : Generative Adversarial Text-to-Image Synthesis Paper

Requirements

Datasets

  • All the steps below for downloading the datasets and models can be performed automatically by running python download_datasets.py. Several gigabytes of files will be downloaded and extracted.
  • The model is currently trained on the flowers dataset. Download the images from this link and save them in Data/flowers/jpg. Also download the captions from this link. Extract the archive, copy the text_c10 folder and paste it in Data/flowers.
  • Download the pretrained models and vocabulary for skip thought vectors as per the instructions given here. Save the downloaded files in Data/skipthoughts.
  • Make empty directories in Data, Data/samples, Data/val_samples and Data/Models. They will be used for sampling the generated images and saving the trained models.

Usage

  • Data Processing : Extract the skip thought vectors for the flowers data set using :
python data_loader.py --data_set="flowers"
  • Training

    • Basic usage python train.py --data_set="flowers"
    • Options
      • z_dim: Noise Dimension. Default is 100.
      • t_dim: Text feature dimension. Default is 256.
      • batch_size: Batch Size. Default is 64.
      • image_size: Image dimension. Default is 64.
      • gf_dim: Number of conv in the first layer generator. Default is 64.
      • df_dim: Number of conv in the first layer discriminator. Default is 64.
      • gfc_dim: Dimension of gen untis for for fully connected layer. Default is 1024.
      • caption_vector_length: Length of the caption vector. Default is 1024.
      • data_dir: Data Directory. Default is Data/.
      • learning_rate: Learning Rate. Default is 0.0002.
      • beta1: Momentum for adam update. Default is 0.5.
      • epochs: Max number of epochs. Default is 600.
      • resume_model: Resume training from a pretrained model path.
      • data_set: Data Set to train on. Default is flowers.
  • Generating Images from Captions

    • Write the captions in text file, and save it as Data/sample_captions.txt. Generate the skip thought vectors for these captions using:
    python generate_thought_vectors.py --caption_file="Data/sample_captions.txt"
    
    • Generate the Images for the thought vectors using:
    python generate_images.py --model_path=<path to the trained model> --n_images=8
    

    n_images specifies the number of images to be generated per caption. The generated images will be saved in Data/val_samples/. python generate_images.py --help for more options.

Sample Images Generated

Following are the images generated by the generative model from the captions.

Caption Generated Images
the flower shown has yellow anther red pistil and bright red petals
this flower has petals that are yellow, white and purple and has dark lines
the petals on this flower are white with a yellow center
this flower has a lot of small round pink petals.
this flower is orange in color, and has petals that are ruffled and rounded.
the flower has yellow petals and the center of it is brown

Implementation Details

  • Only the uni-skip vectors from the skip thought vectors are used. I have not tried training the model with combine-skip vectors.
  • The model was trained for around 200 epochs on a GPU. This took roughly 2-3 days.
  • The images generated are 64 x 64 in dimension.
  • While processing the batches before training, the images are flipped horizontally with a probability of 0.5.
  • The train-val split is 0.75.

Pre-trained Models

  • Download the pretrained model from here and save it in Data/Models. Use this path for generating the images.

TODO

  • Train the model on the MS-COCO data set, and generate more generic images.
  • Try different embedding options for captions(other than skip thought vectors). Also try to train the caption embedding RNN along with the GAN-CLS model.

References

Alternate Implementations

License

MIT

Owner
Paarth Neekhara
PhD student, Computer Science, UCSD
Paarth Neekhara
Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

COResets and Data Subset selection Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order

decile-team 244 Jan 09, 2023
Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite.

TFLite-HITNET-Stereo-depth-estimation Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite. Stereo depth e

Ibai Gorordo 22 Oct 20, 2022
Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images [ICCV 2021] © Mahmood Lab - This code is made avail

Mahmood Lab @ Harvard/BWH 63 Dec 01, 2022
System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Validating Simulations of User Query Variants This repository contains the scripts of the experiments and evaluations, simulated queries, as well as t

IR Group at Technische Hochschule Köln 2 Nov 23, 2022
Supplementary code for TISMIR paper "Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form"

Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form This is supplementary code for the TISMIR paper Sliding-Window Pitch-Class H

1 Nov 27, 2021
TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain Gait Recognition.

TraND This is the code for the paper "Jinkai Zheng, Xinchen Liu, Chenggang Yan, Jiyong Zhang, Wu Liu, Xiaoping Zhang and Tao Mei: TraND: Transferable

Jinkai Zheng 32 Apr 04, 2022
A universal framework for learning timestamp-level representations of time series

TS2Vec This repository contains the official implementation for the paper Learning Timestamp-Level Representations for Time Series with Hierarchical C

Zhihan Yue 284 Dec 30, 2022
Pytorch implementation of forward and inverse Haar Wavelets 2D

Pytorch implementation of forward and inverse Haar Wavelets 2D

Sergei Belousov 9 Oct 30, 2022
official implementation for the paper "Simplifying Graph Convolutional Networks"

Simplifying Graph Convolutional Networks Updates As pointed out by #23, there was a subtle bug in our preprocessing code for the reddit dataset. After

Tianyi 727 Jan 01, 2023
Python3 / PyTorch implementation of the following paper: Fine-grained Semantics-aware Representation Enhancement for Self-supervisedMonocular Depth Estimation. ICCV 2021 (oral)

FSRE-Depth This is a Python3 / PyTorch implementation of FSRE-Depth, as described in the following paper: Fine-grained Semantics-aware Representation

77 Dec 28, 2022
Code for reproducing experiments in "Improved Training of Wasserstein GANs"

Improved Training of Wasserstein GANs Code for reproducing experiments in "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, Tensor

Ishaan Gulrajani 2.2k Jan 01, 2023
KAPAO is an efficient multi-person human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

KAPAO (Keypoints and Poses as Objects) KAPAO is an efficient single-stage multi-person human pose estimation model that models keypoints and poses as

Will McNally 664 Dec 30, 2022
Speedy Implementation of Instance-based Learning (IBL) agents in Python

A Python library to create single or multi Instance-based Learning (IBL) agents that are built based on Instance Based Learning Theory (IBLT) 1 Instal

0 Nov 18, 2021
This repository includes code of my study about Asynchronous in Frequency domain of GAN images.

Exploring the Asynchronous of the Frequency Spectra of GAN-generated Facial Images Binh M. Le & Simon S. Woo, "Exploring the Asynchronous of the Frequ

4 Aug 06, 2022
🏅 Top 5% in 제2회 연구개발특구 인공지능 경진대회 AI SPARK 챌린지

AI_SPARK_CHALLENG_Object_Detection 제2회 연구개발특구 인공지능 경진대회 AI SPARK 챌린지 🏅 Top 5% in mAP(0.75) (443명 중 13등, mAP: 0.98116) 대회 설명 Edge 환경에서의 가축 Object Dete

3 Sep 19, 2022
Code for “ACE-HGNN: Adaptive Curvature ExplorationHyperbolic Graph Neural Network”

ACE-HGNN: Adaptive Curvature Exploration Hyperbolic Graph Neural Network This repository is the implementation of ACE-HGNN in PyTorch. Environment pyt

9 Nov 28, 2022
The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.

FCPS Fundamental Clustering Problems Suite The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning pu

9 Nov 27, 2022
Convert Mission Planner (ArduCopter) Waypoint Missions to Litchi CSV Format to execute on DJI Drones

Mission Planner to Litchi Convert Mission Planner (ArduCopter) Waypoint Surveys to Litchi CSV Format to execute on DJI Drones Litchi doesn't support S

Yaros 24 Dec 09, 2022
TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020)

TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020) About The goal of our research problem is illustrated below: give

59 Dec 09, 2022
SegNet-Basic with Keras

SegNet-Basic: What is Segnet? Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-wise Image Segmentation Segnet = (Encoder + Decoder)

Yad Konrad 81 Jun 30, 2022