Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Last update: Dec 28, 2022

Overview

Make-A-Scene - PyTorch

Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/pdf/2203.13131.pdf)

Figure 1. from paper

Note: this is work in progress.

Everyone is happily invited to contribute --> Discord Channel: https://discord.gg/hCRMGRZkC6

We would love to open-source a trained model. The model is a billion parameter model. Training it requires a lot of compute. If anyone can provide computational resources, let us know.

Paper Description:

Make-A-Scene modifies the VQGAN framework. It makes heavy use of using semantic segmentation maps for extra conditioning. This enables more influence on the generation process. Morever, it also conditions on text. The main improvements are the following:

Segmentation condition: separate VQVAE is trained (VQ-SEG) + loss modified to a weighted binary cross entropy. (3.4)
VQGAN training (VQ-IMG) is extended by Face-Loss & Object-Loss (3.3 & 3.5)
Classifier Guidance for the autoregressive transformer (3.7)

Training Pipeline

Figure 6. from paper

What needs to be done?

Refer to the different folders to see details.

Citation

@misc{https://doi.org/10.48550/arxiv.2203.13131,
  doi = {10.48550/ARXIV.2203.13131},
  url = {https://arxiv.org/abs/2203.13131},
  author = {Gafni, Oran and Polyak, Adam and Ashual, Oron and Sheynin, Shelly and Parikh, Devi and Taigman, Yaniv},
  title = {Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Related tags

Overview

Make-A-Scene - PyTorch

Note: this is work in progress.

Paper Description:

Training Pipeline

What needs to be done?

Citation

Owner

Casual GAN Papers

Source for the paper "Universal Activation Function for machine learning"

This code is an unofficial implementation of HiFiSinger.

LSTM model trained on a small dataset of 3000 names written in PyTorch

FANet - Real-time Semantic Segmentation with Fast Attention

NeuPy is a Tensorflow based python library for prototyping and building neural networks

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

Locally Constrained Self-Attentive Sequential Recommendation

Simple converter for deploying Stable-Baselines3 model to TFLite and/or Coral

Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

DropNAS: Grouped Operation Dropout for Differentiable Architecture Search

For IBM Quantum Challenge 2021 (May 20 - 26)

An unsupervised learning framework for depth and ego-motion estimation from monocular videos

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Unofficial implementation of Pix2SEQ

A repository for generating stylized talking 3D and 3D face

SPEAR: Semi suPErvised dAta progRamming

A Transformer-Based Siamese Network for Change Detection

Rewrite ultralytics/yolov5 v6.0 opencv inference code based on numpy, no need to rely on pytorch