Text-to-Image Translation (DALL-E) for TPU in Pytorch

Refactoring Taming Transformers and DALLE-pytorch for TPU VM with Pytorch Lightning

Requirements

pip install -r requirements.txt

Data Preparation

Place any image dataset with ImageNet-style directory structure (at least 1 subfolder) to fit the dataset into pytorch ImageFolder.

Training VQVAEs

You can easily test main.py with randomly generated fake data.

python train_vae.py --use_tpus --fake_data

For actual training provide specific directory for train_dir, val_dir, log_dir:

python train_vae.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results]

Training DALL-E

python train_dalle.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results] --vae_path [pretrained vae] --bpe_path [pretrained bpe(optional)]

TODO

Refactor Encoder and Decoder modules for better readability
Refactor VQVAE2
Add Net2Net Conditional Transformer for conditional image generation
Refactor, optimize, and merge DALL-E with Net2Net Conditional Transformer
Add Guided Diffusion + CLIP for image refinement
Add VAE converter for JAX to support dalle-mini
Add DALL-E colab notebook
Add RBGumbelQuantizer
Add HiT

ON-GOING

Test large dataset loading on TPU Pods
Change current DALL-E code to fully support latest updates from DALLE-pytorch

DONE

BibTeX

@misc{oord2018neural,
      title={Neural Discrete Representation Learning}, 
      author={Aaron van den Oord and Oriol Vinyals and Koray Kavukcuoglu},
      year={2018},
      eprint={1711.00937},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{razavi2019generating,
      title={Generating Diverse High-Fidelity Images with VQ-VAE-2}, 
      author={Ali Razavi and Aaron van den Oord and Oriol Vinyals},
      year={2019},
      eprint={1906.00446},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation}, 
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Refactoring dalle-pytorch and taming-transformers for TPU VM

Related tags

Overview

Text-to-Image Translation (DALL-E) for TPU in Pytorch

Requirements

Data Preparation

Training VQVAEs

Training DALL-E

TODO

ON-GOING

DONE

BibTeX

Owner

Kim, Taehoon

Code and data for paper "Deep Photo Style Transfer"

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

Canonical Appearance Transformations

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Adapter-BERT: Parameter-Efficient Transfer Learning for NLP.

PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

Contextual Attention Network: Transformer Meets U-Net

A semantic segmentation toolbox based on PyTorch

PyTorch implementation of the wavelet analysis from Torrence & Compo

SPT_LSA_ViT - Implementation for Visual Transformer for Small-size Datasets

AsymmetricGAN - Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

FastReID is a research platform that implements state-of-the-art re-identification algorithms.

A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

A crash course in six episodes for software developers who want to become machine learning practitioners.

GE2340 project source code without credentials.

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

Compressed Video Action Recognition

ICLR 2021 i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

Code and data accompanying our SVRHM'21 paper.