Refactoring dalle-pytorch and taming-transformers for TPU VM

Overview

Text-to-Image Translation (DALL-E) for TPU in Pytorch

Refactoring Taming Transformers and DALLE-pytorch for TPU VM with Pytorch Lightning

Requirements

pip install -r requirements.txt

Data Preparation

Place any image dataset with ImageNet-style directory structure (at least 1 subfolder) to fit the dataset into pytorch ImageFolder.

Training VQVAEs

You can easily test main.py with randomly generated fake data.

python train_vae.py --use_tpus --fake_data

For actual training provide specific directory for train_dir, val_dir, log_dir:

python train_vae.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results]

Training DALL-E

python train_dalle.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results] --vae_path [pretrained vae] --bpe_path [pretrained bpe(optional)]

TODO

  • Refactor Encoder and Decoder modules for better readability
  • Refactor VQVAE2
  • Add Net2Net Conditional Transformer for conditional image generation
  • Refactor, optimize, and merge DALL-E with Net2Net Conditional Transformer
  • Add Guided Diffusion + CLIP for image refinement
  • Add VAE converter for JAX to support dalle-mini
  • Add DALL-E colab notebook
  • Add RBGumbelQuantizer
  • Add HiT

ON-GOING

  • Test large dataset loading on TPU Pods
  • Change current DALL-E code to fully support latest updates from DALLE-pytorch

DONE

  • Add VQVAE, VQGAN, and Gumbel VQVAE(Discrete VAE), Gumbel VQGAN
  • Add VQVAE2
  • Add EMA update for Vector Quantization
  • Debug VAEs (Single TPU Node, TPU Pods, GPUs)
  • Resolve SIGSEGV issue with large TPU Pods pytorch-xla #3028
  • Add DALL-E
  • Debug DALL-E (Single TPU Node, TPU Pods, GPUs)
  • Add WebDataset support
  • Add VAE Image Logger by modifying pl_bolts TensorboardGenerativeModelImageSampler()
  • Add DALLE Image Logger by modifying pl_bolts TensorboardGenerativeModelImageSampler()
  • Add automatic checkpoint saver and resume for sudden (which happens a lot) TPU restart
  • Reimplement EMA VectorQuantizer with nn.Embedding
  • Add DALL-E colab notebook by afiaka87
  • Add Normed Vector Quantizer by GallagherCommaJack
  • Resolve SIGSEGV issue with large TPU Pods pytorch-xla #3068
  • Debug WebDataset functionality

BibTeX

@misc{oord2018neural,
      title={Neural Discrete Representation Learning}, 
      author={Aaron van den Oord and Oriol Vinyals and Koray Kavukcuoglu},
      year={2018},
      eprint={1711.00937},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
@misc{razavi2019generating,
      title={Generating Diverse High-Fidelity Images with VQ-VAE-2}, 
      author={Ali Razavi and Aaron van den Oord and Oriol Vinyals},
      year={2019},
      eprint={1906.00446},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation}, 
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
Owner
Kim, Taehoon
Research Scientist & Machine Learning Engineer.
Kim, Taehoon
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch

Memory Efficient Attention This is unofficial implementation of Self-attention Does Not Need O(n^2) Memory for Jax and PyTorch. Implementation is almo

Amin Rezaei 126 Dec 27, 2022
M2MRF: Many-to-Many Reassembly of Features for Tiny Lesion Segmentation in Fundus Images

M2MRF: Many-to-Many Reassembly of Features for Tiny Lesion Segmentation in Fundus Images This repo is the official implementation of paper "M2MRF: Man

12 Dec 14, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
DNA-RECON { Automatic Web Reconnaissance Tool }

ABOUT TOOL : DNA-RECON is an automatic web reconnaissance tool written in python. This tool made for reconnaissance and information gathering with an

NIKUNJ BHATT 25 Aug 11, 2021
A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

DensePose: Dense Human Pose Estimation In The Wild Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos [densepose.org] [arXiv] [BibTeX] Dense human pos

Meta Research 6.4k Jan 01, 2023
A full pipeline AutoML tool for tabular data

HyperGBM Doc | 中文 We Are Hiring! Dear folks,we are offering challenging opportunities located in Beijing for both professionals and students who are k

DataCanvas 240 Jan 03, 2023
This repository contains the implementation of the following paper: Cross-Descriptor Visual Localization and Mapping

Cross-Descriptor Visual Localization and Mapping This repository contains the implementation of the following paper: "Cross-Descriptor Visual Localiza

Mihai Dusmanu 81 Oct 06, 2022
Elastic weight consolidation technique for incremental learning.

Overcoming-Catastrophic-forgetting-in-Neural-Networks Elastic weight consolidation technique for incremental learning. About Use this API if you dont

Shivam Saboo 89 Dec 22, 2022
Train emoji embeddings based on emoji descriptions.

emoji2vec This is my attempt to train, visualize and evaluate emoji embeddings as presented by Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko

Miruna Pislar 17 Sep 03, 2022
DeepCAD: A Deep Generative Network for Computer-Aided Design Models

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows

Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows This is the official implementation of the ICCV 2021 Paper "Probabilistic Mono

62 Nov 23, 2022
공공장소에서 눈만 돌리면 CCTV가 보인다는 말이 과언이 아닐 정도로 CCTV가 우리 생활에 깊숙이 자리 잡았습니다.

ObsCare_Main 소개 공공장소에서 눈만 돌리면 CCTV가 보인다는 말이 과언이 아닐 정도로 CCTV가 우리 생활에 깊숙이 자리 잡았습니다. CCTV의 대수가 급격히 늘어나면서 관리와 효율성 문제와 더불어, 곳곳에 설치된 CCTV를 개별 관제하는 것으로는 응급 상

5 Jul 07, 2022
Dense Gaussian Processes for Few-Shot Segmentation

DGPNet - Dense Gaussian Processes for Few-Shot Segmentation Welcome to the public repository for DGPNet. The paper is available at arxiv: https://arxi

37 Jan 07, 2023
An Open-Source Package for Information Retrieval.

OpenMatch An Open-Source Package for Information Retrieval. 😃 What's New Top Spot on TREC-COVID Challenge (May 2020, Round2) The twin goals of the ch

THUNLP 439 Dec 27, 2022
Introducing neural networks to predict stock prices

IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o

Vivek Palaniappan 637 Jan 04, 2023
A pure PyTorch implementation of the loss described in "Online Segment to Segment Neural Transduction"

ssnt-loss ℹ️ This is a WIP project. the implementation is still being tested. A pure PyTorch implementation of the loss described in "Online Segment t

張致強 1 Feb 09, 2022
Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

Peidong Liu(刘沛东) 54 Dec 17, 2022
EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

EquiBind: geometric deep learning for fast predictions of the 3D structure in which a small molecule binds to a protein

Hannes Stärk 355 Jan 03, 2023
The code of paper 'Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection'

Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection Pytorch implemetation of paper 'Learning to Aggregate and Personalize

Tencent YouTu Research 136 Dec 29, 2022
PyTorch implementation of PSPNet

PSPNet with PyTorch Unofficial implementation of "Pyramid Scene Parsing Network" (https://arxiv.org/abs/1612.01105). This repository is just for caffe

Kazuto Nakashima 52 Nov 16, 2022