Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Last update: Sep 11, 2022

Overview

VQGAN-CLIP-Docker

About

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

This is a stripped and minimal dependency repository for running locally or in production VQGAN+CLIP.

For a Google Colab notebook see the original repository.

Samples

Setup

Clone this repository and cd inside.

git clone https://github.com/kcosta42/VQGAN-CLIP-Docker.git
cd VQGAN-CLIP-Docker

Download a VQGAN model and put it in the ./models folder.

Dataset	Link
ImageNet (f=16), 16384	vqgan_imagenet_f16_16384

For GPU capability, make sure you have CUDA installed on your system (tested with CUDA 11.1+).

6 GB of VRAM is required to generate 256x256 images.
11 GB of VRAM is required to generate 512x512 images.
24 GB of VRAM is required to generate 1024x1024 images. (Untested)

Local

Install the Python requirements

python3 -m pip install -r requirements.txt

To know if you can run this on your GPU, the following command must return True.

python3 -c "import torch; print(torch.cuda.is_available());"

Docker

Make sure you have docker and docker-compose installed. nvidia-docker is needed if you want to run this on your GPU through Docker.

A Makefile is provided for ease of use.

make build  # Build the docker image

Usage

Two configuration file are provided ./configs/local.json and ./configs/docker.json. They are ready to go, but you may want to edit them to meet your need. Check the Configuration section to understand each field.

The resulting generations can be found in the ./outputs folder.

GPU

To run locally:

python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate

CPU

To run locally:

DEVICE=cpu python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate-cpu

Configuration

Argument	Type	Descriptions
`prompts`	List[str]	Text prompts
`image_prompts`	List[FilePath]	Image prompts / target image path
`max_iterations`	int	Number of iterations
`save_freq`	int	Save image iterations
`size`	[int, int]	Image size (width height)
`init_image`	FilePath	Initial image
`init_noise`	str	Initial noise image ['gradient','pixels']
`init_weight`	float	Initial weight
`output_dir`	FilePath	Path to output directory
`models_dir`	FilePath	Path to models cache directory
`clip_model`	FilePath	CLIP model path or name
`vqgan_checkpoint`	FilePath	VQGAN checkpoint path
`vqgan_config`	FilePath	VQGAN config path
`noise_prompt_seeds`	List[int]	Noise prompt seeds
`noise_prompt_weights`	List[float]	Noise prompt weights
`step_size`	float	Learning rate
`cutn`	int	Number of cuts
`cut_pow`	float	Cut power
`seed`	int	Seed (-1 for random seed)
`optimizer`	str	Optimiser ['Adam','AdamW','Adagrad','Adamax','DiffGrad','AdamP','RAdam']
`augments`	List[str]	Enabled augments ['Ji','Sh','Gn','Pe','Ro','Af','Et','Ts','Cr','Er','Re']

Acknowledgments

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis},
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation},
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Related tags

Overview

VQGAN-CLIP-Docker

About

Samples

Setup

Local

Docker

Usage

GPU

CPU

Configuration

Acknowledgments

Citations

Owner

Kevin Costa

TargetAllDomainObjects - A python wrapper to run a command on against all users/computers/DCs of a Windows Domain

Deep Q-Learning Network in pytorch (not actively maintained)

The code for paper "Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation" which is accepted by AAAI 2022

AdamW optimizer and cosine learning rate annealing with restarts

PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

This is Unofficial Repo. Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection (CVPR 2021)

PySOT - SenseTime Research platform for single object tracking, implementing algorithms like SiamRPN and SiamMask.

Data and extra materials for the food safety publications classifier

Real-Time Multi-Contact Model Predictive Control via ADMM

Anchor-free Oriented Proposal Generator for Object Detection

Answering Open-Domain Questions of Varying Reasoning Steps from Text

Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

Personals scripts using ageitgey/face_recognition

A curated list of neural network pruning resources.

This is an official implementation for "Video Swin Transformers".

Like ThreeJS but for Python and based on wgpu

Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.