Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Overview

VQGAN-CLIP-Docker

About

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

This is a stripped and minimal dependency repository for running locally or in production VQGAN+CLIP.

For a Google Colab notebook see the original repository.

Samples

Setup

Clone this repository and cd inside.

git clone https://github.com/kcosta42/VQGAN-CLIP-Docker.git
cd VQGAN-CLIP-Docker

Download a VQGAN model and put it in the ./models folder.

Dataset Link
ImageNet (f=16), 16384 vqgan_imagenet_f16_16384

For GPU capability, make sure you have CUDA installed on your system (tested with CUDA 11.1+).

  • 6 GB of VRAM is required to generate 256x256 images.
  • 11 GB of VRAM is required to generate 512x512 images.
  • 24 GB of VRAM is required to generate 1024x1024 images. (Untested)

Local

Install the Python requirements

python3 -m pip install -r requirements.txt

To know if you can run this on your GPU, the following command must return True.

python3 -c "import torch; print(torch.cuda.is_available());"

Docker

Make sure you have docker and docker-compose installed. nvidia-docker is needed if you want to run this on your GPU through Docker.

A Makefile is provided for ease of use.

make build  # Build the docker image

Usage

Two configuration file are provided ./configs/local.json and ./configs/docker.json. They are ready to go, but you may want to edit them to meet your need. Check the Configuration section to understand each field.

The resulting generations can be found in the ./outputs folder.

GPU

To run locally:

python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate

CPU

To run locally:

DEVICE=cpu python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate-cpu

Configuration

Argument Type Descriptions
prompts List[str] Text prompts
image_prompts List[FilePath] Image prompts / target image path
max_iterations int Number of iterations
save_freq int Save image iterations
size [int, int] Image size (width height)
init_image FilePath Initial image
init_noise str Initial noise image ['gradient','pixels']
init_weight float Initial weight
output_dir FilePath Path to output directory
models_dir FilePath Path to models cache directory
clip_model FilePath CLIP model path or name
vqgan_checkpoint FilePath VQGAN checkpoint path
vqgan_config FilePath VQGAN config path
noise_prompt_seeds List[int] Noise prompt seeds
noise_prompt_weights List[float] Noise prompt weights
step_size float Learning rate
cutn int Number of cuts
cut_pow float Cut power
seed int Seed (-1 for random seed)
optimizer str Optimiser ['Adam','AdamW','Adagrad','Adamax','DiffGrad','AdamP','RAdam']
augments List[str] Enabled augments ['Ji','Sh','Gn','Pe','Ro','Af','Et','Ts','Cr','Er','Re']

Acknowledgments

VQGAN+CLIP

Taming Transformers

CLIP

DALLE-PyTorch

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}
@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis},
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation},
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
Owner
Kevin Costa
Machine Learning Engineer. Previously Student @ 42 Paris
Kevin Costa
TargetAllDomainObjects - A python wrapper to run a command on against all users/computers/DCs of a Windows Domain

TargetAllDomainObjects A python wrapper to run a command on against all users/co

Podalirius 19 Dec 13, 2022
Deep Q-Learning Network in pytorch (not actively maintained)

pytoch-dqn This project is pytorch implementation of Human-level control through deep reinforcement learning and I also plan to implement the followin

Hung-Tu Chen 342 Jan 01, 2023
The code for paper "Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation" which is accepted by AAAI 2022

Contrastive Spatio Temporal Pretext Learning for Self-supervised Video Representation (AAAI 2022) The code for paper "Contrastive Spatio-Temporal Pret

8 Jun 30, 2022
AdamW optimizer and cosine learning rate annealing with restarts

AdamW optimizer and cosine learning rate annealing with restarts This repository contains an implementation of AdamW optimization algorithm and cosine

Maksym Pyrozhok 133 Dec 20, 2022
PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

Anti-Backdoor Learning PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data. The Anti-Backdoor Learning

Yige-Li 51 Dec 07, 2022
The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

SCINet This is the original PyTorch implementation of the following work: Time Series is a Special Sequence: Forecasting with Sample Convolution and I

386 Jan 01, 2023
This is Unofficial Repo. Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection (CVPR 2021)

Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection This is a PyTorch implementation of the LipForensics paper. This is an U

Minha Kim 2 May 11, 2022
PySOT - SenseTime Research platform for single object tracking, implementing algorithms like SiamRPN and SiamMask.

PySOT is a software system designed by SenseTime Video Intelligence Research team. It implements state-of-the-art single object tracking algorit

STVIR 4.1k Dec 29, 2022
Data and extra materials for the food safety publications classifier

Data and extra materials for the food safety publications classifier The subdirectories contain detailed descriptions of their contents in the README.

1 Jan 20, 2022
Real-Time Multi-Contact Model Predictive Control via ADMM

Here, you can find the code for the paper 'Real-Time Multi-Contact Model Predictive Control via ADMM'. Code is currently being cleared up and optimize

17 Dec 28, 2022
Anchor-free Oriented Proposal Generator for Object Detection

Anchor-free Oriented Proposal Generator for Object Detection Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han, Intro

jbwang1997 56 Nov 15, 2022
Answering Open-Domain Questions of Varying Reasoning Steps from Text

This repository contains the authors' implementation of the Iterative Retriever, Reader, and Reranker (IRRR) model in the EMNLP 2021 paper "Answering Open-Domain Questions of Varying Reasoning Steps

26 Dec 22, 2022
Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

PyGAS: Auto-Scaling GNNs in PyG PyGAS is the practical realization of our G NN A uto S cale (GAS) framework, which scales arbitrary message-passing GN

Matthias Fey 139 Dec 25, 2022
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

Wei-Ning Hsu 21 Aug 23, 2022
Personals scripts using ageitgey/face_recognition

HOW TO USE pip3 install requirements.txt Add some pictures of known people in the folder 'people' : a) Create a folder called by the name of the perso

Antoine Bollengier 1 Jan 06, 2022
A curated list of neural network pruning resources.

A curated list of neural network pruning and related resources. Inspired by awesome-deep-vision, awesome-adversarial-machine-learning, awesome-deep-learning-papers and Awesome-NAS.

Yang He 1.7k Jan 09, 2023
This is an official implementation for "Video Swin Transformers".

Video Swin Transformer By Ze Liu*, Jia Ning*, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin and Han Hu. This repo is the official implementation of "V

Swin Transformer 981 Jan 03, 2023
Like ThreeJS but for Python and based on wgpu

pygfx A render engine, inspired by ThreeJS, but for Python and targeting Vulkan/Metal/DX12 (via wgpu). Introduction This is a Python render engine bui

139 Jan 07, 2023
Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Refer-it-in-RGBD This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021 Pape

Haolin Liu 34 Nov 07, 2022
Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.

Think Bayes 2 by Allen B. Downey The HTML version of this book is here. Think Bayes is an introduction to Bayesian statistics using computational meth

Allen Downey 1.5k Jan 08, 2023