EdiBERT, a generative model for image editing

Last update: Dec 07, 2022

Related tags

Overview

EdiBERT, a generative model for image editing

EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation. The same EdiBERT model, derived from a single training, can be used on a wide variety of tasks.

We follow the implementation of Taming-Transformers (https://github.com/CompVis/taming-transformers). Main modifications can be found in: taming/models/bert_transformer.py ; scripts/sample_mask_likelihood_maximization.py.

Requirements

A suitable conda environment named edibert can be created and activated with:

conda env create -f environment.yaml
conda activate edibert

FFHQ

Download FFHQ dataset (https://github.com/NVlabs/ffhq-dataset) and put it into data/ffhq/.

Training BERT

In the logs/ folder, download and extract the FFHQ VQGAN:

gdown --id '1P_wHLRfdzf1DjsAH_tG10GXk9NKEZqTg'
tar -xvzf 2021-04-23T18-19-01_ffhq_vqgan.tar.gz

Training on 1 GPUs:

python main.py --base configs/ffhq_transformer_bert_2D.yaml -t True --gpus 0,

Training on 2 GPUs:

python main.py --base configs/ffhq_transformer_bert_2D.yaml -t True --gpus 0,1

Running pre-trained BERT on composite/scribble-edited images

In the logs/ folder, download and extract the FFHQ VQGAN:

gdown --id '1P_wHLRfdzf1DjsAH_tG10GXk9NKEZqTg'
tar -xvzf 2021-04-23T18-19-01_ffhq_vqgan.tar.gz

In the logs/ folder, download and extract the FFHQ BERT:

gdown --id '1YGDd8XyycKgBp_whs9v1rkYdYe4Oxfb3'
tar -xvzf 2021-10-14T16-32-28_ffhq_transformer_bert_2D.tar.gz

folders and place them into logs.

Then, launch the following script for composite images:

python scripts/sample_mask_likelihood_maximization.py -r logs/2021-10-14T16-32-28_ffhq_transformer_bert_2D/checkpoints/epoch=000019.ckpt \
--image_folder data/ffhq_collages/ --mask_folder data/ffhq_collages_masks/ --image_list data/ffhq_collages.txt --keep_img \
--dilation_sampling 1 -k 100 -t 1.0 --batch_size 5 --bert --epochs 2  \
--device 0 --random_order \
--mask_collage --collage_frequency 3 --gaussian_smoothing_collage

Then, launch the following script for edits images:

python scripts/sample_mask_likelihood_maximization.py -r logs/2021-10-14T16-32-28_ffhq_transformer_bert_2D/checkpoints/epoch=000019.ckpt \
--image_folder data/ffhq_edits/ --mask_folder data/ffhq_edits_masks/ --image_list data/ffhq_edits.txt --keep_img \
--dilation_sampling 1 -k 100 -t 1.0 --batch_size 5 --bert --epochs 2  \
--device 0 --random_order \
--mask_collage --collage_frequency 3 --gaussian_smoothing_collage

The samples can then be found in logs/my_model/samples/. Here, the --batch_size argument corresponds to the number of EdiBERT generations per image.

Notebooks for playing with completion/denoising with BERT

Notebooks for image denoising and image inpainting can also be found in the main folder.

EdiBERT, a generative model for image editing

Related tags

Overview

EdiBERT, a generative model for image editing

Requirements

FFHQ

Training BERT

Running pre-trained BERT on composite/scribble-edited images

Notebooks for playing with completion/denoising with BERT

Owner

PyTorch implementation of PP-LCNet

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Kaggle Ultrasound Nerve Segmentation competition [Keras]

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

Türkiye Canlı Mobese Görüntülerinde Profesyonel Nesne Takip Sistemi

ICLR2021 (Under Review)

JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

Vision-and-Language Navigation in Continuous Environments using Habitat

Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Python tools for 3D face: 3DMM, Mesh processing(transform, camera, light, render), 3D face representations.

An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.

SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

TiP-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

A large-image collection explorer and fast classification tool

DenseNet Implementation in Keras with ImageNet Pretrained Models

This repository provides the official code for GeNER (an automated dataset Generation framework for NER).

Human annotated noisy labels for CIFAR-10 and CIFAR-100.