High-Resolution Image Synthesis with Latent Diffusion Models

Overview

Latent Diffusion Models

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

Model Zoo

Pretrained Autoencoding Models

rec2

Model FID vs val PSNR PSIM Link Comments
f=4, VQ (Z=8192, d=3) 0.58 27.43 +/- 4.26 0.53 +/- 0.21 https://ommer-lab.com/files/latent-diffusion/vq-f4.zip
f=4, VQ (Z=8192, d=3) 1.06 25.21 +/- 4.17 0.72 +/- 0.26 https://heibox.uni-heidelberg.de/f/9c6681f64bb94338a069/?dl=1 no attention
f=8, VQ (Z=16384, d=4) 1.14 23.07 +/- 3.99 1.17 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/vq-f8.zip
f=8, VQ (Z=256, d=4) 1.49 22.35 +/- 3.81 1.26 +/- 0.37 https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip
f=16, VQ (Z=16384, d=8) 5.15 20.83 +/- 3.61 1.73 +/- 0.43 https://heibox.uni-heidelberg.de/f/0e42b04e2e904890a9b6/?dl=1
f=4, KL 0.27 27.53 +/- 4.54 0.55 +/- 0.24 https://ommer-lab.com/files/latent-diffusion/kl-f4.zip
f=8, KL 0.90 24.19 +/- 4.19 1.02 +/- 0.35 https://ommer-lab.com/files/latent-diffusion/kl-f8.zip
f=16, KL (d=16) 0.87 24.08 +/- 4.22 1.07 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/kl-f16.zip
f=32, KL (d=64) 2.04 22.27 +/- 3.93 1.41 +/- 0.40 https://ommer-lab.com/files/latent-diffusion/kl-f32.zip

Get the models

Running the following script downloads und extracts all available pretrained autoencoding models.

bash scripts/download_first_stages.sh

The first stage models can then be found in models/first_stage_models/

Pretrained LDMs

Datset Task Model FID IS Prec Recall Link Comments
CelebA-HQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 5.11 (5.11) 3.29 0.72 0.49 https://ommer-lab.com/files/latent-diffusion/celeba.zip
FFHQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 4.98 (4.98) 4.50 (4.50) 0.73 0.50 https://ommer-lab.com/files/latent-diffusion/ffhq.zip
LSUN-Churches Unconditional Image Synthesis LDM-KL-8 (400 DDIM steps, eta=0) 4.02 (4.02) 2.72 0.64 0.52 https://ommer-lab.com/files/latent-diffusion/lsun_churches.zip
LSUN-Bedrooms Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 2.95 (3.0) 2.22 (2.23) 0.66 0.48 https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip
ImageNet Class-conditional Image Synthesis LDM-VQ-8 (200 DDIM steps, eta=1) 7.77(7.76)* /15.82** 201.56(209.52)* /78.82** 0.84* / 0.65** 0.35* / 0.63** https://ommer-lab.com/files/latent-diffusion/cin.zip *: w/ guiding, classifier_scale 10 **: w/o guiding, scores in bracket calculated with script provided by ADM
Conceptual Captions Text-conditional Image Synthesis LDM-VQ-f4 (100 DDIM steps, eta=0) 16.79 13.89 N/A N/A https://ommer-lab.com/files/latent-diffusion/text2img.zip finetuned from LAION
OpenImages Super-resolution N/A N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip BSR image degradation
OpenImages Layout-to-Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 32.02 15.92 N/A N/A https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip
Landscapes (finetuned 512) Semantic Image Synthesis LDM-VQ-4 (100 DDIM steps, eta=1) N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip

Get the models

The LDMs listed above can jointly be downloaded and extracted via

bash scripts/download_models.sh

The models can then be found in models/ldm/ .

Sampling with unconditional models

We provide a first script for sampling from our unconditional models. Start it via

CUDA_VISIBLE_DEVICES=<GPU_ID> python scripts/sample_diffusion.py -r models/ldm/<model_spec>/model.ckpt -l <logdir> -n <\#samples> --batch_size <batch_size> -c <\#ddim steps> -e <\#eta> 

Coming Soon...

inpainting

Comments

Owner
CompVis Heidelberg
Computer Vision research group at the Ruprecht-Karls-University Heidelberg
CompVis Heidelberg
Implementation of the ICCV'21 paper Temporally-Coherent Surface Reconstruction via Metric-Consistent Atlases

Temporally-Coherent Surface Reconstruction via Metric-Consistent Atlases [Papers 1, 2][Project page] [Video] The implementation of the papers Temporal

56 Nov 21, 2022
CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery

CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery This paper (CoANet) has been published in IEEE TIP 2021. This code i

Jie Mei 53 Dec 03, 2022
NeurIPS 2021, self-supervised 6D pose on category level

SE(3)-eSCOPE video | paper | website Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation Xiaolong Li, Yijia Weng,

Xiaolong 63 Nov 22, 2022
The CLRS Algorithmic Reasoning Benchmark

Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms.

DeepMind 251 Jan 05, 2023
PointPillars inference with TensorRT

A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.

NVIDIA AI IOT 315 Dec 31, 2022
Unified MultiWOZ evaluation scripts for the context-to-response task.

MultiWOZ Context-to-Response Evaluation Standardized and easy to use Inform, Success, BLEU ~ See the paper ~ Easy-to-use scripts for standardized eval

Tomáš Nekvinda 38 Dec 13, 2022
Pytorch Implementation of Residual Vision Transformers(ResViT)

ResViT Official Pytorch Implementation of Residual Vision Transformers(ResViT) which is described in the following paper: Onat Dalmaz and Mahmut Yurt

ICON Lab 41 Dec 08, 2022
Simple image captioning model - CLIP prefix captioning.

Simple image captioning model - CLIP prefix captioning.

688 Jan 04, 2023
The devkit of the nuScenes dataset.

nuScenes devkit Welcome to the devkit of the nuScenes and nuImages datasets. Overview Changelog Devkit setup nuImages nuImages setup Getting started w

Motional 1.6k Jan 05, 2023
A system for quickly generating training data with weak supervision

Programmatically Build and Manage Training Data Announcement The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI applicat

Snorkel Team 5.4k Jan 02, 2023
An image processing project uses Viola-jones technique to detect faces and then use SIFT algorithm for recognition.

Attendance_System An image processing project uses Viola-jones technique to detect faces and then use LPB algorithm for recognition. Face Detection Us

8 Jan 11, 2022
Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification

S-multi-SNE Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification A repository containing the code to reproduce the findings

Theodoulos Rodosthenous 3 Apr 15, 2022
training script for space time memory network

Trainig Script for Space Time Memory Network This codebase implemented training code for Space Time Memory Network with some cyclic features. Requirem

Yuxi Li 100 Dec 20, 2022
StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking

StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking Datasets You can download datasets that have been pre-pr

25 May 29, 2022
a short visualisation script for pyvideo data

PyVideo Speakers A CLI that visualises repeat speakers from events listed in https://github.com/pyvideo/data Not terribly efficient, but you know. Ins

Katie McLaughlin 3 Nov 24, 2021
The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

SGRAF PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”. It is built on top of the SCAN and C

Ronnie_IIAU 149 Dec 22, 2022
This is a collection of all challenges in HKCERT CTF 2021

香港網絡保安新生代奪旗挑戰賽 2021 (HKCERT CTF 2021) This is a collection of all challenges (and writeups) in HKCERT CTF 2021 Challenges ID Chinese name Name Score S

10 Jan 27, 2022
Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

Edge Impulse 8 Nov 02, 2022
Fully Convolutional Refined Auto Encoding Generative Adversarial Networks for 3D Multi Object Scenes

Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks for 3D Multi Object Scenes This repository contains the source code for Full

Yu Nishimura 106 Nov 21, 2022
The BCNet related data and inference model.

BCNet This repository includes the some source code and related dataset of paper BCNet: Learning Body and Cloth Shape from A Single Image, ECCV 2020,

81 Dec 12, 2022