v objective diffusion inference code for PyTorch.

Last update: Dec 30, 2022

Related tags

Overview

v-diffusion-pytorch

v objective diffusion inference code for PyTorch, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman).

The models are denoising diffusion probabilistic models (https://arxiv.org/abs/2006.11239), which are trained to reverse a gradual noising process, allowing the models to generate samples from the learned data distributions starting from random noise. DDIM-style deterministic sampling (https://arxiv.org/abs/2010.02502) is also supported. The models are also trained on continuous timesteps. They use the 'v' objective from Progressive Distillation for Fast Sampling of Diffusion Models (https://openreview.net/forum?id=TIdIXIpzhoI).

Thank you to stability.ai for compute to train these models!

Dependencies

PyTorch (installation instructions)
requests, tqdm (install with pip install)
CLIP (https://github.com/openai/CLIP), and its additional pip-installable dependencies: ftfy, regex. If you git clone --recursive this repo, it should fetch CLIP automatically.

Model checkpoints:

CC12M 256x256, SHA-256 63946d1f6a1cb54b823df818c305d90a9c26611e594b5f208795864d5efe0d1f

A 602M parameter CLIP conditioned model trained on Conceptual 12M for 3.1M steps.

Sampling

Example

If the model checkpoints are stored in checkpoints/, the following will generate an image:

./clip_sample.py "the rise of consciousness" --model cc12m_1 --seed 0

If they are somewhere else, you need to specify the path to the checkpoint with --checkpoint.

CLIP conditioned/guided sampling

usage: clip_sample.py [-h] [--images [IMAGE ...]] [--batch-size BATCH_SIZE]
                      [--checkpoint CHECKPOINT] [--clip-guidance-scale CLIP_GUIDANCE_SCALE]
                      [--device DEVICE] [--eta ETA] [--model {cc12m_1}] [-n N] [--seed SEED]
                      [--steps STEPS]
                      [prompts ...]

prompts: the text prompts to use. Relative weights for text prompts can be specified by putting the weight after a colon, for example: "the rise of consciousness:0.5".

--batch-size: sample this many images at a time (default 1)

--checkpoint: manually specify the model checkpoint file

--clip-guidance-scale: how strongly the result should match the text prompt (default 500). If set to 0, the cc12m_1 model will still be CLIP conditioned and sampling will go faster and use less memory.

--device: the PyTorch device name to use (default autodetects)

--eta: set to 0 for deterministic (DDIM) sampling, 1 (the default) for stochastic (DDPM) sampling, and in between to interpolate between the two. DDIM is preferred for low numbers of timesteps.

--images: the image prompts to use (local files or HTTP(S) URLs). Relative weights for image prompts can be specified by putting the weight after a colon, for example: "image_1.png:0.5".

--model: specify the model to use (default cc12m_1)

-n: sample until this many images are sampled (default 1)

--seed: specify the random seed (default 0)

--steps: specify the number of diffusion timesteps (default is 1000, can lower for faster but lower quality sampling)

Comments

Generated images are completely black?! 😵 What am I doing wrong?
Hello, I am on Windows 10, and my gpu is a PNY Nvidia GTX 1660 TI 6 Gb. I installed V-Diffusion like so:

conda create --name v-diffusion python=3.8

conda activate v-diffusion

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch (as per Pytorch website instructions)

pip install requests tqdm

The problem is that when I launch the cfg_sample.py or clip_sample.py command lines, the generated images are completely black, although the inference process seems to run nicely and without errors.

Things I've tried:

installing previous pytorch version with conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

removing V-Diffusion conda environment completely and recreating it anew

uninstalling nvidia drivers and performing a new clean driver install (I tried both Nvidia Studio drivers and Nvidia Game Ready drivers)

uninstalling and reinstalling Conda completely

But nothing helped... and at this point I don't know what else to try...

The only interesting piece of information I could gather is that for some reason this problem also happens with another text-to-image project called Big Sleep where similar to V-Diffusion the inference process appears to run correctly but the generated images are all black.

I think there must be some simple detail I'm overlooking... which it's making me go insane... 😵 Please let me know something if you think you can help! THANKS !
opened by illtellyoulater 10
what does this line mean in README?

A weight of 1 will sample images that match the prompt roughly as well as images usually match prompts like that in the training set.

I can't wrap my head around this sentence. Could you please explain it with different wording? Thanks!

opened by illtellyoulater 2
AttributeError: module 'torch' has no attribute 'special'

torch version: 1.8.1+cu111

python ./cfg_sample.py "the rise of consciousness":5 -n 4 -bs 4 --seed 0 Using device: cuda:0 Traceback (most recent call last): File "./cfg_sample.py", line 154, in main() File "./cfg_sample.py", line 148, in main run_all(args.n, args.batch_size) File "./cfg_sample.py", line 136, in run_all steps = utils.get_spliced_ddpm_cosine_schedule(t) File "C:\Users\m\Desktop\v-diffusion-pytorch\diffusion\utils.py", line 75, in get_spliced_ddpm_cosine_schedule ddpm_part = get_ddpm_schedule(big_t + ddpm_crossover - cosine_crossover) File "C:\Users\m\Desktop\v-diffusion-pytorch\diffusion\utils.py", line 65, in get_ddpm_schedule log_snr = -torch.special.expm1(1e-4 + 10 * ddpm_t**2).log() AttributeError: module 'torch' has no attribute 'special'

opened by tempdeltavalue 2
Add github action to automatically push to pypi on Release x.y.z commit

you need to create a token there https://pypi.org/manage/account/token/ and put it in there https://github.com/crowsonkb/v-diffusion-pytorch/settings/secrets/actions/new name it PYPI_PASSWORD

The release will be triggered when you name your commit Release x.y.z I advise to change the version in setup.cfg in that commit

opened by rom1504 0
[Question] What's the meaning of these equations in sample and cfg_model_fn(from sample.py )
Hello, thank you for your great work! I have a little puzzle in sample.py `# Get the model output (v, the predicted velocity) with torch.cuda.amp.autocast(): v = model(x, ts * steps[i], **extra_args).float()

# Predict the noise and the denoised image pred = x * alphas[i] - v * sigmas[i] eps = x * sigmas[i] + v * alphas[i]`

what the meaning ? Where it comes?
opened by zhangquanwei962 0
Images don’t seem to evolve with each iteration

Thanks for sharing such an amazing repo!

I am testing a prompt like openAI “an astronaut riding a horse in a photorealistic style” to compare. But somehow the iterations seems to be stuck on the same image.

This is my first test, so could very likely be that I am doing something wrong. Results and settings attached bellow…

opened by alelordelo 0
[Question] Questions about `zero_embed` and `weights`
Thanks for this great work. I'm recently interested in using diffusion model to generate images iteratively. I found your script cfg_sample.py was a nice implementation and I decided to learn from it. However, because I'm new in this field, I've encountered some problems quite hard to understand for me. It'd be great if some hints/suggestions are provided. Thank you!! My questions are listed below. They're about the script cfg_sample.py.

I noticed in the codes, we've used zero_embed as one of the features for conditioning. What is the purpose of using it? Is it designed to allow the case of no prompt for input?

I also notice that the weight of zero_embed is computed as 1 - sum(weights), I think the 1 is to make them sum to one, but actually the weight of zero_embed could be a negative number, should weights be normalized before all the intermediate noise maps are weighted?

Thanks very much!!
opened by Karbo123 4
Metrics on WikiArt model

Hi!

I wanted to thank you for your work, especially since without you DiscoDiffusion wouldn't exist !

Still, I was wondering if you had the metrics (Precision, Recall, FID and Inception Score) on the 256x256 WikiArt model ?

opened by Maxim-Durand 0
Any idea on how to attach a clip model to a 64x64 unconditional model from openai/improved-diffusion?

Hey! love your work and been following your stuff for a while. I have finetuned a 64x64 unconditional model from openai/improved diffusion. checkpoint

I was curious if you could lend any insight on how to connect CLIP guidance to my model? I have tried repurposing your notebook (https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj#scrollTo=1YwMUyt9LHG1) however past 100 steps, my models seems to unconverge.

I think perhaps because there is too much noise being added for the smaller image size? How might i fix this?

opened by DeepTitan 0

Releases(v0.0.2)

v0.0.2(Nov 20, 2022)

Source code(tar.gz)
Source code(zip)

Owner

Katherine Crowson

AI/generative artist.

GitHub Repository

Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

2 Dec 16, 2022

Bayesian Image Reconstruction using Deep Generative Models

Bayesian Image Reconstruction using Deep Generative Models R. Marinescu, D. Moyer, P. Golland For technical inquiries, please create a Github issue. F

51 Nov 23, 2022

[SIGGRAPH Asia 2019] Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning

AGIS-Net Introduction This is the official PyTorch implementation of the Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning. paper | suppl

102 Jan 02, 2023

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification Created by Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, Ch

414 Jan 01, 2023

Source Code of NeurIPS21 paper: Recognizing Vector Graphics without Rasterization

YOLaT-VectorGraphicsRecognition This repository is the official PyTorch implementation of our NeurIPS-2021 paper: Recognizing Vector Graphics without

49 Dec 20, 2022

Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image

Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image (Project page) Zhengqin Li, Mohammad Sha

209 Jan 05, 2023

MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network

MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network This repository is the official implementation of MatchGAN: A S

12 Dec 27, 2022

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

105 Dec 23, 2022

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

7.7k Jan 05, 2023

v objective diffusion inference code for PyTorch.

Related tags

Overview

v-diffusion-pytorch

Dependencies

Model checkpoints:

Sampling

Example

CLIP conditioned/guided sampling

Comments

Releases(v0.0.2)

v0.0.2(Nov 20, 2022)

Owner

Katherine Crowson

Deep Markov Factor Analysis (NeurIPS2021)

Bayesian Image Reconstruction using Deep Generative Models

[SIGGRAPH Asia 2019] Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

Source Code of NeurIPS21 paper: Recognizing Vector Graphics without Rasterization

Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image

MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

Attempt at implementation of a simple GAN using Keras

SARS-Cov-2 Recombinant Finder for fasta sequences

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders

Official implementation of "Refiner: Refining Self-attention for Vision Transformers".

DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction (3DV 2021)

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

code for our ECCV-2020 paper: Self-supervised Video Representation Learning by Pace Prediction

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Let Python optimize the best stop loss and take profits for your TradingView strategy.

An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).