CLIP+FFT text-to-image

Overview

Aphantasia

Open In Colab

This is a text-to-image tool, part of the artwork of the same name.
Based on CLIP model, with FFT parameterizer from Lucent library as a generator.
Tested on Python 3.7 with PyTorch 1.7.1.

Aphantasia is the inability to visualize mental images, the deprivation of visual dreams.
The image in the header is generated by the tool from this word.

Features

  • generating massive detailed textures, a la deepdream
  • fast convergence!
  • fullHD/4K resolutions and above
  • complex queries:
    • text and/or image as main prompts
    • additional text prompts for fine details and to subtract (avoid) topics
    • criteria inversion (show "the opposite")
  • continuous mode to process phrase lists (e.g. illustrating lyrics)
  • saving/loading parameters to resume processing
  • selectable CLIP model

Setup CLIP et cetera:

pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/Po-Hsun-Su/pytorch-ssim

Operations

  • Generate an image from the text prompt (set the size as you wish):
python clip_fft.py -t "the text" --size 1280-720
  • Reproduce an image:
python clip_fft.py -i theimage.jpg --sync 0.01

--sync X argument (X = from 0 to 1) enables SSIM loss to keep the composition and details of the original image.

You can combine both text and image prompts.
Use --translate option to process non-English languages.

  • Set more specific query like this:
python clip_fft.py -t "macro figures" -t2 "micro details" -t0 "avoid this" --size 1280-720 
  • Other options:
    --model M selects one of the released CLIP models: ViT-B/32 (default), RN50, RN50x4, RN101.
    --overscan mode processes double-padded image to produce more uniform (and probably seamlessly tileable) textures. Omit it, if you need more centered composition.
    --steps N sets iterations count. 50-100 is enough for a starter; 500-1000 would elaborate it more thoroughly.
    --samples N sets amount of the image cuts (samples), processed at one step. With more samples you can set fewer iterations for similar result (and vice versa). 200/200 is a good guess. NB: GPU memory is mostly eaten by this count (not resolution)!
    --fstep N tells to save every Nth frame (useful with high iterations, default is 1).
    --contrast X may be needed for new ResNet models (they tend to burn the colors).
    --noise X adds some noise to the parameters, possibly making composition less clogged (in a degree).
    --lrate controls learning rate. The range is quite wide (tested from 0.01 to 10, try less/more).
    --invert negates the whole criteria, if you fancy checking "totally opposite".
    --save_pt myfile.pt will save FFT parameters, to resume for next query with --resume myfile.pt.
    --verbose ('on' by default) enables some printouts and realtime image preview.

Continuous mode

Open In Colab

  • Make video from a text file, processing it line by line in one shot:
python illustra.py -i mysong.txt --size 1280-720 --length 155

This will first generate and save images for every text line (with sequences and training videos, as in single-image mode above), then render final video from those (mixing images in FFT space) of the length duration in seconds.

By default, every frame is produced independently (randomly initiated). Instead, --keep all starts each generation from the average of previous runs; on practice that means similar compositions and smoother transitions. --keep last amplifies that smoothness by starting generation close to the last run, but that can make imagery getting stuck. This behaviour heavily depends on the input, so test with your prompts and see what's better in your case.

  • Make video from a directory with saved *.pt snapshots (just interpolate them):
python interpol.py -i mydir --length 155

Credits

CLIP, the paper
Copyright (c) 2021 OpenAI

Thanks to Ryan Murdock, Jonathan Fly and Hannu Toyryla for ideas.

Comments
  • Invalid Syntax when trying to run the first time

    Invalid Syntax when trying to run the first time

    I followed all the instructions and still I'm getting this error -

    aphantasia git:(master) python clip_fft.py -t "the text" --size 1280-720 
      File "clip_fft.py", line 112
        Ys = [torch.randn(*Y.shape).cuda() for Y in [Yl_in, *Yh_in]]
    
    opened by GonrasK 10
  • SSIM Alternative: DISTS

    SSIM Alternative: DISTS

    I'm trying DISTS as an alternative to SSIM and so far it's... It works to make the supplied image show up in the results. I don't know a whole lot about it though, to be frank it's just something different that works. I haven't decided if it's of any worth yet.

    To get it running in Aphantasia I just needed to !pip install dists-pytorch and in clip_fft.py add from DISTS_pytorch import DISTS and replace ssim_loss = ssim.SSIM(window_size = 11) with ssim_loss = DISTS(require_grad=True, batch_average=True)

    The two images to compare might need to be blurred before comparison to focus more on shapes.

    opened by torridgristle 7
  • allow aphantasia to be pip installed

    allow aphantasia to be pip installed

    Enjoying these interesting drawing modules which are runnable from my pixray library. I've made some changes to the file locations and added a setup.py which allows aphantasia to be pip installed - which makes using the code as a library easier.

    I'm posting these changes as a pull request for review, but note I haven't had much experience myself packaging python libraries and so would welcome any feedback if there are better ways to achieve this. Also haven't tested the notebooks against this change and the likely would need their imports adjusted as well. But running the notebooks in google colab could potentially be a bit easier as well as you can also use pip now from the colab to install the library.

    opened by dribnet 5
  • clip_fft.py won't start

    clip_fft.py won't start

    I installed requirements.txt and git+https://github.com/openai/CLIP.git and after that I ran

    python clip_fft.py -t "city" -t2 "gradient" --size 1280-720

    And after that I got the error

    c:\etc\aphantasia-master>python clip_fft.py -t "city" -t2 "gradient" --size 1280-720
    Traceback (most recent call last):
      File "clip_fft.py", line 23, in <module>
        from utils import slice_imgs, derivat, sim_func, basename, img_list, img_read, plot_text, txt_clean, checkout, old_torch
      File "c:\etc\aphantasia-master\utils.py", line 13, in <module>
        from kornia.filters.sobel import spatial_gradient
      File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\kornia\__init__.py", line 19, in <module>    from kornia import jit
      File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\kornia\jit\__init__.py", line 9, in <module>
        spatial_soft_argmax2d = torch.jit.script(K.geometry.spatial_soft_argmax2d)
      File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\jit\__init__.py", line 1290, in script
        fn = torch._C._jit_script_compile(qualified_name, ast, _rcb, get_default_args(obj))
      File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\jit\_recursive.py", line 568, in try_compile_fn
        return torch.jit.script(fn, _rcb=rcb)
      File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\jit\__init__.py", line 1290, in script
        fn = torch._C._jit_script_compile(qualified_name, ast, _rcb, get_default_args(obj))
      File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\jit\_recursive.py", line 568, in try_compile_fn
        return torch.jit.script(fn, _rcb=rcb)
      File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\jit\__init__.py", line 1290, in script
        fn = torch._C._jit_script_compile(qualified_name, ast, _rcb, get_default_args(obj))
    RuntimeError:
    Unknown type name 'torch.dtype':
      File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\kornia\utils\grid.py", line 12
            normalized_coordinates: bool = True,
            device: Optional[torch.device] = torch.device('cpu'),
            dtype: torch.dtype = torch.float32) -> torch.Tensor:
                   ~~~~~~~~~~~ <--- HERE
        """Generates a coordinate grid for an image.
    'create_meshgrid' is being compiled since it was called from 'spatial_expectation2d'
      File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\kornia\geometry\subpix\dsnt.py", line 100
        # Create coordinates grid.
        grid: torch.Tensor = create_meshgrid(height, width, normalized_coordinates, input.device)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        grid = grid.to(input.dtype)
    'spatial_expectation2d' is being compiled since it was called from 'spatial_soft_argmax2d'
      File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\kornia\geometry\subpix\spatial_soft_argmax.py", line 516
        """
        input_soft: torch.Tensor = dsnt.spatial_softmax2d(input, temperature)
        output: torch.Tensor = dsnt.spatial_expectation2d(input_soft, normalized_coordinates)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        return output
    

    How can I make the program work?

    opened by Niotferdi 4
  • init_image support

    init_image support

    Do any of the drawing modules have support for initialising from an image? I looked but didn't see any inverse FFT code currently in the codebase. If not this might be an interesting feature to consider adding for any models that would easily support such an operation.

    opened by dribnet 4
  • How do you upload a photo to work with it?

    How do you upload a photo to work with it?

    NameError Traceback (most recent call last) in () 12 text = translator.translate(text, dest='en').text 13 if upload_image: ---> 14 uploaded = files.upload()

    NameError: name 'files' is not defined

    opened by raul2297 4
  • ReadTimeOutError when installing OpenAi

    ReadTimeOutError when installing OpenAi

    In the instructions: pip install git+https://github.com/openai/CLIP.git results in the following error:

    Collecting git+https://github.com/openai/CLIP.git Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-ham_skxz Running command git clone -q https://github.com/openai/CLIP.git /tmp/pip-req-build-ham_skxz Requirement already satisfied: ftfy in /home/steven/anaconda3/lib/python3.8/site-packages (from clip==1.0) (6.0.1) Requirement already satisfied: regex in /home/steven/anaconda3/lib/python3.8/site-packages (from clip==1.0) (2020.6.8) Requirement already satisfied: tqdm in /home/steven/anaconda3/lib/python3.8/site-packages (from clip==1.0) (4.47.0) Collecting torch~=1.7.1 WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/1d/a9/f349273a0327fdf20a73188c9c3aa7dbce68f86fad422eadd366fd2ed7a0/torch-1.7.1-cp38-cp38-manylinux1_x86_64.whl WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/1d/a9/f349273a0327fdf20a73188c9c3aa7dbce68f86fad422eadd366fd2ed7a0/torch-1.7.1-cp38-cp38-manylinux1_x86_64.whl WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/1d/a9/f349273a0327fdf20a73188c9c3aa7dbce68f86fad422eadd366fd2ed7a0/torch-1.7.1-cp38-cp38-manylinux1_x86_64.whl WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/1d/a9/f349273a0327fdf20a73188c9c3aa7dbce68f86fad422eadd366fd2ed7a0/torch-1.7.1-cp38-cp38-manylinux1_x86_64.whl WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/1d/a9/f349273a0327fdf20a73188c9c3aa7dbce68f86fad422eadd366fd2ed7a0/torch-1.7.1-cp38-cp38-manylinux1_x86_64.whl ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Max retries exceeded with url: /packages/1d/a9/f349273a0327fdf20a73188c9c3aa7dbce68f86fad422eadd366fd2ed7a0/torch-1.7.1-cp38-cp38-manylinux1_x86_64.whl (Caused by ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)"))

    opened by squinonescolon 4
  • Alternate Subtraction Method, Faster

    Alternate Subtraction Method, Faster

    I was trying out ways of manipulating the encoded text and one that I tried was subtracting encoded text from the encoded text prompt. I tried four renders for each and they look about the same, except the one that changes the encoded text had less of the subtract prompt which suggests to me that it's more effective at subtracting a prompt. Also it ends up using just the one txt_enc rather than 2, and just the one cosine similarity.

    Prompt: "a photo of a human face" and Negative: "a photo of a face"

    Subtracting Subtract's txt_enc0 from text_enc resulted in these enc_sub

    Existing negative method what uses cosine similarity with the image and negative prompt for loss resulted in these enc_neg

    And for fun, using subtract to increase the difference between the two by txt_enc + (txt_enc - text_enc0) resulted in these enc_subdiff

    The encoded text and images seem to be explorable like latent space.

    opened by torridgristle 4
  • Doesn't work with PyTorch 1.8

    Doesn't work with PyTorch 1.8

    PyTorch recently had a 1.8 release, bringing much better support for backing torch.cuda tensors with AMD GPUs.

    However, clip_fft.py at least hasn't been ported to PyTorch 1.8 yet.

    In particular, it still uses the deprecated and now removed pytorch.irfft, which needs to be replaced with calls to methods in the torch.fft namespace to work on PyTorch 1.8.

    Unfortunately, the PR that removed support for the old methods doesn't provide a recipe for translating calls that can be executed by someone who doesn't understand the finer points of FFTs. It seems to me that the square-root-of-a-bunch-of-stuff normalization method of the old function isn't available as any of the normalization modes of torch.fft.irfft, and I'm not sure of the number of dimensions involved here, or whether we have the input versus the output sizes handy.

    opened by interfect 4
  • [Feature] Learning Rate Modified by Steps

    [Feature] Learning Rate Modified by Steps

    I've experimented with a learning rate that changes as the steps increase due to seeing Aphantasia develop a general image very quickly, but then slowing down to make small details. I believe that my proposed alternative puts more focus on larger shapes, and less on details.

    I expose the learning_rate variable and add a learning_rate_max variable in the Generate cell, remove the optimizer = torch.optim.Adam(params, learning_rate) line and instead add this to def train(i):

    learning_rate_new = learning_rate + (i / steps) * (learning_rate_max - learning_rate)
    optimizer_new = torch.optim.Adam(params, learning_rate_new)
    

    With this, I find that a learning_rate of 0.0001 and a learning_rate_max of 0.008 at the highest value works well, for 300-400 steps and about 50 samples at least.

    opened by torridgristle 4
  • TypeError: 'float' object is not subscriptable

    TypeError: 'float' object is not subscriptable

    Seems like something broke the IllusTrip3D.ipynb in the new update. Settings of possibly relevant parameters: zoom = 0.0005, shift = 0, animate_them = False

    Here is the stack trace:

    using fast aug transforms
     using RGB method, 95 samples
     ref text:  ethereal cosmology
     ref style:  
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-8-09ff3d020318> in <module>()
        229 pbar = ProgressBar(glob_steps)
        230 for i in range(count):
    --> 231   process(i)
        232 
        233 HTML(makevid(tempdir))
    
    1 frames
    <ipython-input-6-4929578fc46a> in depth_transform(img_t, img_np, depth_infer, depth_mask, size, depthX, scale, shift, colors, depth_dir, save_num)
         46     dY = 100. * shift[1] / size[0]
         47     # dZ = movement direction: 1 away (zoom out), 0 towards (zoom in), 0.5 stay
    ---> 48     dZ = 0.5 + 23. * (scale[0]-1)
         49     # dZ += 0.5 * float(math.sin(((save_num % 70)/70) * math.pi * 2))
         50 
    
    TypeError: 'float' object is not subscriptable
    
    opened by Akyshnik 3
Releases(v2.5)
Owner
vadim epstein
vadim epstein
Seg-Torch for Image Segmentation with Torch

Seg-Torch for Image Segmentation with Torch This work was sparked by my personal research on simple segmentation methods based on deep learning. It is

Eren Gölge 37 Dec 12, 2022
Happywhale - Whale and Dolphin Identification Silver🥈 Solution (26/1588)

Kaggle-Happywhale Happywhale - Whale and Dolphin Identification Silver 🥈 Solution (26/1588) 竞赛方案思路 图像数据预处理-标志性特征图片裁剪:首先根据开源的标注数据训练YOLOv5x6目标检测模型,将训练集

Franxx 20 Nov 14, 2022
Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

CoTuning Official implementation for NeurIPS 2020 paper Co-Tuning for Transfer Learning. [News] 2021/01/13 The COCO 70 dataset used in the paper is av

THUML @ Tsinghua University 35 Sep 23, 2022
Black box hyperparameter optimization made easy.

BBopt BBopt aims to provide the easiest hyperparameter optimization you'll ever do. Think of BBopt like Keras (back when Theano was still a thing) for

Evan Hubinger 70 Nov 03, 2022
113 Nov 28, 2022
Python Interview Questions

Python Interview Questions Clone the code to your computer. You need to understand the code in main.py and modify the content in if __name__ =='__main

ClassmateLin 575 Dec 28, 2022
This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap

Sophia Althammer 33 Aug 26, 2022
Open standard for machine learning interoperability

Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. ONNX provides

Open Neural Network Exchange 13.9k Dec 30, 2022
Stream images from a connected camera over MQTT, view using Streamlit, record to file and sqlite

mqtt-camera-streamer Summary: Publish frames from a connected camera or MJPEG/RTSP stream to an MQTT topic, and view the feed in a browser on another

Robin Cole 183 Dec 16, 2022
GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery

GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery This is the code to the paper: Gradient-Based Learn

3 Feb 15, 2022
商品推荐系统

商品top50推荐系统 问题建模 本项目的数据集给出了15万左右的用户以及12万左右的商品, 以及对应的经过脱敏处理的用户特征和经过预处理的商品特征,旨在为用户推荐50个其可能购买的商品。 推荐系统架构方案 本项目采用传统的召回+排序的方案。

107 Dec 29, 2022
Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image This repository is an implementation of the method described in the following pap

21 Dec 15, 2022
A torch implementation of "Pixel-Level Domain Transfer"

Pixel Level Domain Transfer A torch implementation of "Pixel-Level Domain Transfer". based on dcgan.torch. Dataset The dataset used is "LookBook", fro

Fei Xia 260 Sep 02, 2022
Official Repository of NeurIPS2021 paper: PTR

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning Figure 1. Dataset Overview. Introduction A critical aspect of human vis

Yining Hong 32 Jun 02, 2022
A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Mesh-Keys A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python Have been seeing alot

Joseph 53 Dec 13, 2022
"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Texformer: 3D Human Texture Estimation from a Single Image with Transformers This is the official implementation of "3D Human Texture Estimation from

XiangyuXu 193 Dec 05, 2022
Sharpness-Aware Minimization for Efficiently Improving Generalization

Sharpness-Aware-Minimization-TensorFlow This repository provides a minimal implementation of sharpness-aware minimization (SAM) (Sharpness-Aware Minim

Sayak Paul 54 Dec 08, 2022
Implicit Model Specialization through DAG-based Decentralized Federated Learning

Federated Learning DAG Experiments This repository contains software artifacts to reproduce the experiments presented in the Middleware '21 paper "Imp

Operating Systems and Middleware Group 5 Oct 16, 2022
Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

Low-light Image Enhancement via Breaking Down the Darkness by Qiming Hu, Xiaojie Guo. 1. Dependencies Python3 PyTorch=1.0 OpenCV-Python, TensorboardX

Qiming Hu 30 Jan 01, 2023