CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Overview

CLIP-GEN

[简体中文][English]

本项目在萤火二号集群上用 PyTorch 实现了论文 《CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP》。

clip-gen

CLIP-GEN 是一个 Language-Free 的文本生成图像的方法,它不依赖图文训练样本,通过预训练 CLIP 模型的强大表征能力,只需要图片数据就可以训练出一个文本生成图像的模型。该方法的基本原理是:CLIP-GEN 首先会训练一个 VQ-GAN,把图片映射到离散空间;然后再训练一个 GPT 模型,把 CLIP embedding 映射到 VQ-GAN 的离散空间;由于在 CLIP 中,文本和图像共享一个特征空间,在 inference 的时候我们就可以通过同样的方法把文本映射到 VQ-GAN 的离散空间,然后 decode 为 RGB 图像。

Requirements

  • hfai (to be released soon)
  • torch>=1.8

Training

支持的数据集:coco, imagenet, googlecc

  1. 下载 CLIP 预训练模型

    下载 CLIP 后放至 pretrained/clip_vit_b32.pt,该预训练模型来自 OpenAI.

  2. 在 COCO 上训练 VQGAN

    提交任务至萤火集群:

    hfai python train_vqgan.py --ds coco -- -n 1 -p 30

    本地运行:

    python train_vqgan.py --ds coco
  3. 在 COCO 上训练 Conditional GPT

    提交任务至萤火集群:

    hfai python train_gpt.py --ds coco --vqgan_ckpt /path/to/vqgan/ckpt -- -n 4 -p 30

    本地运行:

    python train_gpt.py --ds coco --vqgan_ckpt /path/to/vqgan/ckpt

Demo

下载在 COCO 上训练好的 VQGANGPT 模型,分别放到 pretrained/vqgan_coco.ptpretrained/gpt_coco.pt;然后运行:

python demo.py --text "A city bus driving on the city street" --out "bus.jpg"

NOTE: demo 的运行不依赖 hfai,用户可以在装有 PyTorch 的环境下直接使用

Samples

下面是一些文本生成图像的样本:

tower bus living train skiing

References

Citation

@article{wang2022clip,
  title={CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP},
  author={Wang, Zihao and Liu, Wei and He, Qian and Wu, Xinglong and Yi, Zili},
  journal={arXiv preprint arXiv:2203.00386},
  year={2022}
}

TODO

  • 预训练模型
  • FFRecord 数据
You might also like...
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models
Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models. You can easily generate all kind of art from drawing, painting, sketch, or even a specific artist style just using a text input. You can also specify the dimensions of the image. The process can take 3-20 mins and the results will be emailed to you.

A 1.3B text-to-image generation model trained on 14 million image-text pairs
A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.
A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

train-CLIP 📎 A PyTorch Lightning solution to training CLIP from scratch. Goal ⚽ Our aim is to create an easy to use Lightning implementation of OpenA

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper
[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus
Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

free-duolingo-plus duolingo account creator that uses your invite code to get yo

Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021
Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

🐯 SynthTIGER: Synthetic Text Image GEneratoR Official implementation of SynthTIGER | Paper | Datasets Moonbin Yim1, Yoonsik Kim1, Han-cheol Cho1, Sun

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.
The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

Comments
  • "nn.TransformerEncoderLayer" is adopted to construct the "conditonal transformer" in your paper.

    Thanks for your great work.

    I noticed that you utilize "nn.TransformerEncoderLayer" when constructing "conditional transformer". Since it is used to predict the next token index, I am wondering whether the decoder of transformer is more appropriate for the construction of your conditional transformer? or what's the reason that you don't adopt "nn.TransformerdecoderLayer" ?

    Because of the structure of "nn.TransformerEncoderLayer" is simpler or more concise than that of "nn.TransformerDEcoderLayer" ?

    opened by fido20160817 0
  • Add Web Demo & Docker environment

    Add Web Demo & Docker environment

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model, view it here: https://replicate.com/hfailab/clip-gen. You can find the docker file under the tab ‘run model with docker’.

    We have added some examples to the page, but do claim the page so you can own the page, customise the Example gallery as you like, push any future update to the web demo, and we'll feature it on our website and tweet about it too. You can find the 'Claim this model' button on the top of the page. Any member of the HFAiLab organization on GitHub can claim the model ~ When the page is claimed, it will be automatically linked to the arXiv website as well (under “Demos”).

    In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊

    opened by chenxwh 0
A 3D Dense mapping backend library of SLAM based on taichi-Lang designed for the aerial swarm.

TaichiSLAM This project is a 3D Dense mapping backend library of SLAM based Taichi-Lang, designed for the aerial swarm. Intro Taichi is an efficient d

XuHao 230 Dec 19, 2022
A python script to dump all the challenges locally of a CTFd-based Capture the Flag.

A python script to dump all the challenges locally of a CTFd-based Capture the Flag. Features Connects and logins to a remote CTFd instance. Dumps all

Podalirius 77 Dec 07, 2022
PyTorch Implementation of Backbone of PicoDet

PicoDet-Backbone PyTorch Implementation of Backbone of PicoDet Original Implementation is implemented on PaddlePaddle. Example picodet_l_backbone = ES

Yonghye Kwon 7 Jul 12, 2022
Semantic Image Synthesis with SPADE

Semantic Image Synthesis with SPADE New implementation available at imaginaire repository We have a reimplementation of the SPADE method that is more

NVIDIA Research Projects 7.3k Jan 07, 2023
A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

LPM_Python A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching. The code is established ac

AoxiangFan 11 Nov 07, 2022
EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation

EFENet EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation Code is a bit messy now. I woud clean up soon. For training the EF

Yaping Zhao 19 Nov 05, 2022
Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

PlantDoc: A Dataset for Visual Plant Disease Detection This repository contains the Cropped-PlantDoc dataset used for benchmarking classification mode

Pratik Kayal 109 Dec 29, 2022
Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

Ethan 51 Nov 17, 2022
Nest - A flexible tool for building and sharing deep learning modules

Nest - A flexible tool for building and sharing deep learning modules Nest is a flexible deep learning module manager, which aims at encouraging code

ZhouYanzhao 41 Oct 10, 2022
Next-gen Rowhammer fuzzer that uses non-uniform, frequency-based patterns.

Blacksmith Rowhammer Fuzzer This repository provides the code accompanying the paper Blacksmith: Scalable Rowhammering in the Frequency Domain that is

Computer Security Group @ ETH Zurich 173 Nov 16, 2022
:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

bulbea "Deep Learning based Python Library for Stock Market Prediction and Modelling." Table of Contents Installation Usage Documentation Dependencies

Achilles Rasquinha 1.8k Jan 05, 2023
Lowest memory consumption and second shortest runtime in NTIRE 2022 challenge on Efficient Super-Resolution

FMEN Lowest memory consumption and second shortest runtime in NTIRE 2022 on Efficient Super-Resolution. Our paper: Fast and Memory-Efficient Network T

33 Dec 01, 2022
Implementation for the EMNLP 2021 paper "Interactive Machine Comprehension with Dynamic Knowledge Graphs".

Interactive Machine Comprehension with Dynamic Knowledge Graphs Implementation for the EMNLP 2021 paper. Dependencies apt-get -y update apt-get instal

Xingdi (Eric) Yuan 19 Aug 23, 2022
Tensorflow Implementation of Pixel Transposed Convolutional Networks (PixelTCN and PixelTCL)

Pixel Transposed Convolutional Networks Created by Hongyang Gao, Hao Yuan, Zhengyang Wang and Shuiwang Ji at Texas A&M University. Introduction Pixel

Hongyang Gao 95 Jul 24, 2022
Spectrum is an AI that uses machine learning to generate Rap song lyrics

Spectrum Spectrum is an AI that uses deep learning to generate rap song lyrics. View Demo Report Bug Request Feature Open In Colab About The Project S

39 Dec 16, 2022
This is the PyTorch implementation of GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

1.1k Jan 01, 2023
Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

Low-light Image Enhancement via Breaking Down the Darkness by Qiming Hu, Xiaojie Guo. 1. Dependencies Python3 PyTorch=1.0 OpenCV-Python, TensorboardX

Qiming Hu 30 Jan 01, 2023
Repository for Driving Style Recognition algorithms for Autonomous Vehicles

Driving Style Recognition Using Interval Type-2 Fuzzy Inference System and Multiple Experts Decision Making Created by Iago Pachêco Gomes at USP - ICM

Iago Gomes 9 Nov 28, 2022
PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection?

PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Toyota Research Institute - Machine Learning 364 Dec 27, 2022
Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

HeyangXue1997 103 Dec 23, 2022