Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Overview

glide-finetune

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset.

Installation

git clone https://github.com/afiaka87/glide-finetune.git
cd glide-finetune/
python3 -m venv .venv # create a virtual environment to keep global install clean.
source .venv/bin/activate
(.venv) # optionally install pytorch manually for your own specific env first...
(.venv) python -m pip install -r requirements.txt

Usage

(.venv) python glide-finetune.py 
    --data_dir=./data \
    --batch_size=1 \
    --grad_acc=1 \
    --guidance_scale=4.0 \
    --learning_rate=2e-5 \
    --dropout=0.1 \
    --timestep_respacing=1000 \
    --side_x=64 \
    --side_y=64 \
    --resume_ckpt='' \
    --checkpoints_dir='./glide_checkpoints/' \
    --use_fp16 \
    --device='' \
    --freeze_transformer \
    --freeze_diffusion \
    --weight_decay=0.0 \
    --project_name='glide-finetune'

Known issues:

  • batching isn't handled in the dataloader
  • NaN/Inf errors
  • Resizing doesn't handle non-square aspect ratios properly
  • some of the code is messy, needs refactoring.
Comments
  • Fixed a couple of minor issues

    Fixed a couple of minor issues

    • Pinned webdataset version to work with python 3.7 which is the version being used in Colab, Kaggle. A new version of this module is releaed few days back which only works with 3.8/9
    • Fixed an issue with data_dir arg not getting picked up.
    opened by vanga 1
  • Fix NameError when using --data_dir

    Fix NameError when using --data_dir

    Hello and thank you for your great work.

    Right now using a local data folder with --data_dir results in

    Traceback (most recent call last):
      File "/content/glide-finetune/train_glide.py", line 292, in <module>
        data_dir=data_dir,
    NameError: name 'data_dir' is not defined
    

    This PR fixes that.

    opened by tillfalko 0
  • mention mpi4py dependency

    mention mpi4py dependency

    mpi4py installation will fail unless the user has this package installed. Since MPI is not a ubiquitous dependency it should probably be mentioned. Edit: Since torch==1.10.1 is a requirement, and torch versions come with their own cuda versions (torch 1.10.1 uses cuda 10.2), I don't see a reason not to just include bitsandbytes-cuda102 in requirements.txt.

    $ py -m venv .venv
    $ source .venv/bin/activate
    $ pip install torch==1.10.1
    Collecting torch==1.10.1
      Downloading torch-1.10.1-cp39-cp39-manylinux1_x86_64.whl (881.9 MB)
         |████████████████████████████████| 881.9 MB 15 kB/s
    Collecting typing-extensions
      Downloading typing_extensions-4.0.1-py3-none-any.whl (22 kB)
    Installing collected packages: typing-extensions, torch
    Successfully installed torch-1.10.1 typing-extensions-4.0.1
    $ py -c "import torch; print(torch.__version__)"
    1.10.1+cu102
    
    opened by tillfalko 0
  • Fixed half precision optimizer bug

    Fixed half precision optimizer bug

    Problem

    In half precision, after the first iteration nan values start appearing regardless of input data or gradients since the adam optimizer breaks in float16. The discussion for that can be viewed here.

    Solution

    This can be fixed by setting the eps variable to 1e-4 instead of the default 1e-8. This is the only thing this pr does

    opened by isamu-isozaki 0
  • Training on half precision leads to nan values

    Training on half precision leads to nan values

    I was training my model and I noticed that after just the first iteration I was running into nan values. As it turns out my gradients and input values/images were all normal but the adam optimizer by pytorch does has some weird behavior on float16 precision where it produces nans probably because of a divide by 0 error. A discussion can be found below

    https://discuss.pytorch.org/t/adam-half-precision-nans/1765/4

    I hear changing the epison parameter for the adam weights parameter when on half precisions works but I haven't tested it yet. Will make one once I tested.

    And also let me say thanks for this repo. I wanted to fine tune the glide model and this made it so much easier.

    opened by isamu-isozaki 1
  • Where is the resume_ckpt

    Where is the resume_ckpt

    Hi, thanks for your job.

    I noticed to finetune the glide, we should have a base_model, namely "resume_ckpt". --resume_ckpt 'ckpt_to_resume_from.pt'
    Where can we get this model? Because I find Glide also didn't provide any checkpoint. Thanks for your help.

    opened by zhaobingbingbing 0
Releases(v0.0.1)
  • v0.0.1(Feb 20, 2022)

    Having some experience with finetuning GLIDE on laion/alamy, etc. I think this code works great now and hope as many people can use it as possible. Please file bugs - I know there may be a few.

    New additions:

    • dataloader for LAION400M
    • dataloader for alamy
    • train the upsample model instead of just the base model
    • (early) code for training the released noisy CLIP. still a WIP.
    Source code(tar.gz)
    Source code(zip)
Owner
Clay Mullis
Software engineer working with multi-modal deep learning.
Clay Mullis
Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

StackGAN-v2 StackGAN-v1: Tensorflow implementation StackGAN-v1: Pytorch implementation Inception score evaluation Pytorch implementation for reproduci

Han Zhang 809 Dec 16, 2022
Half Instance Normalization Network for Image Restoration

HINet Half Instance Normalization Network for Image Restoration, based on https://github.com/megvii-model/HINet. Dependencies NumPy PyTorch, preferabl

Holy Wu 4 Jun 06, 2022
Python code to generate art with Generative Adversarial Network

GAN_Canvas_Maker Generating Art using Generative Adversarial Network (GAN) Python code to generate art with Generative Adversarial Network: https://to

Jonny Banana 10 Aug 22, 2022
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation This repo is the official implementation of "MHFormer: Multi-Hypothesis Transforme

Vegetabird 281 Jan 07, 2023
Official Pytorch Implementation for Splicing ViT Features for Semantic Appearance Transfer presenting Splice

Splicing ViT Features for Semantic Appearance Transfer [Project Page] Splice is a method for semantic appearance transfer, as described in Splicing Vi

Omer Bar Tal 253 Jan 06, 2023
Code for CVPR2019 paper《Unequal Training for Deep Face Recognition with Long Tailed Noisy Data》

Unequal-Training-for-Deep-Face-Recognition-with-Long-Tailed-Noisy-Data. This is the code of CVPR 2019 paper《Unequal Training for Deep Face Recognition

Zhong Yaoyao 68 Jan 07, 2023
A library for researching neural networks compression and acceleration methods.

A library for researching neural networks compression and acceleration methods.

Intel Labs 100 Dec 29, 2022
Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

Taxonomizing local versus global structure in neural network loss landscapes Int

Yaoqing Yang 8 Dec 30, 2022
PPO Lagrangian in JAX

PPO Lagrangian in JAX This repository implements PPO in JAX. Implementation is tested on the safety-gym benchmark. Usage Install dependencies using th

Karush Suri 2 Sep 14, 2022
The world's largest toxicity dataset.

The Toxicity Dataset by Surge AI Saving the internet is fun. Combing through thousands of online comments to build a toxicity dataset isn't. That's wh

Surge AI 134 Dec 19, 2022
Pytorch implementation of OCNet series and SegFix.

openseg.pytorch News 2021/09/14 MMSegmentation has supported our ISANet and refer to ISANet for more details. 2021/08/13 We have released the implemen

openseg-group 1.1k Dec 23, 2022
A library for answering questions using data you cannot see

A library for computing on data you do not own and cannot see PySyft is a Python library for secure and private Deep Learning. PySyft decouples privat

OpenMined 8.5k Jan 02, 2023
FOSS Digital Asset Distribution Platform built on Frappe.

Digistore FOSS Digital Assets Marketplace. Distribute digital assets, like a pro. Video Demo Here Features Create, attach and list digital assets (PDF

Mohammad Hussain Nagaria 30 Dec 08, 2022
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
Simple Dynamic Batching Inference

Simple Dynamic Batching Inference 解决了什么问题? 众所周知,Batch对于GPU上深度学习模型的运行效率影响很大。。。 是在Inference时。搜索、推荐等场景自带比较大的batch,问题不大。但更多场景面临的往往是稀碎的请求(比如图片服务里一次一张图)。 如果

116 Jan 01, 2023
机器学习、深度学习、自然语言处理等人工智能基础知识总结。

说明 机器学习、深度学习、自然语言处理基础知识总结。 目前主要参考李航老师的《统计学习方法》一书,也有一些内容例如XGBoost、聚类、深度学习相关内容、NLP相关内容等是书中未提及的。

Peter 445 Dec 12, 2022
Python codes for Lite Audio-Visual Speech Enhancement.

Lite Audio-Visual Speech Enhancement (Interspeech 2020) Introduction This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE

Shang-Yi Chuang 85 Dec 01, 2022
Object Detection Projekt in GKI WS2021/22

tfObjectDetection Object Detection Projekt with tensorflow in GKI WS2021/22 Docker Container: docker run -it --name --gpus all -v path/to/project:p

Tim Eggers 1 Jul 18, 2022
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 463 Dec 09, 2022
Anonymize BLM Protest Images

Anonymize BLM Protest Images This repository automates @BLMPrivacyBot, a Twitter bot that shows the anonymized images to help keep protesters safe. Us

Stanford Machine Learning Group 40 Oct 13, 2022