Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Last update: Oct 13, 2022

Related tags

Overview

glide-finetune

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset.

Installation

git clone https://github.com/afiaka87/glide-finetune.git
cd glide-finetune/
python3 -m venv .venv # create a virtual environment to keep global install clean.
source .venv/bin/activate
(.venv) # optionally install pytorch manually for your own specific env first...
(.venv) python -m pip install -r requirements.txt

Usage

(.venv) python glide-finetune.py 
    --data_dir=./data \
    --batch_size=1 \
    --grad_acc=1 \
    --guidance_scale=4.0 \
    --learning_rate=2e-5 \
    --dropout=0.1 \
    --timestep_respacing=1000 \
    --side_x=64 \
    --side_y=64 \
    --resume_ckpt='' \
    --checkpoints_dir='./glide_checkpoints/' \
    --use_fp16 \
    --device='' \
    --freeze_transformer \
    --freeze_diffusion \
    --weight_decay=0.0 \
    --project_name='glide-finetune'

Known issues:

batching isn't handled in the dataloader
NaN/Inf errors
Resizing doesn't handle non-square aspect ratios properly
some of the code is messy, needs refactoring.

Comments

Fixed a couple of minor issues
Pinned webdataset version to work with python 3.7 which is the version being used in Colab, Kaggle. A new version of this module is releaed few days back which only works with 3.8/9

Fixed an issue with data_dir arg not getting picked up.
opened by vanga 1
Fix NameError when using --data_dir
Hello and thank you for your great work.

Right now using a local data folder with --data_dir results in

Traceback (most recent call last): File "/content/glide-finetune/train_glide.py", line 292, in <module> data_dir=data_dir, NameError: name 'data_dir' is not defined

This PR fixes that.
opened by tillfalko 0

mention mpi4py dependency

mpi4py installation will fail unless the user has this package installed. Since MPI is not a ubiquitous dependency it should probably be mentioned. Edit: Since torch==1.10.1 is a requirement, and torch versions come with their own cuda versions (torch 1.10.1 uses cuda 10.2), I don't see a reason not to just include bitsandbytes-cuda102 in requirements.txt.

$ py -m venv .venv
$ source .venv/bin/activate
$ pip install torch==1.10.1
Collecting torch==1.10.1
  Downloading torch-1.10.1-cp39-cp39-manylinux1_x86_64.whl (881.9 MB)
     |████████████████████████████████| 881.9 MB 15 kB/s
Collecting typing-extensions
  Downloading typing_extensions-4.0.1-py3-none-any.whl (22 kB)
Installing collected packages: typing-extensions, torch
Successfully installed torch-1.10.1 typing-extensions-4.0.1
$ py -c "import torch; print(torch.__version__)"
1.10.1+cu102

opened by tillfalko 0

Fixed half precision optimizer bug

Problem

In half precision, after the first iteration nan values start appearing regardless of input data or gradients since the adam optimizer breaks in float16. The discussion for that can be viewed here.

Solution

This can be fixed by setting the eps variable to 1e-4 instead of the default 1e-8. This is the only thing this pr does

opened by isamu-isozaki 0
Training on half precision leads to nan values

I was training my model and I noticed that after just the first iteration I was running into nan values. As it turns out my gradients and input values/images were all normal but the adam optimizer by pytorch does has some weird behavior on float16 precision where it produces nans probably because of a divide by 0 error. A discussion can be found below

https://discuss.pytorch.org/t/adam-half-precision-nans/1765/4

I hear changing the epison parameter for the adam weights parameter when on half precisions works but I haven't tested it yet. Will make one once I tested.

And also let me say thanks for this repo. I wanted to fine tune the glide model and this made it so much easier.

opened by isamu-isozaki 1
Where is the resume_ckpt

Hi, thanks for your job.

I noticed to finetune the glide, we should have a base_model, namely "resume_ckpt". --resume_ckpt 'ckpt_to_resume_from.pt'
Where can we get this model? Because I find Glide also didn't provide any checkpoint. Thanks for your help.

opened by zhaobingbingbing 0

Releases(v0.0.1)

v0.0.1(Feb 20, 2022)
Having some experience with finetuning GLIDE on laion/alamy, etc. I think this code works great now and hope as many people can use it as possible. Please file bugs - I know there may be a few.

New additions:

dataloader for LAION400M

dataloader for alamy

train the upsample model instead of just the base model

(early) code for training the released noisy CLIP. still a WIP.

Source code(tar.gz)
Source code(zip)

Owner

Clay Mullis

Software engineer working with multi-modal deep learning.

GitHub Repository

Simple image captioning model - CLIP prefix captioning.

CLIP prefix captioning. Inference Notebook: 🥳 New: 🥳 Our technical papar is finally out! Official implementation for the paper "ClipCap: CLIP Prefix

688 Jan 04, 2023

Deep Compression for Dense Point Cloud Maps.

DEPOCO This repository implements the algorithms described in our paper Deep Compression for Dense Point Cloud Maps. How to get started (using Docker)

67 Dec 06, 2022

PyTorch implementation of MulMON

MulMON This repository contains a PyTorch implementation of the paper: Learning Object-Centric Representations of Multi-object Scenes from Multiple Vi

16 Nov 03, 2022

PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE).

GRACE The official PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE). For a thorough resource collection of self-superv

186 Dec 27, 2022

LaneAF: Robust Multi-Lane Detection with Affinity Fields

LaneAF: Robust Multi-Lane Detection with Affinity Fields This repository contains Pytorch code for training and testing LaneAF lane detection models i

155 Dec 17, 2022

OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages

OCR-Streamlit-App OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages OCR app gets an image a

5 Apr 05, 2022

Unet network with mean teacher for altrasound image segmentation

5 Nov 21, 2022

Official implementation of NeuralFusion: Online Depth Map Fusion in Latent Space

NeuralFusion This is the official implementation of NeuralFusion: Online Depth Map Fusion in Latent Space. We provide code to train the proposed pipel

53 Jan 01, 2023

NeurIPS-2021: Neural Auto-Curricula in Two-Player Zero-Sum Games.

NAC Official PyTorch implementation of NAC from the paper: Neural Auto-Curricula in Two-Player Zero-Sum Games. We release code for: Gradient based ora

19 Nov 11, 2022

Official implementation of "Learning Proposals for Practical Energy-Based Regression", 2021.

ebms_proposals Official implementation (PyTorch) of the paper: Learning Proposals for Practical Energy-Based Regression, 2021 [arXiv] [project]. Fredr

10 Oct 22, 2022

Real-time Object Detection for Streaming Perception, CVPR 2022

StreamYOLO Real-time Object Detection for Streaming Perception Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Sun Jian Real-time Object Detection

237 Dec 27, 2022

This project is the PyTorch implementation of our CVPR 2022 paper:

Requirements and Dependency Install PyTorch with CUDA (for GPU). (Experiments are validated on python 3.8.11 and pytorch 1.7.0) (For visualization if

23 Nov 29, 2022

[ICCV 2021] Deep Hough Voting for Robust Global Registration

Deep Hough Voting for Robust Global Registration, ICCV, 2021 Project Page | Paper | Video Deep Hough Voting for Robust Global Registration Junha Lee1,

57 Nov 28, 2022

The official implementation of Equalization Loss for Long-Tailed Object Recognition (CVPR 2020) based on Detectron2

Equalization Loss for Long-Tailed Object Recognition Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, Junjie Yan ⚠️ We re

197 Dec 25, 2022

KwaiRec: A Fully-observed Dataset for Recommender Systems (Density: Almost 100%)

KuaiRec: A Fully-observed Dataset for Recommender Systems (Density: Almost 100%) KuaiRec is a real-world dataset collected from the recommendation log

70 Dec 28, 2022

Weakly Supervised Text-to-SQL Parsing through Question Decomposition

Weakly Supervised Text-to-SQL Parsing through Question Decomposition The official repository for the paper "Weakly Supervised Text-to-SQL Parsing thro

14 Dec 19, 2022

A Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities

MPT A Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities. Implementation for our AAAI 2022 paper: Multi-

4 May 08, 2022

Official implementation of the NeurIPS'21 paper 'Conditional Generation Using Polynomial Expansions'.

Conditional Generation Using Polynomial Expansions Official implementation of the conditional image generation experiments as described on the NeurIPS

4 Aug 07, 2022

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

7 Jan 12, 2022

Local-Global Stratified Transformer for Efficient Video Recognition

DualFormer This repo is the implementation of our manuscript entitled "Local-Global Stratified Transformer for Efficient Video Recognition". Our model

19 Dec 07, 2022