Python script to preprocess images of all Pokémon to finetune ruDALL-E

Overview

ai-generated-pokemon-rudalle

Python script to preprocess images of all Pokémon (the "official artwork" of each Pokémon via PokéAPI) into a format such that it can be used to finetune ruDALL-E using the finetuning example Colab Notebook linked in that repo. This workflow was used to create a model that resulted in AI-Generated Pokemon that went viral (10k+ retweets on Twitter + 30k+ upvotes on Reddit)

My modified Colab Notebook that I used to finetune the model on Pokémon is here: this Notebook's release is purely for demonstration/authentication purposes and no support will be given on how to use it because it is incredibly messy and embarrassing, but there may be a few ideas there that are useful for future generation. Some notes on how the process works are included below, with oppertunity to reproduce/improve it.

The script outputs two things: an images folder with all the preprocessed images plus a data_desc.csv file which contains the image path and Russian caption pairs for finetuning. Some examples of the preprocessed input images are present in the images folder, plus the final data_desc.csv.

The model used is not included in this repo because it's currently too large (~3GB) to distribute (will add the model to Hugging Face at some point).

Preprocessing Script Notes

  • The GraphQL interface to PokéAPI is used as it allows to retrieve the type information plus IDs of all Pokémon in a single request. As a bonus, the returned IDs include the alternate forms of Pokémon (e.g. Mega) which would not otherwise be present just by incrementing IDs.
  • ruDALL-E requires 256x256px, RGB input images. In this case the source input images from PokéAPI are conveiently both square and larger than 256x256 so they downsample nicely. Since the images have transparency (RGBA), they are composited onto a white background.
  • The translation service used is Yandex, which apparently has decent rate limits, plus as a Russian company the translations from English to Russian should theoetically be better.
  • The captions (which are later translated into Russian) are determined by type. For example, a Grass/Poison type will have the caption A Grass-type and Poison-type Pokémon, which is then translated into Russian. In theory, this improves the finetuning process by allowing ruDALL-E to notice trends, plus in theory this can be leveraged at generation-time to control the generation (e.g. prompt with A Grass-type Pokémon and have ruDALL-E generate only Grass-type Pokémon)
  • Due to potential rate limits on translation, translations are cached at runtime by Pokémon type(s) so the API is pinged only once.

Finetuning and Generation Notes

  • The model used above was trained for 12 epochs (4.5 hours on a P100), at a max learning rate of 1e-5. The pct_start param of the OneCycleLR scheudler was set to 0.1 so that learning rate decay happens faster. Despite that, the model converged quickly.

  • The parameters for finetuning ruDALL-E are very difficult to get the expected results. Too little training and the output images will be too incoherent; too much training and the model will overfit and output the source images, and also ignore any text prompts. In the social media posts above, the model is slightly overfit and attempts at using text prompts to control generation failed. But overfitting is not necessairly a bad thing as long as it avoids verbatim output.

Usage

You can install the dependences via:

pip3 install Pillow requests translatepy tqdm

Then run build_image_dataset.py

Getting the images into the ruDALL-E finetuning Colab Notebook is up to the user, but the recommended way to do so is to ZIP the generated images folder (~42 MB!), upload it to Colab (or upload to Google Drive and copy it into the Notebook from there), and unzip the folder in Colab itself via !unzip.

Maintainer/Creator

Max Woolf (@minimaxir)

Max's open-source projects are supported by his Patreon and GitHub Sponsors. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

MIT

Owner
Max Woolf
Data Scientist @buzzfeed. Plotter of pretty charts.
Max Woolf
Objetivo: de forma colaborativa pasar de nodos de Dynamo a Python.

ITTI_Ed01_De-nodos-a-python ITTI. EXPERT TRAINING EN AUTOMATIZACIÓN DE PROCESOS BIM: OFFICIAL DE AUTODESK. Edición 1 Enlace al Master Enunciado: Traba

1 Jun 06, 2022
python DroneCAN code generation, interface and utilities

UAVCAN v0 stack in Python Python implementation of the UAVCAN v0 protocol stack. UAVCAN is a lightweight protocol designed for reliable communication

DroneCAN 11 Dec 12, 2022
Simple but maybe too simple config management through python data classes. We use it for machine learning.

👩‍✈️ Coqpit Simple, light-weight and no dependency config handling through python data classes with to/from JSON serialization/deserialization. Curre

coqui 67 Nov 29, 2022
A PG3D API Made with Python

PG3D Python API A Pixel Gun 3D Python API (Public Ver) Features Count: 29 How To Use? import api as pbn Examples pbn.isBanned(192819483) - True pbn.f

Karim 2 Mar 24, 2022
Heads Down Application for Mac OSX

Heads Down A Mac app that lives in your ribbon—with a click of the mouse, temporarily block distracting websites and applications to encourage "heads

20 Mar 10, 2021
Improving Representations via Similarities

embetter warning I like to build in public, but please don't expect anything yet. This is alpha stuff! notes Improving Representations via Similaritie

vincent d warmerdam 229 Jan 08, 2023
Explore related sequences in the OEIS

OEIS explorer This is a tool for exploring two different kinds of relationships between sequences in the OEIS: mentions (links) of other sequences on

Alex Hall 6 Mar 15, 2022
Manage Procfile-based applications

Foreman Manage Procfile-based applications Installation $ gem install foreman Ruby users should take care not to install foreman in their project's G

David Dollar 5.8k Jan 03, 2023
A small C compiler written in Python for learning purposes

A small C compiler written in Python. Generates x64 Intel-format assembly, which is then assembled and linked by nasm and ld.

Scattered Thoughts 3 Oct 22, 2021
Web App for University Project

University Project About I made this web app to finish a project assigned by my teacher. It is written entirely in Python, thanks to streamlit to make

15 Nov 27, 2022
MiniJVM is simple java virtual machine written by python language, it can load class file from file system and run it.

MiniJVM MiniJVM是一款使用python编写的简易JVM,能够从本地加载class文件并且执行绝大多数指令。 支持的功能 1.从本地磁盘加载class并解析 2.支持绝大多数指令集的执行 3.支持虚拟机内存分区以及对象的创建 4.支持方法的调用和参数传递 5.支持静态代码块的初始化 不支

keguoyu 60 Apr 01, 2022
Sodium is a general purpose programming language which is instruction-oriented

Sodium is a general purpose programming language which is instruction-oriented (a new programming concept that we are developing and devising)

Satin Wuker 22 Jan 11, 2022
py2dis - A disassembly engine & library for Python

py2dis - A disassembly engine & library for Python. py2dis is a disassembly library for Python that does not use any modules/libraries other than colo

3 Feb 04, 2022
Simple script to match riders with drivers.

theBestPooler Simple script to match riders with drivers. It's a greedy, unoptimised search, so no guarantees that it works. It just seems to work (ve

Devansh 1 Nov 22, 2021
Tenda D151 & D301 - Unauthenticated configuration download

Exploit Title: Tenda D151 & D301 - Unauthenticated configuration download (login included)

Ayoub 3 Jul 14, 2022
Freeze your objects in python

gelidum Freeze your objects in python. Latin English Caelum est hieme frigidum et gelidum; myrtos oleas quaeque alia assiduo tepore laetantur, asperna

Diego J. 51 Dec 22, 2022
automate some stuff so I can be more noob

dota automate some stuff so I can be more noob This is a simple project, but one that I've wanted forever! I use pyautogui, time, smtplib and datetime

Aaron Allen 17 Oct 18, 2022
Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting

Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting

Junhua Zou 7 Oct 20, 2022
Fabric mod where anyone can PR anything, concerning or not. I'll merge everything as soon as it works.

Guess What Will Happen In This Fabric mod where anyone can PR anything, concerning or not (Unless it's too concerning). I'll merge everything as soon

anatom 65 Dec 25, 2022
UdemyPy is a bot that hourly looks for Udemy free courses and post them in my Telegram Channel: Free Courses.

UdemyPy UdemyPy is a bot that hourly looks for Udemy free courses and post them in my Telegram Channel: Free Courses. How does it work? For publishing

88 Dec 25, 2022