๐Ÿ‡ฐ๐Ÿ‡ท Text to Image in Korean

Overview

KoDALLE

Open In Colab Wandb Log

image-20211227151557604

Utilizing pretrained language modelโ€™s token embedding layer and position embedding layer as DALLEโ€™s text encoder.

Background

  • Training DALLE model from scratch demands large size paired dataset of images and captions. For example, OpenAI DALLE is trained with more than 250 million text-image pairs for the training.
  • If the dataset isnโ€™t large enough or is limited to specific domains, number of vocabularies in the trained DALLE model are insufficient. For instance, 1 million text captions of K-Fashion dataset only consists of more or less than 300 tokens.
  • Therefore, inferencing from such DALLE models could be problematic if the given sentence query is unconnected to the originally trained captionsโ€™ text dataset.

KoDALLE's Result on Small Size Fashion Dataset

OpenAIโ€™s DALLE KoDALLE of HappyFace
Train Dataset Size 250 Million Pairs 0.8 Million Pairs
#Params 12 Billion 428 Million
#Layers 64 Layers 16 Layers
Computing Resource 1024 x V100 16GB 1 x V100 32GB
Text Encoder 16384 Vocab x 512 Dim BPE 32000 Vocab x 1024 Dim klue/roberta-large
Image Encoder VQVAE VQGAN
Optimizer AdamW AdamW
Learning Rate 4.5e-5 3.0e-5
Weight Decay 4.5e-3 3.0e-3
LR Scheduler ReduceLROnPlateau -

The team constructed Text to Fashion Design DALLE model in Korean language with less than 100k text-image sampled pairs.

Caption ํ•˜์˜์—์„œ ์ƒ‰์ƒ์€ ์Šค์นด์ด๋ธ”๋ฃจ์ด๋‹ค. ์ƒ์˜์—์„œ ๊ธฐ์žฅ์€ ๋กฑ์ด๋‹ค. ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋ธ”๋ผ์šฐ์Šค์ด๋‹ค. ๋””ํ…Œ์ผ์—๋Š” ์…”๋ง์ด๋‹ค. ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์†Œ์žฌ์—๋Š” ์‹คํฌ์ด๋‹ค. ํ”„๋ฆฐํŠธ์—๋Š” ๋ฌด์ง€์ด๋‹ค. ๋„ฅ๋ผ์ธ์€ ๋ธŒ์ด๋„ฅ์ด๋‹ค. ํ•์€ ๋…ธ๋ฉ€
Generated Image image
Caption ์•„์šฐํ„ฐ๋Š” ์ƒ‰์ƒ์ด ์นดํ‚ค ์†Œ์žฌ๊ฐ€ ์šฐ๋ธ ํ•์ด ๋ฃจ์ฆˆ์ธ ์ฝ”ํŠธ์ด๋‹ค. ํ•˜์˜๋Š” ์ƒ‰์ƒ์ด ๋„ค์ด๋น„ ์†Œ์žฌ๊ฐ€ ๋ฐ๋‹˜ ํ•์ด ์Šคํ‚ค๋‹ˆ์ธ ์ฒญ๋ฐ”์ง€์ด๋‹ค.
Generated Image image
Caption ํ•˜์˜์—์„œ ๊ธฐ์žฅ์€ ๋ฐœ๋ชฉ์ด๋‹ค. ์ƒ‰์ƒ์€ ๋ธ”๋ฃจ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ์Šค์ปคํŠธ์ด๋‹ค. ์†Œ์žฌ์—๋Š” ๋ฐ๋‹˜์ด๋‹ค. ํ•์€ ์™€์ด๋“œ์ด๋‹ค. ์ƒ์˜์—์„œ ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋ธ”๋ผ์šฐ์Šค์ด๋‹ค. ๋””ํ…Œ์ผ์—๋Š” ์…”๋ง์ด๋‹ค. ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์†Œ์žฌ์—๋Š” ์šฐ๋ธ์ด๋‹ค.
Generated Image image
Caption ์ƒ์˜์—์„œ ๊ธฐ์žฅ์€ ๋…ธ๋ฉ€์ด๋‹ค. ์ƒ์˜์—์„œ ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์ƒ์˜์—์„œ ์„œ๋ธŒ์ƒ‰์ƒ์€ ๋ธ”๋ž™์ด๋‹ค. ์ƒ์˜์—์„œ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ํ‹ฐ์…”์ธ ์ด๋‹ค. ์ƒ์˜์—์„œ ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์ƒ์˜์—์„œ ์†Œ์žฌ์—๋Š” ์ €์ง€์ด๋‹ค. ์ƒ์˜์—์„œ ํ”„๋ฆฐํŠธ์—๋Š” ๋ ˆํ„ฐ๋ง์ด๋‹ค. ์ƒ์˜์—์„œ ๋„ฅ๋ผ์ธ์€ ๋ผ์šด๋“œ๋„ฅ์ด๋‹ค. ์ƒ์˜์—์„œ ํ•์€ ๋ฃจ์ฆˆ์ด๋‹ค.
Generated Image image

Methodology

Experimentations were conducted with the following Korean Transformers Modelsโ€™ embedding layers. The team selected klue/roberta-large as baseline in the repository considering the size of the model.

KoDALLE with klue/roberta-large's wpe and wte which is trainable on 16GB GPU Google Colab environment. Hyperparams related to the DALLE's model size are following.

'BATCH_SIZE': 32
'DEPTH': 2
'TEXT_SEQ_LEN': 128
'VOCAB_SIZE': 32000
'MODEL_DIM': 1024
'ATTN_TYPES': 'full'
'DIM_HEAD': 64
'HEADS': 8

Significance

  • Offers promising result for training from scratch on specific domains with small size dataset.
  • Introduces solution for domain specific DALLE & CLIP models to be robust on input sentence.
  • Recommends adequate text-to-image model size for given computation resource.
  • Suggests effortless method of creating DALLE & CLIP model for own languages if pretrained language model is available.

WIP

  • Add image-caption reranker(EfficientNet + Klue/roberta-large)
  • Model trained with 500k text-image pairs.
  • Modulize in python code.
  • Update Inference code.
  • Update FID and IS metrics on test and validation dataset.
You might also like...
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

Code for EMNLP 2021 main conference paper
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Comments
  • Koclip apply in KoDALLE

    Koclip apply in KoDALLE

    ๋ณ€๊ฒฝ์‚ฌํ•ญ

    add) model.py

    ํ˜„์ˆ˜๋‹˜์˜ KoCLIP์ด DALLE Roberta ์—์„œ ์ž‘๋™ํ•˜๊ฒŒ๋” ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

    dev branch์— ์กด์žฌํ•˜๋Š” model.py ๋น„๊ตํ•˜๋ฉด์„œ ์ˆ˜์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

    add) generate.ipynb

    KoCLIP์ด ์ž‘๋™ํ•˜๋Š”๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“  ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

    opened by JoonHong-Kim 1
  • add: KoCLIP codes

    add: KoCLIP codes

    ๋ณ€๊ฒฝ์‚ฌํ•ญ:

    refactor) clipmodel.py

    • CLIPModel ์ตœ์ข… ๋ฒ„์ „์œผ๋กœ ์ˆ˜์ •
    • clip folder๋กœ ์ด๋™

    add) clip/train_clip.py

    • CLIP ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค

    add) clip/dataloader.py

    • CLIP ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ dataloader ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
    opened by shawnhyeonsoo 0
  • add skip_sample in TextImageDataset

    add skip_sample in TextImageDataset

    ๋ณ€๊ฒฝ์‚ฌํ•ญ

    modify) loader.py

    • TextImageDataset์—์„œ texts, image๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ, data๊ฐ€ ์—†์„ ๊ฒฝ์šฐ ๋ฐœ์ƒํ•˜๋Š” ์—๋Ÿฌ ์ฒ˜๋ฆฌ
    • skip_sample ํ•จ์ˆ˜๋ฅผ ํ™œ์šฉํ•˜์—ฌ error๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ, random ํ˜น์€ ๋‹ค์Œ index๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ skip
    • ๊ธฐ์กด train_dalle_gpt_roberta.py๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ˆ˜์ •
    opened by jjonhwa 0
Releases(v0.1.0-beta)
Like Dirt-Samples, but cleaned up

Clean-Samples Like Dirt-Samples, but cleaned up, with clear provenance and license info (generally a permissive creative commons licence but check the

TidalCycles 39 Nov 30, 2022
Rational Activation Functions - Replacing Padรฉ Activation Units

Rational Activations - Learnable Rational Activation Functions First introduce as PAU in Padรฉ Activation Units: End-to-end Learning of Activation Func

<a href=[email protected]"> 38 Nov 22, 2022
Educational API for 3D Vision using pose to control carton.

Educational API for 3D Vision using pose to control carton.

41 Jul 10, 2022
Lightwood is Legos for Machine Learning.

Lightwood is like Legos for Machine Learning. A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glu

MindsDB Inc 312 Jan 08, 2023
Continuous Time LiDAR odometry

CT-ICP: Elastic SLAM for LiDAR sensors This repository implements the SLAM CT-ICP (see our article), a lightweight, precise and versatile pure LiDAR o

385 Dec 29, 2022
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

Hanxun Huang 11 Nov 30, 2022
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
ไฝฟ็”จyolov5่ฎญ็ปƒ่‡ชๅทฑๆ•ฐๆฎ้›†(่ฏฆ็ป†่ฟ‡็จ‹)ๅนถ้€š่ฟ‡flask้ƒจ็ฝฒ

ไฝฟ็”จyolov5่ฎญ็ปƒ่‡ชๅทฑ็š„ๆ•ฐๆฎ้›†๏ผˆ่ฏฆ็ป†่ฟ‡็จ‹๏ผ‰ๅนถ้€š่ฟ‡flask้ƒจ็ฝฒ ไพ่ต–ๅบ“ torch torchvision numpy opencv-python lxml tqdm flask pillow tensorboard matplotlib pycocotools Windows๏ผŒ่ฏทไฝฟ็”จ pycoc

HB.com 19 Dec 28, 2022
Simulation code and tutorial for BBHnet training data

Simulation Dataset for BBHnet NOTE: OLD README, UPDATE IN PROGRESS We generate simulation dataset to train BBHnet, our deep learning framework for det

0 May 31, 2022
Implement the Pareto Optimizer and pcgrad to make a self-adaptive loss for multi-task

multi-task_losses_optimizer Implement the Pareto Optimizer and pcgrad to make a self-adaptive loss for multi-task ๅทฒ็ปๅฎž้ชŒ่ฟ‡ไบ†๏ผŒไธไผšๆœ‰cuda out of memoryๆƒ…ๅ†ต ##Par

14 Dec 25, 2022
[NeurIPS 2021] โ€œImproving Contrastive Learning on Imbalanced Data via Open-World Samplingโ€,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

VITA 24 Dec 17, 2022
Unofficial implementation of Perceiver IO: A General Architecture for Structured Inputs & Outputs

Perceiver IO Unofficial implementation of Perceiver IO: A General Architecture for Structured Inputs & Outputs Usage import torch from src.perceiver.

Timur Ganiev 111 Nov 15, 2022
A disassembler for the RP2040 Programmable I/O State-machine!

piodisasm A disassembler for the RP2040 Programmable I/O State-machine! Usage Just run piodisasm.py on a file that contains the PIO code as hex! (Such

Ghidra Ninja 29 Dec 06, 2022
Codebase for Inducing Causal Structure for Interpretable Neural Networks

Interchange Intervention Training (IIT) Codebase for Inducing Causal Structure for Interpretable Neural Networks Release Notes 12/01/2021: Code and Pa

Zen 6 Oct 10, 2022
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

Siavash Eliasi 33 May 31, 2022
A Python library for Deep Probabilistic Modeling

Abstract DeeProb-kit is a Python library that implements deep probabilistic models such as various kinds of Sum-Product Networks, Normalizing Flows an

DeeProb-org 46 Dec 26, 2022
LBK 20 Dec 02, 2022
Self-Learned Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence

In this paper, we address the problem of rain streaks removal in video by developing a self-learned rain streak removal method, which does not require any clean groundtruth images in the training pro

Yang Wenhan 44 Dec 06, 2022
Dilated RNNs in pytorch

PyTorch Dilated Recurrent Neural Networks PyTorch implementation of Dilated Recurrent Neural Networks (DilatedRNN). Getting Started Installation: $ pi

Zalando Research 200 Nov 17, 2022
Earth Vision Foundation

EVer - A Library for Earth Vision Researcher EVer is a Pytorch-based Python library to simplify the training and inference of the deep learning model.

Zhuo Zheng 34 Nov 26, 2022