Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

Overview

Style Transformer for Image Inversion and Editing (CVPR2022)

https://arxiv.org/abs/2203.07932

Existing GAN inversion methods fail to provide latent codes for reliable reconstruction and flexible editing simultaneously. This paper presents a transformer-based image inversion and editing model for pretrained StyleGAN which is not only with less distortions, but also of high quality and flexibility for editing. The proposed model employs a CNN encoder to provide multi-scale image features as keys and values. Meanwhile it regards the style code to be determined for different layers of the generator as queries. It first initializes query tokens as learnable parameters and maps them into $W^+$ space. Then the multi-stage alternate self- and cross-attention are utilized, updating queries with the purpose of inverting the input by the generator. Moreover, based on the inverted code, we investigate the reference- and label-based attribute editing through a pretrained latent classifier, and achieve flexible image-to-image translation with high quality results. Extensive experiments are carried out, showing better performances on both inversion and editing tasks within StyleGAN.


Our style transformer proposes novel multi-stage style transformer in w+ space to invert image accurately, and we characterize the image editing in StyleGAN into label-based and reference-based, and use a non-linear classifier to generate the editing vector.

Getting Started

Prerequisites

  • Ubuntu 16.04
  • NVIDIA GPU + CUDA CuDNN
  • Python 3

Pretrained Models

We provide the pre-trained models of inversion for face and car domains.

Training

Preparing Datasets

Update configs/paths_config.py with the necessary data paths and model paths for training and inference.

dataset_paths = {
    'train_data': '/path/to/train/data'
    'test_data': '/path/to/test/data',
}

Preparing Generator

We use rosinality's StyleGAN2 implementation. You can download the 256px pretrained model in the project and put it in the directory /pretrained_models.

Training Inversion Model

python scripts/train.py \
--dataset_type=ffhq_encode \
--exp_dir=results/train_style_transformer \
--batch_size=8 \
--test_batch_size=8 \
--val_interval=5000 \
--save_interval=10000 \
--stylegan_weights=pretrained_models/stylegan2-ffhq-config-f.pt

Inference

python scripts/inference.py \
--exp_dir=results/infer_style_transformer \
--checkpoint_path=results/train_style_transformer/checkpoints/best_model.pt \
--data_path=/test_data \
--test_batch_size=8 \

Citation

If you use this code for your research, please cite

@article{hu2022style,
  title={Style Transformer for Image Inversion and Editing},
  author={Hu, Xueqi and Huang, Qiusheng and Shi, Zhengyi and Li, Siyuan and Gao, Changxin and Sun, Li and Li, Qingli},
  journal={arXiv preprint arXiv:2203.07932},
  year={2022}
}
Owner
Xueqi Hu
Xueqi Hu
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 03, 2022
Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift (ICCV 2021)

Π-NAS This repository provides the evaluation code of our submitted paper: Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training

Jiqi Zhang 18 Aug 18, 2022
This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Core-tuning This repository is the official implementation of ``Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regular

vanint 18 Dec 17, 2022
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Couler What is Couler? Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo

Couler Project 781 Jan 03, 2023
Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021] SPEC: Seeing People in the Wild with an Estimated Camera, Muhammed Kocabas, Chun-

Muhammed Kocabas 187 Dec 26, 2022
这是一个yolo3-tf2的源码,可以用于训练自己的模型。

YOLOV3:You Only Look Once目标检测模型在Tensorflow2当中的实现 目录 性能情况 Performance 所需环境 Environment 文件下载 Download 训练步骤 How2train 预测步骤 How2predict 评估步骤 How2eval 参考资料

Bubbliiiing 68 Dec 21, 2022
Unsupervised Feature Loss (UFLoss) for High Fidelity Deep learning (DL)-based reconstruction

Unsupervised Feature Loss (UFLoss) for High Fidelity Deep learning (DL)-based reconstruction Official github repository for the paper High Fidelity De

28 Dec 16, 2022
A PyTorch-centric hybrid classical-quantum machine learning framework

torchquantum A PyTorch-centric hybrid classical-quantum dynamic neural networks framework. News Add a simple example script using quantum gates to do

MIT HAN Lab 400 Jan 02, 2023
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

Matterport, Inc 22.5k Jan 04, 2023
My usage of Real-ESRGAN to upscale anime, some test and results in the test_img folder

anime upscaler My usage of Real-ESRGAN to upscale anime, I hope to use this on a proper GPU cuz doing this on CPU is completely shit 😂 , I even tried

Shangar Muhunthan 29 Jan 07, 2023
This repository provides the code for MedViLL(Medical Vision Language Learner).

MedViLL This repository provides the code for MedViLL(Medical Vision Language Learner). Our proposed architecture MedViLL is a single BERT-based model

SuperSuperMoon 39 Jan 05, 2023
Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators

Pandas TA - A Technical Analysis Library in Python 3 Pandas Technical Analysis (Pandas TA) is an easy to use library that leverages the Pandas package

Kevin Johnson 3.2k Jan 09, 2023
Implementation of SwinTransformerV2 in TensorFlow.

SwinTransformerV2-TensorFlow A TensorFlow implementation of SwinTransformerV2 by Microsoft Research Asia, based on their official implementation of Sw

Phan Nguyen 2 May 30, 2022
Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.

Tensorflow-Mobile-Generic-Object-Localizer Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label. Ori

Ibai Gorordo 11 Nov 15, 2022
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

Microsoft 409 Jan 06, 2023
Keras like implementation of Deep Learning architectures from scratch using numpy.

Mini-Keras Keras like implementation of Deep Learning architectures from scratch using numpy. How to contribute? The project contains implementations

MANU S PILLAI 5 Oct 10, 2021
Tensorflow-seq2seq-tutorials - Dynamic seq2seq in TensorFlow, step by step

seq2seq with TensorFlow Collection of unfinished tutorials. May be good for educational purposes. 1 - simple sequence-to-sequence model with dynamic u

Matvey Ezhov 1k Dec 17, 2022
Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper

Google Research 431 Dec 27, 2022
Waymo motion prediction challenge 2021: 3rd place solution

Waymo motion prediction challenge 2021: 3rd place solution 📜 Technical report 🗨️ Presentation 🎉 Announcement 🛆Motion Prediction Channel Website 🛆

158 Jan 08, 2023
End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

onnx-facial-lmk-detector End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model, model.onnx. Demo You can

atksh 42 Dec 30, 2022