Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

Overview

NÜWA - Pytorch (wip)

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch. This repository will be populated in the case that Microsoft does not open source the code by end of December. It may also contain an extension into video and audio, using a dual decoder approach.

DeepReader

Citations

@misc{wu2021nuwa,
    title   = {N\"UWA: Visual Synthesis Pre-training for Neural visUal World creAtion}, 
    author  = {Chenfei Wu and Jian Liang and Lei Ji and Fan Yang and Yuejian Fang and Daxin Jiang and Nan Duan},
    year    = {2021},
    eprint  = {2111.12417},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
Comments
  • Question about generated videos?

    Question about generated videos?

    There are a lot of negative numbers and very small decimals (like 5e-1). But the loss degrades normally when training. Is that a normal situation? How can I make the result visible?

    opened by Fitzwong 0
  • Why the video does not pass through the encoder?

    Why the video does not pass through the encoder?

    Hi! lucidrains. Thanks for providing a great repo which is convenient to understand the NUWA paper.
    I have a question as follows: In the NUWA paper, we can see that the inputs of the Encoder are caption tokens (caption condition) and the video tokens (3DNA condition). So, in my eye, the video tokens sequence should fully self-attend in the Encoder, right? And then, the outputs condition the Decoder. The Decoder provided by you is as following. 截屏2022-05-12 上午11 07 12. It has causal self-attention and text-condition as we expected. But from the definition in paper, the condition contains the text-condition and 3DNA condition, and these two condition the Decoder. Is my opinion right? I am just curious about the condition in the NUWA paper. The Encoder in your repo is only the Text-Encoder, but the video does not pass through the encoder to condition the Encoder.

    Looking forward to your reply! Thanks!

    opened by Wang-Xiaodong1899 0
  • Questions about function forward() in NUWA please.

    Questions about function forward() in NUWA please.

    I'm confused me that, in function forward() of class NUWA, the ground-truth video is fed to transformer and calculate the output video, which is different from function generate().

    frame_embeddings = self.video_transformer(
                frame_embeddings,  # calculated from ground-truth video
                context = text_embeds,
                context_mask = text_mask
            )
    

    So when training NUWA, the loss comes from logits. But the logits are not only from text, but ground-truth video (only one transformer layer, different from the auto-regressive model in generate function). Is that some kind of cheating when training? Or should I generate logits in the same way as in generate(), and then calculate loss to train?

    opened by Fitzwong 1
  • Type of dataset for training VQ-GAN

    Type of dataset for training VQ-GAN

    Hi,

    First, thanks a lot for the amazing work! I have one question regarding the training of the VQ-GAN, do you recommend training it on a dataset similar to the dataset the nuwa model will be trained? What I mean is, if I want to train nuwa to generate sport videos based on text, do I need to also train the VQ-GAN on a sport dataset?

    Thanks a lot

    opened by antonibigata 0
  • Pseudocode for 3DNA?

    Pseudocode for 3DNA?

    me no comprendai le complex einops 😢

    Can someone give the 3DNA pseudocode to illustrate what's going on 🤗

    (Also how did lucidrains bang out thousands of lines of code in a few weeks - is he confirmed to be human? 🤔)

    opened by neel04 4
Releases(0.7.7a)
Owner
Phil Wang
Working with Attention. It's all we need
Phil Wang
Parasite: a tool allowing you to compress and decompress files, to reduce their size

🦠 Parasite 🦠 Parasite is a tool written in Python3 allowing you to "compress" any file, reducing its size. ⭐ Features ⭐ + Fast + Good optimization,

Billy 30 Nov 25, 2022
Official implementation for ICDAR 2021 paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer"

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer Description Convert offline handwritten mathematical expressi

Wenqi Zhao 87 Dec 27, 2022
Wanli Li and Tieyun Qian: Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction, IJCNN 2021

MRefG Wanli Li and Tieyun Qian: "Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction", IJCNN 2021 1. Requirements To reproduc

万理 5 Jul 26, 2022
Turi Create simplifies the development of custom machine learning models.

Quick Links: Installation | Documentation | WWDC 2019 | WWDC 2018 Turi Create Check out our talks at WWDC 2019 and at WWDC 2018! Turi Create simplifie

Apple 10.9k Jan 01, 2023
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
Modification of convolutional neural net "UNET" for image segmentation in Keras framework

ZF_UNET_224 Pretrained Model Modification of convolutional neural net "UNET" for image segmentation in Keras framework Requirements Python 3.*, Keras

209 Nov 02, 2022
we propose a novel deep network, named feature aggregation and refinement network (FARNet), for the automatic detection of anatomical landmarks.

Feature Aggregation and Refinement Network for 2D Anatomical Landmark Detection Overview Localization of anatomical landmarks is essential for clinica

aoyueyuan 0 Aug 28, 2022
ICON: Implicit Clothed humans Obtained from Normals

ICON: Implicit Clothed humans Obtained from Normals arXiv, December 2021. Yuliang Xiu · Jinlong Yang · Dimitrios Tzionas · Michael J. Black Table of C

Yuliang Xiu 1.1k Dec 30, 2022
Efficient Sparse Attacks on Videos using Reinforcement Learning

EARL This repository provides a simple implementation of the work "Efficient Sparse Attacks on Videos using Reinforcement Learning" Example: Demo: Her

12 Dec 05, 2021
Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis"

Beyond the Spectrum Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis" by Yang He, Ning Yu, Margret Keu

Yang He 27 Jan 07, 2023
Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.

Pano3D A Holistic Benchmark and a Solid Baseline for 360o Depth Estimation Pano3D is a new benchmark for depth estimation from spherical panoramas. We

Visual Computing Lab, Information Technologies Institute, Centre for Reseach and Technology Hellas 50 Dec 29, 2022
Pytorch implementation of PTNet for high-resolution and longitudinal infant MRI synthesis

Pyramid Transformer Net (PTNet) Project | Paper Pytorch implementation of PTNet for high-resolution and longitudinal infant MRI synthesis. PTNet: A Hi

Xuzhe Johnny Zhang 6 Jun 08, 2022
Numerical Methods with Python, Numpy and Matplotlib

Numerical Bric-a-Brac Collections of numerical techniques with Python and standard computational packages (Numpy, SciPy, Numba, Matplotlib ...). Diffe

Vincent Bonnet 10 Dec 20, 2021
PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

SelfReg PyTorch official implementation of Self-supervised Contrastive Regularization for Domain Generalization (SelfReg, https://arxiv.org/abs/2104.0

64 Dec 16, 2022
This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Core-tuning This repository is the official implementation of ``Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regular

vanint 18 Dec 17, 2022
GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images

GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-

VITA 298 Dec 12, 2022
A module that used for encrypt code which includes RSA and AES

软件加密模块 requirement: Crypto,pycryptodome,pyqt5 本地加密信息为随机字符串 使用说明 命令行参数 -h 帮助 -checkWorking 检查是否能正常工作,后接1确认指令 -checkEndDate 检查截至日期,后接1确认指令 -activateCode

2 Sep 27, 2022
A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

2021: A Year Full of Amazing AI papers- A Review 📌 A curated list of the latest breakthroughs in AI by release date with a clear video explanation, l

Louis-François Bouchard 2.9k Dec 31, 2022
How to Leverage Multimodal EHR Data for Better Medical Predictions?

How to Leverage Multimodal EHR Data for Better Medical Predictions? This repository contains the code of the paper: How to Leverage Multimodal EHR Dat

13 Dec 13, 2022
Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Keyhole Imaging Code & Dataset Code associated with the paper "Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Singl

Stanford Computational Imaging Lab 20 Feb 03, 2022