Single-Shot Motion Completion with Transformer

Last update: Dec 29, 2022

Related tags

Overview

Single-Shot Motion Completion with Transformer

👉 [Preprint] 👈

Abstract

Motion completion is a challenging and long-discussed problem, which is of great significance in film and game applications. For different motion completion scenarios (in-betweening, in-filling, and blending), most previous methods deal with the completion problems with case-by-case designs. In this work, we propose a simple but effective method to solve multiple motion completion problems under a unified framework and achieves a new state of the art accuracy under multiple evaluation settings. Inspired by the recent great success of attention-based models, we consider the completion as a sequence to sequence prediction problem. Our method consists of two modules - a standard transformer encoder with self-attention that learns long-range dependencies of input motions, and a trainable mixture embedding module that models temporal information and discriminates key-frames. Our method can run in a non-autoregressive manner and predict multiple missing frames within a single forward propagation in real time. We finally show the effectiveness of our method in music-dance applications.

State-of-the-art on Lafan1 dataset

With the help of Transformer, we achieve a new SOTA result on Lafan1 dataset.

Lengths = 30	L2Q	L2P	NPSS
Zero-Vel	1.51	6.60	0.2318
Interp.	0.98	2.32	0.2013
ERD-QV	0.69	1.28	0.1328
Ours	0.61	1.10	0.1222

Some results (blue appearaces represent keyframes):

Dance Infilling on Anidance Dataset

We also evaluate our method on the Anidance dataset:

Infilling on the test set (black skeletons are the keyframes):

(From Left to Right: Ours, Interp. and Ground Truth)

Infilling on random keyframes (keyframes are randomly chosen from the test set with a random order for simulating in-the-wild scenario):

(From Left to Right: Ours, Interp. and Ground Truth)

Dance blending

Our method can also work on complex dance movement completion:

Code

Coming soon

Citation

@misc{duan2021singleshot,
      title={Single-Shot Motion Completion with Transformer}, 
      author={Yinglin Duan and Tianyang Shi and Zhengxia Zou and Yenan Lin and Zhehui Qian and Bohan Zhang and Yi Yuan},
      year={2021},
      eprint={2103.00776},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Single-Shot Motion Completion with Transformer

Related tags

Overview

Single-Shot Motion Completion with Transformer

Abstract

State-of-the-art on Lafan1 dataset

Dance Infilling on Anidance Dataset

Dance blending

Code

Citation

Owner

FuxiCV

Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time.

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Who calls the shots? Rethinking Few-Shot Learning for Audio (WASPAA 2021)

Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On

Machine Learning Toolkit for Kubernetes

Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Code for Max-Margin Contrastive Learning - AAAI 2022

A Gura parser implementation for Python

Deep Learning for Time Series Forecasting.

Code release of paper Improving neural implicit surfaces geometry with patch warping

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Official PyTorch implementation of StyleGAN3

Pydantic models for pywttr and aiopywttr.

Chess reinforcement learning by AlphaGo Zero methods.

Code for the Convolutional Vision Transformer (ConViT)

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Block Sparse movement pruning