PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Overview

StyleSpeech - PyTorch Implementation

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation.

Status (2021.06.13)

  • StyleSpeech (naive branch)
  • Meta-StyleSpeech (main branch)

Quickstart

Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Inference

You have to download pretrained models and put them in output/ckpt/LibriTTS/.

For English single-speaker TTS, run

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --ref_audio path/to/reference_audio.wav --restore_step 200000 --mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml

The generated utterances will be put in output/result/. Your synthesized speech will have ref_audio's style.

Batch Inference

Batch inference is also supported, try

python3 synthesize.py --source preprocessed_data/LibriTTS/val.txt --restore_step 200000 --mode batch -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml

to synthesize all utterances in preprocessed_data/LibriTTS/val.txt. This can be viewed as a reconstruction of validation datasets referring to themselves for the reference style.

Controllability

The pitch/volume/speaking rate of the synthesized utterances can be controlled by specifying the desired pitch/energy/duration ratios. For example, one can increase the speaking rate by 20 % and decrease the volume by 20 % by

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step 200000 --mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml --duration_control 0.8 --energy_control 0.8

Note that the controllability is originated from FastSpeech2 and not a vital interest of StyleSpeech.

Training

Datasets

The supported datasets are

  • LibriTTS: a multi-speaker English dataset containing 585 hours of speech by 2456 speakers.
  • (will be added more)

Preprocessing

First, run

python3 prepare_align.py config/LibriTTS/preprocess.yaml

for some preparations.

In this implementation, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences.

Download the official MFA package and run

./montreal-forced-aligner/bin/mfa_align raw_data/LibriTTS/ lexicon/librispeech-lexicon.txt english preprocessed_data/LibriTTS

or

./montreal-forced-aligner/bin/mfa_train_and_align raw_data/LibriTTS/ lexicon/librispeech-lexicon.txt preprocessed_data/LibriTTS

to align the corpus and then run the preprocessing script.

python3 preprocess.py config/LibriTTS/preprocess.yaml

Training

Train your model with

python3 train.py -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml

As described in the paper, the script will start from pre-training the naive model until meta_learning_warmup steps and then meta-train the model for additional steps via episodic training.

TensorBoard

Use

tensorboard --logdir output/log/LibriTTS

to serve TensorBoard on your localhost.

Implementation Issues

  1. Use 22050Hz sampling rate instead of 16kHz.
  2. Add one fully connected layer at the beginning of Mel-Style Encoder to upsample input mel-spectrogram from 80 to 128.
  3. The model size including meta-learner is 28.197M.
  4. Use a maximum 16 batch size on training instead of 48 or 20 mainly due to the lack of memory capacity with a single 24GiB TITAN-RTX. This can be achieved by the following script to filter out data longer than max_seq_len:
    python3 filelist_filtering.py -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml
    
    This will generate train_filtered.txt in the same location of train.txt.
  5. Since the total batch size is decreased, the number of training steps is doubled compared to the original paper.
  6. Use HiFi-GAN instead of MelGAN for vocoding.

Citation

@misc{lee2021stylespeech,
  author = {Lee, Keon},
  title = {StyleSpeech},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/StyleSpeech}}
}

References

Comments
  • What is the perfermance compared with Adaspeech

    What is the perfermance compared with Adaspeech

    Thank you for your great work and share. Your work looks differ form adaspeech and NAUTILUS. You use GANs which i did not see in other papers regarding adaptative TTS. Have you compare this method with adaspeech1/2? how about the mos and similarity?

    opened by Liujingxiu23 10
  • The size of tensor a (xx) must match the size of tensor b (yy)

    The size of tensor a (xx) must match the size of tensor b (yy)

    Hi I try to run your project. I use cuda 10.1, all requirements are installed (with torch 1.8.1), all models are preloaded. But i have an error: python3 synthesize.py --text "Hello world" --restore_step 200000 --mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml --duration_control 0.8 --energy_control 0.8 --ref_audio ref.wav

    Removing weight norm...
    Raw Text Sequence: Hello world
    Phoneme Sequence: {HH AH0 L OW1 W ER1 L D}
    Traceback (most recent call last):
      File "synthesize.py", line 268, in <module>
        synthesize(model, args.restore_step, configs, vocoder, batchs, control_values)
      File "synthesize.py", line 152, in synthesize
        d_control=duration_control
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(input, *kwargs)
      File "/usr/local/work/model/StyleSpeech.py", line 144, in forward
        d_control,
      File "/usr/local/work/model/StyleSpeech.py", line 91, in G
        output, mel_masks = self.mel_decoder(output, style_vector, mel_masks)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(input, kwargs)
      File "/usr/local/work/model/modules.py", line 307, in forward
        enc_seq = self.mel_prenet(enc_seq, mask)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(input, kwargs)
      File "/usr/local/work/model/modules.py", line 259, in forward
        x = x.masked_fill(mask.unsqueeze(-1), 0)
    RuntimeError: The size of tensor a (44) must match the size of tensor b (47) at non-singleton dimension 1
    
    opened by DiDimus 9
  • VCTK datasets

    VCTK datasets

    Hi, I note your paper evaluates the models' performance on VCTK datasets, but I not see the process file about VCTK. Hence, could you share the files, thank you very much.

    opened by XXXHUA 7
  • training error

    training error

    Thanks for your sharing!

    I tried both naive and main branches using your checkpoints, it seems the former one is much better. So I trained AISHELL3 models with small changes on your code and the synthesized waves are good for me.

    However when I add my own data into AISHELL3, some error occurred: Training: 0%| | 3105/900000 [32:05<154:31:49, 1.61it/s] Epoch 2: 69%|██████████████████████▏ | 318/459 [05:02<02:14, 1.05it/s] File "train.py", line 211, in main(args, configs) File "train.py", line 87, in main output = model(*(batch[2:])) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward return self.module(*inputs[0], **kwargs[0]) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/StyleSpeech-naive/model/StyleSpeech.py", line 83, in forward ) = self.variance_adaptor( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/StyleSpeech-naive/model/modules.py", line 404, in forward x = x + pitch_embedding RuntimeError: The size of tensor a (52) must match the size of tensor b (53) at non-singleton dimension 1

    I only replaced two speakers and preprocessed data the same as the in readme.

    Do you have any advice for this error ? Any suggestion is appreciated.

    opened by MingZJU 6
  • the synthesis result is bad when using pretrain model

    the synthesis result is bad when using pretrain model

    hello sir, thanks for your sharing.

    i meet a problem when i using pretrain model to synthsize demo file. the effect of synthesized wav is so bad.

    do you konw what problem happened?

    pretrain_model: output/ckpt/LibriTTS_meta_learner/200000.pth.tar ref_audio: ref_audio.zip demo_txt: {Promises are often like the butterfly, which disappear after beautiful hover. No matter the ending is perfect or not, you cannot disappear from my world.} demo_wav:demo.zip

    opened by mnfutao 4
  • Maybe style_prototype can instead of ref_mel?

    Maybe style_prototype can instead of ref_mel?

    hello @keonlee9420 , thanks for your contribution on StyleSpeech. When I read your paper and source code, I think that the style_prototype (which is an embedding matrix) maybe can instread of the ref_mel, because there is a CE-loss between style_prototype and style_vector, which can control this embedding matrix close to style. In short, we can give a speaker id to synthesize this speaker's wave. Is it right?

    opened by forwiat 3
  • architecture shows bad results

    architecture shows bad results

    Hi, i have completely repeated your steps for learning. During training, style speech loss fell down, but after learning began, meta style speech loss began to grow up. Can you help with training the model? I can describe my steps in more detail.

    opened by e0xextazy 2
  • UnboundLocalError: local variable 'pitch' referenced before assignment

    UnboundLocalError: local variable 'pitch' referenced before assignment

    Hi, when I run preprocessor.py, I have this problem: /preprocessor.py", line 92, in build_from_path if len(pitch) > 0: UnboundLocalError: local variable 'pitch' referenced before assignment When I try to add a global declaration to the function, it shows NameError: name 'pitch' is not defined How should this be resolved? I would be grateful if I could get your guidance soon.

    opened by Summerxu86 0
  • How can I improve the synthesized results?

    How can I improve the synthesized results?

    I have trained the model for 200k steps, and still, the synthesised results are extremely bad. loss_curve This is what my loss curve looks like. Can you help me with what can I do now to improve my synthesized audio results?

    opened by sanjeevani279 1
  • RuntimeError: Error(s) in loading state_dict for Stylespeech

    RuntimeError: Error(s) in loading state_dict for Stylespeech

    Hi @keonlee9420, I am getting the following error, while running the naive branch :

    Traceback (most recent call last):
      File "synthesize.py", line 242, in <module>
        model = get_model(args, configs, device, train=False)
      File "/home/azureuser/aditya_workspace/stylespeech_keonlee_naive/utils/model.py", line 21, in get_model
        model.load_state_dict(ckpt["model"], strict=True)
      File "/home/azureuser/aditya_workspace/keonlee/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for StyleSpeech:
    	Missing key(s) in state_dict: "D_t.mel_linear.0.fc_layer.fc_layer.linear.weight_orig", "D_t.mel_linear.0.fc_layer.fc_layer.linear.weight", "D_t.mel_linear.0.fc_layer.fc_layer.linear.weight_u", "D_t.mel_linear.0.fc_layer.fc_layer.linear.weight_orig", "D_t.mel_linear.0.fc_layer.fc_layer.linear.weight_u", "D_t.mel_linear.0.fc_layer.fc_layer.linear.weight_v", "D_t.mel_linear.1.fc_layer.fc_layer.linear.weight_orig", "D_t.mel_linear.1.fc_layer.fc_layer.linear.weight", "D_t.mel_linear.1.fc_layer.fc_layer.linear.weight_u", "D_t.mel_linear.1.fc_layer.fc_layer.linear.weight_orig", "D_t.mel_linear.1.fc_layer.fc_layer.linear.weight_u", "D_t.mel_linear.1.fc_layer.fc_layer.linear.weight_v", "D_t.discriminator_stack.0.fc_layer.fc_layer.linear.weight_orig", "D_t.discriminator_stack.0.fc_layer.fc_layer.linear.weight", "D_t.discriminator_stack.0.fc_layer.fc_layer.linear.weight_u", "D_t.discriminator_stack.0.fc_layer.fc_layer.linear.weight_orig", "D_t.discriminator_stack.0.fc_layer.fc_layer.linear.weight_u", "D_t.discriminator_stack.0.fc_layer.fc_layer.linear.weight_v", "D_t.discriminator_stack.1.fc_layer.fc_layer.linear.weight_orig", "D_t.discriminator_stack.1.fc_layer.fc_layer.linear.weight", "D_t.discriminator_stack.1.fc_layer.fc_layer.linear.weight_u", "D_t.discriminator_stack.1.fc_layer.fc_layer.linear.weight_orig", "D_t.discriminator_stack.1.fc_layer.fc_layer.linear.weight_u", "D_t.discriminator_stack.1.fc_layer.fc_layer.linear.weight_v", "D_t.discriminator_stack.2.fc_layer.fc_layer.linear.weight_orig", "D_t.discriminator_stack.2.fc_layer.fc_layer.linear.weight", "D_t.discriminator_stack.2.fc_layer.fc_layer.linear.weight_u", "D_t.discriminator_stack.2.fc_layer.fc_layer.linear.weight_orig", "D_t.discriminator_stack.2.fc_layer.fc_layer.linear.weight_u", "D_t.discriminator_stack.2.fc_layer.fc_layer.linear.weight_v", "D_t.final_linear.fc_layer.fc_layer.linear.weight_orig", "D_t.final_linear.fc_layer.fc_layer.linear.weight", "D_t.final_linear.fc_layer.fc_layer.linear.weight_u", "D_t.final_linear.fc_layer.fc_layer.linear.weight_orig", "D_t.final_linear.fc_layer.fc_layer.linear.weight_u", "D_t.final_linear.fc_layer.fc_layer.linear.weight_v", "D_s.fc_1.fc_layer.fc_layer.linear.weight_orig", "D_s.fc_1.fc_layer.fc_layer.linear.weight", "D_s.fc_1.fc_layer.fc_layer.linear.weight_u", "D_s.fc_1.fc_layer.fc_layer.linear.weight_orig", "D_s.fc_1.fc_layer.fc_layer.linear.weight_u", "D_s.fc_1.fc_layer.fc_layer.linear.weight_v", "D_s.spectral_stack.0.fc_layer.fc_layer.linear.weight_orig", "D_s.spectral_stack.0.fc_layer.fc_layer.linear.weight", "D_s.spectral_stack.0.fc_layer.fc_layer.linear.weight_u", "D_s.spectral_stack.0.fc_layer.fc_layer.linear.weight_orig", "D_s.spectral_stack.0.fc_layer.fc_layer.linear.weight_u", "D_s.spectral_stack.0.fc_layer.fc_layer.linear.weight_v", "D_s.spectral_stack.1.fc_layer.fc_layer.linear.weight_orig", "D_s.spectral_stack.1.fc_layer.fc_layer.linear.weight", "D_s.spectral_stack.1.fc_layer.fc_layer.linear.weight_u", "D_s.spectral_stack.1.fc_layer.fc_layer.linear.weight_orig", "D_s.spectral_stack.1.fc_layer.fc_layer.linear.weight_u", "D_s.spectral_stack.1.fc_layer.fc_layer.linear.weight_v", "D_s.temporal_stack.0.conv_layer.conv_layer.conv.weight_orig", "D_s.temporal_stack.0.conv_layer.conv_layer.conv.weight", "D_s.temporal_stack.0.conv_layer.conv_layer.conv.weight_u", "D_s.temporal_stack.0.conv_layer.conv_layer.conv.bias", "D_s.temporal_stack.0.conv_layer.conv_layer.conv.weight_orig", "D_s.temporal_stack.0.conv_layer.conv_layer.conv.weight_u", "D_s.temporal_stack.0.conv_layer.conv_layer.conv.weight_v", "D_s.temporal_stack.1.conv_layer.conv_layer.conv.weight_orig", "D_s.temporal_stack.1.conv_layer.conv_layer.conv.weight", "D_s.temporal_stack.1.conv_layer.conv_layer.conv.weight_u", "D_s.temporal_stack.1.conv_layer.conv_layer.conv.bias", "D_s.temporal_stack.1.conv_layer.conv_layer.conv.weight_orig", "D_s.temporal_stack.1.conv_layer.conv_layer.conv.weight_u", "D_s.temporal_stack.1.conv_layer.conv_layer.conv.weight_v", "D_s.slf_attn_stack.0.w_qs.linear.weight_orig", "D_s.slf_attn_stack.0.w_qs.linear.weight", "D_s.slf_attn_stack.0.w_qs.linear.weight_u", "D_s.slf_attn_stack.0.w_qs.linear.weight_orig", "D_s.slf_attn_stack.0.w_qs.linear.weight_u", "D_s.slf_attn_stack.0.w_qs.linear.weight_v", "D_s.slf_attn_stack.0.w_ks.linear.weight_orig", "D_s.slf_attn_stack.0.w_ks.linear.weight", "D_s.slf_attn_stack.0.w_ks.linear.weight_u", "D_s.slf_attn_stack.0.w_ks.linear.weight_orig", "D_s.slf_attn_stack.0.w_ks.linear.weight_u", "D_s.slf_attn_stack.0.w_ks.linear.weight_v", "D_s.slf_attn_stack.0.w_vs.linear.weight_orig", "D_s.slf_attn_stack.0.w_vs.linear.weight", "D_s.slf_attn_stack.0.w_vs.linear.weight_u", "D_s.slf_attn_stack.0.w_vs.linear.weight_orig", "D_s.slf_attn_stack.0.w_vs.linear.weight_u", "D_s.slf_attn_stack.0.w_vs.linear.weight_v", "D_s.slf_attn_stack.0.layer_norm.weight", "D_s.slf_attn_stack.0.layer_norm.bias", "D_s.slf_attn_stack.0.fc.linear.weight_orig", "D_s.slf_attn_stack.0.fc.linear.weight", "D_s.slf_attn_stack.0.fc.linear.weight_u", "D_s.slf_attn_stack.0.fc.linear.weight_orig", "D_s.slf_attn_stack.0.fc.linear.weight_u", "D_s.slf_attn_stack.0.fc.linear.weight_v", "D_s.fc_2.fc_layer.fc_layer.linear.weight_orig", "D_s.fc_2.fc_layer.fc_layer.linear.weight", "D_s.fc_2.fc_layer.fc_layer.linear.weight_u", "D_s.fc_2.fc_layer.fc_layer.linear.weight_orig", "D_s.fc_2.fc_layer.fc_layer.linear.weight_u", "D_s.fc_2.fc_layer.fc_layer.linear.weight_v", "D_s.V.fc_layer.fc_layer.linear.weight", "D_s.w_b_0.fc_layer.fc_layer.linear.weight", "D_s.w_b_0.fc_layer.fc_layer.linear.bias", "style_prototype.weight".
    	Unexpected key(s) in state_dict: "speaker_emb.weight".
    

    Can you help with this, seems like the pre-trained weights are old and do not conform to the current architecture.

    opened by sirius0503 1
  • time dimension doesn't match

    time dimension doesn't match

    ^MTraining: 0%| | 0/200000 [00:00<?, ?it/s] ^MEpoch 1: 0%| | 0/454 [00:00<?, ?it/s]^[[APrepare training ... Number of StyleSpeech Parameters: 28197333 Removing weight norm... Traceback (most recent call last): File "train.py", line 224, in main(args, configs) File "train.py", line 98, in main output = (None, None, model((batch[2:-5]))) File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward return self.module(*inputs[0], **kwargs[0]) File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py", line 144, in forward d_control, File "/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py", line 88, in G d_control, File "/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/modules.py", line 417, in forward x = x + pitch_embedding RuntimeError: The size of tensor a (132) must match the size of tensor b (130) at non-singleton dimension 1 ^MTraining: 0%| | 1/200000 [00:02<166:02:12, 2.99s/it]

    I think it might because of mfa I used. As mentioned in https://montreal-forced-aligner.readthedocs.io/en/latest/getting_started.html, I installed mfa through conda.

    Then I used mfa align raw_data/LibriTTS lexicon/librispeech-lexicon.txt english preprocessed_data/LibriTTS instead of the way you showed. But I can't find a way to run it as the way you showed, because I installed mfa through conda.

    opened by MingjieChen 24
Releases(v1.0.2)
Owner
Keon Lee
Expressive Speech Synthesis | Conversational AI | Open-domain Dialog | NLP | Generative Models | Empathic Computing | HCI
Keon Lee
This repository is a series of notebooks that show solutions for the projects at Dataquest.io.

Dataquest Project Solutions This repository is a series of notebooks that show solutions for the projects at Dataquest.io. Of course, there are always

Dataquest 1.1k Dec 30, 2022
An SE(3)-invariant autoencoder for generating the periodic structure of materials

Crystal Diffusion Variational AutoEncoder This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic st

Tian Xie 94 Dec 10, 2022
Fit Fast, Explain Fast

FastExplain Fit Fast, Explain Fast Installing pip install fast-explain About FastExplain FastExplain provides an out-of-the-box tool for analysts to

8 Dec 15, 2022
Code for "On the Effects of Batch and Weight Normalization in Generative Adversarial Networks"

Note: this repo has been discontinued, please check code for newer version of the paper here Weight Normalized GAN Code for the paper "On the Effects

Sitao Xiang 182 Sep 06, 2021
Efficient semidefinite bounds for multi-label discrete graphical models.

Low rank solvers #################################### benchmark/ : folder with the random instances used in the paper. ############################

1 Dec 08, 2022
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

153 Dec 14, 2022
gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI G

Robin Henry 99 Dec 12, 2022
PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

FInite volume Neural Network (FINN) This repository contains the PyTorch code for models, training, and testing, and Python code for data generation t

Cognitive Modeling 20 Dec 18, 2022
FeTaQA: Free-form Table Question Answering

FeTaQA: Free-form Table Question Answering FeTaQA is a Free-form Table Question Answering dataset with 10K Wikipedia-based {table, question, free-form

Language, Information, and Learning at Yale 40 Dec 13, 2022
Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"

VoCapXLM Code for EMNLP2021 paper Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training Environment DockerFile: dancingso

Bo Zheng 15 Jul 28, 2022
Learning Modified Indicator Functions for Surface Reconstruction

Learning Modified Indicator Functions for Surface Reconstruction In this work, we propose a learning-based approach for implicit surface reconstructio

4 Apr 18, 2022
Official implementation of "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers"

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers Figure 1: Performance of SegFormer-B0 to SegFormer-B5. Project page

NVIDIA Research Projects 1.4k Dec 31, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Jan 05, 2023
[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

MosaicKD Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data" 1. Motivation Natural images share common l

ZJU-VIPA 37 Nov 10, 2022
We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will make a program to Crack Any Password Using Python. Show some ❤️ by starring this repository!

Crack Any Password Using Python We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will

Ananya Chatterjee 11 Dec 03, 2022
The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.

FCPS Fundamental Clustering Problems Suite The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning pu

9 Nov 27, 2022
The implementation of DeBERTa

DeBERTa: Decoding-enhanced BERT with Disentangled Attention This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Dis

Microsoft 1.2k Jan 06, 2023
Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

Ren Tianhe 49 Nov 10, 2022
一个目标检测的通用框架(不需要cuda编译),支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

一个目标检测的通用框架(不需要cuda编译),支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

Haoyu Xu 203 Jan 03, 2023
A Temporal Extension Library for PyTorch Geometric

Documentation | External Resources | Datasets PyTorch Geometric Temporal is a temporal (dynamic) extension library for PyTorch Geometric. The library

Benedek Rozemberczki 1.9k Jan 07, 2023