PortaSpeech - PyTorch Implementation

Overview

PortaSpeech - PyTorch Implementation

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech.

Model Size

Module Normal Small Normal (paper) Small (paper)
Total 34.3M 9.6M 21.8M 6.7M
LinguisticEncoder 14M 3.4M - -
VariationalGenerator 11M 2.8M - -
FlowPostNet 9.3M 3.4M - -

Quickstart

DATASET refers to the names of datasets such as LJSpeech in the following documents.

Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Also, Dockerfile is provided for Docker users.

Inference

You have to download the pretrained models and put them in output/ckpt/DATASET/.

For a single-speaker TTS, run

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step RESTORE_STEP --mode single --dataset DATASET

The generated utterances will be put in output/result/.

Batch Inference

Batch inference is also supported, try

python3 synthesize.py --source preprocessed_data/DATASET/val.txt --restore_step RESTORE_STEP --mode batch --dataset DATASET

to synthesize all utterances in preprocessed_data/DATASET/val.txt.

Controllability

The speaking rate of the synthesized utterances can be controlled by specifying the desired duration ratios. For example, one can increase the speaking rate by 20 by

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step RESTORE_STEP --mode single --dataset DATASET --duration_control 0.8

Please note that the controllability is originated from FastSpeech2 and not a vital interest of PortaSpeech.

Training

Datasets

The supported datasets is

  • LJSpeech: a single-speaker English dataset consists of 13100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total.

Preprocessing

Run

python3 prepare_align.py --dataset DATASET

for some preparations.

For the forced alignment, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences. Pre-extracted alignments for the datasets are provided here. You have to unzip the files in preprocessed_data/DATASET/TextGrid/. Alternately, you can run the aligner by yourself.

After that, run the preprocessing script by

python3 preprocess.py --dataset DATASET

Training

Train your model with

python3 train.py --dataset DATASET

Useful options:

  • To use Automatic Mixed Precision, append --use_amp argument to the above command.
  • The trainer assumes single-node multi-GPU training. To use specific GPUs, specify CUDA_VISIBLE_DEVICES= at the beginning of the above command.

TensorBoard

Use

tensorboard --logdir output/log

to serve TensorBoard on your localhost.

Notes

  • For vocoder, HiFi-GAN and MelGAN are supported.
  • Add convolution layer and residual layer in VariationalGenerator to match the shape of conditioner and output.
  • No ReLU activation and LayerNorm in VariationalGenerator for convergence of word-to-phoneme alignment of LinguisticEncoder.
  • Use absolute positional encoding in LinguisticEncoder instead of relative positional encoding.
  • Will be extended to a multi-speaker TTS.

Citation

Please cite this repository by the "Cite this repository" of About section (top right of the main page).

References

Comments
  • Speed in CPU

    Speed in CPU

    Hi, thank you very much for you work and share. In the paper, the proposed method have been compared with many methods in MOS, parameter size, as well as the speed. While you compute the RTF with GPU, did you compared the RTF / speed when running in CPU?

    opened by Liujingxiu23 6
  • Weird  sound from the beginning of the sentence

    Weird sound from the beginning of the sentence "hello"

    Hi, thanks for your contribution in TTS, and it's such a great work !! It's seems perfect in most of the sentence while trying python3 synthesize.py --text "MY_SENTENCE" --restore_step 125000 --mode single --dataset LJSpeech, but when I tried the sentence with "hello" in the front, the sound of "hello" became long and weird. Here is the mel-spetrogram of "Hello, glad to see you." And you can observe a large area on the left is represent the word "hello" clearly. I've tried to check your training data in preprocessed_data/LJSpeech/train.txt, and I couldn't find the word "hello" in that.

    Is this problem caused by the quantity of the phoneme of the word merely or I just do something wrong or something else? Anything would help, thank you.

    opened by johnkuan506 5
  • I hear echo & background noise on the DEMO wavs

    I hear echo & background noise on the DEMO wavs

    This can happen because of blurring of mel-spectrogram(in most cases start of the sound should have sharp edge), which can happen because bad attention. I could recommend to try using Diagonal guided attention (DGA) during the training(https://arxiv.org/abs/1710.08969)

    opened by creotiv 4
  • A run Problem(LJSpeech)

    A run Problem(LJSpeech)

    File "preprocess.py", line 20, in preprocessor.build_from_path() File "D:\UW-Detection\PortaSpeech\preprocessor\preprocessor.py", line 129, in build_from_path n_frames += n UnboundLocalError: local variable 'n' referenced before assignment when I use NATSpeech show this problem (BiaoBei dataset) when I use LJSpeech dataset and this code show this problem I use global but cannot ......

    opened by yanzhuangzhuang-beep 2
  • Inference issue

    Inference issue

    Basically, I tried to run it in the Google Colab

    1st cell

    %cd /content/
    !git clone https://github.com/keonlee9420/PortaSpeech
    %cd /content/PortaSpeech/
    !pip install -r /content/PortaSpeech/requirements.txt
    

    2nd

    id_big = '1VTotGmE42a19bevwgQ9mhPkXzQvKzl8q'
    id_small = '1Y0IGlc4zJ7XN5sh4aPWLTeQ80D9ZhfbB'
    
    !mkdir /content/PortaSpeech/output/
    !mkdir /content/PortaSpeech/output/ckpt/
    !mkdir /content/PortaSpeech/output/ckpt/DATASET/
    %cd /content/PortaSpeech/output/ckpt/DATASET/
    !gdown --id $id_big 
    !gdown --id $id_small 
    %cd /content/PortaSpeech
    

    3rd

    %cd /content/PortaSpeech
    !python3 synthesize.py --text "Moved to Site-19 1993. Origin is as of yet unknown. It is constructed from concrete and rebar with traces of Krylon brand spray paint." \
                            --restore_step 125000 --mode single --dataset DATASET
    

    and this is what I've got:

    /content/PortaSpeech
    [nltk_data] Downloading package averaged_perceptron_tagger to
    [nltk_data]     /root/nltk_data...
    [nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
    [nltk_data] Downloading package cmudict to /root/nltk_data...
    [nltk_data]   Unzipping corpora/cmudict.zip.
    2021-10-26 10:57:51.803863: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
    Traceback (most recent call last):
      File "synthesize.py", line 138, in <module>
        args.dataset)
      File "/content/PortaSpeech/utils/tools.py", line 19, in get_configs_of
        os.path.join(config_dir, "preprocess.yaml"), "r"), Loader=yaml.FullLoader)
    FileNotFoundError: [Errno 2] No such file or directory: './config/DATASET/preprocess.yaml'
    

    What this 'preprocess.yaml' is exactly?

    opened by dobrosketchkun 2
  • missing keys

    missing keys

    Traceback (most recent call last): File "synthesize.py", line 153, in model = get_model(args, configs, device, train=False) File "/content/PortaSpeech/utils/model.py", line 21, in get_model model.load_state_dict(ckpt["model"]) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PortaSpeech: Missing key(s) in state_dict: "linguistic_encoder.phoneme_encoder.attn_layers.3.emb_rel_k", "linguistic_encoder.phoneme_encoder.attn_layers.3.emb_rel_v", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_q.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_q.bias", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_k.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_k.bias", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_v.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_v.bias", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_o.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_o.bias", "linguistic_encoder.phoneme_encoder.norm_layers_1.3.gamma", "linguistic_encoder.phoneme_encoder.norm_layers_1.3.beta", "linguistic_encoder.phoneme_encoder.ffn_layers.3.conv.weight", "linguistic_encoder.phoneme_encoder.ffn_layers.3.conv.bias", "linguistic_encoder.phoneme_encoder.norm_layers_2.3.gamma", "linguistic_encoder.phoneme_encoder.norm_layers_2.3.beta", "linguistic_encoder.word_encoder.attn_layers.3.emb_rel_k", "linguistic_encoder.word_encoder.attn_layers.3.emb_rel_v", "linguistic_encoder.word_encoder.attn_layers.3.conv_q.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_q.bias", "linguistic_encoder.word_encoder.attn_layers.3.conv_k.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_k.bias", "linguistic_encoder.word_encoder.attn_layers.3.conv_v.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_v.bias", "linguistic_encoder.word_encoder.attn_layers.3.conv_o.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_o.bias", "linguistic_encoder.word_encoder.norm_layers_1.3.gamma", "linguistic_encoder.word_encoder.norm_layers_1.3.beta", "linguistic_encoder.word_encoder.ffn_layers.3.conv.weight", "linguistic_encoder.word_encoder.ffn_layers.3.conv.bias", "linguistic_encoder.word_encoder.norm_layers_2.3.gamma", "linguistic_encoder.word_encoder.norm_layers_2.3.beta", "variational_generator.flow.flows.0.enc.in_layers.3.bias", "variational_generator.flow.flows.0.enc.in_layers.3.weight_g", "variational_generator.flow.flows.0.enc.in_layers.3.weight_v", "variational_generator.flow.flows.0.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.0.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.0.enc.res_skip_layers.3.weight_v", "variational_generator.flow.flows.2.enc.in_layers.3.bias", "variational_generator.flow.flows.2.enc.in_layers.3.weight_g", "variational_generator.flow.flows.2.enc.in_layers.3.weight_v", "variational_generator.flow.flows.2.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.2.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.2.enc.res_skip_layers.3.weight_v", "variational_generator.flow.flows.4.enc.in_layers.3.bias", "variational_generator.flow.flows.4.enc.in_layers.3.weight_g", "variational_generator.flow.flows.4.enc.in_layers.3.weight_v", "variational_generator.flow.flows.4.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.4.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.4.enc.res_skip_layers.3.weight_v", "variational_generator.flow.flows.6.enc.in_layers.3.bias", "variational_generator.flow.flows.6.enc.in_layers.3.weight_g", "variational_generator.flow.flows.6.enc.in_layers.3.weight_v", "variational_generator.flow.flows.6.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.6.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.6.enc.res_skip_layers.3.weight_v", "variational_generator.dec_wn.in_layers.3.bias", "variational_generator.dec_wn.in_layers.3.weight_g", "variational_generator.dec_wn.in_layers.3.weight_v", "variational_generator.dec_wn.res_skip_layers.3.bias", "variational_generator.dec_wn.res_skip_layers.3.weight_g", "variational_generator.dec_wn.res_skip_layers.3.weight_v", "postnet.flows.24.logs", "postnet.flows.24.bias", "postnet.flows.25.weight", "postnet.flows.26.start.bias", "postnet.flows.26.start.weight_g", "postnet.flows.26.start.weight_v", "postnet.flows.26.end.weight", "postnet.flows.26.end.bias", "postnet.flows.26.cond_layer.bias", "postnet.flows.26.cond_layer.weight_g", "postnet.flows.26.cond_layer.weight_v", "postnet.flows.26.wn.in_layers.0.bias", "postnet.flows.26.wn.in_layers.0.weight_g", "postnet.flows.26.wn.in_layers.0.weight_v", "postnet.flows.26.wn.in_layers.1.bias", "postnet.flows.26.wn.in_layers.1.weight_g", "postnet.flows.26.wn.in_layers.1.weight_v", "postnet.flows.26.wn.in_layers.2.bias", "postnet.flows.26.wn.in_layers.2.weight_g", "postnet.flows.26.wn.in_layers.2.weight_v", "postnet.flows.26.wn.res_skip_layers.0.bias", "postnet.flows.26.wn.res_skip_layers.0.weight_g", "postnet.flows.26.wn.res_skip_layers.0.weight_v", "postnet.flows.26.wn.res_skip_layers.1.bias", "postnet.flows.26.wn.res_skip_layers.1.weight_g", "postnet.flows.26.wn.res_skip_layers.1.weight_v", "postnet.flows.26.wn.res_skip_layers.2.bias", "postnet.flows.26.wn.res_skip_layers.2.weight_g", "postnet.flows.26.wn.res_skip_layers.2.weight_v", "postnet.flows.27.logs", "postnet.flows.27.bias", "postnet.flows.28.weight", "postnet.flows.29.start.bias", "postnet.flows.29.start.weight_g", "postnet.flows.29.start.weight_v", "postnet.flows.29.end.weight", "postnet.flows.29.end.bias", "postnet.flows.29.cond_layer.bias", "postnet.flows.29.cond_layer.weight_g", "postnet.flows.29.cond_layer.weight_v", "postnet.flows.29.wn.in_layers.0.bias", "postnet.flows.29.wn.in_layers.0.weight_g", "postnet.flows.29.wn.in_layers.0.weight_v", "postnet.flows.29.wn.in_layers.1.bias", "postnet.flows.29.wn.in_layers.1.weight_g", "postnet.flows.29.wn.in_layers.1.weight_v", "postnet.flows.29.wn.in_layers.2.bias", "postnet.flows.29.wn.in_layers.2.weight_g", "postnet.flows.29.wn.in_layers.2.weight_v", "postnet.flows.29.wn.res_skip_layers.0.bias", "postnet.flows.29.wn.res_skip_layers.0.weight_g", "postnet.flows.29.wn.res_skip_layers.0.weight_v", "postnet.flows.29.wn.res_skip_layers.1.bias", "postnet.flows.29.wn.res_skip_layers.1.weight_g", "postnet.flows.29.wn.res_skip_layers.1.weight_v", "postnet.flows.29.wn.res_skip_layers.2.bias", "postnet.flows.29.wn.res_skip_layers.2.weight_g", "postnet.flows.29.wn.res_skip_layers.2.weight_v", "postnet.flows.30.logs", "postnet.flows.30.bias", "postnet.flows.31.weight", "postnet.flows.32.start.bias", "postnet.flows.32.start.weight_g", "postnet.flows.32.start.weight_v", "postnet.flows.32.end.weight", "postnet.flows.32.end.bias", "postnet.flows.32.cond_layer.bias", "postnet.flows.32.cond_layer.weight_g", "postnet.flows.32.cond_layer.weight_v", "postnet.flows.32.wn.in_layers.0.bias", "postnet.flows.32.wn.in_layers.0.weight_g", "postnet.flows.32.wn.in_layers.0.weight_v", "postnet.flows.32.wn.in_layers.1.bias", "postnet.flows.32.wn.in_layers.1.weight_g", "postnet.flows.32.wn.in_layers.1.weight_v", "postnet.flows.32.wn.in_layers.2.bias", "postnet.flows.32.wn.in_layers.2.weight_g", "postnet.flows.32.wn.in_layers.2.weight_v", "postnet.flows.32.wn.res_skip_layers.0.bias", "postnet.flows.32.wn.res_skip_layers.0.weight_g", "postnet.flows.32.wn.res_skip_layers.0.weight_v", "postnet.flows.32.wn.res_skip_layers.1.bias", "postnet.flows.32.wn.res_skip_layers.1.weight_g", "postnet.flows.32.wn.res_skip_layers.1.weight_v", "postnet.flows.32.wn.res_skip_layers.2.bias", "postnet.flows.32.wn.res_skip_layers.2.weight_g", "postnet.flows.32.wn.res_skip_layers.2.weight_v", "postnet.flows.33.logs", "postnet.flows.33.bias", "postnet.flows.34.weight", "postnet.flows.35.start.bias", "postnet.flows.35.start.weight_g", "postnet.flows.35.start.weight_v", "postnet.flows.35.end.weight", "postnet.flows.35.end.bias", "postnet.flows.35.cond_layer.bias", "postnet.flows.35.cond_layer.weight_g", "postnet.flows.35.cond_layer.weight_v", "postnet.flows.35.wn.in_layers.0.bias", "postnet.flows.35.wn.in_layers.0.weight_g", "postnet.flows.35.wn.in_layers.0.weight_v", "postnet.flows.35.wn.in_layers.1.bias", "postnet.flows.35.wn.in_layers.1.weight_g", "postnet.flows.35.wn.in_layers.1.weight_v", "postnet.flows.35.wn.in_layers.2.bias", "postnet.flows.35.wn.in_layers.2.weight_g", "postnet.flows.35.wn.in_layers.2.weight_v", "postnet.flows.35.wn.res_skip_layers.0.bias", "postnet.flows.35.wn.res_skip_layers.0.weight_g", "postnet.flows.35.wn.res_skip_layers.0.weight_v", "postnet.flows.35.wn.res_skip_layers.1.bias", "postnet.flows.35.wn.res_skip_layers.1.weight_g", "postnet.flows.35.wn.res_skip_layers.1.weight_v", "postnet.flows.35.wn.res_skip_layers.2.bias", "postnet.flows.35.wn.res_skip_layers.2.weight_g", "postnet.flows.35.wn.res_skip_layers.2.weight_v". size mismatch for linguistic_encoder.abs_position_enc: copying a param with shape torch.Size([1, 1001, 128]) from checkpoint, the shape in current model is torch.Size([1, 1001, 192]).

    opened by AK391 2
  • About def get_mask_from_lengths(lengths, max_len=None):

    About def get_mask_from_lengths(lengths, max_len=None):

    [email protected], Thank You very much! def get_mask_from_lengths(lengths, max_len=None): batch_size = lengths.shape[0] if max_len is None: max_len = torch.max(lengths).item()

    ids = torch.arange(0, max_len).unsqueeze(
        0).expand(batch_size, -1).to(lengths.device)
    mask = ids >= lengths.unsqueeze(1).expand(-1, max_len)
    
    return ~mask
    

    In PortaSpeech, the return is ~mask, while in DiffGAN-TTS it is mask. I want to know the difference between them!

    opened by qw1260497397 0
  • RuntimeError: The size of tensor a (256) must match the size of tensor b (45) at non-singleton dimension 2

    RuntimeError: The size of tensor a (256) must match the size of tensor b (45) at non-singleton dimension 2

    Hi @keonlee9420, the a(256) is encoder_hidden and the b(45) is the first max_time of src_seq. in the word_level_pooling. I very want to know how to solve the problem. Thank you very much!

    I notice that the b(45) is variable.

    opened by qw1260497397 0
  • Bump pillow from 8.3.1 to 8.3.2

    Bump pillow from 8.3.1 to 8.3.2

    Bumps pillow from 8.3.1 to 8.3.2.

    Release notes

    Sourced from pillow's releases.

    8.3.2

    https://pillow.readthedocs.io/en/stable/releasenotes/8.3.2.html

    Security

    • CVE-2021-23437 Raise ValueError if color specifier is too long [hugovk, radarhere]

    • Fix 6-byte OOB read in FliDecode [wiredfool]

    Python 3.10 wheels

    • Add support for Python 3.10 #5569, #5570 [hugovk, radarhere]

    Fixed regressions

    • Ensure TIFF RowsPerStrip is multiple of 8 for JPEG compression #5588 [kmilos, radarhere]

    • Updates for ImagePalette channel order #5599 [radarhere]

    • Hide FriBiDi shim symbols to avoid conflict with real FriBiDi library #5651 [nulano]

    Changelog

    Sourced from pillow's changelog.

    8.3.2 (2021-09-02)

    • CVE-2021-23437 Raise ValueError if color specifier is too long [hugovk, radarhere]

    • Fix 6-byte OOB read in FliDecode [wiredfool]

    • Add support for Python 3.10 #5569, #5570 [hugovk, radarhere]

    • Ensure TIFF RowsPerStrip is multiple of 8 for JPEG compression #5588 [kmilos, radarhere]

    • Updates for ImagePalette channel order #5599 [radarhere]

    • Hide FriBiDi shim symbols to avoid conflict with real FriBiDi library #5651 [nulano]

    Commits
    • 8013f13 8.3.2 version bump
    • 23c7ca8 Update CHANGES.rst
    • 8450366 Update release notes
    • a0afe89 Update test case
    • 9e08eb8 Raise ValueError if color specifier is too long
    • bd5cf7d FLI tests for Oss-fuzz crash.
    • 94a0cf1 Fix 6-byte OOB read in FliDecode
    • cece64f Add 8.3.2 (2021-09-02) [CI skip]
    • e422386 Add release notes for Pillow 8.3.2
    • 08dcbb8 Pillow 8.3.2 supports Python 3.10 [ci skip]
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump tensorflow from 2.5.0 to 2.5.1

    Bump tensorflow from 2.5.0 to 2.5.1

    Bumps tensorflow from 2.5.0 to 2.5.1.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.5.1

    Release 2.5.1

    This release introduces several vulnerability fixes:

    • Fixes a heap out of bounds access in sparse reduction operations (CVE-2021-37635)
    • Fixes a floating point exception in SparseDenseCwiseDiv (CVE-2021-37636)
    • Fixes a null pointer dereference in CompressElement (CVE-2021-37637)
    • Fixes a null pointer dereference in RaggedTensorToTensor (CVE-2021-37638)
    • Fixes a null pointer dereference and a heap OOB read arising from operations restoring tensors (CVE-2021-37639)
    • Fixes an integer division by 0 in sparse reshaping (CVE-2021-37640)
    • Fixes a division by 0 in ResourceScatterDiv (CVE-2021-37642)
    • Fixes a heap OOB in RaggedGather (CVE-2021-37641)
    • Fixes a std::abort raised from TensorListReserve (CVE-2021-37644)
    • Fixes a null pointer dereference in MatrixDiagPartOp (CVE-2021-37643)
    • Fixes an integer overflow due to conversion to unsigned (CVE-2021-37645)
    • Fixes a bad allocation error in StringNGrams caused by integer conversion (CVE-2021-37646)
    • Fixes a null pointer dereference in SparseTensorSliceDataset (CVE-2021-37647)
    • Fixes an incorrect validation of SaveV2 inputs (CVE-2021-37648)
    • Fixes a null pointer dereference in UncompressElement (CVE-2021-37649)
    • Fixes a segfault and a heap buffer overflow in {Experimental,}DatasetToTFRecord (CVE-2021-37650)
    • Fixes a heap buffer overflow in FractionalAvgPoolGrad (CVE-2021-37651)
    • Fixes a use after free in boosted trees creation (CVE-2021-37652)
    • Fixes a division by 0 in ResourceGather (CVE-2021-37653)
    • Fixes a heap OOB and a CHECK fail in ResourceGather (CVE-2021-37654)
    • Fixes a heap OOB in ResourceScatterUpdate (CVE-2021-37655)
    • Fixes an undefined behavior arising from reference binding to nullptr in RaggedTensorToSparse (CVE-2021-37656)
    • Fixes an undefined behavior arising from reference binding to nullptr in MatrixDiagV* ops (CVE-2021-37657)
    • Fixes an undefined behavior arising from reference binding to nullptr in MatrixSetDiagV* ops (CVE-2021-37658)
    • Fixes an undefined behavior arising from reference binding to nullptr and heap OOB in binary cwise ops (CVE-2021-37659)
    • Fixes a division by 0 in inplace operations (CVE-2021-37660)
    • Fixes a crash caused by integer conversion to unsigned (CVE-2021-37661)
    • Fixes an undefined behavior arising from reference binding to nullptr in boosted trees (CVE-2021-37662)
    • Fixes a heap OOB in boosted trees (CVE-2021-37664)
    • Fixes vulnerabilities arising from incomplete validation in QuantizeV2 (CVE-2021-37663)
    • Fixes vulnerabilities arising from incomplete validation in MKL requantization (CVE-2021-37665)
    • Fixes an undefined behavior arising from reference binding to nullptr in RaggedTensorToVariant (CVE-2021-37666)
    • Fixes an undefined behavior arising from reference binding to nullptr in unicode encoding (CVE-2021-37667)
    • Fixes an FPE in tf.raw_ops.UnravelIndex (CVE-2021-37668)
    • Fixes a crash in NMS ops caused by integer conversion to unsigned (CVE-2021-37669)
    • Fixes a heap OOB in UpperBound and LowerBound (CVE-2021-37670)
    • Fixes an undefined behavior arising from reference binding to nullptr in map operations (CVE-2021-37671)
    • Fixes a heap OOB in SdcaOptimizerV2 (CVE-2021-37672)
    • Fixes a CHECK-fail in MapStage (CVE-2021-37673)
    • Fixes a vulnerability arising from incomplete validation in MaxPoolGrad (CVE-2021-37674)
    • Fixes an undefined behavior arising from reference binding to nullptr in shape inference (CVE-2021-37676)
    • Fixes a division by 0 in most convolution operators (CVE-2021-37675)
    • Fixes vulnerabilities arising from missing validation in shape inference for Dequantize (CVE-2021-37677)
    • Fixes an arbitrary code execution due to YAML deserialization (CVE-2021-37678)
    • Fixes a heap OOB in nested tf.map_fn with RaggedTensors (CVE-2021-37679)

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.5.1

    This release introduces several vulnerability fixes:

    • Fixes a heap out of bounds access in sparse reduction operations (CVE-2021-37635)
    • Fixes a floating point exception in SparseDenseCwiseDiv (CVE-2021-37636)
    • Fixes a null pointer dereference in CompressElement (CVE-2021-37637)
    • Fixes a null pointer dereference in RaggedTensorToTensor (CVE-2021-37638)
    • Fixes a null pointer dereference and a heap OOB read arising from operations restoring tensors (CVE-2021-37639)
    • Fixes an integer division by 0 in sparse reshaping (CVE-2021-37640)
    • Fixes a division by 0 in ResourceScatterDiv (CVE-2021-37642)
    • Fixes a heap OOB in RaggedGather (CVE-2021-37641)
    • Fixes a std::abort raised from TensorListReserve (CVE-2021-37644)
    • Fixes a null pointer dereference in MatrixDiagPartOp (CVE-2021-37643)
    • Fixes an integer overflow due to conversion to unsigned (CVE-2021-37645)
    • Fixes a bad allocation error in StringNGrams caused by integer conversion (CVE-2021-37646)
    • Fixes a null pointer dereference in SparseTensorSliceDataset (CVE-2021-37647)
    • Fixes an incorrect validation of SaveV2 inputs (CVE-2021-37648)
    • Fixes a null pointer dereference in UncompressElement (CVE-2021-37649)
    • Fixes a segfault and a heap buffer overflow in {Experimental,}DatasetToTFRecord (CVE-2021-37650)
    • Fixes a heap buffer overflow in FractionalAvgPoolGrad (CVE-2021-37651)
    • Fixes a use after free in boosted trees creation (CVE-2021-37652)
    • Fixes a division by 0 in ResourceGather (CVE-2021-37653)
    • Fixes a heap OOB and a CHECK fail in ResourceGather (CVE-2021-37654)
    • Fixes a heap OOB in ResourceScatterUpdate (CVE-2021-37655)
    • Fixes an undefined behavior arising from reference binding to nullptr in RaggedTensorToSparse

    ... (truncated)

    Commits
    • 8222c1c Merge pull request #51381 from tensorflow/mm-fix-r2.5-build
    • d584260 Disable broken/flaky test
    • f6c6ce3 Merge pull request #51367 from tensorflow-jenkins/version-numbers-2.5.1-17468
    • 3ca7812 Update version numbers to 2.5.1
    • 4fdf683 Merge pull request #51361 from tensorflow/mm-update-relnotes-on-r2.5
    • 05fc01a Put CVE numbers for fixes in parentheses
    • bee1dc4 Update release notes for the new patch release
    • 47beb4c Merge pull request #50597 from kruglov-dmitry/v2.5.0-sync-abseil-cmake-bazel
    • 6f39597 Merge pull request #49383 from ashahab/abin-load-segfault-r2.5
    • 0539b34 Merge pull request #48979 from liufengdb/r2.5-cherrypick
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • small(320000.pth.tar) weights incompatibility

    small(320000.pth.tar) weights incompatibility

    `2022-11-11 22:31:08.004017: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll

    Device of PortaSpeech: cpu Traceback (most recent call last): File "synthesize.py", line 153, in model = get_model(args, configs, device, train=False) File "D:\projects\PortaSpeech\utils\model.py", line 21, in get_model model.load_state_dict(ckpt["model"]) File "C:\ProgramData\Miniconda3\envs\tts_env\lib\site-packages\torch\nn\modules\module.py", line 1223, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PortaSpeech: Missing key(s) in state_dict: "linguistic_encoder.phoneme_encoder.attn_layers.3.emb_rel_k", "linguistic_encoder.phoneme_encoder.attn_layers.3.emb_rel_v", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_q.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_q.bias", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_k.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_k.bias", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_v.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_v.bias", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_o.weight", "linguistic_encoder.phoneme_encoder.attn_layers.3.conv_o.bias", "linguistic_encoder.phoneme_encoder.norm_layers_1.3.gamma", "linguistic_encoder.phoneme_encoder.norm_layers_1.3.beta", "linguistic_encoder.phoneme_encoder.ffn_layers.3.conv.weight", "linguistic_encoder.phoneme_encoder.ffn_layers.3.conv.bias", "linguistic_encoder.phoneme_encoder.norm_layers_2.3.gamma", "linguistic_encoder.phoneme_encoder.norm_layers_2.3.beta", "linguistic_encoder.word_encoder.attn_layers.3.emb_rel_k", "linguistic_encoder.word_encoder.attn_layers.3.emb_rel_v", "linguistic_encoder.word_encoder.attn_layers.3.conv_q.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_q.bias", "linguistic_encoder.word_encoder.attn_layers.3.conv_k.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_k.bias", "linguistic_encoder.word_encoder.attn_layers.3.conv_v.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_v.bias", "linguistic_encoder.word_encoder.attn_layers.3.conv_o.weight", "linguistic_encoder.word_encoder.attn_layers.3.conv_o.bias", "linguistic_encoder.word_encoder.norm_layers_1.3.gamma", "linguistic_encoder.word_encoder.norm_layers_1.3.beta", "linguistic_encoder.word_encoder.ffn_layers.3.conv.weight", "linguistic_encoder.word_encoder.ffn_layers.3.conv.bias", "linguistic_encoder.word_encoder.norm_layers_2.3.gamma", "linguistic_encoder.word_encoder.norm_layers_2.3.beta", "variational_generator.flow.flows.0.enc.in_layers.3.bias", "variational_generator.flow.flows.0.enc.in_layers.3.weight_g", "variational_generator.flow.flows.0.enc.in_layers.3.weight_v", "variational_generator.flow.flows.0.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.0.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.0.enc.res_skip_layers.3.weight_v", "variational_generator.flow.flows.2.enc.in_layers.3.bias", "variational_generator.flow.flows.2.enc.in_layers.3.weight_g", "variational_generator.flow.flows.2.enc.in_layers.3.weight_v", "variational_generator.flow.flows.2.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.2.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.2.enc.res_skip_layers.3.weight_v", "variational_generator.flow.flows.4.enc.in_layers.3.bias", "variational_generator.flow.flows.4.enc.in_layers.3.weight_g", "variational_generator.flow.flows.4.enc.in_layers.3.weight_v", "variational_generator.flow.flows.4.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.4.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.4.enc.res_skip_layers.3.weight_v", "variational_generator.flow.flows.6.enc.in_layers.3.bias", "variational_generator.flow.flows.6.enc.in_layers.3.weight_g", "variational_generator.flow.flows.6.enc.in_layers.3.weight_v", "variational_generator.flow.flows.6.enc.res_skip_layers.3.bias", "variational_generator.flow.flows.6.enc.res_skip_layers.3.weight_g", "variational_generator.flow.flows.6.enc.res_skip_layers.3.weight_v", "variational_generator.dec_wn.in_layers.3.bias", "variational_generator.dec_wn.in_layers.3.weight_g", "variational_generator.dec_wn.in_layers.3.weight_v", "variational_generator.dec_wn.res_skip_layers.3.bias", "variational_generator.dec_wn.res_skip_layers.3.weight_g", "variational_generator.dec_wn.res_skip_layers.3.weight_v", "postnet.flows.24.logs", "postnet.flows.24.bias", "postnet.flows.25.weight", "postnet.flows.26.start.bias", "postnet.flows.26.start.weight_g", "postnet.flows.26.start.weight_v", "postnet.flows.26.end.weight", "postnet.flows.26.end.bias", "postnet.flows.26.cond_layer.bias", "postnet.flows.26.cond_layer.weight_g", "postnet.flows.26.cond_layer.weight_v", "postnet.flows.26.wn.in_layers.0.bias", "postnet.flows.26.wn.in_layers.0.weight_g", "postnet.flows.26.wn.in_layers.0.weight_v", "postnet.flows.26.wn.in_layers.1.bias", "postnet.flows.26.wn.in_layers.1.weight_g", "postnet.flows.26.wn.in_layers.1.weight_v", "postnet.flows.26.wn.in_layers.2.bias", "postnet.flows.26.wn.in_layers.2.weight_g", "postnet.flows.26.wn.in_layers.2.weight_v", "postnet.flows.26.wn.res_skip_layers.0.bias", "postnet.flows.26.wn.res_skip_layers.0.weight_g", "postnet.flows.26.wn.res_skip_layers.0.weight_v", "postnet.flows.26.wn.res_skip_layers.1.bias", "postnet.flows.26.wn.res_skip_layers.1.weight_g", "postnet.flows.26.wn.res_skip_layers.1.weight_v", "postnet.flows.26.wn.res_skip_layers.2.bias", "postnet.flows.26.wn.res_skip_layers.2.weight_g", "postnet.flows.26.wn.res_skip_layers.2.weight_v", "postnet.flows.27.logs", "postnet.flows.27.bias", "postnet.flows.28.weight", "postnet.flows.29.start.bias", "postnet.flows.29.start.weight_g", "postnet.flows.29.start.weight_v", "postnet.flows.29.end.weight", "postnet.flows.29.end.bias", "postnet.flows.29.cond_layer.bias", "postnet.flows.29.cond_layer.weight_g", "postnet.flows.29.cond_layer.weight_v", "postnet.flows.29.wn.in_layers.0.bias", "postnet.flows.29.wn.in_layers.0.weight_g", "postnet.flows.29.wn.in_layers.0.weight_v", "postnet.flows.29.wn.in_layers.1.bias", "postnet.flows.29.wn.in_layers.1.weight_g", "postnet.flows.29.wn.in_layers.1.weight_v", "postnet.flows.29.wn.in_layers.2.bias", "postnet.flows.29.wn.in_layers.2.weight_g", "postnet.flows.29.wn.in_layers.2.weight_v", "postnet.flows.29.wn.res_skip_layers.0.bias", "postnet.flows.29.wn.res_skip_layers.0.weight_g", "postnet.flows.29.wn.res_skip_layers.0.weight_v", "postnet.flows.29.wn.res_skip_layers.1.bias", "postnet.flows.29.wn.res_skip_layers.1.weight_g", "postnet.flows.29.wn.res_skip_layers.1.weight_v", "postnet.flows.29.wn.res_skip_layers.2.bias", "postnet.flows.29.wn.res_skip_layers.2.weight_g", "postnet.flows.29.wn.res_skip_layers.2.weight_v", "postnet.flows.30.logs", "postnet.flows.30.bias", "postnet.flows.31.weight", "postnet.flows.32.start.bias", "postnet.flows.32.start.weight_g", "postnet.flows.32.start.weight_v", "postnet.flows.32.end.weight", "postnet.flows.32.end.bias", "postnet.flows.32.cond_layer.bias", "postnet.flows.32.cond_layer.weight_g", "postnet.flows.32.cond_layer.weight_v", "postnet.flows.32.wn.in_layers.0.bias", "postnet.flows.32.wn.in_layers.0.weight_g", "postnet.flows.32.wn.in_layers.0.weight_v", "postnet.flows.32.wn.in_layers.1.bias", "postnet.flows.32.wn.in_layers.1.weight_g", "postnet.flows.32.wn.in_layers.1.weight_v", "postnet.flows.32.wn.in_layers.2.bias", "postnet.flows.32.wn.in_layers.2.weight_g", "postnet.flows.32.wn.in_layers.2.weight_v", "postnet.flows.32.wn.res_skip_layers.0.bias", "postnet.flows.32.wn.res_skip_layers.0.weight_g", "postnet.flows.32.wn.res_skip_layers.0.weight_v", "postnet.flows.32.wn.res_skip_layers.1.bias", "postnet.flows.32.wn.res_skip_layers.1.weight_g", "postnet.flows.32.wn.res_skip_layers.1.weight_v", "postnet.flows.32.wn.res_skip_layers.2.bias", "postnet.flows.32.wn.res_skip_layers.2.weight_g", "postnet.flows.32.wn.res_skip_layers.2.weight_v", "postnet.flows.33.logs", "postnet.flows.33.bias", "postnet.flows.34.weight", "postnet.flows.35.start.bias", "postnet.flows.35.start.weight_g", "postnet.flows.35.start.weight_v", "postnet.flows.35.end.weight", "postnet.flows.35.end.bias", "postnet.flows.35.cond_layer.bias", "postnet.flows.35.cond_layer.weight_g", "postnet.flows.35.cond_layer.weight_v", "postnet.flows.35.wn.in_layers.0.bias", "postnet.flows.35.wn.in_layers.0.weight_g", "postnet.flows.35.wn.in_layers.0.weight_v", "postnet.flows.35.wn.in_layers.1.bias", "postnet.flows.35.wn.in_layers.1.weight_g", "postnet.flows.35.wn.in_layers.1.weight_v", "postnet.flows.35.wn.in_layers.2.bias", "postnet.flows.35.wn.in_layers.2.weight_g", "postnet.flows.35.wn.in_layers.2.weight_v", "postnet.flows.35.wn.res_skip_layers.0.bias", "postnet.flows.35.wn.res_skip_layers.0.weight_g", "postnet.flows.35.wn.res_skip_layers.0.weight_v", "postnet.flows.35.wn.res_skip_layers.1.bias", "postnet.flows.35.wn.res_skip_layers.1.weight_g", "postnet.flows.35.wn.res_skip_layers.1.weight_v", "postnet.flows.35.wn.res_skip_layers.2.bias", "postnet.flows.35.wn.res_skip_layers.2.weight_g", "postnet.flows.35.wn.res_skip_layers.2.weight_v". size mismatch for linguistic_encoder.abs_position_enc: copying a param with shape torch.Size([1, 1001, 128]) from checkpoint, the shape in current model is torch.Size([1, 1001, 192]). size mismatch for linguistic_encoder.kv_position_enc: copying a param with shape torch.Size([1, 1001, 128]) from checkpoint, the shape in current model is torch.Size([1, 1001, 192]). size mismatch for linguistic_encoder.q_position_enc: copying a param with shape torch.Size([1, 1001, 128]) from checkpoint, the shape in current model is torch.Size([1, 1001, 192]). size mismatch for linguistic_encoder.src_emb.weight: copying a param with shape torch.Size([361, 128]) from checkpoint, the shape in current model is torch.Size([361, 192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.emb_rel_v: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_q.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_q.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_k.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_k.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_v.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_v.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_o.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.0.conv_o.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.emb_rel_v: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_q.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_q.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_k.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_k.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_v.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_v.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_o.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.1.conv_o.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.emb_rel_v: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_q.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_q.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_k.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_k.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_v.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_v.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_o.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.phoneme_encoder.attn_layers.2.conv_o.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.0.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.0.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.1.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.1.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.2.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_1.2.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.0.conv.weight: copying a param with shape torch.Size([128, 128, 3]) from checkpoint, the shape in current model is torch.Size([192, 192, 5]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.0.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.1.conv.weight: copying a param with shape torch.Size([128, 128, 3]) from checkpoint, the shape in current model is torch.Size([192, 192, 5]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.1.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.2.conv.weight: copying a param with shape torch.Size([128, 128, 3]) from checkpoint, the shape in current model is torch.Size([192, 192, 5]). size mismatch for linguistic_encoder.phoneme_encoder.ffn_layers.2.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.0.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.0.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.1.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.1.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.2.gamma: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.phoneme_encoder.norm_layers_2.2.beta: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.emb_rel_v: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_q.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_q.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_k.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_k.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_v.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_v.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_o.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.0.conv_o.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.emb_rel_v: copying a param with shape torch.Size([1, 9, 64]) from checkpoint, the shape in current model is torch.Size([1, 9, 96]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_q.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_q.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_k.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_k.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_v.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_v.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_o.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([192, 192, 1]). size mismatch for linguistic_encoder.word_encoder.attn_layers.1.conv_o.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for linguistic_encoder.word_encoder.attn_layers.2.emb_rel_k: copying a param with shape torch.Size([1, 9, 64]) from...`

    opened by ironmann250 0
  • RuntimeError: Found dtype Long but expected Float

    RuntimeError: Found dtype Long but expected Float

    File "train.py", line 122, in main model_update(model, step, G_loss, optG_fs2) File "train.py", line 77, in model_update loss = (loss / grad_acc_step).backward() File "C:\Users\12604\Anaconda3\envs\pytorch\lib\site-packages\torch\tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\12604\Anaconda3\envs\pytorch\lib\site-packages\torch\autograd_init_.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Found dtype Long but expected Float

    [email protected]. This problem occurs when the loss function is back-propagating, how can I solve it? This is the dtype of loss image

    opened by qw1260497397 1
  • A questions about the output of Phoneme Encoding

    A questions about the output of Phoneme Encoding

    [email protected], after the linguistic encoder is implemented, the text is input to the character embedding layer, and the output value contains Nan. How to solve this problem?

    image

    opened by qw1260497397 2
  • Multi-speaker TTS

    Multi-speaker TTS

    Dear sir,

    First of all, I really appriciate your contribution in this amazing repo! However, it would be perfect if you can add the feature of multi-speaker TTS here. I can see the spker_emb was not used now. Do I know when can you consider this and opmimize the ability of this impressive model!

    Thanks,

    Max

    opened by manhph2211 1
  • The meaning of inputs[11:] in model.loss.py

    The meaning of inputs[11:] in model.loss.py

    HI@[keonlee9420],I cannot understand the meaning of inputs[11:] in model.loss.py
    

    def forward(self, inputs, predictions, step): ( mel_targets, *_, ) = inputs[11:] Thank you very much!

    opened by qw1260497397 4
Releases(v0.2.0)
Owner
Keon Lee
Expressive Speech Synthesis | Conversational AI | Open-domain Dialog | NLP | Generative Models | Empathic Computing | HCI
Keon Lee
Tracking Progress in Natural Language Processing

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Sebastian Ruder 21.2k Dec 30, 2022
C.J. Hutto 3.8k Dec 30, 2022
Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve

Serbian AI Society 3 Nov 22, 2022
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

Microsoft 7.8k Jan 09, 2023
Problem: Given a nepali news find the category of the news

Classification of category of nepali news catorgory using different algorithms Problem: Multiclass Classification Approaches: TFIDF for vectorization

pudasainishushant 2 Jan 09, 2022
A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

Rule-Based-Classification-in-a-Banking-Case. A CRM department in a local bank works on classify their lost customers with their past datas. So they wa

ÖMER YILDIZ 4 Mar 20, 2022
Top2Vec is an algorithm for topic modeling and semantic search.

Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.

Dimo Angelov 2.4k Jan 06, 2023
Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation.

Covid-19-BOT Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation. This bot uses torc

Neeraj Majhi 2 Nov 05, 2021
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

RAMI ALRFOU 2.1k Jan 07, 2023
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.5k Dec 26, 2022
The ibet-Prime security token management system for ibet network.

ibet-Prime The ibet-Prime security token management system for ibet network. Features ibet-Prime is an API service that enables the issuance and manag

BOOSTRY 8 Dec 22, 2022
Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

KB-NER: a Knowledge-based System for Multilingual Complex Named Entity Recognition The code is for the winner system (DAMO-NLP) of SemEval 2022 MultiC

116 Dec 27, 2022
translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021
Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention ACL2021 Findings Usage 0. Prepare environment Requirements: python==3.6 te

Xiaobao Wu 8 Dec 16, 2022
使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

SimCSE复现 项目描述 SimCSE是一种简单但是很巧妙的NLP对比学习方法,创新性地引入Dropout的方式,对样本添加噪声,从而达到对正样本增强的目的。 该框架的训练目的为:对于batch中的每个样本,拉近其与正样本之间的距离,拉远其与负样本之间的距离,使得模型能够在大规模无监督语料(也可以

58 Dec 20, 2022
Conversational-AI-ChatBot - Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users!

Conversational AI ChatBot Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users! In this project? Thi

Rajkumar Lakshmanamoorthy 6 Nov 30, 2022
Pipeline for training LSA models using Scikit-Learn.

Latent Semantic Analysis Pipeline for training LSA models using Scikit-Learn. Usage Instead of writing custom code for latent semantic analysis, you j

Dani El-Ayyass 23 Sep 05, 2022
Kestrel Threat Hunting Language

Kestrel Threat Hunting Language What is Kestrel? Why we need it? How to hunt with XDR support? What is the science behind it? You can find all the ans

Open Cybersecurity Alliance 201 Dec 16, 2022
构建一个多源(公众号、RSS)、干净、个性化的阅读环境

2C 构建一个多源(公众号、RSS)、干净、个性化的阅读环境 作为一名微信公众号的重度用户,公众号一直被我设为汲取知识的地方。随着使用程度的增加,相信大家或多或少会有一个比较头疼的问题——广告问题。 假设你关注的公众号有十来个,若一个公众号两周接一次广告,理论上你会面临二十多次广告,实际上会更多,运

howie.hu 678 Dec 28, 2022
NLP Text Classification

多标签文本分类任务 近年来随着深度学习的发展,模型参数的数量飞速增长。为了训练这些参数,需要更大的数据集来避免过拟合。然而,对于大部分NLP任务来说,构建大规模的标注数据集非常困难(成本过高),特别是对于句法和语义相关的任务。相比之下,大规模的未标注语料库的构建则相对容易。为了利用这些数据,我们可以

Jason 1 Nov 11, 2021