Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Overview

Word2Wave

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play around in the Colab notebook provided. Note that, in both cases, you will need to train a WaveGAN model first. You can also hear some examples here.

Colab playground Open In Colab

Setup

First, clone the repository

git clone https://www.github.com/ilaria-manco/word2wave

Create a virtual environment and install the requirements:

cd word2wave
python3 -m venv /path/to/venv/
pip install -r requirements.txt

WaveGAN generator

Word2Wave requires a pre-trained WaveGAN generator. In my experiments, I trained my own on the Freesound Loop Dataset, using this implementation. To download the FSL dataset do:

$ wget https://zenodo.org/record/3967852/files/FSL10K.zip?download=1

and then train following the instructions in the WaveGAN repo. Once trained, place the model in the wavegan folder:

πŸ“‚wavegan
  β”— πŸ“œgan_.tar

Pre-trained COALA encoders

You'll need to download the pre-trained weights for the COALA tag and audio encoders from the official repo. Note that the repo provides weights for the model trained with different configurations (e.g. different weights in the loss components). For more details on this, you can refer to the original code and paper. To download the model weights, you can run the following commands (or the equivalent for the desired model configuration)

$ wget https://raw.githubusercontent.com/xavierfav/coala/master/saved_models/dual_ae_c/audio_encoder_epoch_200.pt
$ wget https://raw.githubusercontent.com/xavierfav/coala/master/saved_models/dual_ae_c/tag_encoder_epoch_200.pt

Once downloaded, place them in the coala/models folder:

πŸ“‚coala
 ┣ πŸ“‚models
   ┣ πŸ“‚dual_ae_c
     ┣ πŸ“œaudio_encoder_epoch_200.pt
     β”— πŸ“œtag_encoder_epoch_200.pt

How to use

For text-to-audio generation using the default parameters, simply do

$ python main.py "text prompt" --wavegan_path  --output_dir 

Citations

Some of the code in this repo is adapted from the official COALA repo and @mostafaelaraby's PyTorch implenentation of the WaveGAN model.

@inproceedings{donahue2018adversarial,
  title={Adversarial Audio Synthesis},
  author={Donahue, Chris and McAuley, Julian and Puckette, Miller},
  booktitle={International Conference on Learning Representations},
  year={2018}
}
@article{favory2020coala,
  title={Coala: Co-aligned autoencoders for learning semantically enriched audio representations},
  author={Favory, Xavier and Drossos, Konstantinos and Virtanen, Tuomas and Serra, Xavier},
  journal={arXiv preprint arXiv:2006.08386},
  year={2020}
}
You might also like...
FireFlyer Record file format, writer and reader for DL training samples.

FFRecord The FFRecord format is a simple format for storing a sequence of binary records developed by HFAiLab, which supports random access and Linux

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries.

A high-level yet extensible library for fast language model tuning via automatic prompt search

ruPrompts ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with Hugg

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform This repo try to implement iSTFTNet : Fast

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

Biterm Topic Model (BTM): modeling topics in short texts
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

GDAP The code of paper "Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"" Event Datasets Prep

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing πŸŽ‰ πŸŽ‰ πŸŽ‰ We released the 2.0.0 version with TF2 Support. πŸŽ‰ πŸŽ‰ πŸŽ‰ If you

Comments
  • Colab notebook: Where are weights?

    Colab notebook: Where are weights?

    Thanks for sharing the notebook. Could it perhaps be documented a bit more to make it a bit easier for new users (like me) to make it work without crashing? I can't figure out how to fill in the missing information about where to find the weights.

    Below is a log of my run...

    !nvidia-smi -L
    GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-5218d88a-592a-b7c2-d10c-ff61031ab247)
    

    Mount your drive

    Mounted at /content/drive
    

    Install Word2Wave, import necessary packages

    Cloning into 'word2wave'...
    remote: Enumerating objects: 349, done.
    remote: Counting objects: 100% (349/349), done.
    remote: Compressing objects: 100% (311/311), done.
    remote: Total 349 (delta 185), reused 81 (delta 33), pack-reused 0
    Receiving objects: 100% (349/349), 1.10 MiB | 5.21 MiB/s, done.
    Resolving deltas: 100% (185/185), done.
    /content/word2wave
    Requirement already satisfied: matplotlib>=2.2.4 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 1)) (3.2.2)
    Requirement already satisfied: numpy>=1.16.3 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 2)) (1.19.5)
    Collecting librosa==0.6.3
      Downloading librosa-0.6.3.tar.gz (1.6 MB)
         |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.6 MB 5.0 MB/s 
    Collecting pescador>=2.0.1
      Downloading pescador-2.1.0.tar.gz (20 kB)
    Requirement already satisfied: torch>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 5)) (1.10.0+cu111)
    Requirement already satisfied: tqdm>=4.32.1 in /usr/local/lib/python3.7/dist-packages (from -r requirements.txt (line 6)) (4.62.3)
    Collecting numba==0.49.0
      Downloading numba-0.49.0-cp37-cp37m-manylinux2014_x86_64.whl (3.6 MB)
         |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.6 MB 36.1 MB/s 
    Collecting torchaudio==0.8.1
      Downloading torchaudio-0.8.1-cp37-cp37m-manylinux1_x86_64.whl (1.9 MB)
         |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.9 MB 64.3 MB/s 
    Requirement already satisfied: audioread>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa==0.6.3->-r requirements.txt (line 3)) (2.1.9)
    Requirement already satisfied: scipy>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa==0.6.3->-r requirements.txt (line 3)) (1.4.1)
    Requirement already satisfied: scikit-learn!=0.19.0,>=0.14.0 in /usr/local/lib/python3.7/dist-packages (from librosa==0.6.3->-r requirements.txt (line 3)) (1.0.1)
    Requirement already satisfied: joblib>=0.12 in /usr/local/lib/python3.7/dist-packages (from librosa==0.6.3->-r requirements.txt (line 3)) (1.1.0)
    Requirement already satisfied: decorator>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from librosa==0.6.3->-r requirements.txt (line 3)) (4.4.2)
    Requirement already satisfied: six>=1.3 in /usr/local/lib/python3.7/dist-packages (from librosa==0.6.3->-r requirements.txt (line 3)) (1.15.0)
    Requirement already satisfied: resampy>=0.2.0 in /usr/local/lib/python3.7/dist-packages (from librosa==0.6.3->-r requirements.txt (line 3)) (0.2.2)
    Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba==0.49.0->-r requirements.txt (line 7)) (57.4.0)
    Collecting llvmlite<=0.33.0.dev0,>=0.31.0.dev0
      Downloading llvmlite-0.32.1-cp37-cp37m-manylinux1_x86_64.whl (20.2 MB)
         |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20.2 MB 1.3 MB/s 
    Collecting torch>=1.1.0
      Downloading torch-1.8.1-cp37-cp37m-manylinux1_x86_64.whl (804.1 MB)
         |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 804.1 MB 2.6 kB/s 
    Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.1.0->-r requirements.txt (line 5)) (3.10.0.2)
    Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2.4->-r requirements.txt (line 1)) (2.8.2)
    Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2.4->-r requirements.txt (line 1)) (3.0.6)
    Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2.4->-r requirements.txt (line 1)) (1.3.2)
    Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.2.4->-r requirements.txt (line 1)) (0.11.0)
    Requirement already satisfied: pyzmq>=15.0 in /usr/local/lib/python3.7/dist-packages (from pescador>=2.0.1->-r requirements.txt (line 4)) (22.3.0)
    Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn!=0.19.0,>=0.14.0->librosa==0.6.3->-r requirements.txt (line 3)) (3.0.0)
    Building wheels for collected packages: librosa, pescador
      Building wheel for librosa (setup.py) ... done
      Created wheel for librosa: filename=librosa-0.6.3-py3-none-any.whl size=1573336 sha256=fff7ac07e9d03aa008fcf3f1f369f8acffdb27082d75ffead993dfeb62fb468d
      Stored in directory: /root/.cache/pip/wheels/de/c1/94/619fb8b04ee1f567115662d26650677ecf79bc7d8e462d21f8
      Building wheel for pescador (setup.py) ... done
      Created wheel for pescador: filename=pescador-2.1.0-py3-none-any.whl size=21104 sha256=4a7aaeaff3c65a1913ee3bad1cbd83c1c6e541790056b38a2848a31f5568f4e9
      Stored in directory: /root/.cache/pip/wheels/f0/e3/c6/32d30d5eb5292dac352d2fca4ebf393aa94e09b9b8b4b0f341
    Successfully built librosa pescador
    Installing collected packages: llvmlite, numba, torch, torchaudio, pescador, librosa
      Attempting uninstall: llvmlite
        Found existing installation: llvmlite 0.34.0
        Uninstalling llvmlite-0.34.0:
          Successfully uninstalled llvmlite-0.34.0
      Attempting uninstall: numba
        Found existing installation: numba 0.51.2
        Uninstalling numba-0.51.2:
          Successfully uninstalled numba-0.51.2
      Attempting uninstall: torch
        Found existing installation: torch 1.10.0+cu111
        Uninstalling torch-1.10.0+cu111:
          Successfully uninstalled torch-1.10.0+cu111
      Attempting uninstall: torchaudio
        Found existing installation: torchaudio 0.10.0+cu111
        Uninstalling torchaudio-0.10.0+cu111:
          Successfully uninstalled torchaudio-0.10.0+cu111
      Attempting uninstall: librosa
        Found existing installation: librosa 0.8.1
        Uninstalling librosa-0.8.1:
          Successfully uninstalled librosa-0.8.1
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    torchvision 0.11.1+cu111 requires torch==1.10.0, but you have torch 1.8.1 which is incompatible.
    torchtext 0.11.0 requires torch==1.10.0, but you have torch 1.8.1 which is incompatible.
    kapre 0.3.6 requires librosa>=0.7.2, but you have librosa 0.6.3 which is incompatible.
    Successfully installed librosa-0.6.3 llvmlite-0.32.1 numba-0.49.0 pescador-2.1.0 torch-1.8.1 torchaudio-0.8.1
    /usr/local/lib/python3.7/dist-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
    Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
      from numba.decorators import jit as optional_jit
    /usr/local/lib/python3.7/dist-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
    Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
      from numba.decorators import jit as optional_jit
    

    But then part that says...

    Copy the pre-trained WaveGAN and COALA weights from drive

    drive_path:  "/content/drive/<path/to/word2wave_files/>" 
    

    ...it's not clear from the notebook what to enter in this string.

    I see further up in the output that it seems to have installed into /content/word2wave, but when I type that into the prompt, I get

    cp: cannot stat '/content/word2wavewavegan': No such file or directory
    cp: cannot stat '/content/word2wavecoala': No such file or directory
    mv: cannot stat '/content/word2wave/coala/coala/': No such file or directory
    

    Looking around on Drive, I see...

    !ls /content/drive
    
    MyDrive  Shareddrives
    

    If I just ignore the above errors and try to run the notebook with no changes, then when it comes to the part for generating with "firework", I see:

    NameError                                 Traceback (most recent call last)
    <ipython-input-9-cf00c4445550> in <module>()
          9 id2tag = json.load(open('/content/word2wave/coala/id2token_top_1000.json', 'rb'))
         10 
    ---> 11 check_text_input(text)
    
    <ipython-input-7-0cff9a5913d2> in check_text_input(text)
         28 
         29 def check_text_input(text):
    ---> 30   _, words_in_dict, words_not_in_dict = word2wave.tokenize_text(text)
         31   if not words_in_dict:
         32       raise Exception("All the words in the text prompt are out-of-vocabulary, please try with another prompt")
    
    NameError: name 'word2wave' is not defined
    
    opened by drscotthawley 4
  • train/valid splits used for FSL10K

    train/valid splits used for FSL10K

    It looks like the WaveGAN code in the wavegan-pytorch repo you used assumes that the audio files are split into train and valid subdirectories, but the FSL10K dataset doesn't seem to have any information about standard splits on their Zenodo page or in their paper. Do you have any information about the train/valid split you used?

    opened by ecooper7 0
  • Error with pip install -r requirements.txt

    Error with pip install -r requirements.txt

    I'm getting these errors with the command pip install -r requirements.txt

      ERROR: Command errored out with exit status 1:
       command: 'C:\Users\Computer\anaconda3\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Computer\\AppData\\Local\\Temp\\pip-install-vqe1knc0\\llvmlite_e671eaa75a104e25a13f7826ddcf3a51\\setup.py'"'"'; __file__='"'"'C:\\Users\\Computer\\AppData\\Local\\Temp\\pip-install-vqe1knc0\\llvmlite_e671eaa75a104e25a13f7826ddcf3a51\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\Computer\AppData\Local\Temp\pip-wheel-a4bxxl_i'
           cwd: C:\Users\Computer\AppData\Local\Temp\pip-install-vqe1knc0\llvmlite_e671eaa75a104e25a13f7826ddcf3a51\
      Complete output (24 lines):
      running bdist_wheel
      C:\Users\Computer\anaconda3\python.exe C:\Users\Computer\AppData\Local\Temp\pip-install-vqe1knc0\llvmlite_e671eaa75a104e25a13f7826ddcf3a51\ffi\build.py
      Trying generator 'Visual Studio 14 2015 Win64'
      Traceback (most recent call last):
        File "C:\Users\Computer\AppData\Local\Temp\pip-install-vqe1knc0\llvmlite_e671eaa75a104e25a13f7826ddcf3a51\ffi\build.py", line 192, in <module>
          main()
        File "C:\Users\Computer\AppData\Local\Temp\pip-install-vqe1knc0\llvmlite_e671eaa75a104e25a13f7826ddcf3a51\ffi\build.py", line 180, in main
          main_win32()
        File "C:\Users\Computer\AppData\Local\Temp\pip-install-vqe1knc0\llvmlite_e671eaa75a104e25a13f7826ddcf3a51\ffi\build.py", line 89, in main_win32
          generator = find_win32_generator()
        File "C:\Users\Computer\AppData\Local\Temp\pip-install-vqe1knc0\llvmlite_e671eaa75a104e25a13f7826ddcf3a51\ffi\build.py", line 77, in find_win32_generator
          try_cmake(cmake_dir, build_dir, generator)
        File "C:\Users\Computer\AppData\Local\Temp\pip-install-vqe1knc0\llvmlite_e671eaa75a104e25a13f7826ddcf3a51\ffi\build.py", line 28, in try_cmake
          subprocess.check_call(['cmake', '-G', generator, cmake_dir])
        File "C:\Users\Computer\anaconda3\lib\subprocess.py", line 368, in check_call
          retcode = call(*popenargs, **kwargs)
        File "C:\Users\Computer\anaconda3\lib\subprocess.py", line 349, in call
          with Popen(*popenargs, **kwargs) as p:
        File "C:\Users\Computer\anaconda3\lib\subprocess.py", line 951, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "C:\Users\Computer\anaconda3\lib\subprocess.py", line 1420, in _execute_child
          hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
      FileNotFoundError: [WinError 2] The system cannot find the file specified
      error: command 'C:\\Users\\Computer\\anaconda3\\python.exe' failed with exit code 1
      ----------------------------------------
      ERROR: Failed building wheel for llvmlite
      Running setup.py clean for llvmlite
    Successfully built numba
    Failed to build llvmlite
    Installing collected packages: llvmlite, numba, torch, resampy, audioread, torchaudio, pescador, librosa
      Attempting uninstall: llvmlite
        Found existing installation: llvmlite 0.37.0
    ERROR: Cannot uninstall 'llvmlite'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.```
    opened by Redivh 0
Owner
Ilaria Manco
AI & Music PhD Researcher at the Centre for Digital Music (QMUL)
Ilaria Manco
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Weitang Liu 1.6k Jan 03, 2023
Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

KR-BERT-SimCSE Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT. Training Unsupervised python train_unsupervised.py --mi

Jeong Ukjae 27 Dec 12, 2022
Utilizing RBERT model for KLUE Relation Extraction task

RBERT for Relation Extraction task for KLUE Project Description Relation Extraction task is one of the task of Korean Language Understanding Evaluatio

snoop2head 14 Nov 15, 2022
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

Phil Wang 2.3k Jan 01, 2023
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Speech-Backbones This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab. Grad-TTS Official implementation of the Grad-

HUAWEI Noah's Ark Lab 295 Jan 07, 2023
Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Chenhe Dong 28 Nov 10, 2022
NLP tool to extract emotional phrase from tweets 🀩

Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the

Shahul ES 38 Oct 17, 2022
Practical Machine Learning with Python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

Dipanjan (DJ) Sarkar 2k Jan 08, 2023
Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

Zeqiu (Ellen) Wu 10 Oct 21, 2022
Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (ε›½θͺžη ”ι•·ε˜δ½) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

Koichi Yasuoka 3 Dec 22, 2021
Source code of the "Graph-Bert: Only Attention is Needed for Learning Graph Representations" paper

Graph-Bert Source code of "Graph-Bert: Only Attention is Needed for Learning Graph Representations". Please check the script.py as the entry point. We

14 Mar 25, 2022
Findings of ACL 2021

Assessing Dialogue Systems with Distribution Distances [arXiv][code] We propose to measure the performance of a dialogue system by computing the distr

Yahui Liu 16 Feb 24, 2022
TTS is a library for advanced Text-to-Speech generation.

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretra

Mozilla 6.5k Jan 08, 2023
Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU

Introduction Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization,

Sai Himal Allu 1 Apr 25, 2022
Predict the spans of toxic posts that were responsible for the toxic label of the posts

toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant

Ilias Antonopoulos 3 Jul 24, 2022
The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Subformer This repository contains the code for the Subformer. To help overcome this we propose the Subformer, allowing us to retain performance while

Machel Reid 10 Dec 27, 2022
Chinese Grammatical Error Diagnosis

nlp-CGED Chinese Grammatical Error Diagnosis 中文语法纠错研穢 εŸΊδΊŽεΊεˆ—ζ ‡ζ³¨ηš„ζ–Ήζ³• ζ‰€ιœ€ηŽ―ε’ƒ Python==3.6 tensorflow==1.14.0 keras==2.3.1 bert4keras==0.10.6 η¬”θ€…δ½Ώη”¨δΊ†εΌ€ζΊηš„bert4keras

12 Nov 25, 2022
Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

Matheus Alves 2 Jan 06, 2022
Module for automatic summarization of text documents and HTML pages.

Automatic text summarizer Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains sim

MiΕ‘o Belica 3k Jan 08, 2023
APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

APEACH - Korean Hate Speech Evaluation Datasets APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of

Kevin-Yang 70 Dec 06, 2022