FireFlyer Record file format, writer and reader for DL training samples.

Overview

FFRecord

The FFRecord format is a simple format for storing a sequence of binary records developed by HFAiLab, which supports random access and Linux Asynchronous Input/Output (AIO) read.

File Format

Storage Layout:

+-----------------------------------+---------------------------------------+
|         checksum                  |             N                         |
+-----------------------------------+---------------------------------------+
|         checksums                 |           offsets                     |
+---------------------+---------------------+--------+----------------------+
|      sample 1       |      sample 2       | ....   |      sample N        |
+---------------------+---------------------+--------+----------------------+

Fields:

field size (bytes) description
checksum 4 CRC32 checksum of metadata
N 8 number of samples
checksums 4 * N CRC32 checksum of each sample
offsets 8 * N byte offset of each sample
sample i offsets[i + 1] - offsets[i] data of the i-th sample

Get Started

Requirements

Install

pip3 install ffrecord

Usage

We provide ffrecord.FileWriter and ffrecord.FileReader for reading and writing, respectively.

Write

To create a FileWriter object, you need to specify a file name and the total number of samples. And then you could call FileWriter.write_one() to write a sample to the FFRecord file. It accepts bytes or bytearray as input and appends the data to the end of the opened file.

from ffrecord import FileWriter


def serialize(sample):
    """ Serialize a sample to bytes or bytearray

    You could use anything you like to serialize the sample.
    Here we simply use pickle.dumps().
    """
    return pickle.dumps(sample)


samples = [i for i in range(100)]  # anything you would like to store
fname = 'test.ffr'
n = len(samples)  # number of samples to be written
writer = FileWriter(fname, n)

for i in range(n):
    data = serialize(samples[i])  # data should be bytes or bytearray
    writer.write_one(data)

writer.close()

Read

To create a FileReader object, you only need to specify the file name. And then you could call FileWriter.read() to read multiple samples from the FFReocrd file. It accepts a list of indices as input and outputs the corresponding samples data.

The reader would validate the checksum before returning the data if check_data = True.

from ffrecord import FileReader


def deserialize(data):
    """ deserialize bytes data

    The deserialize method should be paired with the serialize method above.
    """
    return pickle.loads(data)


fname = 'test.ffr'
reader = FileReader(fname, check_data=True)
print(f'Number of samples: {reader.n}')

indices = [3, 6, 0, 10]      # indices of each sample
data = reader.read(indices)  # return a list of bytes data

for i in range(n):
    sample = deserialize(data[i])
    # do what you want

reader.close()

Dataset and DataLoader for PyTorch

We also provide ffrecord.torch.Dataset and ffrecord.torch.DataLoader for PyTorch users to train models using FFRecord.

Different from torch.utils.data.Dataset which accepts an index as input and returns one sample, ffrecord.torch.Dataset accepts a batch of indices as input and returns a batch of samples. One advantage of ffrecord.torch.Dataset is that it could read a batch of data at a time using Linux AIO.

We first read a batch of bytes data from the FFReocrd file and then pass the bytes data to process() function. Users need to inherit from ffrecord.torch.Dataset and define their custom process() function.

Pipline:   indices ----------------------------> bytes -------------> samples
                      reader.read(indices)               process()

For example:

class CustomDataset(ffrecord.torch.Dataset):

    def __init__(self, fname, check_data=True, transform=None):
        super().__init__(fname, check_data)
        self.transform = transform

    def process(self, indices, data):
        # deserialize data
        samples = [pickle.loads(b) for b in data]

        # transform data
        if self.transform:
            samples = [self.transform(s) for s in samples]
        return samples

dataset = CustomDataset('train.ffr')
indices = [3, 4, 1, 0]
samples = dataset[indices]

ffrecord.torch.Dataset could be combined with ffrecord.torch.DataLoader just like PyTorch.

dataset = CustomDataset('train.ffr')
loader = ffrecord.torch.DataLoader(dataset,
                                   batch_size=16,
                                   shuffle=True,
                                   num_workers=8)

for i, batch in enumerate(loader):
    # training model
You might also like...
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play around in the Colab notebook provided. Note that, in both cases, you will need to train a WaveGAN model first

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format
A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co

The Sudachi synonym dictionary in Solar format.

solr-sudachi-synonyms The Sudachi synonym dictionary in Solar format. Summary Run a script that checks for updates to the Sudachi dictionary every hou

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Tevatron Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models. The toolkit has a modularized

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages
Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, Explosion AI 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 French 1.2.3 German 1.2

Comments
  • install error

    install error

    When I install ffrecord with python setup.py install, it failed with the following errors:

    running install
    running bdist_egg
    running egg_info
    creating ffrecord.egg-info
    writing ffrecord.egg-info/PKG-INFO
    writing dependency_links to ffrecord.egg-info/dependency_links.txt
    writing requirements to ffrecord.egg-info/requires.txt
    writing top-level names to ffrecord.egg-info/top_level.txt
    writing manifest file 'ffrecord.egg-info/SOURCES.txt'
    reading manifest file 'ffrecord.egg-info/SOURCES.txt'
    writing manifest file 'ffrecord.egg-info/SOURCES.txt'
    installing library code to build/bdist.linux-x86_64/egg
    running install_lib
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.7
    creating build/lib.linux-x86_64-3.7/ffrecord
    copying ffrecord/fileio.py -> build/lib.linux-x86_64-3.7/ffrecord
    copying ffrecord/__init__.py -> build/lib.linux-x86_64-3.7/ffrecord
    copying ffrecord/utils.py -> build/lib.linux-x86_64-3.7/ffrecord
    creating build/lib.linux-x86_64-3.7/ffrecord/torch
    copying ffrecord/torch/__init__.py -> build/lib.linux-x86_64-3.7/ffrecord/torch
    copying ffrecord/torch/dataset.py -> build/lib.linux-x86_64-3.7/ffrecord/torch
    copying ffrecord/torch/dataloader.py -> build/lib.linux-x86_64-3.7/ffrecord/torch
    running build_ext
    -- The C compiler identification is GNU 7.5.0
    -- The CXX compiler identification is GNU 7.5.0
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Check for working C compiler: /usr/bin/cc - skipped
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Check for working CXX compiler: /usr/bin/c++ - skipped
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Found PythonInterp: /opt/conda/bin/python (found version "3.7.10") 
    -- Found PythonLibs: /opt/conda/lib/libpython3.7m.so
    -- Performing Test HAS_CPP14_FLAG
    -- Performing Test HAS_CPP14_FLAG - Success
    -- Performing Test HAS_CPP11_FLAG
    -- Performing Test HAS_CPP11_FLAG - Success
    -- Performing Test HAS_LTO_FLAG
    -- Performing Test HAS_LTO_FLAG - Success
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /root/ffrecord/build/temp.linux-x86_64-3.7
    [ 20%] Building CXX object CMakeFiles/_ffrecord_cpp.dir/reader.cpp.o
    [ 40%] Building CXX object CMakeFiles/_ffrecord_cpp.dir/writer.cpp.o
    [ 60%] Building CXX object CMakeFiles/_ffrecord_cpp.dir/utils.cpp.o
    [ 80%] Building CXX object CMakeFiles/_ffrecord_cpp.dir/bindings.cpp.o
    /root/ffrecord/ffrecord/src/bindings.cpp: In member function ‘void ffrecord::WriterWrapper::write_one_wrapper(const pybind11::buffer&)’:
    /root/ffrecord/ffrecord/src/bindings.cpp:22:44: error: passing ‘const pybind11::buffer’ as ‘this’ argument discards qualifiers [-fpermissive]
             py::buffer_info info = buf.request();
                                                ^
    In file included from /usr/include/pybind11/cast.h:13:0,
                     from /usr/include/pybind11/attr.h:13,
                     from /usr/include/pybind11/pybind11.h:36,
                     from /root/ffrecord/ffrecord/src/bindings.cpp:1:
    /usr/include/pybind11/pytypes.h:832:17: note:   in call to ‘pybind11::buffer_info pybind11::buffer::request(bool)’
         buffer_info request(bool writable = false) {
                     ^~~~~~~
    /root/ffrecord/ffrecord/src/bindings.cpp: In member function ‘std::vector<pybind11::array> ffrecord::ReaderWrapper::read_batch_wrapper(const std::vector<long int>&)’:
    /root/ffrecord/ffrecord/src/bindings.cpp:41:59: error: invalid conversion from ‘void (*)(void*)’ to ‘void (*)(PyObject*) {aka void (*)(_object*)}’ [-fpermissive]
                 auto capsule = py::capsule(b.data, free_buffer);
                                                               ^
    In file included from /usr/include/pybind11/cast.h:13:0,
                     from /usr/include/pybind11/attr.h:13,
                     from /usr/include/pybind11/pybind11.h:36,
                     from /root/ffrecord/ffrecord/src/bindings.cpp:1:
    /usr/include/pybind11/pytypes.h:734:14: note:   initializing argument 2 of ‘pybind11::capsule::capsule(const void*, void (*)(PyObject*))’
         explicit capsule(const void *value, void (*destruct)(PyObject *) = nullptr)
                  ^~~~~~~
    /root/ffrecord/ffrecord/src/bindings.cpp: In member function ‘pybind11::array ffrecord::ReaderWrapper::read_one_wrapper(int64_t)’:
    /root/ffrecord/ffrecord/src/bindings.cpp:49:55: error: invalid conversion from ‘void (*)(void*)’ to ‘void (*)(PyObject*) {aka void (*)(_object*)}’ [-fpermissive]
             auto capsule = py::capsule(b.data, free_buffer);
                                                           ^
    In file included from /usr/include/pybind11/cast.h:13:0,
                     from /usr/include/pybind11/attr.h:13,
                     from /usr/include/pybind11/pybind11.h:36,
                     from /root/ffrecord/ffrecord/src/bindings.cpp:1:
    /usr/include/pybind11/pytypes.h:734:14: note:   initializing argument 2 of ‘pybind11::capsule::capsule(const void*, void (*)(PyObject*))’
         explicit capsule(const void *value, void (*destruct)(PyObject *) = nullptr)
                  ^~~~~~~
    /root/ffrecord/ffrecord/src/bindings.cpp: In member function ‘pybind11::array_t<long int> ffrecord::ReaderWrapper::get_offsets(int)’:
    /root/ffrecord/ffrecord/src/bindings.cpp:55:58: error: invalid user-defined conversion from ‘ffrecord::ReaderWrapper::get_offsets(int)::<lambda(void*)>’ to ‘void (*)(PyObject*) {aka void (*)(_object*)}’ [-fpermissive]
             auto capsule = py::capsule(v.data(), [](void*) {});
                                                              ^
    /root/ffrecord/ffrecord/src/bindings.cpp:55:54: note: candidate is: ffrecord::ReaderWrapper::get_offsets(int)::<lambda(void*)>::operator void (*)(void*)() const <near match>
             auto capsule = py::capsule(v.data(), [](void*) {});
                                                          ^
    /root/ffrecord/ffrecord/src/bindings.cpp:55:54: note:   no known conversion from ‘void (*)(void*)’ to ‘void (*)(PyObject*) {aka void (*)(_object*)}’
    In file included from /usr/include/pybind11/cast.h:13:0,
                     from /usr/include/pybind11/attr.h:13,
                     from /usr/include/pybind11/pybind11.h:36,
                     from /root/ffrecord/ffrecord/src/bindings.cpp:1:
    /usr/include/pybind11/pytypes.h:734:14: note:   initializing argument 2 of ‘pybind11::capsule::capsule(const void*, void (*)(PyObject*))’
         explicit capsule(const void *value, void (*destruct)(PyObject *) = nullptr)
                  ^~~~~~~
    /root/ffrecord/ffrecord/src/bindings.cpp: In member function ‘pybind11::array_t<unsigned int> ffrecord::ReaderWrapper::get_checksums(int)’:
    /root/ffrecord/ffrecord/src/bindings.cpp:61:58: error: invalid user-defined conversion from ‘ffrecord::ReaderWrapper::get_checksums(int)::<lambda(void*)>’ to ‘void (*)(PyObject*) {aka void (*)(_object*)}’ [-fpermissive]
             auto capsule = py::capsule(v.data(), [](void*) {});
                                                              ^
    /root/ffrecord/ffrecord/src/bindings.cpp:61:54: note: candidate is: ffrecord::ReaderWrapper::get_checksums(int)::<lambda(void*)>::operator void (*)(void*)() const <near match>
             auto capsule = py::capsule(v.data(), [](void*) {});
                                                          ^
    /root/ffrecord/ffrecord/src/bindings.cpp:61:54: note:   no known conversion from ‘void (*)(void*)’ to ‘void (*)(PyObject*) {aka void (*)(_object*)}’
    In file included from /usr/include/pybind11/cast.h:13:0,
                     from /usr/include/pybind11/attr.h:13,
                     from /usr/include/pybind11/pybind11.h:36,
                     from /root/ffrecord/ffrecord/src/bindings.cpp:1:
    /usr/include/pybind11/pytypes.h:734:14: note:   initializing argument 2 of ‘pybind11::capsule::capsule(const void*, void (*)(PyObject*))’
         explicit capsule(const void *value, void (*destruct)(PyObject *) = nullptr)
                  ^~~~~~~
    /root/ffrecord/ffrecord/src/bindings.cpp: At global scope:
    /root/ffrecord/ffrecord/src/bindings.cpp:67:16: error: expected constructor, destructor, or type conversion before ‘(’ token
     PYBIND11_MODULE(_ffrecord_cpp, m) {
                    ^
    CMakeFiles/_ffrecord_cpp.dir/build.make:117: recipe for target 'CMakeFiles/_ffrecord_cpp.dir/bindings.cpp.o' failed
    make[2]: *** [CMakeFiles/_ffrecord_cpp.dir/bindings.cpp.o] Error 1
    CMakeFiles/Makefile2:82: recipe for target 'CMakeFiles/_ffrecord_cpp.dir/all' failed
    make[1]: *** [CMakeFiles/_ffrecord_cpp.dir/all] Error 2
    Makefile:90: recipe for target 'all' failed
    make: *** [all] Error 2
    Traceback (most recent call last):
      File "setup.py", line 24, in <module>
        ext_modules=[cpp_module]
      File "/opt/conda/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/opt/conda/lib/python3.7/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/opt/conda/lib/python3.7/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/opt/conda/lib/python3.7/site-packages/setuptools/command/install.py", line 67, in run
        self.do_egg_install()
      File "/opt/conda/lib/python3.7/site-packages/setuptools/command/install.py", line 109, in do_egg_install
        self.run_command('bdist_egg')
      File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/opt/conda/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 164, in run
        cmd = self.call_command('install_lib', warn_dir=0)
      File "/opt/conda/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
        self.run_command(cmdname)
      File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/opt/conda/lib/python3.7/site-packages/setuptools/command/install_lib.py", line 11, in run
        self.build()
      File "/opt/conda/lib/python3.7/distutils/command/install_lib.py", line 107, in build
        self.run_command('build_ext')
      File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/root/ffrecord/cmake_build.py", line 118, in build_extension
        ["cmake", "--build", "."] + build_args, cwd=self.build_temp
      File "/opt/conda/lib/python3.7/subprocess.py", line 363, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['cmake', '--build', '.']' returned non-zero exit status 2.
    
    bug install 
    opened by jimchenhub 3
  • Error of 0' failed. Number of submitted requests: -22"">

    Error of "RuntimeError: 'ns > 0' failed. Number of submitted requests: -22"

    I apply the sample code from README, but an error occurred in data = self.reader.read(indices) of the __getitem__ method in ffrecord.torch.dataset module. The following are more detailed error messages:


    -- Process 1 terminated with the following error:
    Traceback (most recent call last):
      File "xxxx/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
        fn(i, *args)
      File "xxxx.py", line 172, in worker
        trainer.train(args, gpu_id, rank, train_loader, model, optimizer, scheduler, train_sampler)
      File "xxxx.py", line 39, in train
        for step, batch in enumerate(loader):
      File "xxxx/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
        data = self._next_data()
      File "xxxx/python3.8/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
        return self._process_data(data)
      File "xxxx/python3.8/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
        data.reraise()
      File "xxxx/python3.8/site-packages/site-packages/torch/_utils.py", line 457, in reraise
        raise exception
    RuntimeError: Caught RuntimeError in DataLoader worker process 0.
    Original Traceback (most recent call last):
      File "xxxx/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
        data = fetcher.fetch(index)
      File "xxxx/python3.8/site-packages/ffrecord-1.3.2+35c6863-py3.8-linux-x86_64.egg/ffrecord/torch/dataloader.py", line 151, in fetch
        data = self.dataset[indexes]
      File "xxx.py", line 34, in __getitem__
        data = self.reader.read(indices)
    RuntimeError: 'ns > 0' failed. Number of submitted requests: -22
    Error in std::vector<ffrecord::MemBlock> ffrecord::FileReader::read_batch(const std::vector<long int>&) at xxx/ffrecord/ffrecord/src/reader.cpp line 225
    

    What might be the cause of this error?

    opened by xlxwalex 7
CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT This repo provides the code for reproducing the experiments in CodeBERT: A Pre-Trained Model for Programming and Natural Languages. CodeBERT

Microsoft 1k Jan 03, 2023
A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

MedMCQA MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering A large-scale, Multiple-Choice Question Answe

MedMCQA 24 Nov 30, 2022
Graph4nlp is the library for the easy use of Graph Neural Networks for NLP

Graph4NLP Graph4NLP is an easy-to-use library for R&D at the intersection of Deep Learning on Graphs and Natural Language Processing (i.e., DLG4NLP).

Graph4AI 1.5k Dec 23, 2022
100+ Chinese Word Vectors 上百种预训练中文词向量

Chinese Word Vectors 中文词向量 中文 This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse),

embedding 10.4k Jan 09, 2023
Code for Text Prior Guided Scene Text Image Super-Resolution

Code for Text Prior Guided Scene Text Image Super-Resolution

82 Dec 26, 2022
A Chinese to English Neural Model Translation Project

ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C

Zhenbang Feng 29 Nov 26, 2022
A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework.

Unpacker Karton Service A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework. This project is

c3rb3ru5 45 Jan 05, 2023
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Rebiber: A tool for normalizing bibtex with official info. We often cite papers using their arXiv versions without noting that they are already PUBLIS

(Bill) Yuchen Lin 2k Jan 01, 2023
Two-stage text summarization with BERT and BART

Two-Stage Text Summarization Description We experiment with a 2-stage summarization model on CNN/DailyMail dataset that combines the ability to filter

Yukai Yang (Alexis) 6 Oct 22, 2022
Spert NLP Relation Extraction API deployed with torchserve for inference

URLMask Python program for Linux users to change a URL to ANY domain. A program than can take any url and mask it to any domain name you like. E.g. ne

Zichu Chen 1 Nov 24, 2021
Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

Pretrain and Fine-tune a T5 model with Flax on GCP This tutorial details how pretrain and fine-tune a FlaxT5 model from HuggingFace using a TPU VM ava

Gabriele Sarti 41 Nov 18, 2022
texlive expressions for documents

tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation

Jörg Thalheim 70 Dec 26, 2022
CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

CJK computer science terms comparison This repository contains the source code of the website. You can see the website from the following link: Englis

Hong Minhee (洪 民憙) 88 Dec 23, 2022
Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Tacotron2-HiFiGAN-master Implementation of TTS with combination of Tacotron2 and HiFi-GAN for Mandarin TTS. Inference In order to inference, we need t

SunLu Z 7 Nov 11, 2022
Open source code for AlphaFold.

AlphaFold This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP

DeepMind 9.7k Jan 02, 2023
AudioCLIP Extending CLIP to Image, Text and Audio

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 02, 2023
Pytorch-Named-Entity-Recognition-with-BERT

BERT NER Use google BERT to do CoNLL-2003 NER ! Train model using Python and Inference using C++ ALBERT-TF2.0 BERT-NER-TENSORFLOW-2.0 BERT-SQuAD Requi

Kamal Raj 1.1k Dec 25, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 77.3k Jan 03, 2023
Production First and Production Ready End-to-End Keyword Spotting Toolkit

Production First and Production Ready End-to-End Keyword Spotting Toolkit

223 Jan 02, 2023
Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

Facebook Research 135 Dec 18, 2022