Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

Overview

HiFi-GAN+

This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All You Need by Jiaqi Su, Yunyun Wang, Adam Finkelstein, and Zeyu Jin.

The model takes a band-limited audio signal (usually 8/16/24kHz) and attempts to reconstruct the high frequency components needed to restore a full-band signal at 48kHz. This is useful for upsampling low-rate outputs from upstream tasks like text-to-speech, voice conversion, etc. or enhancing audio that was filtered to remove high frequency noise. For more information, please see this blog post.

Status

PyPI Tests Coveralls DOI

Wandb Gradio Colab

Usage

The example below uses a pretrained HiFi-GAN+ model to upsample a 1 second 24kHz sawtooth to 48kHz.

import torch
from hifi_gan_bwe import BandwidthExtender

model = BandwidthExtender.from_pretrained("hifi-gan-bwe-10-42890e3-vctk-48kHz")

fs = 24000
x = torch.full([fs], 261.63 / fs).cumsum(-1) % 1.0 - 0.5
y = model(x, fs)

There is a Gradio demo on HugggingFace Spaces where you can upload audio clips and run the model. You can also run the model on Colab with this notebook.

Running with pipx

The HiFi-GAN+ library can be run directly from PyPI if you have the pipx application installed. The following script uses a hosted pretrained model to upsample an MP3 file to 48kHz. The input audio can be in any format supported by the audioread library, and the output can be in any format supported by soundfile.

pipx run --python=python3.9 hifi-gan-bwe \
  hifi-gan-bwe-10-42890e3-vctk-48kHz \
  input.mp3 \
  output.wav

Running in a Virtual Environment

If you have a Python 3.9 virtual environment installed, you can install the HiFi-GAN+ library into it and run synthesis, training, etc. using it.

pip install hifi-gan-bwe

hifi-synth hifi-gan-bwe-10-42890e3-vctk-48kHz input.mp3 output.wav

Pretrained Models

The following models can be loaded with BandwidthExtender.from_pretrained and used for audio upsampling. You can also download the model file from the link and use it offline.

Name Sample Rate Parameters Wandb Metrics Notes
hifi-gan-bwe-10-42890e3-vctk-48kHz 48kHz 1M bwe-10-42890e3 Same as bwe-05, but uses bandlimited interpolation for upsampling, for reduced noise and aliasing. Uses the same parameters as resampy's kaiser_best mode.
hifi-gan-bwe-11-d5f542d-vctk-8kHz-48kHz 48kHz 1M bwe-11-d5f542d Same as bwe-10, but trained only on 8kHz sources, for specialized upsampling.
hifi-gan-bwe-12-b086d8b-vctk-16kHz-48kHz 48kHz 1M bwe-12-b086d8b Same as bwe-10, but trained only on 16kHz sources, for specialized upsampling.
hifi-gan-bwe-13-59f00ca-vctk-24kHz-48kHz 48kHz 1M bwe-13-59f00ca Same as bwe-10, but trained only on 24kHz sources, for specialized upsampling.
hifi-gan-bwe-05-cd9f4ca-vctk-48kHz 48kHz 1M bwe-05-cd9f4ca Trained for 200K iterations on the VCTK speech dataset with noise agumentation from the DNS Challenge dataset.

Training

If you want to train your own model, you can use any of the methods above to install/run the library or fork the repo and run the script commands locally. The following commands are supported:

Name Description
hifi-train Starts a new training run, pass in a name for the run.
hifi-clone Clone an existing training run at a given or the latest checkpoint.
hifi-export Optimize a model for inference and export it to a PyTorch model file (.pt).
hifi-synth Run model inference using a trained model on a source audio file.

For example, you might start a new training run called bwe-01 with the following command:

hifi-train 01

To train a model, you will first need to download the VCTK and DNS Challenge datasets. By default, these datasets are assumed to be in the ./data/vctk and ./data/dns directories. See train.py for how to specify your own training data directories. If you want to use a custom training dataset, you can implement a dataset wrapper in datasets.py.

The training scripts use wandb.ai for experiment tracking and visualization. Wandb metrics can be disabled by passing --no_wandb to the training script. All of my own experiment results are publicly available at wandb.ai/brentspell/hifi-gan-bwe.

Each training run is identified by a name and a git hash (ex: bwe-01-8abbca9). The git hash is used for simple experiment tracking, reproducibility, and model provenance. Using git to manage experiments also makes it easy to change model hyperparameters by simply changing the code, making a commit, and starting the training run. This is why there is no hyperparameter configuration file in the project, since I often end up having to change the code anyway to run interesting experiments.

Development

Setup

The following script creates a virtual environment using pyenv for the project and installs dependencies.

pyenv install 3.9.10
pyenv virtualenv 3.9.10 hifi-gan-bwe
pip install -r requirements.txt

If you want to run the hifi-* scripts described above in development, you can install the package locally:

pip install -e .

You can then run tests, etc. follows:

pytest --cov=hifi_gan_bwe
black .
isort --profile=black .
flake8 .
mypy .

These checks are also included in the pre-commit configuration for the project, so you can set them up to run automatically on commit by running

pre-commit install

Acknowledgements

The original research on the HiFi-GAN+ model is not my own, and all credit goes to the paper's authors. I also referred to kan-bayashi's excellent Parallel WaveGAN implementation, specifically the WaveNet module. If you use this code, please cite the original paper:

@inproceedings{su2021bandwidth,
  title={Bandwidth extension is all you need},
  author={Su, Jiaqi and Wang, Yunyun and Finkelstein, Adam and Jin, Zeyu},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={696--700},
  year={2021},
  organization={IEEE},
  url={https://doi.org/10.1109/ICASSP39728.2021.9413575},
}

License

Copyright © 2022 Brent M. Spell

Licensed under the MIT License (the "License"). You may not use this package except in compliance with the License. You may obtain a copy of the License at

https://opensource.org/licenses/MIT

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Owner
Brent M. Spell
Brent M. Spell
3D-aware GANs based on NeRF (arXiv).

CIPS-3D This repository will contain the code of the paper, CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis.

Peterou 563 Dec 31, 2022
Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.

Hand Gesture Volume Control Modules There are basically three modules Handtracking Program Handtracking Module Volume Control Program Handtracking Pro

VITTAL 1 Jan 12, 2022
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains This repository contains the source code for an end-to-end open-domain question

7 Sep 27, 2022
PyTorch wrappers for using your model in audacity!

audacitorch This package contains utilities for prepping PyTorch audio models for use in Audacity. More specifically, it provides abstract classes for

Hugo Flores García 130 Dec 14, 2022
A best practice for tensorflow project template architecture.

A best practice for tensorflow project template architecture.

Mahmoud Gamal Salem 3.6k Dec 22, 2022
Pansharpening by convolutional neural networks in the full resolution framework

Z-PNN: Zoom Pansharpening Neural Network Pansharpening by convolutional neural networks in the full resolution framework is a deep learning method for

20 Nov 24, 2022
Memory Defense: More Robust Classificationvia a Memory-Masking Autoencoder

Memory Defense: More Robust Classificationvia a Memory-Masking Autoencoder Authors: - Eashan Adhikarla - Dan Luo - Dr. Brian D. Davison Abstract Many

Eashan Adhikarla 4 Dec 25, 2022
Leveraging OpenAI's Codex to solve cornerstone problems in Music

Music-Codex Leveraging OpenAI's Codex to solve cornerstone problems in Music Please NOTE: Presented generated samples were created by OpenAI's Codex P

Alex 2 Mar 11, 2022
This repository is for the preprint "A generative nonparametric Bayesian model for whole genomes"

BEAR Overview This repository contains code associated with the preprint A generative nonparametric Bayesian model for whole genomes (2021), which pro

Debora Marks Lab 10 Sep 18, 2022
Official Pytorch implementation of RePOSE (ICCV2021)

RePOSE: Iterative Rendering and Refinement for 6D Object Detection (ICCV2021) [Link] Abstract We present RePOSE, a fast iterative refinement method fo

Shun Iwase 68 Nov 15, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System This repository contains the PyTorch im

Libo Qin 25 Sep 06, 2022
LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

donglee 279 Dec 13, 2022
Unsupervised Image to Image Translation with Generative Adversarial Networks

Unsupervised Image to Image Translation with Generative Adversarial Networks Paper: Unsupervised Image to Image Translation with Generative Adversaria

Hao 71 Oct 30, 2022
Construct a neural network frame by Numpy

本项目的CSDN博客链接:https://blog.csdn.net/weixin_41578567/article/details/111482022 1. 概览 本项目主要用于神经网络的学习,通过基于numpy的实现,了解神经网络底层前向传播、反向传播以及各类优化器的原理。 该项目目前已实现的功

24 Jan 22, 2022
《Dual-Resolution Correspondence Network》(NeurIPS 2020)

Dual-Resolution Correspondence Network Dual-Resolution Correspondence Network, NeurIPS 2020 Dependency All dependencies are included in asset/dualrcne

Active Vision Laboratory 45 Nov 21, 2022
Wenzhou-Kean University AI-LAB

AI-LAB This is Wenzhou-Kean University AI-LAB. Our research interests are in Computer Vision and Natural Language Processing. Computer Vision Please g

WKU AI-LAB 10 May 05, 2022
A Temporal Extension Library for PyTorch Geometric

Documentation | External Resources | Datasets PyTorch Geometric Temporal is a temporal (dynamic) extension library for PyTorch Geometric. The library

Benedek Rozemberczki 1.9k Jan 07, 2023
Codebase for Image Classification Research, written in PyTorch.

pycls pycls is an image classification codebase, written in PyTorch. It was originally developed for the On Network Design Spaces for Visual Recogniti

Facebook Research 2k Jan 01, 2023
Image-to-Image Translation in PyTorch

CycleGAN and pix2pix in PyTorch New: Please check out contrastive-unpaired-translation (CUT), our new unpaired image-to-image translation model that e

Jun-Yan Zhu 19k Jan 07, 2023
Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

PWLQ Updates 2020/07/16 - We are working on getting permission from our institution to release our source code. We will release it once we are granted

54 Dec 15, 2022