The official repository for Audio ALBERT

Last update: Dec 11, 2022

Related tags

Overview

AALBERT

Here is also the official repository of AALBERT, which is Pytorch lightning reimplementation of the paper, Audio ALBERT: A Lite Bert for Self-Supervised Learning of Audio Representation. The original code is in AlbertNew branch of s3prl repo. In the paper, we proposed Audio ALBERT, which achieves performance comparable with massive pre-trained networks in the downstream tasks while having 91% fewer parameters.

Dependencies

Python 3.8
Computing power (high-end GPU) and memory space (both RAM/GPU's RAM) is extremely important if you'd like to train your own model.
Required packages and their use are listed requirements.txt.
pip install -r requirements.txt

Pretrain Stage

We use LibriSpeech as our pretraining stage dataset. You can download dataset by this link.

Stage 1: modify dataset path to your local dataset path:

AALBERT: config path: upstream/aalbert/pretrain_config.yaml

    line 16: datarc:
            {Your dataset key name}: {your local dataset path}

Mockingjay: upstream/mockingjay/pretrain_config.yaml

    line 16: datarc:
            {Your dataset key name}: {your local dataset path}

Stage 2: run pretraining script

python run_pretrain.py -n aalbert_pretrained -u aalbert
- -n : experiment_name
- -u : upstream model: {two option: aalbert / mockingjay}
- model will save on result folder after finish pretraining stage.

Downstream Stage

Here, we take voxceleb1 speaker classification as our downstream task. You can download dataset from their official website.

After pretraining, We can extract the pretrained model feature on different downstream tasks.

Stage 1: modify dataset path to your local dataset path
- voxceleb1_speaker: config path: downstream/voxceleb1_speaker/train_config.yaml
```
line  9: datarc:
line 10:    file_path: {your dataset folder path}
line 11:    meta_path: {your label file path}
```

Stage 2: run downstream script

voxceleb1_speaker:

python run_downstream.py \
-c downstream/voxceleb1_speaker/train_config.yaml \
-g result/pretrain/{your_pretrained_model_folder}/model_config.yaml  \
-t result/pretrain/{your_pretrained_model_folder}/pretrained_config.yaml \
-u aalbert \
-d voxceleb1_speaker \
-k result/pretrained/{your pretrained_model_folder}/checkpoints/{checkpoint_you_want_to_use.ckpt} \
-n voxceleb1_result

-n: experiment name
-c: downstream training config
-g: pretrained model config
-t: load pretrained model pretrained config
-u: upstream model: {two option: aalbert / mockingjay}
-d: downstream task name
-k: model checkpoint path
-f: finetune pretrained model or not, default=False

The official repository for Audio ALBERT

Related tags

Overview

AALBERT

Dependencies

Pretrain Stage

Downstream Stage

Owner

pohan

🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.

TwitterMusicBot - A Twitter bot with Spotify integration.

Open-Source bot to play songs in your Telegram's Group Voice Chat. Powered by @Akki_ThePro

controls volume using hand gestures

XA Music Player - Telegram Music Bot

Python library for audio and music analysis

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Enhanced Audio Player for Discord

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

Anki vector Music ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

Audio book player for senior visually impaired.

Musillow is a music recommender app that finds songs similar to your favourites.

Manipulate audio with a simple and easy high level interface

Algorithmic Multi-Instrumental MIDI Continuation Implementation

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Jarvis From Basic to Advance - make a voice assistant similar to JARVIS (in iron man movie)

Users can transcribe their favorite piano recordings to MIDI files after installation

python wrapper for rubberband

This Bot can extract audios and subtitles from video files

Expressive Digital Signal Processing (DSP) package for Python