Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Last update: May 11, 2022

Overview

Keyword Spotting Transformer

This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train on the 35 words speech command dataset

Paper : Keyword Transformer: A Self-Attention Model for Keyword Spotting

Model architecture

Download the dataset

To download the dataset use the following command

wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
mkdir data
mv ./speech_commands_v0.02.tar.gz ./data
cd ./data
tar -xf ./speech_commands_v0.02.tar.gz
cd ../

Setup virtual environment

virtualenv -p python3 venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt

Training the model

To train the model run this command

python3 train.py --data_dir ${Path to data directory} \
                 --logdir ${Path to log directory} \
                 --num_layers ${Number of sequential encoder layers} \
                 --d_model ${Dimension of the encoder layers} \
                 --num_heads ${Number of heads in multi head attention layer} \
                 --mlp_dim ${Dimension of mlp layers} \
                 --lr ${Learning rate} \
                 --weight_decay ${Weight decay} \
                 --batch_size ${Batch size} \
                 --epochs ${Number of epochs} \
                 --save_dir ${Directory to save the model weights}

To track your training metrics

tensorboard --logdir  ${Path to log directory}

Predicting keyword of audio file

To predict the keyword of the audio file

python3 test.py --model_dir ${Saved model directory} \
                --file_path ${Audio file}

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Related tags

Overview

Keyword Spotting Transformer

Model architecture

Download the dataset

Setup virtual environment

Install dependencies

Training the model

Predicting keyword of audio file

Owner

Intelligent Machines Limited

gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions

Arquitetura e Desenho de Software.

FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

automatic color-grading

Code for Paper: Self-supervised Learning of Motion Capture

Machine learning library for fast and efficient Gaussian mixture models

A `Neural = Symbolic` framework for sound and complete weighted real-value logic

Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

GPU-Accelerated Deep Learning Library in Python

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

General Multi-label Image Classification with Transformers

Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

A project that uses optical flow and machine learning to detect aimhacking in video clips.

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

LightNet++: Boosted Light-weighted Networks for Real-time Semantic Segmentation

Yolov5 + Deep Sort with PyTorch

Dynamic Bottleneck for Robust Self-Supervised Exploration

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

Simple PyTorch implementations of Badnets on MNIST and CIFAR10.

Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018