Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Last update: May 11, 2022

Overview

Keyword Spotting Transformer

This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train on the 35 words speech command dataset

Paper : Keyword Transformer: A Self-Attention Model for Keyword Spotting

Model architecture

Download the dataset

To download the dataset use the following command

wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
mkdir data
mv ./speech_commands_v0.02.tar.gz ./data
cd ./data
tar -xf ./speech_commands_v0.02.tar.gz
cd ../

Setup virtual environment

virtualenv -p python3 venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt

Training the model

To train the model run this command

python3 train.py --data_dir ${Path to data directory} \
                 --logdir ${Path to log directory} \
                 --num_layers ${Number of sequential encoder layers} \
                 --d_model ${Dimension of the encoder layers} \
                 --num_heads ${Number of heads in multi head attention layer} \
                 --mlp_dim ${Dimension of mlp layers} \
                 --lr ${Learning rate} \
                 --weight_decay ${Weight decay} \
                 --batch_size ${Batch size} \
                 --epochs ${Number of epochs} \
                 --save_dir ${Directory to save the model weights}

To track your training metrics

tensorboard --logdir  ${Path to log directory}

Predicting keyword of audio file

To predict the keyword of the audio file

python3 test.py --model_dir ${Saved model directory} \
                --file_path ${Audio file}

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Related tags

Overview

Keyword Spotting Transformer

Model architecture

Download the dataset

Setup virtual environment

Install dependencies

Training the model

Predicting keyword of audio file

Owner

Intelligent Machines Limited

Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

On Effective Scheduling of Model-based Reinforcement Learning

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

Improving Non-autoregressive Generation with Mixup Training

AI assistant built in python.the features are it can display time,say weather,open-google,youtube,instagram.

This repo contains the code required to train the multivariate time-series Transformer.

BuildingNet: Learning to Label 3D Buildings

Bottleneck Transformers for Visual Recognition

Double pendulum simulator using a symplectic Euler's method and Hamiltonian mechanics

Differentiable simulation for system identification and visuomotor control

A PyTorch Implementation of Neural IMage Assessment

Geneva is an artificial intelligence tool that defeats censorship by exploiting bugs in censors

Optimized primitives for collective multi-GPU communication

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

Explanatory Learning: Beyond Empiricism in Neural Networks

LineBoard - Python+React+MySQL-白板即時系統改善人群行為

Automatic Video Captioning Evaluation Metric --- EMScore

A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

ColossalAI-Benchmark - Performance benchmarking with ColossalAI