Multistream Convolutional Neural Network (CNN)

A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recognition tasks. It processes input speech with diverse resolutions by applying different dilation rates to convolutional neural networks across multiple streams to achieve the robustness. The dilation rate of 3 are selected from the multiples of a sub-sampling rate of 3 frames. Each stream stacks TDNN-F layers (a variant of 1D CNN), and output embedding vectors from the streams are concatenated then projected to the final layer, as illustrated below:

References

Multistream CNN for Robust Acoustic Modeling [paper]

{
  @inproceedings{han2021multistream-cnn,
    title={Multistream CNN for Robust Acoustic Modeling},
    author={Kyu J. Han and Jing Pan and Venkata Krishna Naveen Tadala and Tao Ma and Dan Povey},
    booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    year={2021}
}

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition [paper]

{
  @inproceedings{pan2020asapp-asr,
    title={ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition},
    author={Jing Pan and Joshua Shapiro and Jeremy Wohlwend and Kyu J. Han and Tao Lei and Tao Ma},
    booktitle={Interspeech},
    year={2020}
}

Installation

Please follow the original Kaldi build sequence, as below.

>> cd tools; make; cd ../src; ./configure; make clean; make -j clean depend; make -j all

Recipes and Results

LibriSpeech

>> egs/librispeech/s5/local/chain/run_multistream_cnn_1a.sh

	dev-clean	dev-other	test-clean	test-other
tdnn_1d	3.29	8.71	3.80	8.76
multistream_cnn_1a	3.20	7.68	3.54	7.87

Fisher-SWBD

>> egs/fisher_swbd/s5/local/chain/run_multistream_cnn_1a.sh

	eval2000	swbd	callhm
tdnn_7d	12.6	8.8	16.3
multistream_cnn_1a	12.6	9.2	15.7

Multistream CNN for Robust Acoustic Modeling

Related tags

Overview

Multistream Convolutional Neural Network (CNN)

References

Installation

Recipes and Results

Owner

ASAPP Research

Answer a series of contextually-dependent questions like they may occur in natural human-to-human conversations.

A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

AsymmetricGAN - Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Malmo Collaborative AI Challenge - Team Pig Catcher

Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021)

[ICCV 2021] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

Simple tutorials on Pytorch DDP training

Evaluating saliency methods on artificial data with different background types

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

Codebase for BMVC 2021 paper "Text Based Person Search with Limited Data"

Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

A Human-in-the-Loop workflow for creating HD images from text

ICCV2021 - A New Journey from SDRTV to HDRTV.

Action Recognition for Self-Driving Cars

Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features"

Using python and scikit-learn to make stock predictions

Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system

Equivariant GNN for the prediction of atomic multipoles up to quadrupoles.

a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper