A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

Last update: Dec 02, 2022

Overview

torch-cif

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.

Usage

def cif_function(
    input: Tensor,
    alpha: Tensor,
    beta: float = 1.0,
    padding_mask: Optional[Tensor] = None,
    target_lengths: Optional[Tensor] = None,
    max_output_length: Optional[int] = None,
    eps: float = 1e-4,
) -> Tuple[Tensor, Tensor, Tensor]:
    r""" A batched computation implementation of continuous integrate and fire (CIF)
    https://arxiv.org/abs/1905.11235

    Args:
        input (Tensor): (N, S, C) Input features to be integrated.
        alpha (Tensor): (N, S) Weights corresponding to each elements in the
            input. It is expected to be after sigmoid function.
        beta (float): the threshold used for determine firing.
        padding_mask (Tensor, optional): (N, S) A binary mask representing
            padded elements in the input.
        target_lengths (Tensor, optional): (N,) Desired length of the targets
            for each sample in the minibatch.
        max_output_length (int, optional): The maximum valid output length used
            in inference. The alpha is scaled down if the sum exceeds this value.
        eps (float, optional): Epsilon to prevent underflow for divisions.
            Default: 1e-4

    Returns: Tuple (output, feat_lengths, alpha_sum)
        output (Tensor): (N, T, C) The output integrated from the source.
        feat_lengths (Tensor): (N,) The output length for each element in batch.
        alpha_sum (Tensor): (N,) The sum of alpha for each element in batch.
            Can be used to compute the quantity loss.
    """

Note

ℹ️ This is a WIP project. the implementation is still being tested.

This implementation uses cumsum and floor to determine the firing positions, and use scatter to merge the weighted source features.
Run test by python test.py (requires pip install expecttest).
Feel free to contact me if there are bugs in the code.

Reference

CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

Related tags

Overview

torch-cif

Usage

Note

Reference

Owner

張致強

This program can detect your face and add an Christams hat on the top of your head

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

A framework for annotating 3D meshes using the predictions of a 2D semantic segmentation model.

CondenseNet: Light weighted CNN for mobile devices

Implementation of Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

GyroSPD: Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Global Filter Networks for Image Classification

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

Source code for Fathony, Sahu, Willmott, & Kolter, "Multiplicative Filter Networks", ICLR 2021.

Reading list for research topics in Masked Image Modeling

Sandbox for training deep learning networks

(AAAI2022) Style Mixing and Patchwise Prototypical Matching for One-Shot Unsupervised Domain Adaptive Semantic Segmentation

Detection of PCBA defect

An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.

Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini!

This package implements the algorithms introduced in Smucler, Sapienza, and Rotnitzky (2020) to compute optimal adjustment sets in causal graphical models.

The Most Efficient Temporal Difference Learning Framework for 2048

PyTorch implementation for ACL 2021 paper "Maria: A Visual Experience Powered Conversational Agent".