Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Last update: Nov 30, 2022

Overview

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Reference

Paper URL
Author: Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng
Google Research

Method

1. Dense Synthesizer

2. Fixed Random Synthesizer

3. Random Synthesizer

4. Factorized Dense Synthesizer

5. Factorized Random Synthesizer

6. Mixture of Synthesizers

Usage

import torch

from synthesizer import Transformer, SynthesizerDense, SynthesizerRandom, FactorizedSynthesizerDense, FactorizedSynthesizerRandom, MixtureSynthesizers, get_n_params, calculate_flops


def main():
    batch_size, channel_dim, sentence_length = 2, 1024, 32
    x = torch.randn([batch_size, sentence_length, channel_dim])

    vanilla = Transformer(channel_dim)
    out, attention_map = vanilla(x)
    print(out.size(), attention_map.size())
    n_params, flops = get_n_params(vanilla), calculate_flops(vanilla.children())
    print('vanilla, n_params: {}, flops: {}'.format(n_params, flops))

    dense_synthesizer = SynthesizerDense(channel_dim, sentence_length)
    out, attention_map = dense_synthesizer(x)
    print(out.size(), attention_map.size())
    n_params, flops = get_n_params(dense_synthesizer), calculate_flops(dense_synthesizer.children())
    print('dense_synthesizer, n_params: {}, flops: {}'.format(n_params, flops))

    random_synthesizer = SynthesizerRandom(channel_dim, sentence_length)
    out, attention_map = random_synthesizer(x)
    print(out.size(), attention_map.size())
    n_params, flops = get_n_params(random_synthesizer), calculate_flops(random_synthesizer.children())
    print('random_synthesizer, n_params: {}, flops: {}'.format(n_params, flops))

    random_synthesizer_fix = SynthesizerRandom(channel_dim, sentence_length, fixed=True)
    out, attention_map = random_synthesizer_fix(x)
    print(out.size(), attention_map.size())
    n_params, flops = get_n_params(random_synthesizer_fix), calculate_flops(random_synthesizer_fix.children())
    print('random_synthesizer_fix, n_params: {}, flops: {}'.format(n_params, flops))

    factorized_synthesizer_random = FactorizedSynthesizerRandom(channel_dim)
    out, attention_map = factorized_synthesizer_random(x)
    print(out.size(), attention_map.size())
    n_params, flops = get_n_params(factorized_synthesizer_random), calculate_flops(
        factorized_synthesizer_random.children())
    print('factorized_synthesizer_random, n_params: {}, flops: {}'.format(n_params, flops))

    factorized_synthesizer_dense = FactorizedSynthesizerDense(channel_dim, sentence_length)
    out, attention_map = factorized_synthesizer_dense(x)
    print(out.size(), attention_map.size())
    n_params, flops = get_n_params(factorized_synthesizer_dense), calculate_flops(
        factorized_synthesizer_dense.children())
    print('factorized_synthesizer_dense, n_params: {}, flops: {}'.format(n_params, flops))

    mixture_synthesizer = MixtureSynthesizers(channel_dim, sentence_length)
    out, attention_map = mixture_synthesizer(x)
    print(out.size(), attention_map.size())
    n_params, flops = get_n_params(mixture_synthesizer), calculate_flops(mixture_synthesizer.children())
    print('mixture_synthesizer, n_params: {}, flops: {}'.format(n_params, flops))


if __name__ == '__main__':
    main()

Output

torch.Size([2, 32, 1024]) torch.Size([2, 32, 32])
vanilla, n_params: 3148800, flops: 3145729
torch.Size([2, 32, 1024]) torch.Size([2, 32, 32])
dense_synthesizer, n_params: 1083456, flops: 1082370
torch.Size([2, 32, 1024]) torch.Size([1, 32, 32])
random_synthesizer, n_params: 1050624, flops: 1048577
torch.Size([2, 32, 1024]) torch.Size([1, 32, 32])
random_synthesizer_fix, n_params: 1050624, flops: 1048577
torch.Size([2, 32, 1024]) torch.Size([2, 32, 32])
factorized_synthesizer_random, n_params: 1066000, flops: 1064961
torch.Size([2, 32, 1024]) torch.Size([2, 32, 32])
factorized_synthesizer_dense, n_params: 1061900, flops: 1060865
torch.Size([2, 32, 1024]) torch.Size([2, 32, 32])
mixture_synthesizer, n_params: 3149824, flops: 3145729

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Related tags

Overview

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Reference

Paper URL

Author: Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

Google Research

Method

1. Dense Synthesizer

2. Fixed Random Synthesizer

3. Random Synthesizer

4. Factorized Dense Synthesizer

5. Factorized Random Synthesizer

6. Mixture of Synthesizers

Usage

Output

Paper Performance

Owner

Myeongjun Kim

FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥

Bootstrapped Representation Learning on Graphs

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset

FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection

Genetic Programming in Python, with a scikit-learn inspired API

A PyTorch Library for Accelerating 3D Deep Learning Research

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord.

A framework for Quantification written in Python

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

Lightweight Python library for adding real-time object tracking to any detector.

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

a project for 3D multi-object tracking

Sample code from the Neural Networks from Scratch book.

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Related tags

Overview

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Reference

Paper URL

Author: Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

Google Research

Method

1. Dense Synthesizer

2. Fixed Random Synthesizer

3. Random Synthesizer

4. Factorized Dense Synthesizer

5. Factorized Random Synthesizer

6. Mixture of Synthesizers

Usage

Output

Paper Performance

Owner

Myeongjun Kim

FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

FinRL­-Meta: A Universe for Data­-Driven Financial Reinforcement Learning. 🔥

Bootstrapped Representation Learning on Graphs

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset

FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection

Genetic Programming in Python, with a scikit-learn inspired API

A PyTorch Library for Accelerating 3D Deep Learning Research

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Points2Surf: Learning Implicit Surfaces from Point Clouds (ECCV 2020 Spotlight)

Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord.

A framework for Quantification written in Python

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

Lightweight Python library for adding real-time object tracking to any detector.

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

a project for 3D multi-object tracking

Sample code from the Neural Networks from Scratch book.

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥