Implementation of Multistream Transformers in Pytorch

Last update: Jul 26, 2022

Overview

Multistream Transformers

Implementation of Multistream Transformers in Pytorch.

This repository deviates slightly from the paper, where instead of using the skip connection across all streams, it uses attention pooling across all tokens in the same position. This has produced the best results in my experiments with number of streams greater than 2.

Install

$ pip install multistream-transformers

Usage

import torch
from multistream_transformers import MultistreamTransformer

model = MultistreamTransformer(
    num_tokens = 256,         # number of tokens
    dim = 512,                # dimension
    depth = 4,                # depth
    causal = True,            # autoregressive or not
    max_seq_len = 1024,       # maximum sequence length
    num_streams = 2           # number of streams - 1 would make it a regular transformer
)

x = torch.randint(0, 256, (2, 1024))
mask = torch.ones((2, 1024)).bool()

logits = model(x, mask = mask) # (2, 1024, 256)

Citations

@misc{burtsev2021multistream,
    title   = {Multi-Stream Transformers}, 
    author  = {Mikhail Burtsev and Anna Rumshisky},
    year    = {2021},
    eprint  = {2107.10342},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

You might also like...

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022

PyTorch Implementation of "Light Field Image Super-Resolution with Transformers"

LFT PyTorch implementation of "Light Field Image Super-Resolution with Transformers", arXiv 2021. [pdf]. Contributions: We make the first attempt to a

62 Nov 28, 2022

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

49 Nov 10, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

🌈 ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

This is the official PyTorch implementation for

Implementation of Multistream Transformers in Pytorch

Related tags

Overview

Multistream Transformers

Install

Usage

Citations

You might also like...

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

PyTorch Implementation of "Light Field Image Super-Resolution with Transformers"

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

Explainability for Vision Transformers (in PyTorch)

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Releases(0.0.4)

0.0.4(Jul 31, 2021)

0.0.3(Jul 31, 2021)

0.0.2(Jul 30, 2021)

0.0.1(Jul 30, 2021)

Owner

Phil Wang

Rax is a Learning-to-Rank library written in JAX

Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis

The official homepage of the COCO-Stuff dataset.

SpinalNet: Deep Neural Network with Gradual Input

Implementation of "Fast and Flexible Temporal Point Processes with Triangular Maps" (Oral @ NeurIPS 2020)

Adaptive Dropblock Enhanced GenerativeAdversarial Networks for Hyperspectral Image Classification

《Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis》(2021)

Reviving Iterative Training with Mask Guidance for Interactive Segmentation

Deep High-Resolution Representation Learning for Human Pose Estimation

Read and write layered TIFF ImageSourceData and ImageResources tags

Internship Assessment Task for BaggageAI.

A script depending on VASP output for calculating Fermi-Softness.

ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA

DrWhy is the collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models.

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

IJON is an annotation mechanism that analysts can use to guide fuzzers such as AFL.

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Exporter for Storage Area Network (SAN)

Self-Learning - Books Papers, Courses & more I have to learn soon