Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Overview

Linear Transformers Are Secretly Fast Weight Programmers

This repository contains the code accompanying the paper Linear Transformers Are Secretly Fast Weight Programmers which is published at ICML'21. It also contains the logs of all synthetic experiments.

Synthetic Experiments

Requirements

$ cat req.txt 
jupyter==1.0.0
pandas==1.0.1
seaborn==0.10.0
torch==1.6.0
matplotlib==3.1.3
numpy==1.17.2
pip3 install -r req.txt

Rerun Experiments

Logs are provided in the synthetic/logs folder. The files in that folder are a result of running the following commands:

Setting 1 (capacity):

python3 main.py --begin=20 --end=600 --step=20 --attn_name=softmax --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=linear --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=dpfp --attn_arg=2 --update_rule=sum

python3 main.py --begin=20 --end=600 --step=20 --attn_name=dpfp --attn_arg=3 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=favor --attn_arg=64 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=favor --attn_arg=128 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=favor --attn_arg=512 --update_rule=sum

Setting 2 (update rule):

python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=sum --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=tanh --update_rule=fwm --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=fwm --replace

python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=2 --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=linear --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=favor --attn_arg=64 --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=favor --attn_arg=128 --update_rule=ours --replace

Generate figures from the logs using the following notebooks:

synthetic/setting1_generate_figure.ipynb
synthetic/setting2_generate_figure.ipynb

Language Modelling & Machine Translation

The toolkit and scripts for language modeling experiments can be found at IDSIA/lmtool-fwms.

For machine translation experiments, we ported the different attention functions implemented in the language modeling toolkit to the multi-head attention implementation in FAIRSEQ.

Citation

@inproceedings{schlag2021linear,
      title={Linear Transformers Are Secretly Fast Weight Programmers}, 
      author={Imanol Schlag and Kazuki Irie and J\"urgen Schmidhuber},
      booktitle={Proc. Int. Conf. on Machine Learning (ICML)},
      address = {Virtual only},
      month = jul,
      year={2021}
}
Owner
Imanol Schlag
Imanol Schlag
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

Tencent 633 Dec 28, 2022
Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics. Jury offers a smooth and easy-to-use interface. It uses datasets for underlying metric computa

Open Business Software Solutions 129 Jan 06, 2023
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

4 Nov 11, 2021
German Text-To-Speech Engine using Tacotron and Griffin-Lim

jotts JoTTS is a German text-to-speech engine using tacotron and griffin-lim. The synthesizer model has been trained on my voice using Tacotron1. Due

padmalcom 6 Aug 28, 2022
Chinese Pre-Trained Language Models (CPM-LM) Version-I

CPM-Generate 为了促进中文自然语言处理研究的发展,本项目提供了 CPM-LM (2.6B) 模型的文本生成代码,可用于文本生成的本地测试,并以此为基础进一步研究零次学习/少次学习等场景。[项目首页] [模型下载] [技术报告] 若您想使用CPM-1进行推理,我们建议使用高效推理工具BMI

Tsinghua AI 1.4k Jan 03, 2023
An implementation of WaveNet with fast generation

pytorch-wavenet This is an implementation of the WaveNet architecture, as described in the original paper. Features Automatic creation of a dataset (t

Vincent Herrmann 858 Dec 27, 2022
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

Vincent Hellendoorn 947 Dec 28, 2022
COVID-19 Chatbot with Rasa 2.0: open source conversational AI

COVID-19 chatbot implementation with Rasa open source 2.0, conversational AI framework.

Aazim Parwaz 1 Dec 23, 2022
Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

Jifan Chen 22 Oct 21, 2022
A cross platform OCR Library based on PaddleOCR & OnnxRuntime

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

RapidOCR Team 767 Jan 09, 2023
Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

korean extractive summarization 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드 Leaderboard Notice Text Summarization with Pretrained Encoders에 나오는 bertsumext모델(ext

3 Aug 10, 2022
Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

Yifan Wang 100 Dec 19, 2022
Textlesslib - Library for Textless Spoken Language Processing

textlesslib Textless NLP is an active area of research that aims to extend NLP t

Meta Research 379 Dec 27, 2022
Almost State-of-the-art Text Generation library

Ps: we are adding transformer model soon Text Gen 🐐 Almost State-of-the-art Text Generation library Text gen is a python library that allow you build

Emeka boris ama 63 Jun 24, 2022
🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

Matthias 479 Jan 01, 2023
Code for the ACL 2021 paper "Structural Guidance for Transformer Language Models"

Structural Guidance for Transformer Language Models This repository accompanies the paper, Structural Guidance for Transformer Language Models, publis

International Business Machines 10 Dec 14, 2022
This repository contains Python scripts for extracting linguistic features from Filipino texts.

Filipino Text Linguistic Feature Extractors This repository contains scripts for extracting linguistic features from Filipino texts. The scripts were

Joseph Imperial 1 Oct 05, 2021
Random Directed Acyclic Graph Generator

DAG_Generator Random Directed Acyclic Graph Generator verison1.0 简介 工作流通常由DAG(有向无环图)来定义,其中每个计算任务$T_i$由一个顶点(node,task,vertex)表示。同时,任务之间的每个数据或控制依赖性由一条加权

Livion 17 Dec 27, 2022
Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products.

Leah Pathan Khan 2 Jan 12, 2022