pytorch implementation of Attention is all you need

Last update: Dec 07, 2022

Overview

A Pytorch Implementation of the Transformer: Attention Is All You Need

Our implementation is largely based on Tensorflow implementation

Requirements

NumPy >= 1.11.1
Pytorch >= 0.3.0
nltk
tensorboard-pytorch (build from source)

Why This Project?

I'm a freshman of pytorch. So I tried to implement some projects by pytorch. Recently, I read the paper Attention is all you need and impressed by the idea. So that's it. I got similar result compared with the original tensorflow implementation.

Differences with the original paper

I don't intend to replicate the paper exactly. Rather, I aim to implement the main ideas in the paper and verify them in a SIMPLE and QUICK way. In this respect, some parts in my code are different than those in the paper. Among them are

I used the IWSLT 2016 de-en dataset, not the wmt dataset because the former is much smaller, and requires no special preprocessing.
I constructed vocabulary with words, not subwords for simplicity. Of course, you can try bpe or word-piece if you want.
I parameterized positional encoding. The paper used some sinusoidal formula, but Noam, one of the authors, says they both work. See the discussion in reddit
The paper adjusted the learning rate to global steps. I fixed the learning to a small number, 0.0001 simply because training was reasonably fast enough with the small dataset (Only a couple of hours on a single GTX 1060!!).

File description

hyperparams.py includes all hyper parameters that are needed.
prepro.py creates vocabulary files for the source and the target.
data_load.py contains functions regarding loading and batching data.
modules.py has all building blocks for encoder/decoder networks.
train.py has the model.
eval.py is for evaluation.

Training

STEP 1. Download IWSLT 2016 German–English parallel corpus and extract it to corpora/ folder.

wget -qO- https://wit3.fbk.eu/archive/2016-01//texts/de/en/de-en.tgz | tar xz; mv de-en corpora

STEP 2. Adjust hyper parameters in hyperparams.py if necessary.
STEP 3. Run prepro.py to generate vocabulary files to the preprocessed folder.
STEP 4. Run train.py or download pretrained weights, put it into folder './models/' and change the eval_epoch in hpyerparams.py to 18
STEP 5. Show loss and accuracy in tensorboard

tensorboard --logdir runs

Evaluation

Run eval.py.

Results

I got a BLEU score of 16.7.(tensorflow implementation 17.14) (Recollect I trained with a small dataset, limited vocabulary) Some of the evaluation results are as follows. Details are available in the results folder.

source: Ich bin nicht sicher was ich antworten soll
expected: I'm not really sure about the answer
got: I'm not sure what I'm going to answer

source: Was macht den Unterschied aus
expected: What makes his story different
got: What makes a difference

source: Vielen Dank
expected: Thank you
got: Thank you

source: Das ist ein Baum
expected: This is a tree
got: So this is a tree

pytorch implementation of Attention is all you need

Related tags

Overview

A Pytorch Implementation of the Transformer: Attention Is All You Need

Requirements

Why This Project?

Differences with the original paper

File description

Training

Evaluation

Results

Owner

Neural Module Network for VQA in Pytorch

CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

[NeurIPS 2020] This project provides a strong single-stage baseline for Long-Tailed Classification, Detection, and Instance Segmentation (LVIS).

Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

Sentiment analysis translations of the Bhagavad Gita

Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

OpenMMLab 3D Human Parametric Model Toolbox and Benchmark

VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries

Collect super-resolution related papers, data, repositories

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

A code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Vanderhaeghe, and Yotam Gingold from SIGGRAPH Asia 2020.

Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

PAWS 🐾 Predicting View-Assignments with Support Samples

Course on computational design, non-linear optimization, and dynamics of soft systems at UIUC.

Code for IntraQ, PyTorch implementation of our paper under review

TabNet for fastai

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API