PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Last update: Feb 27, 2022

Overview

Transformer-PyTorch

A PyTorch implementation of the Transformer from the paper Attention is All You Need in both Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Pre-LN applies LayerNorm to the input of every sublayers instead of the residual connection part in Post-LN. The proposed model architecture in the paper was in Post-LN, however the official implementation has been changed into Pre-LN version. The experiment result shows that Pre-LN transformer converges faster while doesn't even need warming up, and is less sensitive to hyperparameters. For more detail about the difference between them, check out the paper On Layer Normalization in the Transformer Architecture.

A STAR would be so nice if you like it!

Dataset

The English-German small-dataset WMT 2016 multimodal task from torchtext.

Prerequisites

Python3
PyTorch >= 1.2.0
torchtext
spacy
nltk
tqdm

Implementation Notes

Beam search is not supported.
Label smoothing is not implemented.
BPE is not adapted.

Usage

Run transformer.ipynb to download dataset and train the model.
Change the flag pre_lnorm to determine which to use.

Evaluation

Parameter settings
- hidden size: 512
- feed forward size: 2048
- num head: 8
- layer: 6
- warm-up: 2000
- batch size: 128

Generated Examples

Here's an example from test data:

source
- eine frau verwendet eine bohrmaschine während ein mann sie fotografiert .
gold
- a woman uses a drill while another man takes her picture .
inference
- a woman uses an electric drill as a man takes a picture .

TODO

Label smoothing
Attention visualization

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Related tags

Overview

Transformer-PyTorch

A STAR would be so nice if you like it!

Dataset

Prerequisites

Implementation Notes

Usage

Evaluation

Generated Examples

TODO

References

Owner

Jared Wang

This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Machine Learning with JAX Tutorials

Benchmark VAE - Library for Variational Autoencoder benchmarking

Dieser Scanner findet Websites, die nicht direkt in Suchmaschinen auftauchen, aber trotzdem erreichbar sind.

A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

《Lerning n Intrinsic Grment Spce for Interctive Authoring of Grment Animtion》

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Code for layerwise detection of linguistic anomaly paper (ACL 2021)

A flag generation AI created using DeepAIs API

DP-CL(Continual Learning with Differential Privacy)

Repositorio oficial del curso IIC2233 Programación Avanzada 🚀✨

Code base for "On-the-Fly Test-time Adaptation for Medical Image Segmentation"

Do Neural Networks for Segmentation Understand Insideness?

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

Data-depth-inference - Data depth inference with python

Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection

Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs

Geometric Vector Perceptron --- a rotation-equivariant GNN for learning from biomolecular structure