PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Overview

Transformer-PyTorch

A PyTorch implementation of the Transformer from the paper Attention is All You Need in both Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Pre-LN applies LayerNorm to the input of every sublayers instead of the residual connection part in Post-LN. The proposed model architecture in the paper was in Post-LN, however the official implementation has been changed into Pre-LN version. The experiment result shows that Pre-LN transformer converges faster while doesn't even need warming up, and is less sensitive to hyperparameters. For more detail about the difference between them, check out the paper On Layer Normalization in the Transformer Architecture.

A STAR would be so nice if you like it!

Dataset

The English-German small-dataset WMT 2016 multimodal task from torchtext.

Prerequisites

  • Python3
  • PyTorch >= 1.2.0
  • torchtext
  • spacy
  • nltk
  • tqdm

Implementation Notes

  • Beam search is not supported.
  • Label smoothing is not implemented.
  • BPE is not adapted.

Usage

  • Run transformer.ipynb to download dataset and train the model.
  • Change the flag pre_lnorm to determine which to use.

Evaluation

  • Parameter settings
    • hidden size: 512
    • feed forward size: 2048
    • num head: 8
    • layer: 6
    • warm-up: 2000
    • batch size: 128

Generated Examples

Here's an example from test data:

  • source
    • eine frau verwendet eine bohrmaschine während ein mann sie fotografiert .
  • gold
    • a woman uses a drill while another man takes her picture .
  • inference
    • a woman uses an electric drill as a man takes a picture .

TODO

  • Label smoothing
  • Attention visualization

References

Owner
Jared Wang
CS @ UIUC
Jared Wang
Caffe-like explicit model constructor. C(onfig)Model

cmodel Caffe-like explicit model constructor. C(onfig)Model Installation pip install git+https://github.com/bonlime/cmodel Usage In order to allow usi

1 Feb 18, 2022
Supporting code for the paper "Dangers of Bayesian Model Averaging under Covariate Shift"

Dangers of Bayesian Model Averaging under Covariate Shift This repository contains the code to reproduce the experiments in the paper Dangers of Bayes

Pavel Izmailov 25 Sep 21, 2022
Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation

Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation [Arxiv] [Video] Evaluation code for Unrestricted Facial Geometry Reconstr

Matan Sela 242 Dec 30, 2022
Offical implementation for "Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation".

Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation (NeurIPS 2021) by Qiming Hu, Xiaojie Guo. Dependencies P

Qiming Hu 31 Dec 20, 2022
U-Net Brain Tumor Segmentation

U-Net Brain Tumor Segmentation 🚀 :Feb 2019 the data processing implementation in this repo is not the fastest way (code need update, contribution is

Hao 448 Jan 02, 2023
A Python module for parallel optimization of expensive black-box functions

blackbox: A Python module for parallel optimization of expensive black-box functions What is this? A minimalistic and easy-to-use Python module that e

Paul Knysh 426 Dec 08, 2022
Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations"

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations" this repository is maintained by bo

Yuhan Liu 24 Nov 29, 2022
DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

Jingwei Huang 130 Dec 02, 2022
A Partition Filter Network for Joint Entity and Relation Extraction EMNLP 2021

EMNLP 2021 - A Partition Filter Network for Joint Entity and Relation Extraction

zhy 127 Jan 04, 2023
Open CV - Convert a picture to look like a cartoon sketch in python

Use the video https://www.youtube.com/watch?v=k7cVPGpnels for initial learning.

Sammith S Bharadwaj 3 Jan 29, 2022
Pytorch implementation of RED-SDS (NeurIPS 2021).

Recurrent Explicit Duration Switching Dynamical Systems (RED-SDS) This repository contains a reference implementation of RED-SDS, a non-linear state s

Abdul Fatir 10 Dec 02, 2022
Betafold - AlphaFold with tunings

BetaFold We (hegelab.org) craeted this standalone AlphaFold (AlphaFold-Multimer,

2 Aug 11, 2022
This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Mesa: A Memory-saving Training Framework for Transformers This is the official PyTorch implementation for Mesa: A Memory-saving Training Framework for

Zhuang AI Group 105 Dec 06, 2022
Towards Understanding Quality Challenges of the Federated Learning: A First Look from the Lens of Robustness

FL Analysis This repository contains the code and results for the paper "Towards Understanding Quality Challenges of the Federated Learning: A First L

3 Oct 17, 2022
A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction.

Graph2SMILES A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction. 1. Environmental setup System requirements Ubuntu:

29 Nov 18, 2022
GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs

GraphLily: A Graph Linear Algebra Overlay on HBM-Equipped FPGAs GraphLily is the first FPGA overlay for graph processing. GraphLily supports a rich se

Cornell Zhang Research Group 39 Dec 13, 2022
Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

To run the code Unzip the package to your local directory; Run 'pip install -r requirements.txt' to download required packages; Open file ~/nips_code/

32 Dec 26, 2022
pytorch, hand(object) detect ,yolo v5,手检测

YOLO V5 物体检测,包括手部检测。 项目介绍 手部检测 手部检测示例如下 : 视频示例: 项目配置 作者开发环境: Python 3.7 PyTorch = 1.5.1 数据集 手部检测数据集 该项目数据集采用 TV-Hand 和 COCO-Hand (COCO-Hand-Big 部分) 进

Eric.Lee 11 Dec 20, 2022
Classification Modeling: Probability of Default

Credit Risk Modeling in Python Introduction: If you've ever applied for a credit card or loan, you know that financial firms process your information

Aktham Momani 2 Nov 07, 2022
Semantic segmentation task for ADE20k & cityscapse dataset, based on several models.

semantic-segmentation-tensorflow This is a Tensorflow implementation of semantic segmentation models on MIT ADE20K scene parsing dataset and Cityscape

HsuanKung Yang 83 Oct 13, 2022