🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

Overview

PyTorch implementation of OpenAI's Finetuned Transformer Language Model

This is a PyTorch implementation of the TensorFlow code provided with OpenAI's paper "Improving Language Understanding by Generative Pre-Training" by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.

This implementation comprises a script to load in the PyTorch model the weights pre-trained by the authors with the TensorFlow implementation.

Transformer Language Model

The model classes and loading script are located in model_pytorch.py.

The names of the modules in the PyTorch model follow the names of the Variable in the TensorFlow implementation. This implementation tries to follow the original code as closely as possible to minimize the discrepancies.

This implementation thus also comprises a modified Adam optimization algorithm as used in OpenAI's paper with:

Requirements

To use the model it-self by importing model_pytorch.py, you just need:

  • PyTorch (version >=0.4)

To run the classifier training script in train.py you will need in addition:

  • tqdm
  • sklearn
  • spacy
  • ftfy
  • pandas

You can download the weights of the OpenAI pre-trained version by cloning Alec Radford's repo and placing the model folder containing the pre-trained weights in the present repo.

Using the pre-trained model as a Transformer Language Model

The model can be used as a transformer language model with OpenAI's pre-trained weights as follow:

from model_pytorch import TransformerModel, load_openai_pretrained_model, DEFAULT_CONFIG

args = DEFAULT_CONFIG
model = TransformerModel(args)
load_openai_pretrained_model(model)

This model generates Transformer's hidden states. You can use the LMHead class in model_pytorch.py to add a decoder tied with the weights of the encoder and get a full language model. You can also use the ClfHead class in model_pytorch.py to add a classifier on top of the transformer and get a classifier as described in OpenAI's publication. (see an example of both in the __main__ function of train.py)

To use the positional encoder of the transformer, you should encode your dataset using the encode_dataset() function of utils.py. Please refer to the beginning of the __main__ function in train.py to see how to properly define the vocabulary and encode your dataset.

Fine-tuning the pre-trained model on a classification task

This model can also be integrated in a classifier as detailed in OpenAI's paper. An example of fine-tuning on the ROCStories Cloze task is included with the training code in train.py

The ROCStories dataset can be downloaded from the associated website.

As with the TensorFlow code, this code implements the ROCStories Cloze Test result reported in the paper which can be reproduced by running:

python -m spacy download en
python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path to data here]

First experiments on the ROCStories test set

Finetuning the PyTorch model for 3 Epochs on ROCStories takes 10 minutes to run on a single NVidia K-80.

The single run test accuracy of this PyTorch version is 85.84%, while the authors reports a median accuracy with the TensorFlow code of 85.8% and the paper reports a best single run accuracy of 86.5%.

The authors implementations uses 8 GPU and can thus accomodate a batch of 64 samples while the present implementation is single GPU and is in consequence limited to 20 instances on a K80 for memory reasons. In our test, increasing the batch size from 8 to 20 samples increased the test accuracy by 2.5 points. A better accuracy may be obtained by using a multi-GPU setting (not tried yet).

The previous SOTA on the ROCStories dataset is 77.6% ("Hidden Coherence Model" of Chaturvedi et al. published in "Story Comprehension for Predicting What Happens Next" EMNLP 2017, which is a very nice paper too!)

Owner
Hugging Face
The AI community building the future.
Hugging Face
[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets Introduction This repo contains the source code accompanying the paper: Well-tuned Sim

52 Jan 04, 2023
Type4Py: Deep Similarity Learning-Based Type Inference for Python

Type4Py: Deep Similarity Learning-Based Type Inference for Python This repository contains the implementation of Type4Py and instructions for re-produ

Software Analytics Lab 45 Dec 15, 2022
Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

T2I_CL This is the official Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning Requirements Linux Python

42 Dec 31, 2022
Facebook Research 605 Jan 02, 2023
This is the repo for Uncertainty Quantification 360 Toolkit.

UQ360 The Uncertainty Quantification 360 (UQ360) toolkit is an open-source Python package that provides a diverse set of algorithms to quantify uncert

International Business Machines 207 Dec 30, 2022
Determined: Deep Learning Training Platform

Determined: Deep Learning Training Platform Determined is an open-source deep learning training platform that makes building models fast and easy. Det

Determined AI 2k Dec 31, 2022
Code repository for "Reducing Underflow in Mixed Precision Training by Gradient Scaling" presented at IJCAI '20

Reducing Underflow in Mixed Precision Training by Gradient Scaling This project implements the gradient scaling method to improve the performance of m

Ruizhe Zhao 5 Apr 14, 2022
Auto-Encoding Score Distribution Regression for Action Quality Assessment

DAE-AQA It is an open source program reference to paper Auto-Encoding Score Distribution Regression for Action Quality Assessment. 1.Introduction DAE

13 Nov 16, 2022
Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021)

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021) Overview Prerequisites Linux Pytho

Shaojie Li 34 Mar 31, 2022
"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

MEGVII Research 179 Jan 02, 2023
A general 3D Object Detection codebase in PyTorch.

Det3D is the first 3D Object Detection toolbox which provides off the box implementations of many 3D object detection algorithms such as PointPillars, SECOND, PIXOR, etc, as well as state-of-the-art

Benjin Zhu 1.4k Jan 05, 2023
Pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments

Cascaded-FCN This repository contains the pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments the liver and its lesions out of

300 Nov 22, 2022
Reference code for the paper CAMS: Color-Aware Multi-Style Transfer.

CAMS: Color-Aware Multi-Style Transfer Mahmoud Afifi1, Abdullah Abuolaim*1, Mostafa Hussien*2, Marcus A. Brubaker1, Michael S. Brown1 1York University

Mahmoud Afifi 36 Dec 04, 2022
A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

SlowFast A PyTorch implementation of SlowFast based on ICCV 2019 paper SlowFast Networks for Video Recognition. Requirements Anaconda PyTorch conda in

Hao Ren 8 Dec 23, 2022
Neural Caption Generator with Attention

Neural Caption Generator with Attention Tensorflow implementation of "Show

Taeksoo Kim 510 Nov 30, 2022
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

2.7k Jan 03, 2023
Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style [NeurIPS 2021] Official code to reproduce the results and data p

Yash Sharma 27 Sep 19, 2022
CUP-DNN is a deep neural network model used to predict tissues of origin for cancers of unknown of primary.

CUP-DNN CUP-DNN is a deep neural network model used to predict tissues of origin for cancers of unknown of primary. The model was trained on the expre

1 Oct 27, 2021
PyTorch implementation for 3D human pose estimation

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach This repository is the PyTorch implementation for the network presented in:

Xingyi Zhou 579 Dec 22, 2022
An Active Automata Learning Library Written in Python

AALpy An Active Automata Learning Library AALpy is a light-weight active automata learning library written in pure Python. You can start learning auto

TU Graz - SAL Dependable Embedded Systems Lab (DES Lab) 78 Dec 30, 2022