Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

  • The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
  • Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
  • Please refer to tf/README.md for details.

PyTorch

  • The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
  • Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method enwiki8 text8 One Billion Word WT-103 PTB (w/o finetuning)
Previous Best 1.06 1.13 23.7 20.5 55.5
Transformer-XL 0.99 1.08 21.8 18.3 54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Owner
Zhilin Yang
Zhilin Yang
AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-l

Facebook Research 4.6k Jan 09, 2023
XViT - Space-time Mixing Attention for Video Transformer

XViT - Space-time Mixing Attention for Video Transformer This is the official implementation of the XViT paper: @inproceedings{bulat2021space, title

Adrian Bulat 33 Dec 23, 2022
Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

AequeVox Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems README under development. Python Packages Required

Sai Sathiesh 2 Aug 28, 2022
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
一个免费开源一键搭建的通用验证码识别平台,大部分常见的中英数验证码识别都没啥问题。

captcha_server 一个免费开源一键搭建的通用验证码识别平台,大部分常见的中英数验证码识别都没啥问题。 使用方法 python = 3.8 以上环境 pip install -r requirements.txt -i https://pypi.douban.com/simple gun

Sml2h3 189 Dec 02, 2022
Creative Applications of Deep Learning w/ Tensorflow

Creative Applications of Deep Learning w/ Tensorflow This repository contains lecture transcripts and homework assignments as Jupyter Notebooks for th

Parag K Mital 1.5k Dec 30, 2022
ML model to classify between cats and dogs

Cats-and-dogs-classifier This is my first ML model which can classify between cats and dogs. Here the accuracy is around 75%, however , the accuracy c

Sharath V 4 Aug 20, 2021
An implementation of EWC with PyTorch

EWC.pytorch An implementation of Elastic Weight Consolidation (EWC), proposed in James Kirkpatrick et al. Overcoming catastrophic forgetting in neural

Ryuichiro Hataya 166 Dec 22, 2022
[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

Qu-ANTI-zation This repository contains the code for reproducing the results of our paper: Qu-ANTI-zation: Exploiting Quantization Artifacts for Achie

Secure AI Systems Lab 8 Mar 26, 2022
This library is a location of the LegacyLogger for PyTorch Lightning.

neptune-contrib Documentation See neptune-contrib documentation site Installation Get prerequisites python versions 3.5.6/3.6 are supported Install li

neptune.ai 26 Oct 07, 2021
[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

Self-Supervised Learning of Image Scale and Orientation Estimation (BMVC 2021) This is the official implementation of the paper "Self-Supervised Learn

Jongmin Lee 17 Nov 10, 2022
This repository for project that can Automate Number Plate Recognition (ANPR) in Morocco Licensed Vehicles. 💻 + 🚙 + 🇲🇦 = 🤖 🕵🏻‍♂️

MoroccoAI Data Challenge (Edition #001) This Reposotory is result of our work in the comepetiton organized by MoroccoAI in the context of the first Mo

SAFOINE EL KHABICH 14 Oct 31, 2022
Implementation of ICCV21 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers

Implementation of ICCV 2021 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers arxiv This repository is based on detr Recently, DETR

twang 113 Dec 27, 2022
LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision Project | Arxiv | Abstract It is very challenging for various visual tasks such as image

CVSM Group - email: <a href=[email protected]"> 377 Jan 07, 2023
[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

Hyeongseok Son 50 Dec 29, 2022
2D&3D human pose estimation

Human Pose Estimation Papers [CVPR 2016] - 201511 [IJCAI 2016] - 201602 Other Action Recognition with Joints-Pooled 3D Deep Convolutional Descriptors

133 Jan 02, 2023
Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

Trash-Sorter-Extraordinaire Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash

Rameen Mahmood 1 Nov 07, 2021
Generating Band-Limited Adversarial Surfaces Using Neural Networks

Generating Band-Limited Adversarial Surfaces Using Neural Networks This is the official repository of the technical report that was published on arXiv

3 Jul 26, 2022
RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

RMNet: Equivalently Removing Residual Connection from Networks This repository is the official implementation of "RMNet: Equivalently Removing Residua

184 Jan 04, 2023
Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

Visual Inference Lab @TU Darmstadt 85 Dec 22, 2022