Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Overview

Introduction

Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, Funnel-Transformer usually has a higher capacity given the same FLOPs. In addition, with a decoder, Funnel-Transformer is able to recover the token-level deep representation for each token from the reduced hidden sequence, which enables standard pretraining.

For a detailed description of technical details and experimental results, please refer to our paper:

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Zihang Dai*, Guokun Lai*, Yiming Yang, Quoc V. Le

(*: equal contribution)

Preprint 2020

Source Code

Data Download

  • The corresponding source code and instructions are in the data-scrips folder, which specifies how to access the raw data we used in this work.

TensorFlow

  • The corresponding source code is in the tensorflow folder, which was developed and exactly used for TPU pretraining & finetuning as presented in the paper.
  • The TensorFlow funetuning code mainly supports TPU finetuining on GLUE benchmark, text classification, SQuAD and RACE.
  • Please refer to tensorflow/README.md for details.

PyTorch

  • The source code is in the pytorch folder, which only serves as an example PyTorch implementation of Funnel-Transformer.
  • Hence, the PyTorch code only supports GPU finetuning for the GLUE benchmark & text classification.
  • Please refer to pytorch/README.md for details.

Pretrained models

Model Size PyTorch TensorFlow TensorFlow-Full
B10-10-10H1024 Link Link Link
B8-8-8H1024 Link Link Link
B6-6-6H768 Link Link Link
B6-3x2-3x2H768 Link Link Link
B4-4-4H768 Link Link Link

Each .tar.gz file contains three items:

  • A TensorFlow or PyTorch checkpoint (model.ckpt-* or model.ckpt.pt) checkpoint containing the pre-trained weights (Note: The TensorFlow checkpoint actually corresponds to 3 files).
  • A Word Piece model (vocab.uncased.txt) used for (de)tokenization.
  • A config file (net_config.json or net_config.pytorch.json) which specifies the hyperparameters of the model.

You also can use download_all_ckpts.sh to download all checkpoints mentioned above.

For how to use the pretrained models, please refer to tensorflow/README.md or pytorch/README.md respectively.

Results

glue-dev

qa

Owner
GUOKUN LAI
GUOKUN LAI
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 2.1k Jan 01, 2023
ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体

ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体,包括上市公司所属行业关系、行业上级关系、产品上游原材料关系、产品下游产品关系、公司主营产品、产品小类共6大类。 上市公司4,654家,行业511个,产品95,559条、上游材料56,824条,上级行业480条,下游产品390条,产品小类52,937条,所属行业3,946条。

liuhuanyong 415 Jan 06, 2023
BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Table of contents Introduction Using BARTpho with fairseq Using BARTpho with transformers Notes BARTpho: Pre-trained Sequence-to-Sequence Models for V

VinAI Research 58 Dec 23, 2022
Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Dual Path Learning for Domain Adaptation of Semantic Segmentation Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Sema

27 Dec 22, 2022
wxPython app for converting encodings, modifying and fixing SRT files

Subtitle Converter Program za obradu srt i txt fajlova. Requirements: Python version 3.8 wxPython version 4.1.0 or newer Libraries: srt, PyDispatcher

4 Nov 25, 2022
Open source annotation tool for machine learning practitioners.

doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ

7.1k Jan 01, 2023
A Streamlit web app that generates Rick and Morty stories using GPT2.

Rick and Morty Story Generator This project uses a pre-trained GPT2 model, which was fine-tuned on Rick and Morty transcripts, to generate new stories

₸ornike 33 Oct 13, 2022
The code for two papers: Feedback Transformer and Expire-Span.

transformer-sequential This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequentia

Meta Research 125 Dec 25, 2022
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost LOVE is accpeted by ACL22 main conference as a long pape

Lihu Chen 32 Jan 03, 2023
Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Sploitus Command line search tool for sploitus.com. Think searchsploit, but with

watchdog2000 5 Mar 07, 2022
Code Implementation of "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Span-ASTE: Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction ***** New March 31th, 2022: Scikit-Style API for Easy Usage *****

Chia Yew Ken 111 Dec 23, 2022
Fast, DB Backed pretrained word embeddings for natural language processing.

Embeddings Embeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning. Instead of lo

Victor Zhong 212 Nov 21, 2022
STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

STonKGs STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs. This multimodal Transformer combin

STonKGs 27 Aug 11, 2022
PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022
Various capabilities for static malware analysis.

Malchive The malchive serves as a compendium for a variety of capabilities mainly pertaining to malware analysis, such as scripts supporting day to da

MITRE Cybersecurity 64 Nov 22, 2022
Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

e-editiones.org 14 Nov 15, 2022
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

537 Jan 05, 2023
Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

K-BERT Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph", which is implemented based on the UER framework. R

Weijie Liu 834 Jan 09, 2023
TLA - Twitter Linguistic Analysis

TLA - Twitter Linguistic Analysis Tool for linguistic analysis of communities TLA is built using PyTorch, Transformers and several other State-of-the-

Tushar Sarkar 47 Aug 14, 2022