tldr-transformers

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Models: GPT- *, * BERT *, Adapter- *, * T5, etc.

BERT and T5 (art from the original papers)

Each set of notes includes links to the paper, the original code implementation (if available) and the Huggingface 🤗 implementation.

Here is an example: t5.

The transformers papers are presented somewhat chronologically below. Go to the " 👉 Notes 👈 " column below to find the notes for each paper.

This repo also includes a table quantifying the differences across transformer papers all in one table.

Quick Note
Motivation
Papers::Transformer Papers
Papers::1 Table To Rule Them All
Papers::Alignment Papers
Papers::Scaling Law Papers
Papers::LM Memorization Papers
Papers::Limited Label Learning Papers
How To Contribute
How To Point Our Errors
Citation
License

Quick_Note

This is not an intro to deep learning in NLP. If you are looking for that, I recommend one of the following: Fast AI's course, one of the Coursera courses, or maybe this old thing. Come here after that.

Motivation

With the explosion in papers on all things Transformers the past few years, it seems useful to catalog the salient features/results/insights of each paper in a digestible format. Hence this repo.

Models

Model	Year	Institute	Paper	👉 Notes 👈	Original Code	Huggingface 🤗	Other Repo
Transformer	2017	Google	Attention is All You Need	Skipped, too many good write-ups: Harvard NLP Group Jay Alammar Lilian Weng Something old		?
GPT-3	2018	OpenAI	Language Models are Unsupervised Multitask Learners	To-Do	X	X
GPT-J-6B	2021	EleutherAI	GPT-J-6B: 6B Jax-Based Transformer (public GPT-3)	X	here	x	x
BERT	2018	Google	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	BERT notes	here	here
DistilBERT	2019	Huggingface	DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	DistilBERT notes		here
ALBERT	2019	Google/Toyota	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations	ALBERT notes	here	here
RoBERTa	2019	Facebook	RoBERTa: A Robustly Optimized BERT Pretraining Approach	RoBERTa notes	here	here
BART	2019	Facebook	BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension	BART notes	here	here
T5	2019	Google	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	T5 notes	here	here
Adapter-BERT	2019	Google	Parameter-Efficient Transfer Learning for NLP	Adapter-BERT notes	here	-	here
Megatron-LM	2019	NVIDIA	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism	Megatron notes	here	-	here
Reformer	2020	Google	Reformer: The Efficient Transformer	Reformer notes		here
byT5	2021	Google	ByT5: Towards a token-free future with pre-trained byte-to-byte models	ByT5 notes	here	here
CLIP	2021	OpenAI	Learning Transferable Visual Models From Natural Language Supervision	CLIP notes	here	here
DALL-E	2021	OpenAI	Zero-Shot Text-to-Image Generation	DALL-E notes	here	-
Codex	2021	OpenAI	Evaluating Large Language Models Trained on Code	Codex notes	X	-

BigTable

All of the table summaries found ^ collapsed into one really big table here.

Alignment

Paper	Year	Institute	👉 Notes 👈	Codes
Fine-Tuning Language Models from Human Preferences	2019	OpenAI	To-Do	None

Scaling

Paper	Year	Institute	👉 Notes 👈	Codes
Scaling Laws for Neural Language Models	2020	OpenAI	To-Do	None

Memorization

Paper	Year	Institute	👉 Notes 👈	Codes
Extracting Training Data from Large Language Models	2021	Google et al.	To-Do	None
Deduplicating Training Data Makes Language Models Better	2021	Google et al.	To-Do	None

FewLabels

Paper	Year	Institute	👉 Notes 👈	Codes
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP	2021	GIT/UNC	To-Do	None
Learning with fewer labeled examples	2021	Kevin Murphy & Colin Raffel (Preprint: "Probabilistic Machine Learning", Chapter 19)	Worth a read, won't summarize here.	None

Contribute

If you are interested in contributing to this repo, feel free to do the following:

Fork the repo.
Create a Draft PR with the paper of interest (to prevent "in-flight" issues).
Use the suggested template to write your "tl;dr". If it's an architecture paper, you may also want to add to the larger table here.
Submit your PR.

Errata

Undoubtedly there is information that is incorrect here. Please open an Issue and point it out.

Citation

@misc{cliff-notes-transformers,
  author = {Thompson, Will},
  url = {https://github.com/will-thompson-k/cliff-notes-transformers},
  year = {2021}
}

For the notes above, I've linked the original papers.

License

MIT

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Related tags

Overview

tldr-transformers

Contents

Quick_Note

Motivation

Models

BigTable

Alignment

Scaling

Memorization

FewLabels

Contribute

Errata

Citation

License

Owner

Will Thompson

DL course co-developed by YSDA, HSE and Skoltech

Text Generation by Learning from Demonstrations

Library for 8-bit optimizers and quantization routines.

This repository lets you interact with Lean through a REPL.

This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend.

Demonstrational Session git repo for H SAF User Workshop (28/1)

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Dense Prediction Transformers

Reproduce ResNet-v2(Identity Mappings in Deep Residual Networks) with MXNet

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Official Pytorch implementation of 'RoI Tanh-polar Transformer Network for Face Parsing in the Wild.'

In the AI for TSP competition we try to solve optimization problems using machine learning.

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队

A few stylization coreML models that I've trained with CreateML

A3C LSTM Atari with Pytorch plus A3G design

Rocket-recycling with Reinforcement Learning

September-Assistant - Open-source Windows Voice Assistant

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Related tags

Overview

tldr-transformers

Contents

Quick_Note

Motivation

Models

BigTable

Alignment

Scaling

Memorization

FewLabels

Contribute

Errata

Citation

License

Owner

Will Thompson

DL course co-developed by YSDA, HSE and Skoltech

Text Generation by Learning from Demonstrations

Library for 8-bit optimizers and quantization routines.

This repository lets you interact with Lean through a REPL.

This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend.

Demonstrational Session git repo for H SAF User Workshop (28/1)

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Dense Prediction Transformers

Reproduce ResNet-v2(Identity Mappings in Deep Residual Networks) with MXNet

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Official Pytorch implementation of 'RoI Tanh-polar Transformer Network for Face Parsing in the Wild.'

In the AI for TSP competition we try to solve optimization problems using machine learning.

2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

A few stylization coreML models that I've trained with CreateML

A3C LSTM Atari with Pytorch plus A3G design

Rocket-recycling with Reinforcement Learning

September-Assistant - Open-source Windows Voice Assistant

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队