The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Overview

tldr-transformers

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Models: GPT- *, * BERT *, Adapter- *, * T5, etc.

BERT and T5 (art from the original papers)

Β  Β  Β 

Each set of notes includes links to the paper, the original code implementation (if available) and the Huggingface πŸ€— implementation.

Here is an example: t5.

The transformers papers are presented somewhat chronologically below. Go to the " πŸ‘‰ Notes πŸ‘ˆ " column below to find the notes for each paper.

This repo also includes a table quantifying the differences across transformer papers all in one table.

Contents

Quick_Note

This is not an intro to deep learning in NLP. If you are looking for that, I recommend one of the following: Fast AI's course, one of the Coursera courses, or maybe this old thing. Come here after that.

Motivation

With the explosion in papers on all things Transformers the past few years, it seems useful to catalog the salient features/results/insights of each paper in a digestible format. Hence this repo.

Models

Model Year Institute Paper πŸ‘‰ Notes πŸ‘ˆ Original Code Huggingface πŸ€— Other Repo
Transformer 2017 Google Attention is All You Need Skipped, too many good write-ups: ?
GPT-3 2018 OpenAI Language Models are Unsupervised Multitask Learners To-Do X X
GPT-J-6B 2021 EleutherAI GPT-J-6B: 6B Jax-Based Transformer (public GPT-3) X here x x
BERT 2018 Google BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding BERT notes here here
DistilBERT 2019 Huggingface DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter DistilBERT notes here
ALBERT 2019 Google/Toyota ALBERT: A Lite BERT for Self-supervised Learning of Language Representations ALBERT notes here here
RoBERTa 2019 Facebook RoBERTa: A Robustly Optimized BERT Pretraining Approach RoBERTa notes here here
BART 2019 Facebook BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension BART notes here here
T5 2019 Google Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer T5 notes here here
Adapter-BERT 2019 Google Parameter-Efficient Transfer Learning for NLP Adapter-BERT notes here - here
Megatron-LM 2019 NVIDIA Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Megatron notes here - here
Reformer 2020 Google Reformer: The Efficient Transformer Reformer notes here
byT5 2021 Google ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 notes here here
CLIP 2021 OpenAI Learning Transferable Visual Models From Natural Language Supervision CLIP notes here here
DALL-E 2021 OpenAI Zero-Shot Text-to-Image Generation DALL-E notes here -
Codex 2021 OpenAI Evaluating Large Language Models Trained on Code Codex notes X -

BigTable

All of the table summaries found ^ collapsed into one really big table here.

Alignment

Paper Year Institute πŸ‘‰ Notes πŸ‘ˆ Codes
Fine-Tuning Language Models from Human Preferences 2019 OpenAI To-Do None

Scaling

Paper Year Institute πŸ‘‰ Notes πŸ‘ˆ Codes
Scaling Laws for Neural Language Models 2020 OpenAI To-Do None

Memorization

Paper Year Institute πŸ‘‰ Notes πŸ‘ˆ Codes
Extracting Training Data from Large Language Models 2021 Google et al. To-Do None
Deduplicating Training Data Makes Language Models Better 2021 Google et al. To-Do None

FewLabels

Paper Year Institute πŸ‘‰ Notes πŸ‘ˆ Codes
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP 2021 GIT/UNC To-Do None
Learning with fewer labeled examples 2021 Kevin Murphy & Colin Raffel (Preprint: "Probabilistic Machine Learning", Chapter 19) Worth a read, won't summarize here. None

Contribute

If you are interested in contributing to this repo, feel free to do the following:

  1. Fork the repo.
  2. Create a Draft PR with the paper of interest (to prevent "in-flight" issues).
  3. Use the suggested template to write your "tl;dr". If it's an architecture paper, you may also want to add to the larger table here.
  4. Submit your PR.

Errata

Undoubtedly there is information that is incorrect here. Please open an Issue and point it out.

Citation

@misc{cliff-notes-transformers,
  author = {Thompson, Will},
  url = {https://github.com/will-thompson-k/cliff-notes-transformers},
  year = {2021}
}

For the notes above, I've linked the original papers.

License

MIT

Owner
Will Thompson
Will Thompson
UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model

UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model Official repository for the ICCV 2021 paper: UltraPose: Syn

MomoAILab 92 Dec 21, 2022
Created as part of CS50 AI's coursework. This AI makes use of knowledge entailment to calculate the best probabilities to win Minesweeper.

Minesweeper-AI Created as part of CS50 AI's coursework. This AI makes use of knowledge entailment to calculate the best probabilities to win Minesweep

Beckham 0 Jul 20, 2022
A GOOD REPRESENTATION DETECTS NOISY LABELS

A GOOD REPRESENTATION DETECTS NOISY LABELS This code is a PyTorch implementation of the paper: Prerequisites Python 3.6.9 PyTorch 1.7.1 Torchvision 0.

<a href=[email protected]"> 64 Jan 04, 2023
Solving Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge

Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge Associated code for the paper Zero-Shot Learning in Named Entity Recognitio

SΓΈren Hougaard Mulvad 13 Dec 25, 2022
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

Kevin Costa 73 Sep 11, 2022
DGN pymarl - Implementation of DGN on Pymarl, which could be trained by VDN or QMIX

This is the implementation of DGN on Pymarl, which could be trained by VDN or QM

4 Nov 23, 2022
[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger πŸ” Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

Gsunshine 271 Dec 29, 2022
Collective Multi-type Entity Alignment Between Knowledge Graphs (WWW'20)

CG-MuAlign A reference implementation for "Collective Multi-type Entity Alignment Between Knowledge Graphs", published in WWW 2020. If you find our pa

Bran Zhu 28 Dec 11, 2022
A collection of semantic image segmentation models implemented in TensorFlow

A collection of semantic image segmentation models implemented in TensorFlow. Contains data-loaders for the generic and medical benchmark datasets.

bobby 16 Dec 06, 2019
LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation Table of Contents: Introduction Project Structure Installation Datas

Yu Wang 492 Dec 02, 2022
Rasterize with the least efforts for researchers.

utils3d Rasterize and do image-based 3D transforms with the least efforts for researchers. Based on numpy and OpenGL. It could be helpful when you wan

Ruicheng Wang 8 Dec 15, 2022
U-Net: Convolutional Networks for Biomedical Image Segmentation

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras This tutorial shows how to use Keras library to build deep ne

Yihui He 401 Nov 21, 2022
Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Şebnem 6 Jan 18, 2022
Python module providing a framework to trace individual edges in an image using Gaussian process regression.

Edge Tracing using Gaussian Process Regression Repository storing python module which implements a framework to trace individual edges in an image usi

Jamie Burke 7 Dec 27, 2022
diablo2 resurrected loot filter

Only For Chinese and Traditional Chinese The filter only for Chinese and Traditional Chinese, i didn't change it for other language.Maybe you could mo

elmagnifico 249 Dec 04, 2022
Python Algorithm Interview Book Review

파이썬 μ•Œκ³ λ¦¬μ¦˜ 인터뷰 μ±… 리뷰 리뷰 IT λŒ€κΈ°μ—…μ— λ“€μ–΄κ°€κ³  싢은 λͺ©ν‘œκ°€ μžˆλ‹€. λ‚΄κ°€ κΏˆκΏ”μ˜¨ νšŒμ‚¬μ—μ„œ μΌν•˜λŠ” μ‚¬λžŒλ“€μ˜ λͺ¨μŠ΅μ„ 보면 λ©‹μžˆλ‹€κ³  생각이 λ“€κ³  λ‚˜μ˜ λͺ©ν‘œμ— λŒ€ν•œ 열망이 κ°•ν•΄μ§€λŠ” 것 κ°™λ‹€. 미래의 핡심 사업 쀑 ν•˜λ‚˜μΈ SW 뢀뢄을 이끌고 λ°œμ „μ‹œν‚€λŠ” μš°λ¦¬λ‚˜λΌμ˜ I

SharkBSJ 1 Dec 14, 2021
Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs

Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs MATLAB implementation of the paper: P. Mercado, F. Tudisco, and M. Hein,

Pedro Mercado 6 May 26, 2022
Winners of the Facebook Image Similarity Challenge

Winners of the Facebook Image Similarity Challenge

DrivenData 111 Jan 05, 2023
3 Apr 20, 2022
K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)

KCP The official implementation of KCP: k Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching, accepted for p

Yu-Kai Lin 109 Dec 14, 2022