A quick recipe to learn all about Transformers

Overview

Transformers Recipe

Transformers have accelerated the development of new techniques and models for natural language processing (NLP) tasks. While it has mostly been used for NLP tasks, it is now seeing heavy adoption to address computer vision tasks as well. That makes it a very important concept to understand and be able to apply.

I am aware that a lot of machine learning and NLP students and practitioners are keen on learning about transformers. Therefore, I have prepared this recipe of resources and study materials to help guide students interested in learning about the world of Transformers.

To begin with, I have prepared a few links to materials that I used to better understand and implement transformer models from scratch.

This recipe will also allow me to easily continue to update the study materials needed to learning about Transformers.

๐Ÿง  High-level Introduction

First, try to get a very high-level introduction about transformers. Some references worth looking at:

๐Ÿ”— Transformers From Scratch (Brandon Rohrer)

๐Ÿ”— How Transformers work in deep learning and NLP: an intuitive introduction (AI Summer)

๐Ÿ”— Deep Learning for Language Understanding (DeepMind)

๐ŸŽจ The Illustrated Transformer

Jay Alammar's illustrated explanations are exceptional. Once you get that high-level understanding of transformers, you can jump into this popular detailed and illustrated explanation of transformers:

๐Ÿ”— http://jalammar.github.io/illustrated-transformer/

Figure source: http://jalammar.github.io/illustrated-transformer/

๐Ÿ”– Technical Summary

At this point, you may be looking for a technical summary and overview of transformers. Lilian Weng's blog posts are a gem and provide concise technical explanations/summaries:

๐Ÿ”— https://lilianweng.github.io/lil-log/2020/04/07/the-transformer-family.html

Figure source: https://lilianweng.github.io/lil-log/2020/04/07/the-transformer-family.html

๐Ÿ‘ฉ๐Ÿผโ€๐Ÿ’ป Implementation

After the theory, it's important to test the knowledge. I typically prefer to understand things in more detail so I prefer to implement algorithms from scratch. For implementing transformers, I mainly relied on this tutorial:

๐Ÿ”— https://nlp.seas.harvard.edu/2018/04/03/attention.html

(Google Colab | GitHub)

Figure source: https://nlp.seas.harvard.edu/2018/04/03/attention.html

๐Ÿ“„ Attention Is All You Need

This paper by Vaswani et al. introduced the Transformer architecture. Read it after you have a high-level understanding and want to get into the details. Pay attention to other references in the paper for diving deep.

๐Ÿ”— https://arxiv.org/pdf/1706.03762v5.pdf

Figure source: https://arxiv.org/pdf/1706.03762v5.pdf

๐Ÿ‘ฉ๐Ÿผโ€๐Ÿ’ป Applying Transformers

After some time studying and understanding the theory behind transformers, you may be interested in applying them to different NLP projects or research. At this time, your best bet is the Transformers library by HuggingFace.

๐Ÿ”— https://github.com/huggingface/transformers

The Hugging Face Team is also publishing a new book on NLP with Transformers, so you might want to check that out here.


Feel free to suggest study material. In the next update, I am looking to add a more comprehensive collection of Transformer applications and papers. In addition, a code implementation for easy experimentation is coming as well. Stay tuned!

To get regular updates on new ML and NLP resources, follow me on Twitter.

Owner
DAIR.AI
Democratizing Artificial Intelligence Research, Education, and Technologies
DAIR.AI
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

220 Dec 31, 2022
CNN visualization tool in TensorFlow

tf_cnnvis A blog post describing the library: https://medium.com/@falaktheoptimist/want-to-look-inside-your-cnn-we-have-just-the-right-tool-for-you-ad

InFoCusp 778 Jan 02, 2023
Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

Session-aware BERT4Rec Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 shor

Jamie J. Seol 22 Dec 13, 2022
Pytorch Implementation for (STANet+ and STANet)

Pytorch Implementation for (STANet+ and STANet) V2-Weakly Supervised Visual-Auditory Saliency Detection with Multigranularity Perception (arxiv), pdf:

GuotaoWang 14 Nov 29, 2022
๐Ÿ† The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval ๐Ÿ† The 1st Place Submission to AICity Challenge 2021 Natural

82 Dec 29, 2022
NHS AI Lab Skunkworks project: Long Stayer Risk Stratification

NHS AI Lab Skunkworks project: Long Stayer Risk Stratification A pilot project for the NHS AI Lab Skunkworks team, Long Stayer Risk Stratification use

NHSX 21 Nov 14, 2022
Implement some metaheuristics and cost functions

Metaheuristics This repot implement some metaheuristics and cost functions. Metaheuristics JAYA Implement Jaya optimizer without constraints. Cost fun

Adri1G 1 Mar 23, 2022
YOLOv7 - Framework Beyond Detection

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ YOLO with Transformers and Instance Segmentation, with TensorRT acceleration! ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

JinTian 3k Jan 01, 2023
Encoding Causal Macrovariables

Encoding Causal Macrovariables Data Natural climate data ('El Nino') Self-generated data ('Simulated') Experiments Detecting macrovariables through th

Benedikt Hรถltgen 3 Jul 31, 2022
Betafold - AlphaFold with tunings

BetaFold We (hegelab.org) craeted this standalone AlphaFold (AlphaFold-Multimer,

2 Aug 11, 2022
Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation

SUCP Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation () Direct Friends (i.e., users who follow each o

Kosar 8 Nov 26, 2022
KIND: an Italian Multi-Domain Dataset for Named Entity Recognition

KIND (Kessler Italian Named-entities Dataset) KIND is an Italian dataset for Named-Entity Recognition. It contains more than one million tokens with t

Digital Humanities 5 Jun 21, 2022
A graph adversarial learning toolbox based on PyTorch and DGL.

GraphWar: Arms Race in Graph Adversarial Learning NOTE: GraphWar is still in the early stages and the API will likely continue to change. ๐Ÿš€ Installat

Jintang Li 54 Jan 05, 2023
novel deep learning research works with PaddlePaddle

Research ๅ‘ๅธƒๅŸบไบŽ้ฃžๆกจ็š„ๅ‰ๆฒฟ็ ”็ฉถๅทฅไฝœ๏ผŒๅŒ…ๆ‹ฌCVใ€NLPใ€KGใ€STDM็ญ‰้ข†ๅŸŸ็š„้กถไผš่ฎบๆ–‡ๅ’Œๆฏ”่ต›ๅ† ๅ†›ๆจกๅž‹ใ€‚ ็›ฎๅฝ• ่ฎก็ฎ—ๆœบ่ง†่ง‰(Computer Vision) ่‡ช็„ถ่ฏญ่จ€ๅค„็†(Natrual Language Processing) ็Ÿฅ่ฏ†ๅ›พ่ฐฑ(Knowledge Graph) ๆ—ถ็ฉบๆ•ฐๆฎๆŒ–ๆŽ˜(Spa

1.5k Dec 29, 2022
Just-Now - This Is Just Now Login Friendlist Cloner Tools

JUST NOW LOGIN FRIENDLIST CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 21 Mar 09, 2022
Weakly Supervised End-to-End Learning (NeurIPS 2021)

WeaSEL: Weakly Supervised End-to-end Learning This is a PyTorch-Lightning-based framework, based on our End-to-End Weak Supervision paper (NeurIPS 202

Auton Lab, Carnegie Mellon University 131 Jan 06, 2023
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

์ด์ค€ํ˜(Junhyeok Lee) 64 Dec 22, 2022
Attendance Monitoring with Face Recognition using Python

Attendance Monitoring with Face Recognition using Python A python GUI integrated attendance system using face recognition to take attendance. In this

Vaibhav Rajput 2 Jun 21, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
hipCaffe: the HIP port of Caffe

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Cent

ROCm Software Platform 126 Dec 05, 2022