In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Last update: Apr 13, 2022

Overview

Transformers are all you need

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Table of Content

The workshop will be divided into four parts

Introduction to Transformers as a HYPE
Sneak peek to the theory behind Transfomers
Quick tour (Huggingface framework)
Lab
- fine tune a translation model

Note that you can always open the notebooks on Google Colab ( No need to install anything ) you just need a stable internet connection :

- fine tune a translation model

2. How to get started

Fork this repository
Create a branch by your name
Go through the notebook and complete all tasks
Submit a pull request

Homework exercise

Your task is to fine-tune a classification model

Using HuggingFace transformers and datasets.
fine tune it to one of the classification task of the GLUE Benchmark(CoLa to be specific).
Use a checkpoint from the Hub ("distilbert-base-uncased" for example)
Once finished submit a pull request to this repo, make sure to place your .ipynb file in the submissions folder (YOUR_NAME.ipynb)

Useful ressources : text_classification

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Related tags

Overview

Transformers are all you need

Table of Content

Note that you can always open the notebooks on Google Colab ( No need to install anything ) you just need a stable internet connection :

2. How to get started

Homework exercise

Owner

Aymen Berriche

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

A deep learning-based translation library built on Huggingface transformers

A Paper List for Speech Translation

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Persian-lexicon - A lexicon of 70K unique Persian (Farsi) words

Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

Sequence model architectures from scratch in PyTorch

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Implementation of Multistream Transformers in Pytorch

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

This repository contains Python scripts for extracting linguistic features from Filipino texts.

Python SDK for working with Voicegain Speech-to-Text

An Open-Source Package for Neural Relation Extraction (NRE)

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

A complete NLP guideline for enthusiasts

Pipeline for chemical image-to-text competition