Deep learning for NLP crash course at ABBYY.

Last update: Dec 18, 2022

Overview

Deep NLP Course at ABBYY

Deep learning for NLP crash course at ABBYY.

I'm gradually updating and translating the notebooks right now. Stay in touch.

Materials

Week 1: Introduction

Sentiment analysis on the IMDB movie review dataset: a short overview of classical machine learning for NLP + indecently brief intro to keras.

Russian version:

Updated English version:

Week 2: Word Embeddings: Part 1

Meet the Word Embeddings: an unsupervised method to capture some fun relationships between words.
Phrases similarity with word embeddings model + word based machine translation without parallel data (with MUSE word embeddings).

Russian version:

Updated English version:

Week 3: Word Embeddings: Part 2

Introduction to PyTorch. Implementation of pet linear regression on pure numpy and pytorch. Implementations of CBoW, skip-gram, negative sampling and structured Word2vec models.

Russian version:

Updated English version:

Week 4: Convolutional Neural Networks

Introduction to convolutional networks. Relations between convolutions and n-grams. Simple surname detector on character-level convolutions + fun visualizations.

Russian version:

Updated English version:

Week 5: RNNs: Part 1

RNNs for text classification. Simple RNN implementation + memorization test. Surname detector in multilingual setup: character-level LSTM classifier.

Russian version:

Updated English version:

Week 6: RNNs: Part 2

RNNs for sequence labelling. Part-of-speech tagger implementations based on word embeddings and character-level word embeddings.

Russian version:

Week 7: Language Models: Part 1

Character-level language model for Russian troll tweets generation: fixed-window model via convolutions and RNN model.
Simple conditional language model: surname generation given source language.
And Toxic Comment Classification Challenge - to apply your skills to a real-world problem.

Russian version:

Week 8: Language Models: Part 2

Word-level language model for poetry generation. Pet examples of transfer learning and multi-task learning applied to language models.

Russian version:

Week 9: Seq2seq

Seq2seq for machine translation and image captioning. Byte-pair encoding, beam search and other usefull stuff for machine translation.

Russian version:

Week 10: Seq2seq with Attention

Seq2seq with attention for machine translation and image captioning.

Russian version:

Week 11: Transformers & Text Summarization

Implementation of Transformer model for text summarization. Discussion of Pointer-Generator Networks for text summarization.

Russian version:

Week 12: Dialogue Systems: Part 1

Goal-orientied dialogue systems. Implemention of the multi-task model: intent classifier and token tagger for dialogue manager.

Russian version:

Week 13: Dialogue Systems: Part 2

General conversation dialogue systems and DSSMs. Implementation of question answering model on SQuAD dataset and chit-chat model on OpenSubtitles dataset.

Russian version:

Week 14: Pretrained Models

Pretrained models for various tasks: Universal Sentence Encoder for sentence similarity, ELMo for sequence tagging (with a bit of CRF), BERT for SWAG - reasoning about possible continuation.

Russian version:

Final Presentation

NLP Summary - summary of cool stuff that appeared and didn't in the course.

Deep learning for NLP crash course at ABBYY.

Related tags

Overview

Deep NLP Course at ABBYY

Materials

Week 1: Introduction

Week 2: Word Embeddings: Part 1

Week 3: Word Embeddings: Part 2

Week 4: Convolutional Neural Networks

Week 5: RNNs: Part 1

Week 6: RNNs: Part 2

Week 7: Language Models: Part 1

Week 8: Language Models: Part 2

Week 9: Seq2seq

Week 10: Seq2seq with Attention

Week 11: Transformers & Text Summarization

Week 12: Dialogue Systems: Part 1

Week 13: Dialogue Systems: Part 2

Week 14: Pretrained Models

Final Presentation

Owner

Dan Anastasyev

Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

Blazing fast language detection using fastText model

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

A collection of models for image - text generation in ACM MM 2021.

This repo stores the codes for topic modeling on palliative care journals.

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

Transformer related optimization, including BERT, GPT

SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。

Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

A multi-voice TTS system trained with an emphasis on quality

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

A simple implementation of N-gram language model.

Partially offline multi-language translator built upon Huggingface transformers.

🏖 Easy training and deployment of seq2seq models.