nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

nlp-tutorial

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments or blank lines)

[08-14-2020] Old TensorFlow v1 code is archived in the archive folder. For beginner readability, only pytorch version 1.0 or higher is supported.

Curriculum - (Example Purpose)

1. Basic Embedding Model

1-1. NNLM(Neural Network Language Model) - Predict Next Word
- Paper - A Neural Probabilistic Language Model(2003)
- Colab - NNLM.ipynb
1-2. Word2Vec(Skip-gram) - Embedding Words and Show Graph
- Paper - Distributed Representations of Words and Phrases and their Compositionality(2013)
- Colab - Word2Vec.ipynb
1-3. FastText(Application Level) - Sentence Classification
- Paper - Bag of Tricks for Efficient Text Classification(2016)
- Colab - FastText.ipynb

2. CNN(Convolutional Neural Network)

2-1. TextCNN - Binary Sentiment Classification
- Paper - Convolutional Neural Networks for Sentence Classification(2014)
- TextCNN.ipynb

3. RNN(Recurrent Neural Network)

3-1. TextRNN - Predict Next Step
- Paper - Finding Structure in Time(1990)
- Colab - TextRNN.ipynb
3-2. TextLSTM - Autocomplete
- Paper - LONG SHORT-TERM MEMORY(1997)
- Colab - TextLSTM.ipynb
3-3. Bi-LSTM - Predict Next Word in Long Sentence
- Colab - Bi_LSTM.ipynb

4. Attention Mechanism

4-1. Seq2Seq - Change Word
- Paper - Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation(2014)
- Colab - Seq2Seq.ipynb
4-2. Seq2Seq with Attention - Translate
- Paper - Neural Machine Translation by Jointly Learning to Align and Translate(2014)
- Colab - Seq2Seq(Attention).ipynb
4-3. Bi-LSTM with Attention - Binary Sentiment Classification
- Colab - Bi_LSTM(Attention).ipynb

5. Model based on Transformer

5-1. The Transformer - Translate
- Paper - Attention Is All You Need(2017)
- Colab - Transformer.ipynb, Transformer(Greedy_decoder).ipynb
5-2. BERT - Classification Next Sentence & Predict Masked Tokens
- Paper - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(2018)
- Colab - BERT.ipynb

Dependencies

Python 3.5+
Pytorch 1.0.0+

Author

Tae Hwan Jung(Jeff Jung) @graykode
Author Email : [email protected]
Acknowledgements to mojitok as NLP Research Internship.

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

Related tags

Overview

nlp-tutorial

Curriculum - (Example Purpose)

1. Basic Embedding Model

2. CNN(Convolutional Neural Network)

3. RNN(Recurrent Neural Network)

4. Attention Mechanism

5. Model based on Transformer

Dependencies

Author

Owner

Tae-Hwan Jung

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Dust model dichotomous performance analysis

Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Code for "Generative adversarial networks for reconstructing natural images from brain activity".

OpenAI CLIP text encoders for multiple languages!

NLP Text Classification

Nested Named Entity Recognition

YACLC - Yet Another Chinese Learner Corpus

A paper list of pre-trained language models (PLMs).

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

Harvis is designed to automate your C2 Infrastructure.

[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Machine Psychology: Python Generated Art

Open Source Neural Machine Translation in PyTorch