Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Last update: Sep 11, 2022

Related tags

Text Data & NLP gpt

Overview

Pytorch GPT-X

My Own Pytorch GPT-X

1. Abstract

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

2. Model

Transformer

Additional Module

① Rezero

Rezero Is All You Need link

② Explicit Sparse Transformer

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection link

③ Macaron Architecture

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View link

④ RealFormer, Residual Attention

RealFormer link

Train

DeepSpeed

TODO

~~ReZero~~
RealFormer, Residual Attention
~~Macaron architectures~~
~~Macaron architectures - layer Scale 0.5~~
~~Explicit Sparse Transformer~~
torch lightning
Deepspeed train on single GPU
Deepspeed parallel trainig on 2 V100 GPU with 16GB Memory

Parameter For Few-shot

The 175B parameter model is very large, but a large model is needed for Few-Shot Learning. So this repository try to use DeepSpeed for training extremely big model.

GPT-3 Config

model_name	n_params	n_layer	d_model	n_heads	d_head	batch_size	learning_rate
GPT-3 175B	175B	96	12288	96	128	3.2M	0.6 x 10^-4
GPT-3 13B	13B	40	5140	40	128	2M	1.0 x 10^-4
GPT-3 6.7B	6.7B	32	4096	32	128	2M	1.2 x 10^-4
GPT-3 2.7B	2.7B	32	25560	32	80	1M	1.6 x 10^-4

References

Transformer

lucidrains/x-transformers

DeepSpeed

ReZero

/majumderb/rezero

Explicit Sparse Transformer

x-transformer: explicit_sparse_transformer

Macaron Architecrue

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Related tags

Overview

Pytorch GPT-X

1. Abstract

2. Model

Transformer

Additional Module

① Rezero

② Explicit Sparse Transformer

③ Macaron Architecture

④ RealFormer, Residual Attention

Train

DeepSpeed

TODO

Parameter For Few-shot

GPT-3 Config

References

Owner

Seonghwan Kim

a chinese segment base on crf

A complete NLP guideline for enthusiasts

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

100+ Chinese Word Vectors 上百种预训练中文词向量

A python package for deep multilingual punctuation prediction.

This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

An Explainable Leaderboard for NLP

Converts text into a PDF of handwritten notes

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

An implementation of the Pay Attention when Required transformer

CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training

Speech Recognition for Uyghur using Speech transformer

A BERT-based reverse dictionary of Korean proverbs

NSFW A chatbot based on GPT2-chitchat

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Related tags

Overview

Pytorch GPT-X

1. Abstract

2. Model

Transformer

Additional Module

① Rezero

② Explicit Sparse Transformer

③ Macaron Architecture

④ RealFormer, Residual Attention

Train

DeepSpeed

TODO

Parameter For Few-shot

GPT-3 Config

References

Owner

Seonghwan Kim

a chinese segment base on crf

A complete NLP guideline for enthusiasts

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

100+ Chinese Word Vectors 上百种预训练中文词向量

A python package for deep multilingual punctuation prediction.

This project deals with a simplified version of a more general problem of Aspect Based Sentiment Analysis.

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

An Explainable Leaderboard for NLP

Converts text into a PDF of handwritten notes

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

An implementation of the Pay Attention when Required transformer

CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training

Speech Recognition for Uyghur using Speech transformer

A BERT-based reverse dictionary of Korean proverbs

**NSFW** A chatbot based on GPT2-chitchat

NSFW A chatbot based on GPT2-chitchat