Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Related tags

Text Data & NLPgpt
Overview

Pytorch GPT-X

My Own Pytorch GPT-X

1. Abstract

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

2. Model

Transformer

Additional Module

โ‘  Rezero

Rezero Is All You Need link

โ‘ก Explicit Sparse Transformer

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection link

โ‘ข Macaron Architecture

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View link

โ‘ฃ RealFormer, Residual Attention

RealFormer link

Train

DeepSpeed

TODO

  • ReZero
  • RealFormer, Residual Attention
  • Macaron architectures
  • Macaron architectures - layer Scale 0.5
  • Explicit Sparse Transformer
  • torch lightning
  • Deepspeed train on single GPU
  • Deepspeed parallel trainig on 2 V100 GPU with 16GB Memory

Parameter For Few-shot

The 175B parameter model is very large, but a large model is needed for Few-Shot Learning. So this repository try to use DeepSpeed for training extremely big model.

GPT-3 Config

model_name n_params n_layer d_model n_heads d_head batch_size learning_rate
GPT-3 175B 175B 96 12288 96 128 3.2M 0.6 x 10^-4
GPT-3 13B 13B 40 5140 40 128 2M 1.0 x 10^-4
GPT-3 6.7B 6.7B 32 4096 32 128 2M 1.2 x 10^-4
GPT-3 2.7B 2.7B 32 25560 32 80 1M 1.6 x 10^-4

References

Transformer

DeepSpeed

ReZero

Explicit Sparse Transformer

Macaron Architecrue

Owner
Seonghwan Kim
Seonghwan Kim
SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

SentimentArcs - Emotion in Text An end-to-end pipeline based on Jupyter notebooks to detect, extract, process and anlayze emotion over time in text. E

jon_chun 14 Dec 19, 2022
Leon is an open-source personal assistant who can live on your server.

Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story ๐Ÿ‘‹ Introduction Leon is an open-source personal

Leon AI 11.7k Dec 30, 2022
An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

EleutherAI 3.1k Jan 08, 2023
Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

fake-news-detector-1.0 Lists, lists and more lists... Spam filter list, quality keyword list, stoplist list, top-domains urls list, news agencies webs

Memo Sim 1 Jan 04, 2022
Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you print(fix_encoding("(ร ยธโ€ก'รขล’ยฃ')ร ยธโ€ก")) (เธ‡'โŒฃ')เธ‡ Full documentation: https://ftfy.readthedocs.org Testimonials โ€œMy life is li

Luminoso Technologies, Inc. 3.4k Dec 29, 2022
์ˆญ์‹ค๋Œ€ํ•™๊ต ์ปดํ“จํ„ฐํ•™๋ถ€ ์ „๊ณต์ข…ํ•ฉ์„ค๊ณ„ํ”„๋กœ์ ํŠธ

โœจ ์‹œ๊ฐ์žฅ์• ์ธ์„ ์œ„ํ•œ ๋ฒ„์Šค๋„์ฐฉ ์•Œ๋ฆผ ์žฅ์น˜ โœจ ๐Ÿ‘€ ๊ฐœ์š” ํ˜„๋Œ€ ์‚ฌํšŒ์—์„œ ๋Œ€์ค‘๊ตํ†ต ์œ„์น˜ ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ์‚ฌ๋žŒ๋“ค์ด ๊ฐ„๋‹จํ•˜๊ฒŒ ์ด์šฉํ•  ๋Œ€์ค‘๊ตํ†ต์˜ ์ •๋ณด๋ฅผ ์–ป๊ณ  ์‰ฝ๊ฒŒ ๋Œ€์ค‘๊ตํ†ต์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•ด๋‹น ์ •๋ณด๋Š” ๊ฐ์ข… ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ๋Œ€์ค‘๊ตํ†ต ์ด์šฉ์‹œ์„ค์—์„œ ์œ„์น˜ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ์ง€๋งŒ ์‹œ๊ฐ

taegyun 3 Jan 25, 2022
Shellcode antivirus evasion framework

Schrodinger's Cat Schrodinger'sCat is a Shellcode antivirus evasion framework Technical principle Please visit my blog https://idiotc4t.com/ How to us

idiotc4t 27 Jul 09, 2022
A paper list of pre-trained language models (PLMs).

Large-scale pre-trained language models (PLMs) such as BERT and GPT have achieved great success and become a milestone in NLP.

RUCAIBox 124 Jan 02, 2023
Topic Inference with Zeroshot models

zeroshot_topics Table of Contents Installation Usage License Installation zeroshot_topics is distributed on PyPI as a universal wheel and is available

Rita Anjana 55 Nov 28, 2022
STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

st3 STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch. Currently it supports converting pbmm models to pt scripts with integra

Vlad Ki 8 Oct 18, 2021
Opal-lang - A WIP programming language based on Python

thanks to aphitorite for the beautiful logo! opal opal is a WIP transcompiled pr

3 Nov 04, 2022
LCG T-TEST USING EUCLIDEAN METHOD

This project has been created for statistical usage, purposing for determining ATL takers and nontakers using LCG ttest and Euclidean Method, especially for internal business case in Telkomsel.

2 Jan 21, 2022
Transformation spoken text to written text

Transformation spoken text to written text This model is used for formatting raw asr text output from spoken text to written text (Eg. date, number, i

Nguyen Binh 16 Dec 28, 2022
auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

RUO 2 Feb 22, 2022
Multilingual word vectors in 78 languages

Aligning the fastText vectors of 78 languages Facebook recently open-sourced word vectors in 89 languages. However these vectors are monolingual; mean

Babylon Health 1.2k Dec 17, 2022
MicBot - MicBot uses Google Translate to speak everyone's chat messages

MicBot MicBot uses Google Translate to speak everyone's chat messages. It can al

2 Mar 09, 2022
PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

PRAnCER (Platform enabling Rapid Annotation for Clinical Entity Recognition) is a web platform that enables the rapid annotation of medical terms within clinical notes. A user can highlight spans of

Sontag Lab 39 Nov 14, 2022
NLP - Machine learning

Flipkart-product-reviews NLP - Machine learning About Product reviews is an essential part of an online store like Flipkartโ€™s branding and marketing.

Harshith VH 1 Oct 29, 2021
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

Artifici Online Services inc. 74 Oct 07, 2022
Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022