GPT, but made only out of gMLPs

Last update: Dec 01, 2022

Overview

GPT - gMLP

This repository will attempt to crack long context autoregressive language modeling (GPT) using variations of gMLPs. Specifically, it will contain a variant that does gMLP for local sliding windows. The hope is to be able to stretch a single GPU to be able to train context lengths of 4096 and above efficiently and well.

GPT is technically a misnomer now, since there will be no attention (transformer) at all contained in the architecture.

Install

$ pip install g-mlp-gpt

Usage

import torch
from g_mlp_gpt import gMLPGPT

model = gMLPGPT(
    num_tokens = 20000,
    dim = 512,
    depth = 4,
    seq_len = 1024,
    window_size = (128, 256, 512, 1024) # window sizes for each depth
)

x = torch.randint(0, 20000, (1, 1000))
logits = model(x) # (1, 1000, 20000)

16k context length

import torch
from g_mlp_gpt import gMLPGPT

model = gMLPGPT(
    num_tokens = 20000,
    dim = 512,
    seq_len = 16384,
    depth = 8,
    reversible = True,
    window = (128, 128, 256, 512, 1024, 1024, 2048, 2048, 4096, 4096, 8192, 8192),
    axial = (1, 1, 1, 1, 1, 1, 2, 2, 4, 4, 8, 8)
).cuda()

x = torch.randint(0, 20000, (1, 16384)).cuda()
logits = model(x) # (1, 16384, 20000)

Citations

@misc{liu2021pay,
    title   = {Pay Attention to MLPs}, 
    author  = {Hanxiao Liu and Zihang Dai and David R. So and Quoc V. Le},
    year    = {2021},
    eprint  = {2105.08050},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

AI-Bot - 一个基于watermelon改造的OpenAI-GPT-2的智能机器人

AI-Bot 一个基于watermelon改造的OpenAI-GPT-2的智能机器人在Binder上直接运行测试目前有两种实现方式 TF2的GPT-2 TF

9 Nov 16, 2022

Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors, I built Ellee - a robotic teddy bear who can move her head and converse naturally.

24 Oct 26, 2022

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning Authors repo (alphabetical) Constantin (CoEich), Mayukh (Mayukh

331 Jan 3, 2023

Simple, but essential Bayesian optimization package

BayesO: A Bayesian optimization framework in Python Simple, but essential Bayesian optimization package. http://bayeso.org Online documentation Instal

74 Dec 5, 2022

Like a cowsay but without cows!

Foxsay This is a simple program that generates pictures of a cute fox with a message. It is like a cowsay but without cows! Fox girls are better! Usag

28 Feb 20, 2022

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

81 Nov 26, 2022

Like Dirt-Samples, but cleaned up

Clean-Samples Like Dirt-Samples, but cleaned up, with clear provenance and license info (generally a permissive creative commons licence but check the

39 Nov 30, 2022

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

49 Dec 22, 2022

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

120 Dec 12, 2022

GPT, but made only out of gMLPs

Related tags

Overview

GPT - gMLP

Install

Usage

Citations

You might also like...

AI-Bot - 一个基于watermelon改造的OpenAI-GPT-2的智能机器人

Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

Simple, but essential Bayesian optimization package

Like a cowsay but without cows!

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Like Dirt-Samples, but cleaned up

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

Releases(0.0.15)

0.0.15(May 25, 2021)

0.0.14(May 25, 2021)

0.0.12(May 24, 2021)

0.0.11(May 23, 2021)

0.0.10(May 23, 2021)

0.0.9(May 21, 2021)

0.0.8(May 21, 2021)

0.0.7(May 21, 2021)

0.0.6(May 21, 2021)

0.0.5(May 20, 2021)

0.0.4(May 20, 2021)

0.0.3(May 20, 2021)

0.0.2(May 20, 2021)

0.0.1(May 20, 2021)

Owner

Phil Wang

Crosslingual Segmental Language Model

We are More than Our JOints: Predicting How 3D Bodies Move

A curated list of neural rendering resources.

Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

Efficient training of deep recommenders on cloud.

Kalidokit is a blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models

Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

Dilated Convolution with Learnable Spacings PyTorch

EmoTag helps you train emotion detection model for Chinese audios

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

A novel benchmark dataset for Monocular Layout prediction

[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution

CausaLM: Causal Model Explanation Through Counterfactual Language Models

PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

AWS documentation corpus for zero-shot open-book question answering.

World Models with TensorFlow 2

Data and codes for ACL 2021 paper: Towards Emotional Support Dialog Systems

Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch