Versatile Generative Language Model

Last update: Dec 02, 2022

Overview

Versatile Generative Language Model

This is the implementation of the paper:

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning. Zhaojiang Lin, Andrea Madotto, Pascale Fung Findings of EMNLP 2020 [PDF]

If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex is listed below:

@article{lin2020exploring,
  title={Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning},
  author={Lin, Zhaojiang and Madotto, Andrea and Fung, Pascale},
  journal={arXiv preprint arXiv:2004.03829},
  year={2020}
}

Abstract

Fine-tuning pre-trained generative language models to down-stream language generation tasks have shown promising results. However, it comes with the cost of having a single, large, model for each task, which is not ideal in low-memory/power scenarios (e.g., mobile). In this work, we propose an effective way for fine-tuning multiple down-stream generation tasks simultaneously using a single, large pre-trained model. The experiments in five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.

Versatile Generative Language Model (VLM):

Versatile Language Model (VLM) is composed of three components: a pre-trained language model back-bone (e.g., GPT-2), and two kinds of specialized parameters for each generation task such as low-rank residual adapters and task embeddings.

Dependency

Check the packages needed or simply run the command

❱❱❱ pip install -r requirements.txt

Experiments

Dataset

Download the preprocessed datasets

Reproducibility

We provide the trained checkpoint of our VLM.

Test model: choose one task from (mt, summarization, dialogue, qa, nlg].

❱❱❱ python ./evaluate_vlm.py --task mt --no_sample --model_checkpoint $model_path

Fine tune GPT-2

Train machine translation:

❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json

Test machine translation:

❱❱❱ python ./evaluate.py --task mt --no_sample --max_history=2 --model_checkpoint runs/$model_checkpoint

Check run.sh to run other tasks

VLM train Adapters and Task embeddings

Train machine translation without knowledge distillation

❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json --adapter_bottleneck 300 --lr 0.0005

Train machine translation using sentence level knowledge distillation:

❱❱❱ python ./sentence_distiller.py --task mt --max_history=2 --model_checkpoint runs/$fully_finetuned_gpt2_checkpoint --no_sample

❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json --adapter_bottleneck 300 --lr 0.0005 --distillation

Test machine traslation:

❱❱❱ python ./evaluate.py --task mt --no_sample --adapter_bottleneck 300 --model_checkpoint runs/$model_checkpoint

Check run.sh to run other tasks

Combine all the adapters and task embedding into single model

Line 68 of combine_all.py to provide the list of checkpoint

❱❱❱ python combine_all.py

Test to see if the result is same

❱❱❱ python ./evaluate_vlm.py --task mt --no_sample --model_checkpoint $model_path

The above scripts illustrate how to train VLM continuously when tasks arrive sequentially.

Multitask training VLM

When all the tasks available at the same time.

❱❱❱ python ./train_vlm.py --gradient_accumulation_steps=16 --train_batch_size=1 --valid_batch_size=1 --n_epochs 3

Acknowledgement

This repository is implemented base on Huggingface

Versatile Generative Language Model

Related tags

Overview

Versatile Generative Language Model

Abstract

Versatile Generative Language Model (VLM):

Dependency

Experiments

Acknowledgement

Owner

Zhaojiang Lin

Streamlit component for TensorBoard, TensorFlow's visualization toolkit

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Paddle-Adversarial-Toolbox (PAT) is a Python library for Deep Learning Security based on PaddlePaddle.

Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks

Wav2Vec for speech recognition, classification, and audio classification

A Broad Study on the Transferability of Visual Representations with Contrastive Learning

Subdivision-based Mesh Convolutional Networks

Baseline inference Algorithm for the STOIC2021 challenge.

A simple software for capturing human body movements using the Kinect camera.

This repo is about implementing different approaches of pose estimation and also is a sub-task of the smart hospital bed project :smile:

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)

Code repository for our paper regarding the L3D dataset.

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

This is an implementation of PIFuhd based on Pytorch

A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

Official implementation of Unfolded Deep Kernel Estimation for Blind Image Super-resolution.

Unofficial PyTorch code for BasicVSR

A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system