Versatile Generative Language Model

Overview

Versatile Generative Language Model

License: MIT

This is the implementation of the paper:

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning. Zhaojiang Lin, Andrea Madotto, Pascale Fung Findings of EMNLP 2020 [PDF]

If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex is listed below:

@article{lin2020exploring,
  title={Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning},
  author={Lin, Zhaojiang and Madotto, Andrea and Fung, Pascale},
  journal={arXiv preprint arXiv:2004.03829},
  year={2020}
}

Abstract

Fine-tuning pre-trained generative language models to down-stream language generation tasks have shown promising results. However, it comes with the cost of having a single, large, model for each task, which is not ideal in low-memory/power scenarios (e.g., mobile). In this work, we propose an effective way for fine-tuning multiple down-stream generation tasks simultaneously using a single, large pre-trained model. The experiments in five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.

Versatile Generative Language Model (VLM):

Versatile Language Model (VLM) is composed of three components: a pre-trained language model back-bone (e.g., GPT-2), and two kinds of specialized parameters for each generation task such as low-rank residual adapters and task embeddings.

Dependency

Check the packages needed or simply run the command

❱❱❱ pip install -r requirements.txt

Experiments

Dataset

Download the preprocessed datasets

Reproducibility

We provide the trained checkpoint of our VLM.

Test model: choose one task from (mt, summarization, dialogue, qa, nlg].

❱❱❱ python ./evaluate_vlm.py --task mt --no_sample --model_checkpoint $model_path

Fine tune GPT-2

Train machine translation:

❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json

Test machine translation:

❱❱❱ python ./evaluate.py --task mt --no_sample --max_history=2 --model_checkpoint runs/$model_checkpoint

Check run.sh to run other tasks

VLM train Adapters and Task embeddings

Train machine translation without knowledge distillation

❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json --adapter_bottleneck 300 --lr 0.0005

Train machine translation using sentence level knowledge distillation:

❱❱❱ python ./sentence_distiller.py --task mt --max_history=2 --model_checkpoint runs/$fully_finetuned_gpt2_checkpoint --no_sample
❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json --adapter_bottleneck 300 --lr 0.0005 --distillation

Test machine traslation:

❱❱❱ python ./evaluate.py --task mt --no_sample --adapter_bottleneck 300 --model_checkpoint runs/$model_checkpoint

Check run.sh to run other tasks

Combine all the adapters and task embedding into single model

Line 68 of combine_all.py to provide the list of checkpoint

❱❱❱ python combine_all.py

Test to see if the result is same

❱❱❱ python ./evaluate_vlm.py --task mt --no_sample --model_checkpoint $model_path

The above scripts illustrate how to train VLM continuously when tasks arrive sequentially.

Multitask training VLM

When all the tasks available at the same time.

❱❱❱ python ./train_vlm.py --gradient_accumulation_steps=16 --train_batch_size=1 --valid_batch_size=1 --n_epochs 3

Acknowledgement

This repository is implemented base on Huggingface

Owner
Zhaojiang Lin
Ph.D. Candidate - NLP - Deep Learning
Zhaojiang Lin
PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE) PyTorch code fo

Xinlei-Pei 6 Dec 23, 2022
《Dual-Resolution Correspondence Network》(NeurIPS 2020)

Dual-Resolution Correspondence Network Dual-Resolution Correspondence Network, NeurIPS 2020 Dependency All dependencies are included in asset/dualrcne

Active Vision Laboratory 45 Nov 21, 2022
This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

SBEVNet: End-to-End Deep Stereo Layout Estimation This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by D

Divam Gupta 19 Dec 17, 2022
The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Box-Aware Tracker (BAT) Pytorch-Lightning implementation of the Box-Aware Tracker. Box-Aware Feature Enhancement for Single Object Tracking on Point C

Kangel Zenn 5 Mar 26, 2022
Optical Character Recognition + Instance Segmentation for russian and english languages

Распознавание рукописного текста в школьных тетрадях Соревнование, проводимое в рамках олимпиады НТО, разработанное Сбером. Платформа ODS. Результаты

Gerasimov Maxim 21 Dec 19, 2022
Pytorch for Segmentation

Pytorch for Semantic Segmentation This repo has been deprecated currently and I will not maintain it. Meanwhile, I strongly recommend you can refer to

ycszen 411 Nov 22, 2022
Implementation of ICLR 2020 paper "Revisiting Self-Training for Neural Sequence Generation"

Self-Training for Neural Sequence Generation This repo includes instructions for running noisy self-training algorithms from the following paper: Revi

Junxian He 45 Dec 31, 2022
Feature board for ERPNext

ERPNext Feature Board Feature board for ERPNext Development Prerequisites k3d kubectl helm bench Install K3d Cluster # export K3D_FIX_CGROUPV2=1 # use

Revant Nandgaonkar 16 Nov 09, 2022
Logistic Bandit experiments. Official code for the paper "Jointly Efficient and Optimal Algorithms for Logistic Bandits".

Code for the paper Jointly Efficient and Optimal Algorithms for Logistic Bandits, by Louis Faury, Marc Abeille, Clément Calauzènes and Kwang-Sun Jun.

Faury Louis 1 Jan 22, 2022
Jarvis Project is a basic virtual assistant that uses TensorFlow for learning.

Jarvis_proyect Jarvis Project is a basic virtual assistant that uses TensorFlow for learning. Latest version 0.1 Features: Good morning protocol Tell

Anze Kovac 3 Aug 31, 2022
Neural network-based build time estimation for additive manufacturing

Neural network-based build time estimation for additive manufacturing Oh, Y., Sharp, M., Sprock, T., & Kwon, S. (2021). Neural network-based build tim

Yosep 1 Nov 15, 2021
Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets (including obl

Azavea 1.7k Dec 22, 2022
Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021

Self-Supervised Bug Detection and Repair This is the reference code to replicate the research in Self-Supervised Bug Detection and Repair in NeurIPS 2

Microsoft 85 Dec 24, 2022
Code for "Searching for Efficient Multi-Stage Vision Transformers"

Searching for Efficient Multi-Stage Vision Transformers This repository contains the official Pytorch implementation of "Searching for Efficient Multi

Yi-Lun Liao 62 Oct 25, 2022
Python Implementation of Chess Playing AI with variable difficulty

Chess AI with variable difficulty level implemented using the MiniMax AB-Pruning Algorithm

Ali Imran 7 Feb 20, 2022
Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

Stratified Transformer for 3D Point Cloud Segmentation Xin Lai*, Jianhui Liu*, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, Jiaya Jia

DV Lab 195 Jan 01, 2023
Pointer networks Tensorflow2

Pointer networks Tensorflow2 原文:https://arxiv.org/abs/1506.03134 仅供参考与学习,内含代码备注 环境 tensorflow==2.6.0 tqdm matplotlib numpy 《pointer networks》阅读笔记 应用场景

HUANG HAO 7 Oct 27, 2022
Optimal Adaptive Allocation using Deep Reinforcement Learning in a Dose-Response Study

Optimal Adaptive Allocation using Deep Reinforcement Learning in a Dose-Response Study Supplementary Materials for Kentaro Matsuura, Junya Honda, Imad

Kentaro Matsuura 4 Nov 01, 2022
OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021) This is an PyTorch implementation of OpenMatc

Vision and Learning Group 38 Dec 26, 2022
PyTorch implementation of DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images

DARDet PyTorch implementation of "DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images", [pdf]. Highlights: 1. We develop a new dense

41 Oct 23, 2022