The official github repository for Towards Continual Knowledge Learning of Language Models

Last update: Jan 07, 2023

Overview

Towards Continual Knowledge Learning of Language Models

This is the official github repository for Towards Continual Knowledge Learning of Language Models.

In order to reproduce our results, take the following steps:

1. Create conda environment and install requirements

conda create -n ckl python=3.8 && conda activate ckl
pip install -r requirements.txt

Also, make sure to install the correct version of pytorch corresponding to the CUDA version and environment: Refer to https://pytorch.org/

#For CUDA 10.x
pip3 install torch torchvision torchaudio
#For CUDA 11.x
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

2. Download the data used for the experiments.

To download only the CKL benchmark dataset:

python download_ckl_data.py

To download ALL of the data used for the experiments (required to reproduce results):

python download_all_data.py

To download the (continually pretrained) model checkpoints of the main experiment (required to reproduce results):

python download_model_checkpoints.py

For the other experimental settings such as multiple CKL phases, GPT-2, we do not separately provide the continually pretrained model checkpoints.

3. Reproducing Experimental Results

We provide all the configs in order to reproduce the zero-shot results of our paper. We only provide the model checkpoints for the main experimental setting (full_setting) which can be downloaded with the command above.

configs
├── full_setting
│   ├── evaluation
│   |   ├── invariantLAMA
│   |   |   ├── t5_baseline.json
│   |   |   ├── t5_kadapters.json
│   |   |   ├── ...
│   |   ├── newLAMA
│   |   ├── newLAMA_easy
│   |   ├── updatedLAMA
│   ├── training
│   |   ├── t5_baseline.json
│   |   ├── t5_kadapters.json
│   |   ├── ...
├── GPT2
│   ├── ...
├── kilt
│   ├── ...
├── small_setting
│   ├── ...
├── split
│   ├── ...

Components in each configurations file

input_length (int) : the input sequence length
output_length (int) : the output sequence length
num_train_epochs (int) : number of training epochs
output_dir (string) : the directory to save the model checkpoints
dataset (string) : the dataset to perform zero-shot evaluation or continual pretraining
dataset_version (string) : the version of the dataset ['full', 'small', 'debug']
train_batch_size (int) : batch size used for training
learning rate (float) : learning rate used for training
model (string) : model name in huggingface models (https://huggingface.co/models)
method (string) : method being used ['baseline', 'kadapter', 'lora', 'mixreview', 'modular_small', 'recadam']
freeze_level (int) : how much of the model to freeze during traininig (0 for none, 1 for freezing only encoder, 2 for freezing all of the parameters)
gradient_accumulation_steps (int) : gradient accumulation used to match the global training batch of each method
ngpu (int) : number of gpus used for the run
num_workers (int) : number of workers for the Dataloader
resume_from_checkpoint (string) : null by default. directory to model checkpoint if resuming from checkpoint
accelerator (string) : 'ddp' by default. the pytorch lightning accelerator to be used.
use_deepspeed (bool) : false by default. Currently not extensively tested.
CUDA_VISIBLE_DEVICES (string) : gpu devices that are made available for this run (e.g. "0,1,2,3", "0")
wandb_log (bool) : whether to log experiment through wandb
wandb_project (string) : project name of wandb
wandb_run_name (string) : the name of this training run
mode (string) : 'pretrain' for all configs
use_lr_scheduling (bool) : true if using learning rate scheduling
check_validation (bool) : true for evaluation (no training)
checkpoint_path (string) : path to the model checkpoint that is used for evaluation
output_log (string) : directory to log evaluation results to
split_num (int) : default is 1. more than 1 if there are multile CKL phases
split (int) : which CKL phase it is

This is an example of getting the invariantLAMA zero-shot evaluation of continually pretrained t5_kadapters

python run.py --config configs/full_setting/evaluation/invariantLAMA/t5_kadapters.json

This is an example of performing continual pretraining on CC-RecentNews (main experiment) with t5_kadapters

python run.py --config configs/full_setting/training/t5_kadapters.json

Reference

@article{jang2021towards,
  title={Towards Continual Knowledge Learning of Language Models},
  author={Jang, Joel and Ye, Seonghyeon and Yang, Sohee and Shin, Joongbo and Han, Janghoon and Kim, Gyeonghun and Choi, Stanley Jungkyu and Seo, Minjoon},
  journal={arXiv preprint arXiv:2110.03215},
  year={2021}
}

The official github repository for Towards Continual Knowledge Learning of Language Models

Related tags

Overview

Towards Continual Knowledge Learning of Language Models

1. Create conda environment and install requirements

2. Download the data used for the experiments.

3. Reproducing Experimental Results

Components in each configurations file

Reference

Owner

Joel Jang | 장요엘

Few-shot Learning of GPT-3

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System

3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

Instant neural graphics primitives: lightning fast NeRF and more

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Code for "Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification", ECCV 2020 Spotlight

Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.

Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

This repository consists of Blender python scripts and corresponding assets to generate variants of the CANDLE dataset

Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

TAug :: Time Series Data Augmentation using Deep Generative Models

A python interface for training Reinforcement Learning bots to battle on pokemon showdown

Unofficial PyTorch Implementation for HifiFace (https://arxiv.org/abs/2106.09965)

Resources for the Ki testnet challenge

Score refinement for confidence-based 3D multi-object tracking

Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

Mixed Neural Likelihood Estimation for models of decision-making

A flexible ML framework built to simplify medical image reconstruction and analysis experimentation.