The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

Overview

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data

This repository provides the implementation details for the ACL 2021 main conference paper:

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data. [paper]

1. Data Preparation

In this work, we carried out persona-based dialogue generation experiments under a persona-dense scenario (English PersonaChat) and a persona-sparse scenario (Chinese PersonalDialog), with the assistance of a series of auxiliary inference datasets. Here we summarize the key information of these datasets and provide the links to download these datasets if they are directly accessible.

2. How to Run

The setup.sh script contains the necessary dependencies to run this project. Simply run ./setup.sh would install these dependencies. Here we take the English PersonaChat dataset as an example to illustrate how to run the dialogue generation experiments. Generally, there are three steps, i.e., tokenization, training and inference:

  • Preprocessing

     python preprocess.py --dataset_type convai2 \
     --trainset ./data/ConvAI2/train_self_original_no_cands.txt \
     --testset ./data/ConvAI2/valid_self_original_no_cands.txt \
     --nliset ./data/ConvAI2/ \
     --encoder_model_name_or_path ./pretrained_models/bert/bert-base-uncased/ \
     --max_source_length 64 \
     --max_target_length 32
    

    We have provided some data examples (dozens of lines) at the ./data directory to show the data format. preprocess.py reads different datasets and tokenizes the raw data into a series of vocab IDs to facilitate model training. The --dataset_type could be either convai2 (for English PersonaChat) or ecdt2019 (for Chinese PersonalDialog). Finally, the tokenized data will be saved as a series of JSON files.

  • Model Training

     CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --do_train \
     --encoder_model ./pretrained_models/bert/bert-base-uncased/ \
     --decoder_model ./pretrained_models/bert/bert-base-uncased/ \
     --decoder2_model ./pretrained_models/bert/bert-base-uncased/ \
     --save_model_path checkpoints/ConvAI2/bertoverbert --dataset_type convai2 \
     --dumped_token ./data/ConvAI2/convai2_tokenized/ \
     --learning_rate 7e-6 \
     --batch_size 32
    

    Here we initialize encoder and both decoders from the same downloaded BERT checkpoint. And more parameter settings could be found at bertoverbert.py.

  • Evaluations

     CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --dumped_token ./data/ConvAI2/convai2_tokenized/ \
     --dataset_type convai2 \
     --encoder_model ./pretrained_models/bert/bert-base-uncased/  \
     --do_evaluation --do_predict \
     --eval_epoch 7
    

    Empirically, in the PersonaChat experiment with default hyperparameter settings, the best-performing checkpoint should be found between epoch 5 and epoch 9. If the training procedure goes fine, there should be some results like:

     Perplexity on test set is 21.037 and 7.813.
    

    where 21.037 is the ppl from the first decoder and 7.813 is the final ppl from the second decoder. And the generated results is redirected to test_result.tsv, here is a generated example from the above checkpoint:

     persona:i'm terrified of scorpions. i am employed by the us postal service. i've a german shepherd named barnaby. my father drove a car for nascar.
     query:sorry to hear that. my dad is an army soldier.
     gold:i thank him for his service.
     response_from_d1:that's cool. i'm a train driver.
     response_from_d2:that's cool. i'm a bit of a canadian who works for america.  
    

    where d1 and d2 are the two BERT decoders, respectively.

  • Computing Infrastructure:

    • The released codes were tested on NVIDIA Tesla V100 32G and NVIDIA PCIe A100 40G GPUs. Notice that with a batch_size=32, the BoB model will need at least 20Gb GPU resources for training.

MISC

  • Build upon 🤗 Transformers.

  • Bibtex:

      @inproceedings{song-etal-2021-bob,
          title = "BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data",
          author = "Haoyu Song, Yan Wang, Kaiyan Zhang, Wei-Nan Zhang, Ting Liu",
          booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL-2021)",
          month = "Aug",
          year = "2021",
          address = "Online",
          publisher = "Association for Computational Linguistics",
      }
      
  • Email: [email protected].

Convolutional Neural Network to detect deforestation in the Amazon Rainforest

Convolutional Neural Network to detect deforestation in the Amazon Rainforest This project is part of my final work as an Aerospace Engineering studen

5 Feb 17, 2022
MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet.

Lightweight-Detection-and-KD MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet. This repo also includes detection knowledge di

Egqawkq 12 Jan 05, 2023
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information This repository contains code, model, dataset for ChineseBERT at ACL2021. Ch

413 Dec 01, 2022
Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).

A Self-Supervised Descriptor for Image Copy Detection (SSCD) This is the open-source codebase for "A Self-Supervised Descriptor for Image Copy Detecti

Meta Research 68 Jan 04, 2023
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 0 Dec 15, 2022
This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

Hansheng Jiang 6 Nov 18, 2022
Temporal-Relational CrossTransformers

Temporal-Relational Cross-Transformers (TRX) This repo contains code for the method introduced in the paper: Temporal-Relational CrossTransformers for

83 Dec 12, 2022
A Re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"

What is This This is a simple re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"(1). Only Sections

102 Dec 14, 2022
Using deep learning model to detect breast cancer.

Breast-Cancer-Detection Breast cancer is the most frequent cancer among women, with around one in every 19 women at risk. The number of cases of breas

1 Feb 13, 2022
PyJokes - Joking around with Python library pyjokes

Hi, it's Muhaimin again 👋 This is something unorthodox but cool. Don't forget t

Muhaimin A. Salay Kanton 1 Feb 02, 2022
An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Federated Averaging (FedAvg) in PyTorch An unofficial implementation of FederatedAveraging (or FedAvg) algorithm proposed in the paper Communication-E

Seok-Ju Hahn 123 Jan 06, 2023
Official code repository for ICCV 2021 paper: Gravity-Aware Monocular 3D Human Object Reconstruction

GraviCap Official code repository for ICCV 2021 paper: Gravity-Aware Monocular 3D Human Object Reconstruction. Gravity-Aware Monocular 3D Human-Object

Rishabh Dabral 15 Dec 09, 2022
Code base for "On-the-Fly Test-time Adaptation for Medical Image Segmentation"

On-the-Fly Adaptation Official Pytorch Code base for On-the-Fly Test-time Adaptation for Medical Image Segmentation Paper Introduction One major probl

Jeya Maria Jose 17 Nov 10, 2022
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

VidLanKD Implementation of VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer by Zineng Tang, Jaemin Cho, Hao Tan, Mohi

Zineng Tang 54 Dec 20, 2022
Unsupervised Pre-training for Person Re-identification (LUPerson)

LUPerson Unsupervised Pre-training for Person Re-identification (LUPerson). The repository is for our CVPR2021 paper Unsupervised Pre-training for Per

143 Dec 24, 2022
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

Isen (Songyao Jiang) 128 Dec 08, 2022
A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

OutliersSlidingWindows A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows" Dataset generatio

PaoloPellizzoni 0 Jan 05, 2022
Goal of the project : Detecting Temporal Boundaries in Sign Language videos

MVA RecVis course final project : Goal of the project : Detecting Temporal Boundaries in Sign Language videos. Sign language automatic indexing is an

Loubna Ben Allal 6 Dec 21, 2022
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 03, 2023
Code for "Universal inference meets random projections: a scalable test for log-concavity"

How to use this repository This repository contains code to replicate the results of "Universal inference meets random projections: a scalable test fo

Robin Dunn 0 Nov 21, 2021