The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

Overview

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data

This repository provides the implementation details for the ACL 2021 main conference paper:

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data. [paper]

1. Data Preparation

In this work, we carried out persona-based dialogue generation experiments under a persona-dense scenario (English PersonaChat) and a persona-sparse scenario (Chinese PersonalDialog), with the assistance of a series of auxiliary inference datasets. Here we summarize the key information of these datasets and provide the links to download these datasets if they are directly accessible.

2. How to Run

The setup.sh script contains the necessary dependencies to run this project. Simply run ./setup.sh would install these dependencies. Here we take the English PersonaChat dataset as an example to illustrate how to run the dialogue generation experiments. Generally, there are three steps, i.e., tokenization, training and inference:

  • Preprocessing

     python preprocess.py --dataset_type convai2 \
     --trainset ./data/ConvAI2/train_self_original_no_cands.txt \
     --testset ./data/ConvAI2/valid_self_original_no_cands.txt \
     --nliset ./data/ConvAI2/ \
     --encoder_model_name_or_path ./pretrained_models/bert/bert-base-uncased/ \
     --max_source_length 64 \
     --max_target_length 32
    

    We have provided some data examples (dozens of lines) at the ./data directory to show the data format. preprocess.py reads different datasets and tokenizes the raw data into a series of vocab IDs to facilitate model training. The --dataset_type could be either convai2 (for English PersonaChat) or ecdt2019 (for Chinese PersonalDialog). Finally, the tokenized data will be saved as a series of JSON files.

  • Model Training

     CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --do_train \
     --encoder_model ./pretrained_models/bert/bert-base-uncased/ \
     --decoder_model ./pretrained_models/bert/bert-base-uncased/ \
     --decoder2_model ./pretrained_models/bert/bert-base-uncased/ \
     --save_model_path checkpoints/ConvAI2/bertoverbert --dataset_type convai2 \
     --dumped_token ./data/ConvAI2/convai2_tokenized/ \
     --learning_rate 7e-6 \
     --batch_size 32
    

    Here we initialize encoder and both decoders from the same downloaded BERT checkpoint. And more parameter settings could be found at bertoverbert.py.

  • Evaluations

     CUDA_VISIBLE_DEVICES=0 python bertoverbert.py --dumped_token ./data/ConvAI2/convai2_tokenized/ \
     --dataset_type convai2 \
     --encoder_model ./pretrained_models/bert/bert-base-uncased/  \
     --do_evaluation --do_predict \
     --eval_epoch 7
    

    Empirically, in the PersonaChat experiment with default hyperparameter settings, the best-performing checkpoint should be found between epoch 5 and epoch 9. If the training procedure goes fine, there should be some results like:

     Perplexity on test set is 21.037 and 7.813.
    

    where 21.037 is the ppl from the first decoder and 7.813 is the final ppl from the second decoder. And the generated results is redirected to test_result.tsv, here is a generated example from the above checkpoint:

     persona:i'm terrified of scorpions. i am employed by the us postal service. i've a german shepherd named barnaby. my father drove a car for nascar.
     query:sorry to hear that. my dad is an army soldier.
     gold:i thank him for his service.
     response_from_d1:that's cool. i'm a train driver.
     response_from_d2:that's cool. i'm a bit of a canadian who works for america.  
    

    where d1 and d2 are the two BERT decoders, respectively.

  • Computing Infrastructure:

    • The released codes were tested on NVIDIA Tesla V100 32G and NVIDIA PCIe A100 40G GPUs. Notice that with a batch_size=32, the BoB model will need at least 20Gb GPU resources for training.

MISC

  • Build upon 🤗 Transformers.

  • Bibtex:

      @inproceedings{song-etal-2021-bob,
          title = "BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data",
          author = "Haoyu Song, Yan Wang, Kaiyan Zhang, Wei-Nan Zhang, Ting Liu",
          booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL-2021)",
          month = "Aug",
          year = "2021",
          address = "Online",
          publisher = "Association for Computational Linguistics",
      }
      
  • Email: [email protected].

This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

VAC_CSLR This repo holds codes of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper] Prerequisites Th

Yuecong Min 64 Dec 19, 2022
MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ra

EnliteAI GmbH 222 Dec 24, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

Hugging Face 77.2k Jan 02, 2023
Asynchronous Advantage Actor-Critic in PyTorch

Asynchronous Advantage Actor-Critic in PyTorch This is PyTorch implementation of A3C as described in Asynchronous Methods for Deep Reinforcement Learn

Reiji Hatsugai 38 Dec 12, 2022
Implementation detail for paper "Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet"

Multi-level-colonoscopy-malignant-tissue-detection-with-adversarial-CAC-UNet Implementation detail for our paper "Multi-level colonoscopy malignant ti

CVSM Group - email: <a href=[email protected]"> 84 Nov 22, 2022
PyTorch implementation of Glow

glow-pytorch PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions (https://arxiv.org/abs/1807.03039) Usage: python train.p

Kim Seonghyeon 433 Dec 27, 2022
Learning to Segment Instances in Videos with Spatial Propagation Network

Learning to Segment Instances in Videos with Spatial Propagation Network This paper is available at the 2017 DAVIS Challenge website. Check our result

Jingchun Cheng 145 Sep 28, 2022
SPT_LSA_ViT - Implementation for Visual Transformer for Small-size Datasets

Vision Transformer for Small-Size Datasets Seung Hoon Lee and Seunghyun Lee and Byung Cheol Song | Paper Inha University Abstract Recently, the Vision

Lee SeungHoon 87 Jan 01, 2023
Framework for evaluating ANNS algorithms on billion scale datasets.

Billion-Scale ANN http://big-ann-benchmarks.com/ Install The only prerequisite is Python (tested with 3.6) and Docker. Works with newer versions of Py

Harsha Vardhan Simhadri 132 Dec 24, 2022
Accelerating BERT Inference for Sequence Labeling via Early-Exit

Sequence-Labeling-Early-Exit Code for ACL 2021 paper: Accelerating BERT Inference for Sequence Labeling via Early-Exit Requirement: Please refer to re

李孝男 23 Oct 14, 2022
Easily benchmark PyTorch model FLOPs, latency, throughput, max allocated memory and energy consumption

⏱ pytorch-benchmark Easily benchmark model inference FLOPs, latency, throughput, max allocated memory and energy consumption Install pip install pytor

Lukas Hedegaard 21 Dec 22, 2022
Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation

SUCP Leveraging Social Influence based on Users Activity Centers for Point-of-Interest Recommendation () Direct Friends (i.e., users who follow each o

Kosar 8 Nov 26, 2022
Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Realtime Face Anti-Spoofing Detection 🤖 Realtime Face Anti Spoofing Detection with Face Detector to detect real and fake faces Please star this repo

Prem Kumar 86 Aug 03, 2022
v objective diffusion inference code for PyTorch.

v-diffusion-pytorch v objective diffusion inference code for PyTorch, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman). The

Katherine Crowson 635 Dec 30, 2022
PyTorch implementation of "Optimization Planning for 3D ConvNets"

Optimization-Planning-for-3D-ConvNets Code for the ICML 2021 paper: Optimization Planning for 3D ConvNets. Authors: Zhaofan Qiu, Ting Yao, Chong-Wah N

Zhaofan Qiu 2 Jan 12, 2022
A repository for benchmarking neural vocoders by their quality and speed.

License The majority of VocBench is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Wavenet, Para

Meta Research 177 Dec 12, 2022
This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

BiCAT This is our TensorFlow implementation for the paper: "BiCAT: Sequential Recommendation with Bidirectional Chronological Augmentation of Transfor

John 15 Dec 06, 2022
Anime Face Detector using mmdet and mmpose

Anime Face Detector This is an anime face detector using mmdetection and mmpose. (To avoid copyright issues, I use generated images by the TADNE model

198 Jan 07, 2023
Object Detection Projekt in GKI WS2021/22

tfObjectDetection Object Detection Projekt with tensorflow in GKI WS2021/22 Docker Container: docker run -it --name --gpus all -v path/to/project:p

Tim Eggers 1 Jul 18, 2022
This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

Adversarial poison generation and evaluation. This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong

31 Nov 01, 2022