Source code of SIGIR2021 Paper 'One Chatbot Per Person: Creating Personalized Chatbots based on Implicit Profiles'

Overview

DHAP

Source code of SIGIR2021 Long Paper:

One Chatbot Per Person: Creating Personalized Chatbots based on Implicit User Profiles .

Preinstallation

First, install the python packages in your Python3 environment:

  git clone https://github.com/zhengyima/DHAP.git DHAP
  cd DHAP
  pip install -r requirements.txt

Then, you should download the pre-trained word embeddings to initialize the model training. We provide two word embeddings in the Google Drive:

  • sgns.weibo.bigram-char, folloing Li et al., Chinese word embeddings pre-trained on Weibo. Google Drive
  • Fasttext embeddings, English word embedding pre-trained on Reddit set. Google Drive

You can pre-train your own embeddings(with the same format, i.e., the standard txt format), and use it in the model.

After downloading, you should put the embedding file to the path EMB_FILE.

Data

You should provide the dialogue history of users for training the model. For convenience, we provide a very small subset of PChatbot in the data/ as the demo data. In the direcotry, each user's dialogue history is saved in one text file. Each line in the file should contain post text, user id of post, post timestamp, response text, user id of response, response timestamp, _, _ , with tab as the seperator.

You can refer to seq2seq/dataset/perdialogDatasets.py for more details about the data processing.

If you are interested in the dataset PChatbot, please go to its official repository for more details.

Model Training

We provide a shell script scripts/train_chat.sh to start model pre-training. You should modify the DATA_DIR and EMB_FILE to your own paths. Then, you can start training by the following command:

bash scripts/train_chat.sh

The hyper-parameters are defined and set in the configParser.py.

After training, the trained checkpoints are saved in outputs. The inferenced result is saved in RESULT_FILE, which you define in bash scripts/train_chat.sh

Evaluating

For calculating varities of evaluation metrics(e.g. BLEU, P-Cover...), we provide a shell script scripts/eval.sh. You should modify the EMB_FILE to your own path, then evaluate the results by the following command:

bash scripts/eval.sh

Citations

If our code helps you, please cite our work by:

@inproceedings{DBLP:conf/sigir/madousigir21,
     author = {Zhengyi Ma and Zhicheng Dou and Yutao Zhu Hanxun Zhong and Ji-Rong Wen}, 
     title = {One Chatbot Per Person: Creating Personalized Chatbots based onImplicit User Profiles}, 
     booktitle = {Proceedings of the {SIGIR} 2021}, 
     publisher = {{ACM}}, 
     year = {2021}, 
     url = {https://doi.org/10.1145/3404835.3462828}, 
     doi = {10.1145/3404835.3462828}}

Links

Owner
ZYMa
Master candidate. IR and NLP.
ZYMa
Chinese clinical named entity recognition using pre-trained BERT model

Chinese clinical named entity recognition (CNER) using pre-trained BERT model Introduction Code for paper Chinese clinical named entity recognition wi

Xiangyang Li 109 Dec 14, 2022
Code for Fold2Seq paper from ICML 2021

[ICML2021] Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design Environment file: environment.yml Data and Feat

International Business Machines 43 Dec 04, 2022
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 03, 2023
AI Toolkit for Healthcare Imaging

Medical Open Network for AI MONAI is a PyTorch-based, open-source framework for deep learning in healthcare imaging, part of PyTorch Ecosystem. Its am

Project MONAI 3.7k Jan 07, 2023
Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)

This is the official implementation of our paper Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR), which has been accepted by WSDM2022.

Yongchun Zhu 81 Dec 29, 2022
🔥 Cannlytics-powered artificial intelligence 🤖

Cannlytics AI 🔥 Cannlytics-powered artificial intelligence 🤖 🏗️ Installation 🏃‍♀️ Quickstart 🧱 Development 🦾 Automation 💸 Support 🏛️ License ?

Cannlytics 3 Nov 11, 2022
[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

RoSTER The source code used for Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training, p

Yu Meng 60 Dec 30, 2022
Our solution for SSN Invente 2021's Hackathon

Our solution for SSN Invente 2021's Hackathon. To help maitain godowns in a pristine and safe condition using raspberry pi.

1 Jan 12, 2022
Bayesian Image Reconstruction using Deep Generative Models

Bayesian Image Reconstruction using Deep Generative Models R. Marinescu, D. Moyer, P. Golland For technical inquiries, please create a Github issue. F

Razvan Valentin Marinescu 51 Nov 23, 2022
A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

Clark He 49 Sep 20, 2022
[PAMI 2020] Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation

Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation This repository contains the source code for

Yun-Chun Chen 60 Nov 25, 2022
FastFace: Lightweight Face Detection Framework

Light Face Detection using PyTorch Lightning

Ömer BORHAN 75 Dec 05, 2022
Python Environment for Bayesian Learning

Pebl is a python library and command line application for learning the structure of a Bayesian network given prior knowledge and observations. Pebl in

Abhik Shah 103 Jul 14, 2022
RL and distillation in CARLA using a factorized world model

World on Rails Learning to drive from a world on rails Dian Chen, Vladlen Koltun, Philipp Krähenbühl, arXiv techical report (arXiv 2105.00636) This re

Dian Chen 131 Dec 16, 2022
StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

Yunjey Choi 5.1k Jan 04, 2023
This is the repo for Uncertainty Quantification 360 Toolkit.

UQ360 The Uncertainty Quantification 360 (UQ360) toolkit is an open-source Python package that provides a diverse set of algorithms to quantify uncert

International Business Machines 207 Dec 30, 2022
This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

SBEVNet: End-to-End Deep Stereo Layout Estimation This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by D

Divam Gupta 19 Dec 17, 2022
This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21

Deep Virtual Markers This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21 Getting Started Get sa

KimHyomin 45 Oct 07, 2022
Learning Dense Representations of Phrases at Scale (Lee et al., 2020)

DensePhrases DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches th

Princeton Natural Language Processing 540 Dec 30, 2022
Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.

Blended Diffusion for Text-driven Editing of Natural Images Blended Diffusion for Text-driven Editing of Natural Images Omri Avrahami, Dani Lischinski

328 Dec 30, 2022