My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

Overview

Deep Q&A

Join the chat at https://gitter.im/chatbot-pilots/DeepQA

Table of Contents

Presentation

This work tries to reproduce the results of A Neural Conversational Model (aka the Google chatbot). It uses a RNN (seq2seq model) for sentence predictions. It is done using python and TensorFlow.

The loading corpus part of the program is inspired by the Torch neuralconvo from macournoyer.

For now, DeepQA support the following dialog corpus:

To speedup the training, it's also possible to use pre-trained word embeddings (thanks to Eschnou). More info here.

Installation

The program requires the following dependencies (easy to install using pip: pip3 install -r requirements.txt):

  • python 3.5
  • tensorflow (tested with v1.0)
  • numpy
  • CUDA (for using GPU)
  • nltk (natural language toolkit for tokenized the sentences)
  • tqdm (for the nice progression bars)

You might also need to download additional data to make nltk work.

python3 -m nltk.downloader punkt

The Cornell dataset is already included. For the other datasets, look at the readme files into their respective folders (inside data/).

The web interface requires some additional packages:

  • django (tested with 1.10)
  • channels
  • Redis (see here)
  • asgi_redis (at least 1.0)

A Docker installation is also available. More detailed instructions here.

Running

Chatbot

To train the model, simply run main.py. Once trained, you can test the results with main.py --test (results generated in 'save/model/samples_predictions.txt') or main.py --test interactive (more fun).

Here are some flags which could be useful. For more help and options, use python main.py -h:

  • --modelTag <name>: allow to give a name to the current model to differentiate between them when testing/training.
  • --keepAll: use this flag when training if when testing, you want to see the predictions at different steps (it can be interesting to see the program changes its name and age as the training progress). Warning: It can quickly take a lot of storage space if you don't increase the --saveEvery option.
  • --filterVocab 20 or --vocabularySize 30000: Limit the vocabulary size to and optimize the performances and memory usage. Replace the words used less than 20 times by the <unknown> token and set a maximum vocabulary size.
  • --verbose: when testing, will print the sentences as they are computed.
  • --playDataset: show some dialogue samples from the dataset (can be use conjointly with --createDataset if this is the only action you want to perform).

To visualize the computational graph and the cost with TensorBoard, just run tensorboard --logdir save/.

By default, the network architecture is a standard encoder/decoder with two LSTM layers (hidden size of 256) and an embedding size for the vocabulary of 32. The network is trained using ADAM. The maximum sentence length is set to 10 words, but can be increased.

Web interface

Once trained, it's possible to chat with it using a more user friendly interface. The server will look at the model copied to save/model-server/model.ckpt. The first time you want to use it, you'll need to configure it with:

export CHATBOT_SECRET_KEY="my-secret-key"
cd chatbot_website/
python manage.py makemigrations
python manage.py migrate

Then, to launch the server locally, use the following commands:

cd chatbot_website/
redis-server &  # Launch Redis in background
python manage.py runserver

After launch, the interface should be available on http://localhost:8000/. If you want to deploy the program on a server, use python manage.py runserver 0.0.0.0 instead. More info here.

Chatbot web interface

Results

Surprisingly, it's possible to get some results after only 1 or 2 hours of training (on a GeForce GT 740M), by drastically reducing the input sequence to 5 words and output to 3 (plus the go and eos tokens) and by using a small embedding size (something like 25). Since then I modified the code and now the output length has to match the input one but you can still reproduce the original results, using this version. Of course, the network won't be really chatty:

Q: Hi
A: Hi.

Q: What is your name ?
A: Laura.

Q: What does that mean ?
A: I dunno.

Q: How old are you ?
A: thirty-five.

Q: Will Google hire me ?
A: No.

Q: Tell me the alphabet
A: Fuck you.

Q: That's not nice
A: Yeah.

Here is some case where it fails:

Q: Who is Laura ?
A: My brother.

Q: Say 'goodbye'
A: Alright.

Q: What is cooking ?
A: A channel.

Q: Can you say no ?
A: No.

Q: Two plus two
A: Manny...

With longer sentences, the network is much slower to train. After 250000 iterations with maxLength=10, the network still mainly gives evasive answers (by answering most of the open ended questions by 'I don't know') but it seems to have integrated some basic semantic:

Q: When are the fireworks ?
A: Tomorrow.

Q: What color ?
A: Gray.

Q: Which apple ?
A: That's the first one.

Q: What time is it ?
A: Eleven o'clock.

Q: Who is Skywalker ?
A: She's a freelance bio-exorcist.

Q: Who is John ?
A: The old man.

Q: Who is Jenny ?
A: The man who killed herself.

Q: When were you born ?
A: Soon.

As shown on the last questions, the network still does some confusion between the genders and the present/future. I also tried some deeper philosophical questions with more or less success.

Q: What is the purpose of being intelligent ?
A: The man are obsolete.

Q: What is immoral ?
A: The truth.

Q: What is morality ?
A: The worst.

Q: What is the purpose of existence ?
A: It 's private.

Q: What happens if machines can think ?
A: I think I'll just lie with you.

The model I trained is quite limited by the embedding size I put and by its size, and by the training corpus size. Its 'thought vector' is also probably too small to answer the kind of following:

Q: My name is David. What is my name ?
A: Debbie.

Q: My name is John. What is my name ?
A: Carl Smith.

Q: How many legs does a cat have ?
A: Four.

Q: How many legs does a spider have ?
A: Four.

Q: Do you prefer football or soccer ?
A: Whiskey.

It also seems to overfit as sometimes it will just pop out sentences from its training set unrelated to the question. Adding some dropout should reduce the issue.

Q: I play tennis. What do I play ?
A: I'm going to get laid.

Q: What color is the sky ?
A: One of the deputies is everywhere.

Pretrained model

You can find a pre-trained model here, trained of the default corpus. To use it:

  1. Extract the zip file inside DeepQA/save/
  2. Copy the preprocessed dataset from save/model-pretrainedv2/dataset-cornell-old-lenght10-filter0-vocabSize0.pkl to data/samples/.
  3. Run ./main.py --modelTag pretrainedv2 --test interactive.

Thanks to Nicholas C., here (original) are some additional pre-trained models (compatible with TF 1.2) for diverse datasets. The folder also contains the pre-processed dataset for Cornell, OpenSubtitles, Ubuntu and Scotus (to move inside data/samples/). Those are required is you don't want to process the datasets yourself.

If you have a high-end GPU, don't hesitate to play with the hyper-parameters/corpus to train a better model. From my experiments, it seems that the learning rate and dropout rate have the most impact on the results. Also if you want to share your models, don't hesitate to contact me and I'll add it here.

Improvements

In addition to trying larger/deeper model, there are a lot of small improvements which could be tested. Don't hesitate to send a pull request if you implement one of those. Here are some ideas:

  • For now, the predictions are deterministic (the network just take the most likely output) so when answering a question, the network will always gives the same answer. By adding a sampling mechanism, the network could give more diverse (and maybe more interesting) answers. The easiest way to do that is to sample the next predicted word from the SoftMax probability distribution. By combining that with the loop_function argument of tf.nn.seq2seq.rnn_decoder, it shouldn't be too difficult to add. After that, it should be possible to play with the SoftMax temperature to get more conservative or exotic predictions.
  • Adding attention could potentially improve the predictions, especially for longer sentences. It should be straightforward by replacing embedding_rnn_seq2seq by embedding_attention_seq2seq on model.py.
  • Having more data usually don't hurt. Training on a bigger corpus should be beneficial. Reddit comments dataset seems the biggest for now (and is too big for this program to support it). Another trick to artificially increase the dataset size when creating the corpus could be to split the sentences of each training sample (ex: from the sample Q:Sentence 1. Sentence 2. => A:Sentence X. Sentence Y. we could generate 3 new samples: Q:Sentence 1. Sentence 2. => A:Sentence X., Q:Sentence 2. => A:Sentence X. Sentence Y. and Q:Sentence 2. => A:Sentence X.. Warning: other combinations like Q:Sentence 1. => A:Sentence X. won't work because it would break the transition 2 => X which links the question to the answer)
  • The testing curve should really be monitored as done in my other music generation project. This would greatly help to see the impact of dropout on overfitting. For now it's just done empirically by manually checking the testing prediction at different training steps.
  • For now, the questions are independent from each other. To link questions together, a straightforward way would be to feed all previous questions and answer to the encoder before giving the answer. Some caching could be done on the final encoder stated to avoid recomputing it each time. To improve the accuracy, the network should be retrain on entire dialogues instead of just individual QA. Also when feeding the previous dialogue to the encoder, new tokens <Q> and <A> could be added so the encoder knows when the interlocutor is changing. I'm not sure though that the simple seq2seq model would be sufficient to capture long term dependencies between sentences. Adding a bucket system to group similar input lengths together could greatly improve training speed.
Owner
Conchylicultor
Research Engineer
Conchylicultor
[CVPR 2022] Structured Sparse R-CNN for Direct Scene Graph Generation

Structured Sparse R-CNN for Direct Scene Graph Generation Our paper Structured Sparse R-CNN for Direct Scene Graph Generation has been accepted by CVP

Multimedia Computing Group, Nanjing University 44 Dec 23, 2022
Detection of PCBA defect

Detection_of_PCBA_defect Detection_of_PCBA_defect Use yolov5 to train. $pip install -r requirements.txt Detect.py will detect file(jpg,mp4...) in cu

6 Nov 28, 2022
Ray tracing of a Schwarzschild black hole written entirely in TensorFlow.

TensorGeodesic Ray tracing of a Schwarzschild black hole written entirely in TensorFlow. Dependencies: Python 3 TensorFlow 2.x numpy matplotlib About

5 Jan 15, 2022
Real-Time Multi-Contact Model Predictive Control via ADMM

Here, you can find the code for the paper 'Real-Time Multi-Contact Model Predictive Control via ADMM'. Code is currently being cleared up and optimize

17 Dec 28, 2022
Implementation of PersonaGPT Dialog Model

PersonaGPT An open-domain conversational agent with many personalities PersonaGPT is an open-domain conversational agent cpable of decoding personaliz

ILLIDAN Lab 42 Jan 01, 2023
Pytorch Implementation of "Desigining Network Design Spaces", Radosavovic et al. CVPR 2020.

RegNet Pytorch Implementation of "Desigining Network Design Spaces", Radosavovic et al. CVPR 2020. Paper | Official Implementation RegNet offer a very

Vishal R 2 Feb 11, 2022
2.86% and 15.85% on CIFAR-10 and CIFAR-100

Shake-Shake regularization This repository contains the code for the paper Shake-Shake regularization. This arxiv paper is an extension of Shake-Shake

Xavier Gastaldi 294 Nov 22, 2022
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

213 Nov 12, 2022
VQGAN+CLIP Colab Notebook with user-friendly interface.

VQGAN+CLIP and other image generation system VQGAN+CLIP Colab Notebook with user-friendly interface. Latest Notebook: Mse regulized zquantize Notebook

Justin John 227 Jan 05, 2023
RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

旷视天元 MegEngine 28 Dec 09, 2022
GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

owl 37 Dec 24, 2022
Conditional Generative Adversarial Networks (CGAN) for Mobility Data Fusion

This code implements the paper, Kim et al. (2021). Imputing Qualitative Attributes for Trip Chains Extracted from Smart Card Data Using a Conditional Generative Adversarial Network. Transportation Re

Eui-Jin Kim 2 Feb 03, 2022
Software Platform for solving and manipulating multiparametric programs in Python

PPOPT Python Parametric OPtimization Toolbox (PPOPT) is a software platform for solving and manipulating multiparametric programs in Python. This pack

10 Sep 13, 2022
🗣️ Microsoft Edge TTS for Home Assistant, no need for app_key

Microsoft Edge TTS for Home Assistant This component is based on the TTS service of Microsoft Edge browser, no need to apply for app_key. Install Down

152 Dec 31, 2022
Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching This repository is an official implementation of

HKUST-KnowComp 13 Sep 08, 2022
Analyses of the individual electric field magnitudes with Roast.

Aloi Davide - PhD Student (UoB) Analysis of electric field magnitudes (wp2a dataset only at the moment) and correlation analysis with Dynamic Causal M

Davide Aloi 7 Dec 15, 2022
PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

Saim Wani 4 May 08, 2022
The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition Boyan Zhou, Quan Cui, Xiu-Shen Wei*, Zhao-Min Chen This repo

Megvii-Nanjing 616 Dec 21, 2022
Python implementation of NARS (Non-Axiomatic-Reasoning-System)

Python implementation of NARS (Non-Axiomatic-Reasoning-System)

Bowen XU 11 Dec 20, 2022
PyTorch Implementation of DSB for Score Based Generative Modeling. Experiments managed using Hydra.

Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling This repository contains the implementation for the paper Diffusion

James Thornton 50 Jan 03, 2023