The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Our released data can be found at this link. Make sure the following steps are adopted to use our codes.

How to Use

  1. Init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py
    
    # prepare the environment
    # if some package cannot be installed, just google and install it from other ways
    pip install -r requirements.txt
  2. train the model

    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  3. test the model [rerank]

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>
  4. test the model [recal]

    # different recall_modes are available: q-q, q-r
    ./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>
  5. inference the responses and save into the faiss index

    Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

    It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

    # work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
    # work_mode=context, inference the context to do q-q matching
    # work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
    ./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

    If you want to generate the gray dataset for the dataset:

    # 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
    ./scripts/inference.sh <dataset_name> response <cuda_ids>
    
    # 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
    ./scripts/inference.sh <dataset_name> gray <cuda_ids>
    
    # 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
    ./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

    If you want to generate the pesudo positive pairs, run the following commands:

    # make sure the dual-bert inference dataset name is BERTDualInferenceDataset
    ./scripts/inference.sh <dataset_name> unparallel <cuda_ids>
  6. deploy the rerank and recall model

    # load the model on the cuda:0(can be changed in deploy.sh script)
    ./scripts/deploy.sh <cuda_id>

    at the same time, you can test the deployed model by using:

    # test_mode: recall, rerank, pipeline
    ./scripts/test_api.sh <test_mode> <dataset>
  7. test the recall performance of the elasticsearch

    Before testing the es recall, make sure the es index has been built:

    # recall_mode: q-q/q-r
    ./scripts/build_es_index.sh <dataset_name> <recall_mode>
    # recall_mode: q-q/q-r
    ./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0
  8. simcse generate the gray responses

    # train the simcse model
    ./script/train.sh <dataset_name> simcse <cuda_ids>
    # generate the faiss index, dataset name: BERTSimCSEInferenceDataset
    ./script/inference_response.sh <dataset_name> simcse <cuda_ids>
    # generate the context index
    ./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>
    # generate the gray response
    ./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>
Owner
GMFTBY
Those who are crazy enough to think they can change the world are the ones who can.
GMFTBY
Data and a Twitter bot for the EPA's DOCUMERICA (1972-1977) program.

documerica This repository holds JSON(L) artifacts and a few scripts related to managing archival data from the EPA's DOCUMERICA program. Contents: Ma

William Woodruff 2 Oct 27, 2021
๐€ ๐ฆ๐จ๐๐ฎ๐ฅ๐š๐ซ ๐“๐ž๐ฅ๐ž๐ ๐ซ๐š๐ฆ ๐†๐ซ๐จ๐ฎ๐ฉ ๐ฆ๐š๐ง๐š๐ ๐ž๐ฆ๐ž๐ง๐ญ ๐›๐จ๐ญ ๐ฐ๐ข๐ญ๐ก ๐ฎ๐ฅ๐ญ๐ข๐ฆ๐š๐ญ๐ž ๐Ÿ๐ž๐š๐ญ๐ฎ๐ซ๐ž๐ฌ !!

๐‡๐จ๐ฐ ๐“๐จ ๐ƒ๐ž๐ฉ๐ฅ๐จ๐ฒ For easiest way to deploy this Bot click on the below button ๐Œ๐š๐๐ž ๐๐ฒ ๐’๐ฎ๐ฉ๐ฉ๐จ๐ซ๐ญ ๐†๐ซ๐จ๐ฎ๐ฉ ๐’๐จ๐ฎ๐ซ๐œ๐ž๐ฌ ๐†๐ž๐ง๐ž?

Mukesh Solanki 4 Oct 18, 2021
Salmanul Farisx Bot With Python

Salman_Farisx_Bot How To Deploy Video Subscribe YouTube Channel Added Features Imdb posters for autofilter. Imdb rating for autofilter. Custom caption

1 Dec 23, 2021
Petit webhook manager by moi (wassim)

Webhook Manager By wassim oubliez pas de โญ le projet Installations il te faut python sinon quand tu va lancer le start.bat sa va tout installer tout s

wassim 9 Jul 08, 2021
A.I and game for gomoku, working only on windows

Gomoku (A.I of gomoku) The goal of the project is to create an artificial intelligence of gomoku. Goals Beat the opponent. Requirements Python 3.7+ Wo

Luis Rosario 13 Jun 20, 2021
Scuttlecrab.py - Python Version of Scuttle Crab Bot

____ _ _ _ ____ _ / ___| ___ _ _| |_|

Fabrizo 4 Jul 08, 2022
Autofilter with imdb bot || broakcast , imdb poster and imdb rating

LuciferMoringstar_Robot How To Deploy Video Subscribe YouTube Channel Added Features Imdb posters for autofilter. Imdb rating for autofilter. Custom c

Muhammed 127 Dec 29, 2022
Catinthebox - Awesome bot for Mastodon

Cat In The Box :3 Description Awesome bot for Mastodon Requirements python pip g

satanist 0 Jan 19, 2022
Aula-API - a school system widely used in Denmark, as you can see and read about in the python file

Information : Hello, thank you for reading this first of all. This is a Aula-API

Binary.club 2 May 28, 2022
Python functions to run WASS stereo wave processing executables, and load and post process WASS output files.

wass-pyfuns Python functions to run the WASS stereo wave processing executables, and load and post process the WASS output files. General WASS (Waves

Mika Malila 3 May 13, 2022
A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

Trevor Miller 2 Sep 14, 2021
Bot Realm of the Mad God Exalt (ROTMG). (Auto_nexus, Auto_HP, Auto_Loot)

Bot_ROTMG Bot Realm of the Mad God Exalt (ROTMG). (Auto_nexus, Auto_HP, Auto_Loot) *Este projeto visa apenas o aprendizado, quem faz mal uso รฉ o รบnico

Guilherme Silva Uchoa 1 Oct 30, 2021
Una herramienta para transmitir mensajes automรกticamente a mรบltiples grupos de chat

chat-broadcast Una herramienta para transmitir mensajes automรกticamente a mรบltiples grupos de chat Setup Librerรญas Necesitas Python 3 con la librerรญa

Seguimos 2 Jan 09, 2022
A file-based quote bot written in Python

Let's Write a Python Quote Bot! This repository will get you started with building a quote bot in Python. It's meant to be used along with the Learnin

1 Jan 19, 2022
A multipurpose Telegram Bot writen in Python for mirroring files

Deepak Clouds Mirror Deepak Clouds Torrent is a multipurpose Telegram Bot writen in Python for mirroring files on the Internet to our beloved Google D

MR.SHAGGY 0 Dec 19, 2021
Kali Kush - Account Nuker Tool

Kali Kush - Account Nuker Tool This is a discord tool made by me, and SSL :) antho#1731 How to use? pip3 install -r requirements.txt -py kalikush.py -

ryan 3 Dec 21, 2021
Simple Discord bot which logs several events in your server

logging-bot Simple Discord bot which logs several events in your server, including: Message Edits Message Deletes Role Adds Role Removes Member joins

1 Feb 14, 2022
A simple Spamming software made in python

Spam-qlk Warning!!! 'I' am not responsible for the 'damage or harm' caused by this 'Software'!!! Use at your own risk!!! Input the message. After you

Aditya kumar 1 Nov 30, 2021
Automatically check for free Anmeldung appointments.

Berlin Anmeldung Appointments (Python) This Python script will automatically check for free Anmeldung appointments in Berlin, and find them for you. T

Martรญn Aberastegue 6 May 19, 2022
Spotify Web API client for Python 3

Welcome to the GitHub repository of Tekore! We provide a client for the Spotify Web API for Python, complete with all available endpoints and authenti

Felix Hildรฉn 186 Dec 22, 2022