The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Overview

Easy-to-use toolkit for retrieval-based Chatbot

Our released data can be found at this link. Make sure the following steps are adopted to use our codes.

How to Use

  1. Init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py
    
    # prepare the environment
    # if some package cannot be installed, just google and install it from other ways
    pip install -r requirements.txt
  2. train the model

    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  3. test the model [rerank]

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>
  4. test the model [recal]

    # different recall_modes are available: q-q, q-r
    ./scripts/test_recall.sh <dataset_name> <model_name> <cuda_id>
  5. inference the responses and save into the faiss index

    Somethings inference will missing data samples, please use the 1 gpu (faiss-gpu search use 1 gpu quickly)

    It should be noted that: 1. For writer dataset, use extract_inference.py script to generate the inference.txt 2. For other datasets(douban, ecommerce, ubuntu), just cp train.txt inference.txt. The dataloader will automatically read the test.txt to supply the corpus.

    # work_mode=response, inference the response and save into faiss (for q-r matching) [dual-bert/dual-bert-fusion]
    # work_mode=context, inference the context to do q-q matching
    # work_mode=gray, inference the context; read the faiss(work_mode=response has already been done), search the topk hard negative samples; remember to set the BERTDualInferenceContextDataloader in config/base.yaml
    ./scripts/inference.sh <dataset_name> <model_name> <cuda_ids>

    If you want to generate the gray dataset for the dataset:

    # 1. set the mode as the **response**, to generate the response faiss index; corresponding dataset name: BERTDualInferenceDataset;
    ./scripts/inference.sh <dataset_name> response <cuda_ids>
    
    # 2. set the mode as the **gray**, to inference the context in the train.txt and search the top-k candidates as the gray(hard negative) samples; corresponding dataset name: BERTDualInferenceContextDataset
    ./scripts/inference.sh <dataset_name> gray <cuda_ids>
    
    # 3. set the mode as the **gray-one2many** if you want to generate the extra positive samples for each context in the train set, the needings of this mode is the same as the **gray** work mode
    ./scripts/inference.sh <dataset_name> gray-one2many <cuda_ids>

    If you want to generate the pesudo positive pairs, run the following commands:

    # make sure the dual-bert inference dataset name is BERTDualInferenceDataset
    ./scripts/inference.sh <dataset_name> unparallel <cuda_ids>
  6. deploy the rerank and recall model

    # load the model on the cuda:0(can be changed in deploy.sh script)
    ./scripts/deploy.sh <cuda_id>

    at the same time, you can test the deployed model by using:

    # test_mode: recall, rerank, pipeline
    ./scripts/test_api.sh <test_mode> <dataset>
  7. test the recall performance of the elasticsearch

    Before testing the es recall, make sure the es index has been built:

    # recall_mode: q-q/q-r
    ./scripts/build_es_index.sh <dataset_name> <recall_mode>
    # recall_mode: q-q/q-r
    ./scripts/test_es_recall.sh <dataset_name> <recall_mode> 0
  8. simcse generate the gray responses

    # train the simcse model
    ./script/train.sh <dataset_name> simcse <cuda_ids>
    # generate the faiss index, dataset name: BERTSimCSEInferenceDataset
    ./script/inference_response.sh <dataset_name> simcse <cuda_ids>
    # generate the context index
    ./script/inference_simcse_response.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_simcse_unlikelyhood_response.sh <dataset_name> simcse <cuda_ids>
    # generate the gray response
    ./script/inference_gray_simcse.sh <dataset_name> simcse <cuda_ids>
    # generate the test set for unlikelyhood-gen dataset
    ./script/inference_gray_simcse_unlikelyhood.sh <dataset_name> simcse <cuda_ids>
Owner
GMFTBY
Those who are crazy enough to think they can change the world are the ones who can.
GMFTBY
Stack overflow search API

Stack overflow search API

Vikash Karodiya 1 Nov 15, 2021
๐Ÿ’ฌ Send iMessages using Python through the Shortcuts app.

py-imessage-shortcuts Send iMessages using Python through the Shortcuts app. Requires macOS Monterey (macOS 12) or later. Compatible with Apple Silico

Kevin Schaich 10 Nov 30, 2022
Connects to a local SenseCap M1 Helium Hotspot and pulls API Data.

sensecap_api_checker_HELIUM Connects to a local SenseCap M1 Helium Hotspot and pulls API Data.

Lorentz Factr 1 Nov 03, 2021
MashaRobot : New Generation Telegram Group Manager Bot (๐Ÿ”ธFast ๐Ÿ”ธPython๐Ÿ”ธPyrogram ๐Ÿ”ธTelethon ๐Ÿ”ธMongo db )

MashaRobot Me On Telegram โœจ MASHA โœจ This is just a demo bot.. Don't try to add to your group.. Create your own bot How To Host The easiest way to depl

Mr Dark Prince 40 Oct 09, 2022
An API that uses NLP and AI to let you predict possible diseases and symptoms based on a prompt of what you're feeling.

Disease detection API for MediSearch An API that uses NLP and AI to let you predict possible diseases and symptoms based on a prompt of what you're fe

Sebastian Ponce 1 Jan 15, 2022
SickNerd aims to slowly enumerate Google Dorks via the googlesearch API then requests found pages for metadata

CLI tool for making Google Dorking a passive recon experience. With the ability to fetch and filter dorks from GHDB.

Jake Wnuk 21 Jan 02, 2023
A simple chat api that can also work with ipb4 and chatbox+

SimpleChatApi API for chatting that can work on its own or work with Invision Community and Chatbox+. You are also welcome to create frontend for this

Anubhav K. 1 Feb 01, 2022
โ›‘ REDCap API interface in Python

REDCap API in Python Description Supports structured data extraction for REDCap projects. The API module d3b_redcap_api.redcap.REDCapStudy can be logi

D3b 1 Nov 21, 2022
A super awesome Twitter API client for Python.

birdy birdy is a super awesome Twitter API client for Python in just a little under 400 LOC. TL;DR Features Future proof dynamic API with full REST an

Inueni 259 Dec 28, 2022
A Twitter bot written in Python using Tweepy and hosted on a server.

A Twitter bot written in Python using Tweepy. It can like and/or retweet tweets that contain single or multiple keywords and hashtags.

anniedotexe 11 Dec 15, 2022
WhatsApp Api Python - This documentation aims to exemplify the use of Moorse Whatsapp API in Python

WhatsApp API Python ChatBot Este repositรณrio contรฉm uma aplicaรงรฃo que se utiliza

Moorse.io 3 Jan 08, 2022
โš”๏ธ Fastest tibia bot API

๐Ÿ“ Description tibia bot api using python โŒจ Development โš™ Running the app python bot.py โœ… ROADMAP Add confidence to floor level to have more accuracy

Lucas Santos 133 Dec 28, 2022
Discord bot ( discord.py ), uses pandas library from python for data-management.

Discord_bot A Best and the most easy-to-use Discord bot !! Some simple basic auto moderations, Chat functions. It includes a game similar to Casino, g

Jaitej 4 Aug 30, 2022
ShoukoKomiRobot - An anime themed telegram bot that can convert telegram media

ShoukoKomiRobot โ€ข ๐•Ž๐•ฃ๐•š๐•ฅ๐•ฅ๐•–๐•Ÿ ๐•€๐•Ÿ Python3 โ€ข ๐•ƒ๐•š๐•“๐•ฃ๐•’๐•ฃ๐•ช ๐•Œ๐•ค๐•–๐•• Pyrogram

25 Aug 14, 2022
Play Video & Music on Telegram Group Video Chat

Video Stream is an Advanced Telegram Bot that's allow you to play Video & Music on Telegram Group Video Chat ๐Ÿงช Get SESSION_NAME from below: Pyrogram

Sehath Perera 1 Jan 17, 2022
An API or getting Optifine VersionsList/Version/Download-URL.

Optifine-API An API for getting Optifine VersionsList/Versions/Download-URL. Table of contents Get Versions List Get Specify Versions Download Optifin

2 Dec 04, 2022
A Anything goes Discord bot written in python and uses the wrapper Discord.py

GerardTheWizard A Anything goes Discord bot written in python and uses the wrapper Discord.py What can he do? Allow users to level up through typing,

1 May 05, 2022
โšก PoC: Hide a c&c botnet in the discord client. (Proof Of Concept)

๐Ÿ‘จโ€๐Ÿ’ป Discord Self Bot ๐Ÿ‘จโ€๐Ÿ’ป A Discord Self-Bot in Python by natrix Installation Run: selfbot.bat Python: version : 3.8 Modules

0ั…Vฮนcะฝy#1337 37 Oct 21, 2022
Telegram Group Manager Bot Written In Python Using Pyrogram.

โ”€โ”€ใ€Œ๐‚๐ก๐ข๐ค๐š ๐…๐ฎ๐ฃ๐ข๐ฐ๐š๐ซ๐šใ€โ”€โ”€ Telegram Group Manager Bot Written In Python Using Pyrogram. Deploy To Heroku NOTE: I'm making this note to whoever

Wahyusaputra 3 Feb 12, 2022
ApiMoedas - This API is a extesion of API

๐Ÿช™ Api Moeda ๐Ÿช™ Este projeto รฉ uma extensรฃo da API Awesome API. Basicamente, ele mostra todas as moedas que a Awesome API tem e todas as suas conversรต

Abel 4 May 29, 2022