Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Last update: Dec 28, 2022

Related tags

Overview

Knover

Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out efficient training/inference of large-scale dialogue generation models.

What's New:

December 2021: We are opening the dialogue generation model of PLATO-XL, with up to 11 billion parameters.
October 2021: We are opening AG-DST, an amendable generation for dialogue state tracking.
February 2021: We are opening our implementation (Team 19) in DSTC9-Track1.
July 2020: We are opening PLATO-2, a large-scale generative model with latent space for open-domain dialogue systems.

Requirements and Installation

python version >= 3.7
paddlepaddle-gpu version >= 2.0.0
- You can install PaddlePaddle following the instructions.
- The specific version of PaddlePaddle is also based on your CUDA version (recommended version: 10.1) and CuDNN version (recommended version: 7.6). See more information on PaddlePaddle document about GPU support
sentencepiece
termcolor
If you want to run distributed training, you'll also need NCCL
Install Knover locally:

git clone https://github.com/PaddlePaddle/Knover.git
cd Knover
pip3 install -e .

Or you can setup PYTHONPATH only:

export PYTHONPATH=/abs/path/to/Knover:$PYTHONPATH

Basic usage

See usage document.

Disclaimer

This project aims to facilitate further research progress in dialogue generation. Baidu is not responsible for the 3rd party's generation with the pre-trained system.

Contact information

For help or issues using Knover, please submit a GitHub issue.

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Related tags

Overview

Knover

What's New:

Requirements and Installation

Basic usage

Disclaimer

Contact information

Owner

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

A curated list of FOSS tools to improve the Hacker News experience

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Question and answer retrieval in Turkish with BERT

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

Longformer: The Long-Document Transformer

DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

Code release for "COTR: Correspondence Transformer for Matching Across Images"

CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

Biterm Topic Model (BTM): modeling topics in short texts

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Guide to using pre-trained large language models of source code

This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021