基于“Seq2Seq+前缀树”的知识图谱问答

Last update: Dec 12, 2022

Related tags

Text Data & NLP KgCLUE-bert4keras

Overview

KgCLUE-bert4keras

基于“Seq2Seq+前缀树”的知识图谱问答

简介

博客：https://kexue.fm/archives/8802

环境

软件：bert4keras>=0.10.8
硬件：目前的结果是用一张Titan RTX（24G）跑出来的。

运行

第一次运行的时候，会给知识库构建前缀树，然后保存下来，这个过程大概需要30分钟左右；
如果是第二次运行，那么就会自动加载保存好的前缀树，这个过程大概需要5分钟左右；
保存下载的前缀树文件大概1.8G，加载到运行环境中，大概需要30G内存；
每个epoch的训练时间是很快的，反而是验证效果时间比较长，跑完训练和测试，大概需要1～2小时。

交流

QQ交流群：808623966，微信群请加机器人微信号spaces_ac_cn

Owner

苏剑林(Jianlin Su)

科学爱好者

GitHub Repository

Deep learning for NLP crash course at ABBYY.

Deep NLP Course at ABBYY Deep learning for NLP crash course at ABBYY. Suggested textbook: Neural Network Methods in Natural Language Processing by Yoa

597 Dec 18, 2022

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

3.3k Dec 28, 2022

TPlinker for NER 中文/英文命名实体识别

本项目是参考 TPLinker 中HandshakingTagging思想，将TPLinker由原来的关系抽取(RE)模型修改为命名实体识别(NER)模型。

113 Dec 28, 2022

BERT Attention Analysis

BERT Attention Analysis This repository contains code for What Does BERT Look At? An Analysis of BERT's Attention. It includes code for getting attent

401 Dec 11, 2022

The training code for the 4th place model at MDX 2021 leaderboard A.

32 Dec 18, 2022

Text Normalization（文本正则化）

Text Normalization（文本正则化）任务描述：通过机器学习算法将英文文本的“手写”形式转换成“口语“形式，例如“6ft”转换成“six feet”等实验结果 XGBoost + bag-of-words: 0.99159 XGBoost+Weights+rules：0.99002

0 Feb 26, 2022

2021搜狐校园文本匹配算法大赛baseline

sohu2021-baseline 2021搜狐校园文本匹配算法大赛baseline 简介分享了一个搜狐文本匹配的baseline，主要是通过条件LayerNorm来增加模型的多样性，以实现同一模型处理不同类型的数据、形成不同输出的目的。线下验证集F1约0.74，线上测试集F1约0.73。

45 Sep 06, 2022

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

1 Nov 24, 2021

STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.

stsb_multi_mt_en STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 an

2 Nov 05, 2021

Installation, test and evaluation of Scribosermo speech-to-text engine

Scribosermo STT Setup Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different langua

3 Jun 20, 2022

Yet Another Neural Machine Translation Toolkit

YANMTT YANMTT is short for Yet Another Neural Machine Translation Toolkit. For a backstory how I ended up creating this toolkit scroll to the bottom o

121 Jan 05, 2023

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

NLP-Models-Tensorflow, Gathers machine learning and tensorflow deep learning models for NLP problems, code simplify inside Jupyter Notebooks 100%. Tab

1.7k Dec 30, 2022

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

ProphetNet-X This repo provides the code for reproducing the experiments in ProphetNet. In the paper, we propose a new pre-trained language model call

394 Dec 17, 2022

UniSpeech - Large Scale Self-Supervised Learning for Speech

UniSpeech The family of UniSpeech: WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing UniSpeech (ICML 202

281 Dec 15, 2022

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

简体中文 | English 并行语音合成 [TOC] 新进展 2021/04/20 合并 wavegan 分支到 main 主分支，删除 wavegan 分支！ 2021/04/13 创建 encoder 分支用于开发语音风格迁移模块！ 2021/04/13 softdtw 分支支持使用 Sof

161 Dec 19, 2022

基于“Seq2Seq+前缀树”的知识图谱问答

Related tags

Overview

KgCLUE-bert4keras

简介

环境

运行

交流

Owner

苏剑林(Jianlin Su)

Deep learning for NLP crash course at ABBYY.

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TPlinker for NER 中文/英文命名实体识别

BERT Attention Analysis

The training code for the 4th place model at MDX 2021 leaderboard A.

Text Normalization（文本正则化）

2021搜狐校园文本匹配算法大赛baseline

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.

Installation, test and evaluation of Scribosermo speech-to-text engine

Yet Another Neural Machine Translation Toolkit

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

UniSpeech - Large Scale Self-Supervised Learning for Speech

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Pretty-doc - Composable text objects with python

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

Blazing fast language detection using fastText model