SPACES

端到端的长文本摘要模型（法研杯2020司法摘要赛道）。

博客介绍：https://kexue.fm/archives/8046

含义

我们将我们的模型称为SPACES，它正好是科学空间的域名之一（https://spaces.ac.cn），具体含义如下：

S：Sparse Softmax；
P：Pretrained Language Model；
A：Abstractive；
C：Copy Mechanism；
E：Extractive；
S：Special Words。

顾名思义，这是一个以词为单位的、包含预训练和Copy机制的“抽取-生成”式摘要模型，里边包含了一些我们对文本生成技术的最新研究成果。

运行

实验环境：tensorflow 1.14 + keras 2.3.1 + bert4keras 0.9.7

(如果是Windows，请用bert4keras>=0.9.8)

首先请在snippets.py中修改相关路径配置，然后再执行下述代码。

训练代码：

#! /bin/bash

python extract_convert.py
python extract_vectorize.py

for ((i=0; i<15; i++));
    do
        python extract_model.py $i
    done

python seq2seq_convert.py
python seq2seq_model.py

预测代码

from final import *
summary = predict(text, topk=3)
print(summary)

交流

QQ交流群：808623966，微信群请加机器人微信号spaces_ac_cn

链接

博客：https://kexue.fm
追一：https://zhuiyi.ai/
预训练模型：https://github.com/ZhuiyiTechnology/pretrained-models
WoBERT：https://github.com/ZhuiyiTechnology/WoBERT

端到端的长本文摘要模型（法研杯2020司法摘要赛道）

Related tags

Overview

SPACES

含义

运行

交流

链接

Owner

苏剑林(Jianlin Su)

Document processing using transformers

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

BiNE: Bipartite Network Embedding

Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Module for automatic summarization of text documents and HTML pages.

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

translate using your voice

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

【原神】自动演奏风物之诗琴的程序

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities