Trained T5 and T5-large model for creating keywords from text

Last update: Nov 24, 2022

Overview

text to keywords

Trained T5-base and T5-large model for creating keywords from text. Supported languages: ru

Pretraining Large version | Pretraining Base version

habr article

Usage

Example usage (the code returns a list with keywords. duplicates are possible):

pip install transformers sentencepiece

from itertools import groupby
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = "0x7194633/keyt5-large" # or 0x7194633/keyt5-base
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def generate(text, **kwargs):
    inputs = tokenizer(text, return_tensors='pt')
    with torch.no_grad():
        hypotheses = model.generate(**inputs, num_beams=5, **kwargs)
    s = tokenizer.decode(hypotheses[0], skip_special_tokens=True)
    s = s.replace('; ', ';').replace(' ;', ';').lower().split(';')[:-1]
    s = [el for el, _ in groupby(s)]
    return s

article = """Reuters сообщил об отмене 3,6 тыс. авиарейсов из-за «омикрона» и погоды
Наибольшее число отмен авиарейсов 2 января пришлось на американские авиакомпании 
SkyWest и Southwest, у каждой — более 400 отмененных рейсов. При этом среди 
отмененных 2 января авиарейсов — более 2,1 тыс. рейсов в США. Также свыше 6400 
рейсов были задержаны."""

print(generate(article, top_p=1.0, max_length=64))  
# ['авиаперевозки', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов']

Training

To teach the keyT5-base and keyT5-large models, you will need a table in csv format, like this:

KeyT5 models were trained on ~7000 compressed habr.com articles. data.csv collect.py Exclusively supports the Russian language!

X	Y
Some text that is fed to the input	The text that should come out
Some text that is fed to the input	The text that should come out

Go to the training notebook and learn more about it:

Trained T5 and T5-large model for creating keywords from text

Related tags

Overview

text to keywords

Usage

Training

Owner

Danil

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Auto translate textbox from Japanese to English or Indonesia

Outreachy TFX custom component project

Code examples for my Write Better Python Code series on YouTube.

In this Notebook I've build some machine-learning and deep-learning to classify corona virus tweets, in both multi class classification and binary classification.

This is a project of data parallel that running on NLP tasks.

Fine-tune GPT-3 with a Google Chat conversation history

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

An implementation of the Pay Attention when Required transformer

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

A demo for end-to-end English and Chinese text spotting using ABCNet.

apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

An open source framework for seq2seq models in PyTorch.

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

An open source library for deep learning end-to-end dialog systems and chatbots.

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

A Telegram bot to add notes to Flomo.