Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Last update: Jan 03, 2023

Overview

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

Marketing
Search Engine Optimization
Topic generation etc.
Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model:

k2t: Model
k2t-base: Model
mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage:

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here:

from keytotext import trainer

UI:

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

API:

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

https://github.com/Shivanandroy/simpleT5 (Shivanand Roy)
https://github.com/patil-suraj/question_generation (Suraj Patil)
https://github.com/MathewAlexander/T5_nlg (Mathew Alexander)

Articles about keytotext:

https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45 (Mathew Alexander)
Amazing Video by 1LittleCoder here: https://www.youtube.com/watch?v=I0iBzP-SxFY about keytotext
https://medium.com/mlearning-ai/generating-sentences-from-keywords-using-transformers-in-nlp-e89f4de5cf6b (Prakhar Mishra)

Comments

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
Hi,

I tried to install keytotext via pip install keytotext --upgrade in local machine.

but came across the following :

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none) ERROR: No matching distribution found for keytotext

My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?
opened by abhijithneilabraham 6
Add finetuning model to keytotext

Is your feature request related to a problem? Please describe. Its difficult to use it without fine-tuning on new corpus so we need to build script to finetune it on new corpus
enhancement good first issue

opened by gagan3012 2
"Oh no." ?

"Error running app. If this keeps happening, please file an issue."

Ok,...sure? I know nothing about this app.

Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

Chrome browser, Linux.

opened by drscotthawley 2
Add Citations

Is your feature request related to a problem? Please describe. Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by gagan3012 1
Adding new models to keytotext

Is your feature request related to a problem? Please describe. Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.
enhancement good first issue

opened by gagan3012 1
Inference API for Keytotext

Is your feature request related to a problem? Please describe. It is difficult to host the UI on streamlit without API

Describe the solution you'd like Inference API
enhancement good first issue

opened by gagan3012 1
Create Better UI

Is your feature request related to a problem? Please describe. The current UI is not functional It needs to be fixed

Describe the solution you'd like Better UI with a nicer design
enhancement

opened by gagan3012 1
Add `st.cache` to load model

Hi @gagan3012,

Johannes from the Streamlit team here :) I am currently investigating why apps run over the resource limits of Streamlit Sharing and saw that your app was affected in the past few days.

Thought I'd send you a small PR which should fix this. You've already been on a good way with using st.cache but it gets even better if you use it once more to load the model. This makes sure the model and tokenizer are only loaded once, which should make the app consume less memory (and not run into resource limits again! Plus, I've seen that it also works a bit faster now ;).

Hope this works for you and let me know if you have any other questions! 🎈

Cheers, Johannes

opened by jrieke 1
ValueError: transformers.models.auto.__spec__ is None

'from keytotext import pipeline'

While running the above line, it is showing this error . "ValueError: transformers.models.auto.spec is None"

opened by varunakk 0
Update README.md
Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by gagan3012 0
Update trainer.py
Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by gagan3012 0
Pipeline error on fresh install

Hi I'm getting this on a first run and fresh install

Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

opened by skintflickz 0
New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'
I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Imported libraries:

!pip install keytotext --upgrade !sudo apt-get install git-lfs

from keytotext import trainer

Training Model:

model = trainer() model.from_pretrained(model_name="t5-small") model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True) model.save_model()

Have attached error screenshot

OS: Windows

Browser Chrome
opened by aishwaryapisal9 2
Update trainer.py
Delete progress_bar_refresh_rate in trainer.py

Description

delete progress_bar_refresh_rate=5, since this keyword argument is no longer supported by the latest version (1.7.0) of PyTorch.Lightning.Trainer module

Motivation and Context

having this argument fails the training process

How Has This Been Tested?

Ran key to text on the custom dataset before and after August 2nd, 2022. Changes in the new version of Pytorch Lightning's Trainer were put into effect on that date where the above argument was removed and hence, the custom training failed since that day.

Screenshots (if appropriate):

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[x] My code follows the code style of this project.

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by anath2110benten 0
Why is cv2 required?

https://github.com/gagan3012/keytotext/blob/6f807b940f5e2fdeb755ed085b40af7c0fa5e87e/keytotext/dataset.py#L1

I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

opened by ChunxuYang 0
Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by RuiFeiHe 6

Releases(v1.5.0)

v1.5.0(Jul 9, 2021)

Trainer tool finalized and completed!
Source code(tar.gz)
Source code(zip)
v1.4.1(Jul 2, 2021)

Val acc added
Source code(tar.gz)
Source code(zip)
v1.3.9(Jul 2, 2021)

Bug fixes
Source code(tar.gz)
Source code(zip)
v1.3.8(Jul 2, 2021)

New Upload to hf hub module
Source code(tar.gz)
Source code(zip)
v1.3.1(Jun 16, 2021)

Documentation updated along with sematic versioning
Source code(tar.gz)
Source code(zip)

v0.3.1(Jun 15, 2021)

This version features a tested trainer which can be used in 4 lines of code:

from keytotext import KeytotextTrainer

model = KeytotextTrainer()
model.from_pretrained(model_name="t5-small")
model.train(data_df=df,batch_size=4, max_epochs=3, use_gpu=True)
model.save_model()

Source code(tar.gz)
Source code(zip)

v0.2.9(Jun 15, 2021)

This release features the new Trainer module More details coming soon
Source code(tar.gz)
Source code(zip)
v0.2.5(May 12, 2021)
Changes:

Bug Fixes

Maintaining new models

Source code(tar.gz)
Source code(zip)
v0.2.4(May 11, 2021)
Changes:

Refactoring of code

Ability to add new models too

Source code(tar.gz)
Source code(zip)
v0.2.3(May 10, 2021)
v0.2.3 :

Bug fixes

New models added

Source code(tar.gz)
Source code(zip)
v0.2.2(May 10, 2021)
Changes:

Now keytotext supports new models trained by other people too

A new fine-tuning script

Source code(tar.gz)
Source code(zip)
v0.2.1(May 5, 2021)

Bug fixes
Source code(tar.gz)
Source code(zip)
v0.2.0(May 4, 2021)
Latest Release:

Completed API

Completed testing

completed all Evals

UI Improvements too

Source code(tar.gz)
Source code(zip)
v0.1.6(May 2, 2021)
Changes:

Updates to Eval pipeline

Source code(tar.gz)
Source code(zip)
v0.1.5(May 2, 2021)
Changes:

Added Trainer API

Added Eval pipeline

Source code(tar.gz)
Source code(zip)
v0.1.4(Apr 30, 2021)

Latest release
Source code(tar.gz)
Source code(zip)
v0.1.3(Apr 27, 2021)

Updates
Source code(tar.gz)
Source code(zip)
0.1.1(Apr 26, 2021)

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 26, 2021)

Production release- 0.1.0
Source code(tar.gz)
Source code(zip)

Owner

Gagan Bhatia

Software Developer | Machine Learning Enthusiast

GitHub Repository https://share.streamlit.io/gagan3012/keytotext/UI/app.py

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

1. What does this library do? Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a prepr

334 Dec 30, 2022

This repository is home to the Optimus data transformation plugins for various data processing needs.

Transformers Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus. To i

37 Dec 14, 2022

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 Corpora 📃 Corpora Number of documents Size (GB) BNE 201,080,084 570GB Models 🤖 RoBERTa-base BNE: https://huggingface.co

203 Dec 20, 2022

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

Diaformer Diaformer: Automatic Diagnosis via Symptoms Sequence Generation (AAAI 2022) Diaformer is an efficient model for automatic diagnosis via symp

20 Dec 13, 2022

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

478 Dec 25, 2022

a test times augmentation toolkit based on paddle2.0.

Patta Image Test Time Augmentation with Paddle2.0! Input | # input batch of images / / /|\ \ \ # apply

110 Dec 03, 2022

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

AliceMind AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab This repository provides pre-trained encode

1.4k Jan 04, 2023

LCG T-TEST USING EUCLIDEAN METHOD

This project has been created for statistical usage, purposing for determining ATL takers and nontakers using LCG ttest and Euclidean Method, especially for internal business case in Telkomsel.

2 Jan 21, 2022

Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

8.8k Jan 01, 2023

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.

1.2k Jan 06, 2023

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Transformers are all you need In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a

8 Apr 13, 2022

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Related tags

Overview

keytotext

Model:

Usage:

Trainer:

UI:

API:

BibTex:

References

Articles about keytotext:

Comments

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Releases(v1.5.0)

v1.5.0(Jul 9, 2021)

v1.4.1(Jul 2, 2021)

v1.3.9(Jul 2, 2021)

v1.3.8(Jul 2, 2021)

v1.3.1(Jun 16, 2021)

v0.3.1(Jun 15, 2021)

v0.2.9(Jun 15, 2021)

v0.2.5(May 12, 2021)

v0.2.4(May 11, 2021)

v0.2.3(May 10, 2021)

v0.2.2(May 10, 2021)

v0.2.1(May 5, 2021)

v0.2.0(May 4, 2021)

v0.1.6(May 2, 2021)

v0.1.5(May 2, 2021)

v0.1.4(Apr 30, 2021)

v0.1.3(Apr 27, 2021)

0.1.1(Apr 26, 2021)

0.1.0(Apr 26, 2021)

Owner

Gagan Bhatia

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

This repository is home to the Optimus data transformation plugins for various data processing needs.

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

a test times augmentation toolkit based on paddle2.0.

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

LCG T-TEST USING EUCLIDEAN METHOD

Fuzzy String Matching in Python

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Question and answer retrieval in Turkish with BERT

EdiTTS: Score-based Editing for Controllable Text-to-Speech

TLA - Twitter Linguistic Analysis

The ibet-Prime security token management system for ibet network.

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

Code for the paper "Are Sixteen Heads Really Better than One?"

Weakly-supervised Text Classification Based on Keyword Graph

An open-source NLP library: fast text cleaning and preprocessing.