Topic Inference with Zeroshot models

Overview

zeroshot_topics

Table of Contents

Installation

zeroshot_topics is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows and supports Python 3.7+ and PyPy.

$ pip install zeroshot_topics

Usage

from zeroshot_topics import ZeroShotTopicFinder
zsmodel = ZeroShotTopicFinder()
text = """can you tell me anything else okay great tell me everything you know about George_Washington.
he was the first president he was well he I'm trying to well he fought in the Civil_War he was a general
in the Civil_War and chopped down his father's cherry tree when he was a little boy he that's it."""
zsmodel.find_topic(text)

License

zeroshot_topics is distributed under the terms of

You might also like...
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API
topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

Biterm Topic Model (BTM): modeling topics in short texts
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation
LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT2, Transformer, etc. It is therefore best useful for Machine Translation, Text Generation, Dialog, Language Modelling, and other related tasks using these models.

Spert NLP Relation Extraction API deployed with torchserve for inference

SpERT torchserve Spert_torchserve is the Relation Extraction model (SpERT)Span-based Entity and Relation Transformer API deployed with pytorch/serve.

A minimal code for fairseq vq-wav2vec model inference.

vq-wav2vec inference A minimal code for fairseq vq-wav2vec model inference. Runs without installing the fairseq toolkit and its dependencies. Usage ex

Comments
  • Error when I run the sample code

    Error when I run the sample code

    I get this when I try to run the sample code:

    Traceback (most recent call last): File "zerotopics.py", line 1, in from zeroshot_topics import ZeroShotTopicFinder File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/init.py", line 3, in from .zeroshot_tm import ZeroShotTopicFinder File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/zeroshot_tm.py", line 3, in from .utils import load_zeroshot_model File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/utils.py", line 6, in def load_zeroshot_model(model_name="valhalla/distilbart-mnli-12-6"): File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/functools.py", line 490, in lru_cache raise TypeError('Expected maxsize to be an integer or None') TypeError: Expected maxsize to be an integer or None

    Specifics: Python version 3.7.9

    pip freeze gives (yeh this virtualenv is getting big :):

    absl-py==1.0.0 aiohttp==3.8.1 aiosignal==1.2.0 alabaster==0.7.12 aniso8601==9.0.1 antlr4-python3-runtime==4.8 appnope @ file:///opt/concourse/worker/volumes/live/4f734db2-9ca8-4d8b-5b29-6ca15b4b4772/volume/appnope_1606859466979/work async-timeout==4.0.2 asynctest==0.13.0 attrs==20.3.0 Babel==2.9.1 backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work bertopic==0.6.0 blis @ file:///opt/concourse/worker/volumes/live/cd6a6bea-d063-4b62-4c10-fcc89b17d0ac/volume/cython-blis_1594246851083/work boto3==1.17.86 botocore==1.20.86 brotlipy==0.7.0 cachetools==4.2.1 catalogue==2.0.6 certifi==2020.12.5 cffi @ file:///opt/concourse/worker/volumes/live/2aa8abfe-8b8d-4889-78d9-837b74c3cd64/volume/cffi_1606255119410/work chardet @ file:///opt/concourse/worker/volumes/live/9efbf151-b45b-463d-6340-a5c399bf00b7/volume/chardet_1607706825988/work charset-normalizer==2.0.9 click==7.1.2 colorama==0.4.4 coloredlogs==15.0.1 commonmark==0.9.1 cryptography @ file:///opt/concourse/worker/volumes/live/41c3d62a-f1f8-46ce-414a-9adaf4ea7d96/volume/cryptography_1607636752064/work cycler==0.10.0 cymem @ file:///opt/concourse/worker/volumes/live/3e8d7428-f57d-4000-44e7-34ac8a744f13/volume/cymem_1605062299053/work Cython==0.29.23 dataclasses==0.6 datasets==1.17.0 decorator @ file:///home/ktietz/src/ci/decorator_1611930055503/work dill==0.3.4 docformatter==1.4 docutils==0.15.2 emoji==1.6.1 en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.2.0/en_core_web_lg-3.2.0-py3-none-any.whl en-core-web-md @ https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.2.0/en_core_web_md-3.2.0-py3-none-any.whl en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl en-core-web-trf @ https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.2.0/en_core_web_trf-3.2.0-py3-none-any.whl et-xmlfile==1.1.0 fairscale==0.4.4 Faker==8.16.0 fasttext @ file:///Users/scharlesworth/fastText-0.9.2 filelock==3.0.12 flake8==4.0.1 flake8-bugbear==21.11.29 Flask==2.0.2 Flask-Cors==3.0.10 Flask-RESTful==0.3.9 frozenlist==1.2.0 fsspec==2021.11.1 future==0.18.2 gitdb==4.0.9 gitdb2==4.0.2 GitPython==3.1.24 google-api-core==1.26.2 google-api-python-client==2.0.2 google-auth==1.28.0 google-auth-httplib2==0.1.0 google-auth-oauthlib==0.4.6 googleapis-common-protos==1.53.0 grpcio==1.43.0 hdbscan==0.8.27 httplib2==0.19.0 huggingface-hub==0.2.1 humanfriendly==10.0 hydra-core==1.1.1 idna @ file:///tmp/build/80754af9/idna_1593446292537/work imagesize==1.3.0 importlib-metadata @ file:///tmp/build/80754af9/importlib-metadata_1602276842396/work importlib-resources==5.4.0 iniconfig==1.1.1 iopath==0.1.9 ipykernel @ file:///opt/concourse/worker/volumes/live/73e8766c-12c3-4f76-62a6-3dea9a7da5b7/volume/ipykernel_1596206701501/work/dist/ipykernel-5.3.4-py3-none-any.whl ipython @ file:///opt/concourse/worker/volumes/live/ac685347-76d6-4904-4b88-886c6a434f22/volume/ipython_1614616430264/work ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work itsdangerous==2.0.1 jedi @ file:///opt/concourse/worker/volumes/live/5006b7b5-a924-4788-6cfe-ae05d8be8830/volume/jedi_1606932947370/work Jinja2==3.0.1 jmespath==0.10.0 joblib==1.0.1 jsonlines==3.0.0 jsonschema==3.0.2 jupyter-client @ file:///tmp/build/80754af9/jupyter_client_1601311786391/work jupyter-core @ file:///opt/concourse/worker/volumes/live/a699b83f-e941-4170-5136-bf87e3f37756/volume/jupyter_core_1612213304212/work keybert==0.5.0 kiwisolver==1.3.1 langcodes==3.3.0 llvmlite==0.36.0 loguru==0.5.3 Markdown==3.3.4 markdown-it-py==0.5.8 MarkupSafe==2.0.1 matplotlib==3.4.0 mccabe==0.6.1 mkl-fft==1.2.0 mkl-random==1.1.1 mkl-service==2.3.0 mock==4.0.3 multidict==5.2.0 multiprocess==0.70.12.2 murmurhash @ file:///opt/concourse/worker/volumes/live/9a0582f9-9097-4dab-6d7a-fcf62b4968ae/volume/murmurhash_1607456116622/work myst-parser==0.12.10 nltk==3.6.5 numba==0.53.1 numpy==1.20.2 oauthlib==3.1.1 omegaconf==2.1.1 openai==0.6.3 openpyxl==3.0.9 packaging==20.9 pandas==1.2.1 parlai==1.5.1 parquet==1.3.1 parso==0.7.0 pathy==0.6.1 pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work Pillow==8.2.0 plac @ file:///opt/concourse/worker/volumes/live/a94b6881-2d18-4055-5a3c-f24036f05ef6/volume/plac_1594259982880/work pluggy==1.0.0 ply==3.11 portalocker==2.3.2 praw==7.1.0 prawcore==1.5.0 preshed @ file:///opt/concourse/worker/volumes/live/952fa955-acc7-4aa0-6766-86f802ea8ef1/volume/preshed_1608233410312/work prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1616415428029/work protobuf==3.15.6 ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl py==1.11.0 py-gfm==1.0.2 py-rouge==1.1 py4j==0.10.7 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.6.1 pycodestyle==2.8.0 pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work pydantic==1.8.2 pyee==8.2.2 pyflakes==2.4.0 Pygments @ file:///tmp/build/80754af9/pygments_1615143339740/work PyJWT==2.3.0 pynndescent==0.5.2 pyodbc==4.0.32 pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1608057966937/work pyparsing==2.4.7 pyrsistent @ file:///opt/concourse/worker/volumes/live/656e0c1b-ef87-4251-4a51-1290b2351993/volume/pyrsistent_1600141745371/work PySocks @ file:///opt/concourse/worker/volumes/live/ef943889-94fc-4539-798d-461c60b77804/volume/pysocks_1605305801690/work pytest==6.2.5 pytest-datadir==1.3.1 pytest-regressions==2.2.0 python-dateutil @ file:///home/ktietz/src/ci/python-dateutil_1611928101742/work python-slugify==5.0.2 pytorch-transformers==1.2.0 pytz==2020.5 PyYAML==6.0 pyzmq==20.0.0 regex==2021.11.10 requests @ file:///tmp/build/80754af9/requests_1608241421344/work requests-mock==1.9.3 requests-oauthlib==1.3.0 requests-toolbelt==0.9.1 rich==10.16.2 rsa==4.7.2 s3transfer==0.4.2 sacremoses==0.0.44 scikit-learn==0.24.1 scipy==1.6.2 seaborn==0.11.1 sentence-transformers==1.0.4 sentencepiece==0.1.91 seqeval==0.0.5 sh==1.14.2 six @ file:///opt/concourse/worker/volumes/live/f983ba11-c9fe-4dff-7ce7-d89b95b09771/volume/six_1605205318156/work sklearn==0.0 slack-bolt==1.11.1 slack-sdk==3.13.0 slackclient==2.9.3 slackeventsapi==3.0.1 smart-open==5.2.1 smmap==5.0.0 snowballstemmer==2.2.0 spacy==3.2.0 spacy-alignments==0.8.4 spacy-legacy==3.0.8 spacy-loggers==1.0.1 spacy-sentence-bert==0.1.2 spacy-transformers==1.1.2 spark-nlp==3.0.2 Sphinx==2.2.2 sphinx-autodoc-typehints==1.10.3 sphinx-rtd-theme==1.0.0 sphinxcontrib-applehelp==1.0.2 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.0 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 srsly==2.4.2 subword-nmt==0.3.8 tensorboard==2.7.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.0 tensorboardX==2.4.1 text-unidecode==1.3 thinc==8.0.13 threadpoolctl==2.1.0 thriftpy2==0.4.14 tokenizers==0.10.2 toml==0.10.2 torch==1.10.1 torchtext==0.11.1 tornado @ file:///opt/concourse/worker/volumes/live/d531d395-893c-4ca1-6a5f-717b318eb08c/volume/tornado_1606942307627/work tqdm==4.62.3 traitlets @ file:///home/ktietz/src/ci/traitlets_1611929699868/work transformers==4.11.0 typer==0.4.0 typing-extensions==3.7.4.3 umap-learn==0.5.1 Unidecode==1.3.2 untokenize==0.1.1 update-checker==0.18.0 uritemplate==3.0.1 urllib3==1.26.7 wasabi==0.8.2 wcwidth @ file:///tmp/build/80754af9/wcwidth_1593447189090/work webexteamsbot==0.1.4.2 webexteamssdk==1.6 websocket-client==0.57.0 websocket-server==0.6.4 Werkzeug==2.0.1 xlrd==2.0.1 xxhash==2.0.2 yarl==1.7.2 zeroshot-topics==0.1.0 zipp @ file:///tmp/build/80754af9/zipp_1604001098328/work

    opened by sdcharle 1
  • Add size to lru_cache

    Add size to lru_cache

    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/__init__.py in <module>()
          1 __version__ = '0.1.0'
          2 
    ----> 3 from .zeroshot_tm import ZeroShotTopicFinder
    
    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/zeroshot_tm.py in <module>()
          1 import attr
          2 from keybert import KeyBERT
    ----> 3 from .utils import load_zeroshot_model
          4 from nltk.corpus import wordnet as wn
          5 
    
    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/utils.py in <module>()
          4 
          5 @lru_cache
    ----> 6 def load_zeroshot_model(model_name="valhalla/distilbart-mnli-12-6"):
          7     classifier = pipeline("zero-shot-classification", model=model_name)
          8     return classifier
    
    /usr/lib/python3.7/functools.py in lru_cache(maxsize, typed)
        488             maxsize = 0
        489     elif maxsize is not None:
    --> 490         raise TypeError('Expected maxsize to be an integer or None')
        491 
        492     def decorating_function(user_function):
    
    TypeError: Expected maxsize to be an integer or None
    

    I assume that you have to provide, maxsize parameter to lru_cache. Worked for me, when I provided the parameter.

    opened by gsasikiran 6
Releases(v.0.0.1)
Owner
Rita Anjana
ML engineer
Rita Anjana
TruthfulQA: Measuring How Models Imitate Human Falsehoods

TruthfulQA: Measuring How Models Imitate Human Falsehoods

69 Dec 25, 2022
Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Chenyang Huang 37 Jan 04, 2023
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
Codes to pre-train Japanese T5 models

t5-japanese Codes to pre-train a T5 (Text-to-Text Transfer Transformer) model pre-trained on Japanese web texts. The model is available at https://hug

Megagon Labs 37 Dec 25, 2022
BiQE: Code and dataset for the BiQE paper

BiQE: Bidirectional Query Embedding This repository includes code for BiQE and the datasets introduced in Answering Complex Queries in Knowledge Graph

Bhushan Kotnis 1 Oct 20, 2021
🕹 An esoteric language designed so that the program looks like the transcript of a Pokémon battle

PokéBattle is an esoteric language designed so that the program looks like the transcript of a Pokémon battle. Original inspiration and specification

Eduardo Correia 9 Jan 11, 2022
Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

K-BERT Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph", which is implemented based on the UER framework. R

Weijie Liu 834 Jan 09, 2023
Weakly-supervised Text Classification Based on Keyword Graph

Weakly-supervised Text Classification Based on Keyword Graph How to run? Download data Our dataset follows previous works. For long texts, we follow C

Hello_World 20 Dec 29, 2022
SciBERT is a BERT model trained on scientific text.

SciBERT is a BERT model trained on scientific text.

AI2 1.2k Dec 24, 2022
fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

fast.ai ULMFiT with SentencePiece from pretraining to deployment Motivation: Why even bother with a non-BERT / Transformer language model? Short answe

Florian Leuerer 26 May 27, 2022
Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. @inproceedings{tedes

Babelscape 40 Dec 11, 2022
Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

2 Dec 29, 2022
Sapiens is a human antibody language model based on BERT.

Sapiens: Human antibody language model ____ _ / ___| __ _ _ __ (_) ___ _ __ ___ \___ \ / _` | '_ \| |/ _ \ '

Merck Sharp & Dohme Corp. a subsidiary of Merck & Co., Inc. 13 Nov 20, 2022
Train and use generative text models in a few lines of code.

blather Train and use generative text models in a few lines of code. To see blather in action check out the colab notebook! Installation Use the packa

Dan Carroll 16 Nov 07, 2022
Trained T5 and T5-large model for creating keywords from text

text to keywords Trained T5-base and T5-large model for creating keywords from text. Supported languages: ru Pretraining Large version | Pretraining B

Danil 61 Nov 24, 2022
A Flask Sentiment Analysis API, with visual implementation

The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can

Ifechukwudeni Oweh 10 Jul 17, 2022
Deep learning for NLP crash course at ABBYY.

Deep NLP Course at ABBYY Deep learning for NLP crash course at ABBYY. Suggested textbook: Neural Network Methods in Natural Language Processing by Yoa

Dan Anastasyev 597 Dec 18, 2022
skweak: A software toolkit for weak supervision applied to NLP tasks

Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels wi

Norsk Regnesentral (Norwegian Computing Center) 850 Dec 28, 2022
Question answering app is used to answer for a user given question from user given text.

Question answering app is used to answer for a user given question from user given text.It is created using HuggingFace's transformer pipeline and streamlit python packages.

Siva Prakash 3 Apr 05, 2022
A versatile token stream for handwritten parsers.

Writing recursive-descent parsers by hand can be quite elegant but it's often a bit more verbose than expected, especially when it comes to handling indentation and reporting proper syntax errors. Th

Valentin Berlier 8 Nov 30, 2022