Topic Inference with Zeroshot models

Overview

zeroshot_topics

Table of Contents

Installation

zeroshot_topics is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows and supports Python 3.7+ and PyPy.

$ pip install zeroshot_topics

Usage

from zeroshot_topics import ZeroShotTopicFinder
zsmodel = ZeroShotTopicFinder()
text = """can you tell me anything else okay great tell me everything you know about George_Washington.
he was the first president he was well he I'm trying to well he fought in the Civil_War he was a general
in the Civil_War and chopped down his father's cherry tree when he was a little boy he that's it."""
zsmodel.find_topic(text)

License

zeroshot_topics is distributed under the terms of

You might also like...
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API
topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

Biterm Topic Model (BTM): modeling topics in short texts
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation
LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT2, Transformer, etc. It is therefore best useful for Machine Translation, Text Generation, Dialog, Language Modelling, and other related tasks using these models.

Spert NLP Relation Extraction API deployed with torchserve for inference

SpERT torchserve Spert_torchserve is the Relation Extraction model (SpERT)Span-based Entity and Relation Transformer API deployed with pytorch/serve.

A minimal code for fairseq vq-wav2vec model inference.

vq-wav2vec inference A minimal code for fairseq vq-wav2vec model inference. Runs without installing the fairseq toolkit and its dependencies. Usage ex

Comments
  • Error when I run the sample code

    Error when I run the sample code

    I get this when I try to run the sample code:

    Traceback (most recent call last): File "zerotopics.py", line 1, in from zeroshot_topics import ZeroShotTopicFinder File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/init.py", line 3, in from .zeroshot_tm import ZeroShotTopicFinder File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/zeroshot_tm.py", line 3, in from .utils import load_zeroshot_model File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/utils.py", line 6, in def load_zeroshot_model(model_name="valhalla/distilbart-mnli-12-6"): File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/functools.py", line 490, in lru_cache raise TypeError('Expected maxsize to be an integer or None') TypeError: Expected maxsize to be an integer or None

    Specifics: Python version 3.7.9

    pip freeze gives (yeh this virtualenv is getting big :):

    absl-py==1.0.0 aiohttp==3.8.1 aiosignal==1.2.0 alabaster==0.7.12 aniso8601==9.0.1 antlr4-python3-runtime==4.8 appnope @ file:///opt/concourse/worker/volumes/live/4f734db2-9ca8-4d8b-5b29-6ca15b4b4772/volume/appnope_1606859466979/work async-timeout==4.0.2 asynctest==0.13.0 attrs==20.3.0 Babel==2.9.1 backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work bertopic==0.6.0 blis @ file:///opt/concourse/worker/volumes/live/cd6a6bea-d063-4b62-4c10-fcc89b17d0ac/volume/cython-blis_1594246851083/work boto3==1.17.86 botocore==1.20.86 brotlipy==0.7.0 cachetools==4.2.1 catalogue==2.0.6 certifi==2020.12.5 cffi @ file:///opt/concourse/worker/volumes/live/2aa8abfe-8b8d-4889-78d9-837b74c3cd64/volume/cffi_1606255119410/work chardet @ file:///opt/concourse/worker/volumes/live/9efbf151-b45b-463d-6340-a5c399bf00b7/volume/chardet_1607706825988/work charset-normalizer==2.0.9 click==7.1.2 colorama==0.4.4 coloredlogs==15.0.1 commonmark==0.9.1 cryptography @ file:///opt/concourse/worker/volumes/live/41c3d62a-f1f8-46ce-414a-9adaf4ea7d96/volume/cryptography_1607636752064/work cycler==0.10.0 cymem @ file:///opt/concourse/worker/volumes/live/3e8d7428-f57d-4000-44e7-34ac8a744f13/volume/cymem_1605062299053/work Cython==0.29.23 dataclasses==0.6 datasets==1.17.0 decorator @ file:///home/ktietz/src/ci/decorator_1611930055503/work dill==0.3.4 docformatter==1.4 docutils==0.15.2 emoji==1.6.1 en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.2.0/en_core_web_lg-3.2.0-py3-none-any.whl en-core-web-md @ https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.2.0/en_core_web_md-3.2.0-py3-none-any.whl en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl en-core-web-trf @ https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.2.0/en_core_web_trf-3.2.0-py3-none-any.whl et-xmlfile==1.1.0 fairscale==0.4.4 Faker==8.16.0 fasttext @ file:///Users/scharlesworth/fastText-0.9.2 filelock==3.0.12 flake8==4.0.1 flake8-bugbear==21.11.29 Flask==2.0.2 Flask-Cors==3.0.10 Flask-RESTful==0.3.9 frozenlist==1.2.0 fsspec==2021.11.1 future==0.18.2 gitdb==4.0.9 gitdb2==4.0.2 GitPython==3.1.24 google-api-core==1.26.2 google-api-python-client==2.0.2 google-auth==1.28.0 google-auth-httplib2==0.1.0 google-auth-oauthlib==0.4.6 googleapis-common-protos==1.53.0 grpcio==1.43.0 hdbscan==0.8.27 httplib2==0.19.0 huggingface-hub==0.2.1 humanfriendly==10.0 hydra-core==1.1.1 idna @ file:///tmp/build/80754af9/idna_1593446292537/work imagesize==1.3.0 importlib-metadata @ file:///tmp/build/80754af9/importlib-metadata_1602276842396/work importlib-resources==5.4.0 iniconfig==1.1.1 iopath==0.1.9 ipykernel @ file:///opt/concourse/worker/volumes/live/73e8766c-12c3-4f76-62a6-3dea9a7da5b7/volume/ipykernel_1596206701501/work/dist/ipykernel-5.3.4-py3-none-any.whl ipython @ file:///opt/concourse/worker/volumes/live/ac685347-76d6-4904-4b88-886c6a434f22/volume/ipython_1614616430264/work ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work itsdangerous==2.0.1 jedi @ file:///opt/concourse/worker/volumes/live/5006b7b5-a924-4788-6cfe-ae05d8be8830/volume/jedi_1606932947370/work Jinja2==3.0.1 jmespath==0.10.0 joblib==1.0.1 jsonlines==3.0.0 jsonschema==3.0.2 jupyter-client @ file:///tmp/build/80754af9/jupyter_client_1601311786391/work jupyter-core @ file:///opt/concourse/worker/volumes/live/a699b83f-e941-4170-5136-bf87e3f37756/volume/jupyter_core_1612213304212/work keybert==0.5.0 kiwisolver==1.3.1 langcodes==3.3.0 llvmlite==0.36.0 loguru==0.5.3 Markdown==3.3.4 markdown-it-py==0.5.8 MarkupSafe==2.0.1 matplotlib==3.4.0 mccabe==0.6.1 mkl-fft==1.2.0 mkl-random==1.1.1 mkl-service==2.3.0 mock==4.0.3 multidict==5.2.0 multiprocess==0.70.12.2 murmurhash @ file:///opt/concourse/worker/volumes/live/9a0582f9-9097-4dab-6d7a-fcf62b4968ae/volume/murmurhash_1607456116622/work myst-parser==0.12.10 nltk==3.6.5 numba==0.53.1 numpy==1.20.2 oauthlib==3.1.1 omegaconf==2.1.1 openai==0.6.3 openpyxl==3.0.9 packaging==20.9 pandas==1.2.1 parlai==1.5.1 parquet==1.3.1 parso==0.7.0 pathy==0.6.1 pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work Pillow==8.2.0 plac @ file:///opt/concourse/worker/volumes/live/a94b6881-2d18-4055-5a3c-f24036f05ef6/volume/plac_1594259982880/work pluggy==1.0.0 ply==3.11 portalocker==2.3.2 praw==7.1.0 prawcore==1.5.0 preshed @ file:///opt/concourse/worker/volumes/live/952fa955-acc7-4aa0-6766-86f802ea8ef1/volume/preshed_1608233410312/work prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1616415428029/work protobuf==3.15.6 ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl py==1.11.0 py-gfm==1.0.2 py-rouge==1.1 py4j==0.10.7 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.6.1 pycodestyle==2.8.0 pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work pydantic==1.8.2 pyee==8.2.2 pyflakes==2.4.0 Pygments @ file:///tmp/build/80754af9/pygments_1615143339740/work PyJWT==2.3.0 pynndescent==0.5.2 pyodbc==4.0.32 pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1608057966937/work pyparsing==2.4.7 pyrsistent @ file:///opt/concourse/worker/volumes/live/656e0c1b-ef87-4251-4a51-1290b2351993/volume/pyrsistent_1600141745371/work PySocks @ file:///opt/concourse/worker/volumes/live/ef943889-94fc-4539-798d-461c60b77804/volume/pysocks_1605305801690/work pytest==6.2.5 pytest-datadir==1.3.1 pytest-regressions==2.2.0 python-dateutil @ file:///home/ktietz/src/ci/python-dateutil_1611928101742/work python-slugify==5.0.2 pytorch-transformers==1.2.0 pytz==2020.5 PyYAML==6.0 pyzmq==20.0.0 regex==2021.11.10 requests @ file:///tmp/build/80754af9/requests_1608241421344/work requests-mock==1.9.3 requests-oauthlib==1.3.0 requests-toolbelt==0.9.1 rich==10.16.2 rsa==4.7.2 s3transfer==0.4.2 sacremoses==0.0.44 scikit-learn==0.24.1 scipy==1.6.2 seaborn==0.11.1 sentence-transformers==1.0.4 sentencepiece==0.1.91 seqeval==0.0.5 sh==1.14.2 six @ file:///opt/concourse/worker/volumes/live/f983ba11-c9fe-4dff-7ce7-d89b95b09771/volume/six_1605205318156/work sklearn==0.0 slack-bolt==1.11.1 slack-sdk==3.13.0 slackclient==2.9.3 slackeventsapi==3.0.1 smart-open==5.2.1 smmap==5.0.0 snowballstemmer==2.2.0 spacy==3.2.0 spacy-alignments==0.8.4 spacy-legacy==3.0.8 spacy-loggers==1.0.1 spacy-sentence-bert==0.1.2 spacy-transformers==1.1.2 spark-nlp==3.0.2 Sphinx==2.2.2 sphinx-autodoc-typehints==1.10.3 sphinx-rtd-theme==1.0.0 sphinxcontrib-applehelp==1.0.2 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.0 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 srsly==2.4.2 subword-nmt==0.3.8 tensorboard==2.7.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.0 tensorboardX==2.4.1 text-unidecode==1.3 thinc==8.0.13 threadpoolctl==2.1.0 thriftpy2==0.4.14 tokenizers==0.10.2 toml==0.10.2 torch==1.10.1 torchtext==0.11.1 tornado @ file:///opt/concourse/worker/volumes/live/d531d395-893c-4ca1-6a5f-717b318eb08c/volume/tornado_1606942307627/work tqdm==4.62.3 traitlets @ file:///home/ktietz/src/ci/traitlets_1611929699868/work transformers==4.11.0 typer==0.4.0 typing-extensions==3.7.4.3 umap-learn==0.5.1 Unidecode==1.3.2 untokenize==0.1.1 update-checker==0.18.0 uritemplate==3.0.1 urllib3==1.26.7 wasabi==0.8.2 wcwidth @ file:///tmp/build/80754af9/wcwidth_1593447189090/work webexteamsbot==0.1.4.2 webexteamssdk==1.6 websocket-client==0.57.0 websocket-server==0.6.4 Werkzeug==2.0.1 xlrd==2.0.1 xxhash==2.0.2 yarl==1.7.2 zeroshot-topics==0.1.0 zipp @ file:///tmp/build/80754af9/zipp_1604001098328/work

    opened by sdcharle 1
  • Add size to lru_cache

    Add size to lru_cache

    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/__init__.py in <module>()
          1 __version__ = '0.1.0'
          2 
    ----> 3 from .zeroshot_tm import ZeroShotTopicFinder
    
    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/zeroshot_tm.py in <module>()
          1 import attr
          2 from keybert import KeyBERT
    ----> 3 from .utils import load_zeroshot_model
          4 from nltk.corpus import wordnet as wn
          5 
    
    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/utils.py in <module>()
          4 
          5 @lru_cache
    ----> 6 def load_zeroshot_model(model_name="valhalla/distilbart-mnli-12-6"):
          7     classifier = pipeline("zero-shot-classification", model=model_name)
          8     return classifier
    
    /usr/lib/python3.7/functools.py in lru_cache(maxsize, typed)
        488             maxsize = 0
        489     elif maxsize is not None:
    --> 490         raise TypeError('Expected maxsize to be an integer or None')
        491 
        492     def decorating_function(user_function):
    
    TypeError: Expected maxsize to be an integer or None
    

    I assume that you have to provide, maxsize parameter to lru_cache. Worked for me, when I provided the parameter.

    opened by gsasikiran 6
Releases(v.0.0.1)
Owner
Rita Anjana
ML engineer
Rita Anjana
Pretrain CPM - 大规模预训练语言模型的预训练代码

CPM-Pretrain 版本更新记录 为了促进中文自然语言处理研究的发展,本项目提供了大规模预训练语言模型的预训练代码。项目主要基于DeepSpeed、Megatron实现,可以支持数据并行、模型加速、流水并行的代码。 安装 1、首先安装pytorch等基础依赖,再安装APEX以支持fp16。 p

Tsinghua AI 37 Dec 06, 2022
Textlesslib - Library for Textless Spoken Language Processing

textlesslib Textless NLP is an active area of research that aims to extend NLP t

Meta Research 379 Dec 27, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Alexa 98 Dec 09, 2022
:P Some basic stuff I'm gonna use for my upcoming Agile Software Development and Devops

reverse-image-search-py bash script.sh img_name.jpg Requirements pip install requests pip install pyshorteners Dry run [ Sudhanva M 3 Dec 18, 2021

SimCTG - A Contrastive Framework for Neural Text Generation

A Contrastive Framework for Neural Text Generation Authors: Yixuan Su, Tian Lan,

Yixuan Su 345 Jan 03, 2023
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Language Models are Few-shot Multilingual Learners Paper This is the source code of the paper [Arxiv] [ACL Anthology]: This code has been written usin

Genta Indra Winata 45 Nov 21, 2022
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

fluz 11 Nov 16, 2022
FB ID CLONER WUTHOT CHECKPOINT, FACEBOOK ID CLONE FROM FILE

* MY SOCIAL MEDIA : Programming And Memes Want to contact Mr. Error ? CONTACT : [ema

Mr. Error 9 Jun 17, 2021
A script that automatically creates a branch name using google translation api and jira api

About google translation api와 jira api을 사용하여 자동으로 브랜치 이름을 만들어주는 스크립트 Setup 환경변수에 다음 3가지를 등록해야 한다. JIRA_USER : JIRA email (ex: hyunwook.kim 2 Dec 20, 2021

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Negative Sampling for NER Unlabeled entity problem is prevalent in many NER scenarios (e.g., weakly supervised NER). Our paper in ICLR-2021 proposes u

Yangming Li 128 Dec 29, 2022
Fine-tune GPT-3 with a Google Chat conversation history

Google Chat GPT-3 This repo will help you fine-tune GPT-3 with a Google Chat conversation history. The trained model will be able to converse as one o

Nate Baer 7 Dec 10, 2022
COVID-19 Related NLP Papers

COVID-19 outbreak has become a global pandemic. NLP researchers are fighting the epidemic in their own way.

xcfeng 28 Oct 30, 2022
A toolkit for document-level event extraction, containing some SOTA model implementations

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le

84 Dec 15, 2022
Production First and Production Ready End-to-End Keyword Spotting Toolkit

Production First and Production Ready End-to-End Keyword Spotting Toolkit

223 Jan 02, 2023
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 3.6k Jan 02, 2023
Translate - a PyTorch Language Library

NOTE PyTorch Translate is now deprecated, please use fairseq instead. Translate - a PyTorch Language Library Translate is a library for machine transl

775 Dec 24, 2022
Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

Junseong Kim 5.3k Jan 07, 2023
Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-Sentiment-Analysis Twitter sentiment analysis for india's top online retailers(2019 to 2022) Project Overview : Sentiment Analysis helps us to

Balaji R 1 Jan 01, 2022