Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Overview

Welcome to Healthsea

Create better access to health with spaCy.

Healthsea is a pipeline for analyzing user reviews to supplement products by extracting their effects on health.

Learn more about Healthsea in our blog post!

💉 Creating better access to health

Healthsea aims to analyze user-written reviews of supplements in relation to their effects on health. Based on this analysis, we try to provide product recommendations. For many people, supplements are an addition to maintaining health and achieving personal goals. Due to their rising popularity, consumers have increasing access to a variety of products.

However, it's likely that most of the products on the market are redundant or produced in a "quantity over quality" fashion to maximize profit. The resulting white noise of products makes it hard to find the right supplements.

Healthsea automizes the analysis and provides information in a more digestible way.


🟢 Requirements

To run this project you need:

spacy>=3.2.0
benepar>=0.2.0
torch>=1.6.0
spacy-transformers>=1.1.2

You can install them in the project folder via spacy project run install

📖 Documentation

Documentation
🧭 Usage How to use the pipeline
⚙️ Pipeline Learn more about the architecture of the pipeline
🪐 spaCy project Introduction to the spaCy project
Demos Introduction to the Healthsea demos

🧭 Usage

The pipeline processes reviews to supplements and returns health effects for every found health aspect.

You can either train the pipeline yourself with the provided datasets in the spaCy project or directly download the trained Healthsea pipeline from Huggingface via pip install https://huggingface.co/explosion/en_healthsea/resolve/main/en_healthsea-any-py3-none-any.whl

import spacy

nlp = spacy.load("en_healthsea")
doc = nlp("This is great for joint pain.")

# Clause Segmentation & Blinding
print(doc._.clauses)

>    {"split_indices": [0, 7],
>    "has_ent": true,
>    "ent_indices": [4, 6],
>    "blinder": "_CONDITION_",
>    "ent_name": "joint pain",
>    "cats": {
>        "POSITIVE": 0.9824668169021606,
>        "NEUTRAL": 0.017364952713251114,
>        "NEGATIVE": 0.00002889777533710003,
>        "ANAMNESIS": 0.0001394189748680219
>    },
>    "prediction_text": ["This", "is", "great", "for", "_CONDITION_", "!"]}

# Aggregated results
print(doc._.health_effects)

>    {"joint_pain": {
>        "effects": ["POSITIVE"],
>        "effect": "POSITIVE",
>        "label": "CONDITION",
>        "text": "joint pain"
>    }}


⚙️ Pipeline

The pipeline consists of the following components:

pipeline = [sentencizer, tok2vec, ner, benepar, segmentation, clausecat, aggregation]

It uses Named Entity Recognition to detect two types of entities Condition and Benefit.

Condition entities are defined as health aspects that are improved by decreasing them. They include diseases, symptoms and general health problems (e.g. pain in back). Benefit entities on the other hand, are desired states of health (muscle recovery, glowing skin) that improve by increasing them.

The pipeline uses a modified model that performs Clause Segmentation based on the benepar parser, Entity Blinding and Text Classification. It predicts four exclusive effects: Positive, Negative, Neutral, and Anamnesis.


🪐 spaCy project

The project folder contains a spaCy project with all the training data and workflows.

Use spacy project run inside the project folder to get an overview of all commands and assets. For more detailed documentation, visit the project folders readme.

Use spacy project run install to install dependencies needed for the pipeline.

Demo

Healthsea Demo

A demo for exploring the results of Healthsea on real data can be found at Hugging Face Spaces.

Healthsea Pipeline

A demo for exploring the Healthsea pipeline with its individual processing steps can be found at Hugging Face Spaces.

Owner
Explosion
A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing
Explosion
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti

Sidharth Pal 20 Dec 14, 2022
Pipeline for training LSA models using Scikit-Learn.

Latent Semantic Analysis Pipeline for training LSA models using Scikit-Learn. Usage Instead of writing custom code for latent semantic analysis, you j

Dani El-Ayyass 23 Sep 05, 2022
DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy: A SpaCy NLP Pipeline for Danish DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Ar

Kenneth Enevoldsen 71 Jan 06, 2023
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

Microsoft 7.8k Jan 09, 2023
this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

1 Nov 02, 2021
A Streamlit web app that generates Rick and Morty stories using GPT2.

Rick and Morty Story Generator This project uses a pre-trained GPT2 model, which was fine-tuned on Rick and Morty transcripts, to generate new stories

₸ornike 33 Oct 13, 2022
The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

Prakhar Mishra 28 May 25, 2021
OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters - where the final result looks like waves in the ocean.

Contains descriptions and code of the mini-projects developed in various programming languages

TexttoSpeechAndLanguageTranslator-project introduction A pleasant application where the client will be given buttons like play,reset and exit. The cli

Adarsh Reddy 1 Dec 22, 2021
Question answering app is used to answer for a user given question from user given text.

Question answering app is used to answer for a user given question from user given text.It is created using HuggingFace's transformer pipeline and streamlit python packages.

Siva Prakash 3 Apr 05, 2022
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr

Explosion 222 Dec 16, 2022
A retro text-to-speech bot for Discord

hawking A retro text-to-speech bot for Discord, designed to work with all of the stuff you might've seen in Moonbase Alpha, using the existing command

Nick Schorr 23 Dec 25, 2022
Uses Google's gTTS module to easily create robo text readin' on command.

Tool to convert text to speech, creating files for later use. TTRS uses Google's gTTS module to easily create robo text readin' on command.

0 Jun 20, 2021
A programming language with logic of Python, and syntax of all languages.

Pytov The idea was to take all well known syntaxes, and combine them into one programming language with many posabilities. Installation Install using

Yuval Rosen 14 Dec 07, 2022
Simple, hackable offline speech to text - using the VOSK-API.

Simple, hackable offline speech to text - using the VOSK-API.

Campbell Barton 844 Jan 07, 2023
The code from the whylogs workshop in DataTalks.Club on 29 March 2022

whylogs Workshop The code from the whylogs workshop in DataTalks.Club on 29 March 2022 whylogs - The open source standard for data logging (Don't forg

DataTalksClub 12 Sep 05, 2022
Natural Language Processing with transformers

we want to create a repo to illustrate usage of transformers in chinese

Datawhale 763 Dec 27, 2022
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai

Amazon Web Services - Labs 124 Jan 03, 2023
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

Yiming Wang 919 Jan 03, 2023
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context This Repository contains the code on AVA of our ACM MM 2021 paper: LSTC: Boosting

Tencent YouTu Research 9 Oct 11, 2022