The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Last update: Jan 28, 2022

Related tags

Text Data & NLP information_retrieval

Overview

Main Idea

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Setup

Download trained models

There are two models trained for spanish, a bi-encoder and a cross-encoder. These serve to make the retrieval system using the retrieve and rerank idea:

make setup
pip install -r requirements.txt

Basic usage

Setup Elasticsearch index with semantic vectors. For this step we supose that a set of json files is folder. Each json can contain several optional fields but need to contain id and text fiedlds.

from information_retrieval import SemanticEmbedder, CrossEncoder, Prepare, Search

data_folder = 'data/'
text_field = "texto_parrafo"
id_field = "id_parrafo"
elastic_index_name = "sentencias_2.0"

# Read the files, compute embeddings and upload them to elasticsearch
P = Prepare(data_folder, text_field, id_field, elastic_index_name)
P.prepare()

Make queries to retrieve documents:

from information_retrieval import SearchEngine

query = "la vida es bella"
S = SearchEngine(elastic_index_name)
S.retrieve(query) # Only semantic search

S.rerank(query) # Retrieve and rerank

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Related tags

Overview

Main Idea

Setup

Download trained models

Basic usage

Model architecture

Training

Finetuning

Owner

Sergio Arnaud Gomez

CorNet Correlation Networks for Extreme Multi-label Text Classification

Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

SummerTime - Text Summarization Toolkit for Non-experts

[KBS] Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

Sapiens is a human antibody language model based on BERT.

Predict the spans of toxic posts that were responsible for the toxic label of the posts

This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

Code repository for "It's About Time: Analog clock Reading in the Wild"

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

A Transformer Implementation that is easy to understand and customizable.

Code for paper "Role-oriented Network Embedding Based on Adversarial Learning between Higher-order and Local Features"

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

Large-scale Knowledge Graph Construction with Prompting

NLP tool to extract emotional phrase from tweets 🤩

Python wrapper for Stanford CoreNLP tools v3.4.1

The swas programming language

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation