Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Overview

Deep Text Search - AI Based Text Search & Recommendation System

Brain+Machine

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Generic badge Generic badge Generic badge Generic badge Generic badge Downloads

Brain+Machine Creators

Nilesh Verma

Features

  • Faster Search.
  • High Accurate Text Recommendation and Search Output Result.
  • Best for Implementing on python based web application or APIs.
  • Best implementation for College students and freshers for project creation.
  • Applications are Text-based News, Social media post, E-commerce Product recommendation and other text-based platforms that want to implement text recommendation and search.

Installation

This library is compatible with both windows and Linux system you can just use PIP command to install this library on your system:

pip install DeepTextSearch

How To Use?

We have provided the Demo folder under the GitHub repository, you can find the example in both .py and .ipynb file. Following are the ideal flow of the code:

1. Importing the Important Classes

There are three important classes you need to load LoadData - for data loading, TextEmbedder - for embedding the text to data, TextSearch - For searching the text.

# Importing the proper classes
from DeepTextSearch import LoadData,TextEmbedder,TextSearch

2. Loading the Texts Data

For loading the Texts data we need to use the LoadData object, from there we can import text data as python list object from the CSV/Text file.

# Load data from CSV file
data = LoadData().from_csv("../your_file_name.csv")
# Load data from Text file
data = LoadData().from_text("../your_file_name.txt")

3. Embedding and Saving The File in Local Folder

For Embedding we are using state of the art multilingual Sentence Transformer Embedding, We also store the information of the Embedding for further use on the local path [embedding-data/] folder.

You can also use the load embedding() method in a TextEmbedder() class to load saved embedding data.

# To use Serching, we must first embed data. After that, we must save all of the data on the local path.
TextEmbedder().embed(corpus_list=data)

# Loading Embedding data
corpus_embedding = TextEmbedder().load_embedding()

3. Searching

We compare Cosian Similarity for searching and recommending, and then the corpus is sorted according to the similarity score:

# You must include the query text and the quantity of comparable texts you want to search for.
TextSearch().find_similar(query_text="What are the key features of Node.js?",top_n=10)

Complete Code

# Importing the proper classes
from DeepTextSearch import LoadData,TextEmbedder,TextSearch
# Load data from CSV file
data = LoadData().from_csv("../your_file_name.csv")
# To use Serching, we must first embed data. After that, we must save all of the data on the local path
TextEmbedder().embed(corpus_list=data)
# You must include the query text and the quantity of comparable texts you want to search for
TextSearch().find_similar(query_text="What are the key features of Node.js?",top_n=10)

License

MIT License

Copyright (c) 2021 Nilesh Verma

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Please do STAR the repository, if it helped you in anyway.

More cool features will be added in future. Feel free to give suggestions, report bugs and contribute.

You might also like...
On-device speech-to-index engine powered by deep learning.

On-device speech-to-index engine powered by deep learning.

State of the Art Neural Networks for Deep Learning

pyradox This python library helps you with implementing various state of the art neural networks in a totally customizable fashion using Tensorflow 2

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Here is deepparse. Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning. Use deepparse to Use the pr

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time. Recommendationsystem - Movie-recommendation - matrixfactorization colloborative filtering recommendation system user
Recommendationsystem - Movie-recommendation - matrixfactorization colloborative filtering recommendation system user

recommendationsystem matrixfactorization colloborative filtering recommendation

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.
Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Summary Explorer Summary Explorer is a tool to visually inspect the summaries from several state-of-the-art neural summarization models across multipl

Releases(v_03)
Owner
Data Science Enthusiast & Digital Influencer
Open-source Monocular Python HawkEye for Tennis

Tennis Tracking 🎾 Objectives Track the ball Detect court lines Detect the players To track the ball we used TrackNet - deep learning network for trac

ArtLabs 188 Jan 08, 2023
Official Implementation of "Transformers Can Do Bayesian Inference"

Official Code for the Paper "Transformers Can Do Bayesian Inference" We train Transformers to do Bayesian Prediction on novel datasets for a large var

AutoML-Freiburg-Hannover 103 Dec 25, 2022
[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

BNN - BN = ? Training Binary Neural Networks without Batch Normalization Codes for this paper BNN - BN = ? Training Binary Neural Networks without Bat

VITA 40 Dec 30, 2022
A PyTorch-based library for semi-supervised learning

News If you want to join TorchSSL team, please e-mail Yidong Wang ([email protected]<

1k Jan 06, 2023
The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

ISC21-Descriptor-Track-1st The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track. You can check our solution

lyakaap 73 Dec 24, 2022
Cross-Modal Contrastive Learning for Text-to-Image Generation

Cross-Modal Contrastive Learning for Text-to-Image Generation This repository hosts the open source JAX implementation of XMC-GAN. Setup instructions

Google Research 94 Nov 12, 2022
Pydantic models for pywttr and aiopywttr.

Pydantic models for pywttr and aiopywttr.

Almaz 2 Dec 08, 2022
Social Network Ads Prediction

Social network advertising, also social media targeting, is a group of terms that are used to describe forms of online advertising that focus on social networking services.

Khazar 2 Jan 28, 2022
Complete U-net Implementation with keras

U Net Lowered with Keras Complete U-net Implementation with keras Original Paper Link : https://arxiv.org/abs/1505.04597 Special Implementations : The

Sagnik Roy 14 Oct 10, 2022
A graphical Semi-automatic annotation tool based on labelImg and Yolov5

💕YOLOV5 semi-automatic annotation tool (Based on labelImg)

EricFang 247 Jan 05, 2023
Virtual Dance Reality Stage is a feature that offers you to share a stage with another user virtually.

Virtual Dance Reality Stage is a feature that offers you to share a stage with another user virtually. It uses the concept of Image Background Removal using DeepLab Architecture (based on Semantic Se

Devashi Choudhary 5 Aug 24, 2022
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

The Hypersim Dataset For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real i

Apple 1.3k Jan 04, 2023
ICCV2021: Code for 'Spatial Uncertainty-Aware Semi-Supervised Crowd Counting'

ICCV2021: Code for 'Spatial Uncertainty-Aware Semi-Supervised Crowd Counting'

Yanda Meng 14 May 13, 2022
BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization

BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization Authors: Wojciech Kryściński, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong,

Salesforce 125 Dec 31, 2022
Convert BART models to ONNX with quantization. 3X reduction in size, and upto 3X boost in inference speed

fast-Bart Reduction of BART model size by 3X, and boost in inference speed up to 3X BART implementation of the fastT5 library (https://github.com/Ki6a

Siddharth Sharma 19 Dec 09, 2022
masscan + nmap + Finger

说明 个人根据使用习惯修改masnmap而来的一个小工具。调用masscan做全端口扫描,再调用nmap做服务识别,最后调用Finger做Web指纹识别。工具使用场景适合风险探测排查、众测等。 使用方法 安装依赖 pip3 install -r requirements.txt -i https:/

Ryan 3 Mar 25, 2022
Pytorch implementation of DeepMind's differentiable neural computer paper.

DNC pytorch This is a Pytorch implementation of DeepMind's Differentiable Neural Computer (DNC) architecture introduced in their recent Nature paper:

Yuanpu Xie 91 Nov 21, 2022
Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Text Summarization WCN — Weighted Contextual N-gram method for evaluation of Text Summarization In this project, I fine tune T5 model on Extreme Summa

Aditya Shah 1 Jan 03, 2022
Flaxformer: transformer architectures in JAX/Flax

Flaxformer is a transformer library for primarily NLP and multimodal research at Google.

Google 116 Jan 05, 2023
Create UIs for prototyping your machine learning model in 3 minutes

Note: We just launched Hosted, where anyone can upload their interface for permanent hosting. Check it out! Welcome to Gradio Quickly create customiza

Gradio 11.7k Jan 07, 2023