Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Last update: Nov 25, 2022

Related tags

Overview

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Trong bài viết này mình sẽ sử dụng pretrain model SimCSE_Vietnamese để cải thiện elastic search trong bài toán semantic search.

Mọi người có thể xem bài viết hướng dẫn đầy đủ tại đây.

Cài đặt:

git clone https://github.com/vovanphuc/elastic_simCSE.git
cd elastic_simCSE
pip install -r requirements.txt

Đánh index cho toàn bộ data

python3 index_elastic.py

Search keyword và so sánh giữa BM25 (elasticsearch thường) và simCSE:

streamlit run main.py

Kết quả tìm kiếm:

Kết quả khi sử dụng elasticsearch bình thường.

Kết quả khi sử dụng SimCSE_VietNamese.

Contact:

Email: [email protected]

Facebook: facebook.com/vovanphucc

Linkedin: linkedin.com/in/vovanphuc

Owner

Vo Van Phuc

KEEP IT SIMPLE STUPID!

GitHub Repository

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Proquabet Turn your prose into a constant stream of encrypted and meaningless-so

2 Oct 10, 2022

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding This repository contains the official PyTorch implementation of th

26 Dec 14, 2022

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

1.1k Dec 27, 2022

Chinese segmentation library

What is loso? loso is a Chinese segmentation system written in Python. It was developed by Victor Lin ( Fang-Pen Lin 82 Jun 28, 2022

Practical Machine Learning with Python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

2k Jan 08, 2023

基于pytorch_rnn的古诗词生成

pytorch_peot_rnn 基于pytorch_rnn的古诗词生成说明 config.py里面含有训练、测试、预测的参数，更改后运行： python main.py 预测结果 if config.do_predict: result = trainer.generate('丽日照残春')

3 May 26, 2022

Chatbot for the Chatango messaging platform

BroiestBot The baddest bot in the game right now. Uses the ch.py framework for joining Chantango rooms and responding to user messages. Commands If a

3 Jan 17, 2022

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Python_Natural_Language_Processing This repository contains tutorials on important topics related to Natural Language Processing (NPL). No. Name 01 01

170 Dec 13, 2022

Source code of the "Graph-Bert: Only Attention is Needed for Learning Graph Representations" paper

Graph-Bert Source code of "Graph-Bert: Only Attention is Needed for Learning Graph Representations". Please check the script.py as the entry point. We

14 Mar 25, 2022

Sequence-to-Sequence Framework in PyTorch

nmtpytorch allows training of various end-to-end neural architectures including but not limited to neural machine translation, image captioning and au

395 Nov 21, 2022

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

32 Nov 09, 2021

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Related tags

Overview

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Cài đặt:

Đánh index cho toàn bộ data

Search keyword và so sánh giữa BM25 (elasticsearch thường) và simCSE:

Kết quả tìm kiếm:

Contact:

Owner

Vo Van Phuc

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Chinese segmentation library

Practical Machine Learning with Python

基于pytorch_rnn的古诗词生成

Chatbot for the Chatango messaging platform

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Source code of the "Graph-Bert: Only Attention is Needed for Learning Graph Representations" paper

Sequence-to-Sequence Framework in PyTorch

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Scene Text Retrieval via Joint Text Detection and Similarity Learning

Weird Sort-and-Compress Thing

LSTM model - IMDB review sentiment analysis

LewusBot - Twitch ChatBot built in python with twitchio library

test

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

Input english text, then translate it between languages n times using the Deep Translator Python Library.

NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI