Implementation of legal QA system based on SentenceKoBART

Last update: Dec 27, 2022

Related tags

Text Data & NLP LegalQA

Overview

LegalQA using SentenceKoBART

Implementation of legal QA system based on SentenceKoBART

How to train SentenceKoBART
Based on Neural Search Engine Jina
Provide Korean legal QA data(1,830 pairs)

Setup

# install git lfs , https://github.com/git-lfs/git-lfs/wiki/Installation
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git clone https://github.com/haven-jeon/LegalQA.git
cd LegalQA
git lfs pull
pip install -r requirements.txt

Index

python app.py -t index

GPU-based indexing available as an option

pods/encoder.yml - on_gpu: true

Search

With REST API

To start the Jina server for REST API:

python app.py -t query_restful

Then use a client to query:

curl --request POST -d '{"top_k": 1, "mode": "search",  "data": ["상속 관련 문의"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:1234/api/search'

Or use Jinabox with endpoint http://127.0.0.1:1234/api/search

From the terminal

python app.py -t query

Demo

http://ec2-3-36-123-253.ap-northeast-2.compute.amazonaws.com:7874/

Citation

Model training, data crawling, and demo system were all supported by the AWS Hero program.

@misc{heewon2021,
author = {Heewon Jeon},
title = {LegalQA using SentenceKoBART},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/haven-jeon/LegalQA}}

License

QA data data/legalqa.jsonlines is crawled in www.freelawfirm.co.kr based on robots.txt. Commercial use other than academic use is prohibited.
We are not responsible for any legal decisions we make based on the resources provided here.

Implementation of legal QA system based on SentenceKoBART

Related tags

Overview

LegalQA using SentenceKoBART

Setup

Index

Search

With REST API

From the terminal

Demo

Citation

License

Owner

Heewon Jeon(gogamza)

keras implement of transformers for humans

A demo for end-to-end English and Chinese text spotting using ABCNet.

A python framework to transform natural language questions to queries in a database query language.

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

DaCy: The State of the Art Danish NLP pipeline using SpaCy

BiQE: Code and dataset for the BiQE paper

Retraining OpenAI's GPT-2 on Discord Chats

Automatically search Stack Overflow for the command you want to run

Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Natural language Understanding Toolkit

NLP Text Classification

An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

The guide to tackle with the Text Summarization

An attempt to map the areas with active conflict in Ukraine using open source twitter data.

RecipeReduce: Simplified Recipe Processing for Lazy Programmers

Machine learning classifiers to predict American Sign Language .

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Learning to Rewrite for Non-Autoregressive Neural Machine Translation