초성 해석기 based on ko-BART

Overview

초성 해석기

개요

한국어 초성만으로 이루어진 문장을 입력하면, 완성된 문장을 예측하는 초성 해석기입니다.

초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ
예측 문장: 나는 너를 좋아해

모델

모델은 SKT-AI에서 공개한 Ko-BART를 이용합니다.

데이터

문장 단위로 이루어진 아무 코퍼스나 사용가능합니다. 단, 모델의 추론 성능은 데이터의 도메인이나 데이터의 양에 크게 의존하기 때문에 원하는 모델 성능에 맞는 코퍼스를 사용해주세요. ./data 디렉토리에 더미 데이터셋을 추가해두었으니, 더미 데이터셋과 동일한 형식의 코퍼스를 준비해두시면 됩니다.

학습

python run_train.py

추론

python run_inference.py --finetuned-model-path $FINETUNED_MODEL_PATH

예시

  • 공개된 코퍼스로 학습한 모델의 추론 결과입니다.
초성: ㅂㄱㅍㄷ 	 예측 문장: 배고픈데
초성: ㅂㄱㅍㄷ 	 예측 문장: 배고프다
초성: ㅂㄱㅍㄷ 	 예측 문장: 배고프대

초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 너무너무 사랑해요
초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 너무너무 사랑했어
초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 나만너무 사랑해요

초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 나는 너를 좋아해
초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 누나 나랑 좋아해
초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 너는 나를 좋아해

Notes

  • 본 레포는 별도의 학습 데이터를 포함하고 있지 않습니다.
  • 본 레포의 라이센스는 Ko-BARTmodified-MIT 라이센스를 따릅니다.

Todo

  • 테스트 코드 추가
Owner
Dawoon Jung
Dawoon Jung
Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

NLP learning Trying to learn NLP to use in my projects! Table of Contents About The Project Built With Getting Started Requirements Run Usage License

Faraz Farangizadeh 3 Aug 25, 2022
Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

The KLEJ Benchmark Baselines The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language und

Allegro Tech 17 Oct 18, 2022
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

475 Jan 04, 2023
glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

Rhasspy 8 Dec 25, 2022
Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-Sentiment-Analysis Twitter sentiment analysis for india's top online retailers(2019 to 2022) Project Overview : Sentiment Analysis helps us to

Balaji R 1 Jan 01, 2022
Meta learning algorithms to train cross-lingual NLI (multi-task) models

Meta learning algorithms to train cross-lingual NLI (multi-task) models

M.Hassan Mojab 4 Nov 20, 2022
Refactored version of FastSpeech2

Refactored version of FastSpeech2. An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

ILJI CHOI 10 May 26, 2022
[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

LM-Critic: Language Models for Unsupervised Grammatical Error Correction This repo provides the source code & data of our paper: LM-Critic: Language M

Michihiro Yasunaga 98 Nov 24, 2022
Translate U is capable of translating the text present in an image from one language to the other.

Translate U is capable of translating the text present in an image from one language to the other. The app uses OCR and Google translate to identify and translate across 80+ languages.

Neelanjan Manna 1 Dec 22, 2021
This repository contains examples of Task-Informed Meta-Learning

Task-Informed Meta-Learning This repository contains examples of Task-Informed Meta-Learning (paper). We consider two tasks: Crop Type Classification

10 Dec 19, 2022
LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language ⚖️ The library of Natural Language Processing for Brazilian legal lang

Felipe Maia Polo 125 Dec 20, 2022
2021搜狐校园文本匹配算法大赛baseline

sohu2021-baseline 2021搜狐校园文本匹配算法大赛baseline 简介 分享了一个搜狐文本匹配的baseline,主要是通过条件LayerNorm来增加模型的多样性,以实现同一模型处理不同类型的数据、形成不同输出的目的。 线下验证集F1约0.74,线上测试集F1约0.73。

苏剑林(Jianlin Su) 45 Sep 06, 2022
GooAQ 🥑 : Google Answers to Google Questions!

This repository contains the code/data accompanying our recent work on long-form question answering.

AI2 112 Nov 06, 2022
Tool to check whether a GCP bucket is public or not.

Tool to check publicly accessible GCP bucket. Blog https://justm0rph3u5.medium.com/gcp-inspector-auditing-publicly-exposed-gcp-bucket-ac6cad55618c Wha

DIVYANSHU SHUKLA 7 Nov 24, 2022
What are the best Systems? New Perspectives on NLP Benchmarking

What are the best Systems? New Perspectives on NLP Benchmarking In Machine Learning, a benchmark refers to an ensemble of datasets associated with one

Pierre Colombo 12 Nov 03, 2022
translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021
Code for hyperboloid embeddings for knowledge graph entities

Implementation for the papers: Self-Supervised Hyperboloid Representations from Logical Queries over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao,

30 Dec 10, 2022
Blue Brain text mining toolbox for semantic search and structured information extraction

Blue Brain Search Source Code DOI Data & Models DOI Documentation Latest Release Python Versions License Build Status Static Typing Code Style Securit

The Blue Brain Project 29 Dec 01, 2022
UniSpeech - Large Scale Self-Supervised Learning for Speech

UniSpeech The family of UniSpeech: WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing UniSpeech (ICML 202

Microsoft 281 Dec 15, 2022