KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Last update: Dec 13, 2022

Related tags

Overview

KLUE Baseline

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark. See our paper for more details about KLUE and the baselines.

Dependencies

Make sure you have installed the packages listed in requirements.txt.

pip install -r requirements.txt

All expereiments are tested under Python 3.7 environment.

KLUE Benchmark Datasets

All train/dev sets of KLUE tasks are publicly available in this repo. You can access them by using git submodules. To clone the repo with datasets:

git clone --recursive https://github.com/KLUE-benchmark/KLUE-Baseline.git

or just download datasets after cloned this repo:

git submodule update --init --recursive

All test sets are not publicly available. To measure performance of your model on test set, you should first train your model on train set and submit the model to our submission system. Alternatively, you can compare dev set performances with our baseline models. They are also reported in our paper.

Train

To reproduce our baselines, run run_all.sh.

NOTE: klue/roberta models accept input length at most 510 tokens. Details are explained here.

Reference

If you use this code or KLUE, please cite:

@misc{park2021klue,
      title={KLUE: Korean Language Understanding Evaluation}, 
      author={Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Taehwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Younghoon Jeong and Inkwon Lee and Sangwoo Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice Oh and Jung-Woo Ha and Kyunghyun Cho},
      year={2021},
      eprint={2105.09680},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contribution

Feel free to leave issues if there are any questions or comments. To contribute, please run make style before creating pull requests.

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Related tags

Overview

KLUE Baseline

Dependencies

KLUE Benchmark Datasets

Train

Reference

Contribution

Owner

Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries

This repository is home to the Optimus data transformation plugins for various data processing needs.

Generating new names based on trends in data using GPT2 (Transformer network)

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

Search Git commits in natural language

This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

Fidibo.com comments Sentiment Analyser

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

ConvBERT-Prod

🦆 Contextually-keyed word vectors

Checking spelling of form elements

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Sequence-to-Sequence Framework in PyTorch

Finally, some decent sample sentences

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.