Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Last update: Oct 18, 2022

Related tags

Overview

The KLEJ Benchmark Baselines

The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language understanding.

This repository contains example scripts to easily fine-tune models from the transformers library on the KLEJ benchmark.

Installation

Install the Python package using the following commands:

$ git clone https://github.com/allegro/klejbenchmark-baselines
$ pip install klejbenchmark-baselines/

Quick Start

To fine-tune your model on KLEJ tasks using the default settings, you can use the provided example scripts.

First, download the KLEJ benchmark datasets:

$ bash scripts/download_klej.sh

After downloading KLEJ, customize training parameters inside the scripts/run_training.sh script and train the models using:

$ bash scripts/run_training.sh

It will create:

Tensorboard logs with training and validation metrics,
checkpoints of the best models,
a zip file with predictions for the test sets, which is a valid submission for the KLEJ benchmark.

The zip file can be submitted at the klejbenchmark.com website for the evaluation on the test sets.

Custom Training

It's also possible to train each model separately and customize the training parameters using the klejbenchmark_baselines/main.py script.

License

Apache 2 License

Citation

If you use this code, please cite the following paper:

@inproceedings{rybak-etal-2020-klej,
    title = "{KLEJ}: Comprehensive Benchmark for Polish Language Understanding",
    author = "Rybak, Piotr and Mroczkowski, Robert and Tracz, Janusz and Gawlik, Ireneusz",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.111",
    pages = "1191--1201",
}

Authors

This code was created by the Allegro Machine Learning Research team.

You can contact us at: [email protected]

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Related tags

Overview

The KLEJ Benchmark Baselines

Installation

Quick Start

Custom Training

License

Citation

Authors

Owner

Allegro Tech

gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

Blender addon - Scrub timeline from viewport with a shortcut

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

BERT, LDA, and TFIDF based keyword extraction in Python

A highly sophisticated sequence-to-sequence model for code generation

An open-source NLP research library, built on PyTorch.

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

A versatile token stream for handwritten parsers.

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Generating Korean Slogans with phonetic and structural repetition

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

Simple Speech to Text, Text to Speech

DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

Modified GPT using average pooling to reduce the softmax attention memory constraints.

Google AI 2018 BERT pytorch implementation

Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

本插件是pcrjjc插件的重置版，可以独立于后端api运行

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).