APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Last update: Dec 06, 2022

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of the dataset are created by anonymous participants using an online crowdsourcing platform DeepNatural AI.

Sample Code :

Download

You can download benchmark set APEACH. APEACH/test.csv in this repository.

Dataset Description

APEACH : A hate-speech evaluation dataset generated in 2021, using generation method followd by APEACH paper.

Guidelines

APEACH-GUIDELINE

Topics

Lengths

Paper

https://arxiv.org/pdf/2202.12459.pdf

Experiment Code

Experiment Results

Name	Beep! Dev Dataset	Apeach (Ours)
SoongsilBERT-Base	0.8261	0.8424
SoongsilBERT-Small	0.8149	0.8228
KcBERT-base	0.8088	0.8086
KcBERT-large	0.8295	0.8116
DistillKoBERT	0.7570	0.7715
KoELECTRA-V3	0.7920	0.8101
KoBERT	0.8030	0.7885

We also share BEST model of our dataset which we trained in this experiment as checkpoint, demo webite and api.

Citation

@article{yang2022apeach,
  title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets},
  author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik},
  journal={arXiv preprint arXiv:2202.12459},
  year={2022}
}

Contributors

The main contributors of the work ( * : equal contribution) :

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

Download

Dataset Description

Guidelines

Topics

Lengths

Paper

Experiment Code

Experiment Results

Citation

Contributors

License

Owner

Kevin-Yang

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Pipeline for training LSA models using Scikit-Learn.

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Segmenter - Transformer for Semantic Segmentation

DataCLUE: 国内首个以数据为中心的AI测评（含模型分析报告）

Repository of the Code to Chatbots, developed in Python

Build Text Rerankers with Deep Language Models

jiant is an NLP toolkit

Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

Pytorch-Named-Entity-Recognition-with-BERT

SDL: Synthetic Document Layout dataset

This repo contains simple to use, pretrained/training-less models for speaker diarization.

Findings of ACL 2021

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

A deep learning-based translation library built on Huggingface transformers

Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch