Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Last update: Jan 03, 2023

Overview

Chinese NER using Bert

BERT for Chinese NER.

dataset list

cner: datasets/cner
CLUENER: https://github.com/CLUEbenchmark/CLUENER

model list

BERT+Softmax
BERT+CRF
BERT+Span

requirement

1.1.0 =< PyTorch < 1.5.0
cuda=9.0
python3.6+

input format

Input format (prefer BIOS tag scheme), with each character its label for one line. Sentences are splited with a null line.

美	B-LOC
国	I-LOC
的	O
华	B-PER
莱	I-PER
士	I-PER

我	O
跟	O
他	O

run the code

Modify the configuration information in run_ner_xxx.py or run_ner_xxx.sh .
sh scripts/run_ner_xxx.sh

note: file structure of the model

├── prev_trained_model
|  └── bert_base
|  |  └── pytorch_model.bin
|  |  └── config.json
|  |  └── vocab.txt
|  |  └── ......

CLUENER result

The overall performance of BERT on dev:

	Accuracy (entity)	Recall (entity)	F1 score (entity)
BERT+Softmax	0.7897	0.8031	0.7963
BERT+CRF	0.7977	0.8177	0.8076
BERT+Span	0.8132	0.8092	0.8112
BERT+Span+adv	0.8267	0.8073	0.8169
BERT-small(6 layers)+Span+kd	0.8241	0.7839	0.8051
BERT+Span+focal_loss	0.8121	0.8008	0.8064
BERT+Span+label_smoothing	0.8235	0.7946	0.8088

ALBERT for CLUENER

The overall performance of ALBERT on dev:

model	version	Accuracy(entity)	Recall(entity)	F1(entity)	Train time/epoch
albert	base_google	0.8014	0.6908	0.7420	0.75x
albert	large_google	0.8024	0.7520	0.7763	2.1x
albert	xlarge_google	0.8286	0.7773	0.8021	6.7x
bert	google	0.8118	0.8031	0.8074	-----
albert	base_bright	0.8068	0.7529	0.7789	0.75x
albert	large_bright	0.8152	0.7480	0.7802	2.2x
albert	xlarge_bright	0.8222	0.7692	0.7948	7.3x

Cner result

The overall performance of BERT on dev(test):

	Accuracy (entity)	Recall (entity)	F1 score (entity)
BERT+Softmax	0.9586(0.9566)	0.9644(0.9613)	0.9615(0.9590)
BERT+CRF	0.9562(0.9539)	0.9671(0.9644)	0.9616(0.9591)
BERT+Span	0.9604(0.9620)	0.9617(0.9632)	0.9611(0.9626)
BERT+Span+focal_loss	0.9516(0.9569)	0.9644(0.9681)	0.9580(0.9625)
BERT+Span+label_smoothing	0.9566(0.9568)	0.9624(0.9656)	0.9595(0.9612)

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Related tags

Overview

Chinese NER using Bert

dataset list

model list

requirement

input format

run the code

CLUENER result

ALBERT for CLUENER

Cner result

Owner

Weitang Liu

precise iris segmentation

Making text a first-class citizen in TensorFlow.

A single model that parses Universal Dependencies across 75 languages.

BERT, LDA, and TFIDF based keyword extraction in Python

NLP tool to extract emotional phrase from tweets 🤩

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

OCR을 이용하여 인원수를 인식 후 줌을 Kill 해줍니다

A python package for deep multilingual punctuation prediction.

A tool helps build a talk preview image by combining the given background image and talk event description

DeepPavlov Tutorials

Mapping a variable-length sentence to a fixed-length vector using BERT model

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Script to download some free japanese lessons in portuguse from NHK

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

Toward Model Interpretability in Medical NLP

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

NLP applications using deep learning.

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks