Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

Related tags

Text Data & NLPpiqn
Overview

README

Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model and experiments, please see our paper.

Setup

Requirements

conda create --name acl python=3.8
conda activate acl
pip install -r requirements.txt

Datasets

Nested NER:

Flat NER:

Data format:

{
    "tokens": ["Others", ",", "though", ",", "are", "novices", "."], 
    "entities": [{"type": "PER", "start": 0, "end": 1}, {"type": "PER", "start": 5, "end": 6}], "relations": [], "org_id": "CNN_IP_20030328.1600.07", 
    "ltokens": ["WOODRUFF", "We", "know", "that", "some", "of", "the", "American", "troops", "now", "fighting", "in", "Iraq", "are", "longtime", "veterans", "of", "warfare", ",", "probably", "not", "most", ",", "but", "some", ".", "Their", "military", "service", "goes", "back", "to", "the", "Vietnam", "era", "."], 
    "rtokens": ["So", "what", "is", "it", "like", "for", "them", "to", "face", "combat", "far", "from", "home", "?", "For", "an", "idea", ",", "here", "is", "CNN", "'s", "Candy", "Crowley", "with", "some", "war", "stories", "."]
}

The ltokens contains the tokens from the previous sentence. And The rtokens contains the tokens from the next sentence.

Due to the license, we cannot directly release our preprocessed datasets of ACE04, ACE05, KBP17, NNE and OntoNotes. We only release the preprocessed GENIA, FewNERD, MSRA and CoNLL03 datasets. Download them from here.

If you need other datasets, please contact me ([email protected]) by email. Note that you need to state your identity and prove that you have obtained the license.

Example

Train

python piqn.py train --config configs/nested.conf

Note: You should edit this line in config_reader.py according to the actual number of GPUs.

Evaluation

You can download our checkpoints on ACE04 and ACE05, or train your own model and then evaluate the model. Because of the limited space of Google Cloud Drive, we share the other models in Baidu Cloud Drive, please download at this link (code: js9z).

python identifier.py eval --config configs/batch_eval.conf

If you use the checkpoints (ACE05 and ACE04) we provided, you will get the following results:

  • ACE05:
2022-03-30 12:56:52,447 [MainThread  ] [INFO ]  --- NER ---
2022-03-30 12:56:52,447 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   PER        88.07        92.92        90.43         1724
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   LOC        63.93        73.58        68.42           53
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   WEA        86.27        88.00        87.13           50
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   GPE        87.22        87.65        87.44          405
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   ORG        85.74        81.64        83.64          523
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   VEH        83.87        77.23        80.41          101
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   FAC        75.54        77.21        76.36          136
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                 micro        86.38        88.57        87.46         2992
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                 macro        81.52        82.61        81.98         2992
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  --- NER on Localization ---
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                Entity        90.58        92.91        91.73         2991
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                 micro        90.58        92.91        91.73         2991
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                 macro        90.58        92.91        91.73         2991
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  --- NER on Classification ---
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   PER        97.09        92.92        94.96         1724
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   LOC        76.47        73.58        75.00           53
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   WEA        95.65        88.00        91.67           50
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   GPE        92.93        87.65        90.22          405
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   ORG        93.85        81.64        87.32          523
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   VEH       100.00        77.23        87.15          101
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   FAC        89.74        77.21        83.00          136
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                 micro        95.36        88.57        91.84         2992
2022-03-30 12:56:52,517 [MainThread  ] [INFO ]                 macro        92.25        82.61        87.05         2992
  • ACE04
2021-11-15 22:06:50,896 [MainThread  ] [INFO ]  --- NER ---
2021-11-15 22:06:50,896 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   VEH        88.89        94.12        91.43           17
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   WEA        74.07        62.50        67.80           32
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   GPE        89.11        87.62        88.36          719
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   ORG        85.06        84.60        84.83          552
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   FAC        83.15        66.07        73.63          112
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   PER        91.09        92.12        91.60         1498
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]                   LOC        72.90        74.29        73.58          105
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]                 micro        88.48        87.81        88.14         3035
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]                 macro        83.47        80.19        81.61         3035
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  --- NER on Localization ---
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                Entity        92.56        91.89        92.23         3034
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                 micro        92.56        91.89        92.23         3034
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                 macro        92.56        91.89        92.23         3034
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]  --- NER on Classification ---
2021-11-15 22:06:50,955 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   VEH        94.12        94.12        94.12           17
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   WEA        95.24        62.50        75.47           32
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   GPE        95.60        87.62        91.44          719
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   ORG        93.59        84.60        88.87          552
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   FAC        93.67        66.07        77.49          112
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   PER        97.11        92.12        94.55         1498
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   LOC        84.78        74.29        79.19          105
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                 micro        95.59        87.81        91.53         3035
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                 macro        93.44        80.19        85.87         3035

Citation

If you have any questions related to the code or the paper, feel free to email [email protected].

@inproceedings{shen-etal-2022-piqn,
    title = "Parallel Instance Query Network for Named Entity Recognition",
    author = "Shen, Yongliang  and
      Wang, Xiaobin  and
      Tan, Zeqi  and
      Xu, Guangwei  and
      Xie, Pengjun  and
      Huang, Fei and
      Lu, Weiming and
      Zhuang, Yueting",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2203.10545",
}
Owner
Yongliang Shen
Knowledge is power.
Yongliang Shen
Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

CodeFill This repository contains the code for our paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Namin

Software Analytics Lab 11 Oct 31, 2022
Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Chenhe Dong 28 Nov 10, 2022
Ray-based parallel data preprocessing for NLP and ML.

Wrangl Ray-based parallel data preprocessing for NLP and ML. pip install wrangl # for latest pip install git+https://github.com/vzhong/wrangl See exa

Victor Zhong 33 Dec 27, 2022
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.

117 Jan 07, 2023
Dope Wars game engine on StarkNet L2 roll-up

RYO Dope Wars game engine on StarkNet L2 roll-up. What TI-83 drug wars built as smart contract system. Background mechanism design notion here. Initia

104 Dec 04, 2022
Translation to python of Chris Sims' optimization function

pycsminwel This is a locol minimization algorithm. Uses a quasi-Newton method with BFGS update of the estimated inverse hessian. It is robust against

Gustavo Amarante 1 Mar 21, 2022
Anomaly Detection 이상치 탐지 전처리 모듈

Anomaly Detection 시계열 데이터에 대한 이상치 탐지 1. Kernel Density Estimation을 활용한 이상치 탐지 train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와

CLUST-consortium 43 Nov 28, 2022
Summarization module based on KoBART

KoBART-summarization Install KoBART pip install git+https://github.com/SKT-AI/KoBART#egg=kobart Requirements pytorch==1.7.0 transformers==4.0.0 pytor

seujung hwan, Jung 148 Dec 28, 2022
FewCLUE: 为中文NLP定制的小样本学习测评基准

FewCLUE: 为中文NLP定制的小样本学习测评基准

CLUE benchmark 387 Jan 04, 2023
原神抽卡记录数据集-Genshin Impact gacha data

提要 持续收集原神抽卡记录中 可以使用抽卡记录导出工具导出抽卡记录的json,将json文件发送至[email protected],我会在清除个人信息后

117 Dec 27, 2022
A PyTorch-based model pruning toolkit for pre-trained language models

English | 中文说明 TextPruner是一个为预训练语言模型设计的模型裁剪工具包,通过轻量、快速的裁剪方法对模型进行结构化剪枝,从而实现压缩模型体积、提升模型速度。 其他相关资源: 知识蒸馏工具TextBrewer:https://github.com/airaria/TextBrewe

Ziqing Yang 231 Jan 08, 2023
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
neural network based speaker embedder

Content What is deepaudio-speaker? Installation Get Started Model Architecture How to contribute to deepaudio-speaker? Acknowledge What is deepaudio-s

20 Dec 29, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
Translate U is capable of translating the text present in an image from one language to the other.

Translate U is capable of translating the text present in an image from one language to the other. The app uses OCR and Google translate to identify and translate across 80+ languages.

Neelanjan Manna 1 Dec 22, 2021
In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Transformers are all you need In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a

Aymen Berriche 8 Apr 13, 2022
Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

VK.com 847 Dec 19, 2022
Generate vector graphics from a textual caption

VectorAscent: Generate vector graphics from a textual description Example "a painting of an evergreen tree" python text_to_painting.py --prompt "a pai

Ajay Jain 97 Dec 15, 2022
This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

Raihan Ahmed 1 Dec 09, 2021
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

4 Nov 11, 2021