Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

Related tags

Text Data & NLPpiqn
Overview

README

Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model and experiments, please see our paper.

Setup

Requirements

conda create --name acl python=3.8
conda activate acl
pip install -r requirements.txt

Datasets

Nested NER:

Flat NER:

Data format:

{
    "tokens": ["Others", ",", "though", ",", "are", "novices", "."], 
    "entities": [{"type": "PER", "start": 0, "end": 1}, {"type": "PER", "start": 5, "end": 6}], "relations": [], "org_id": "CNN_IP_20030328.1600.07", 
    "ltokens": ["WOODRUFF", "We", "know", "that", "some", "of", "the", "American", "troops", "now", "fighting", "in", "Iraq", "are", "longtime", "veterans", "of", "warfare", ",", "probably", "not", "most", ",", "but", "some", ".", "Their", "military", "service", "goes", "back", "to", "the", "Vietnam", "era", "."], 
    "rtokens": ["So", "what", "is", "it", "like", "for", "them", "to", "face", "combat", "far", "from", "home", "?", "For", "an", "idea", ",", "here", "is", "CNN", "'s", "Candy", "Crowley", "with", "some", "war", "stories", "."]
}

The ltokens contains the tokens from the previous sentence. And The rtokens contains the tokens from the next sentence.

Due to the license, we cannot directly release our preprocessed datasets of ACE04, ACE05, KBP17, NNE and OntoNotes. We only release the preprocessed GENIA, FewNERD, MSRA and CoNLL03 datasets. Download them from here.

If you need other datasets, please contact me ([email protected]) by email. Note that you need to state your identity and prove that you have obtained the license.

Example

Train

python piqn.py train --config configs/nested.conf

Note: You should edit this line in config_reader.py according to the actual number of GPUs.

Evaluation

You can download our checkpoints on ACE04 and ACE05, or train your own model and then evaluate the model. Because of the limited space of Google Cloud Drive, we share the other models in Baidu Cloud Drive, please download at this link (code: js9z).

python identifier.py eval --config configs/batch_eval.conf

If you use the checkpoints (ACE05 and ACE04) we provided, you will get the following results:

  • ACE05:
2022-03-30 12:56:52,447 [MainThread  ] [INFO ]  --- NER ---
2022-03-30 12:56:52,447 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   PER        88.07        92.92        90.43         1724
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   LOC        63.93        73.58        68.42           53
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   WEA        86.27        88.00        87.13           50
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   GPE        87.22        87.65        87.44          405
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   ORG        85.74        81.64        83.64          523
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   VEH        83.87        77.23        80.41          101
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   FAC        75.54        77.21        76.36          136
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                 micro        86.38        88.57        87.46         2992
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                 macro        81.52        82.61        81.98         2992
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  --- NER on Localization ---
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                Entity        90.58        92.91        91.73         2991
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                 micro        90.58        92.91        91.73         2991
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                 macro        90.58        92.91        91.73         2991
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  --- NER on Classification ---
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   PER        97.09        92.92        94.96         1724
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   LOC        76.47        73.58        75.00           53
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   WEA        95.65        88.00        91.67           50
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   GPE        92.93        87.65        90.22          405
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   ORG        93.85        81.64        87.32          523
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   VEH       100.00        77.23        87.15          101
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   FAC        89.74        77.21        83.00          136
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                 micro        95.36        88.57        91.84         2992
2022-03-30 12:56:52,517 [MainThread  ] [INFO ]                 macro        92.25        82.61        87.05         2992
  • ACE04
2021-11-15 22:06:50,896 [MainThread  ] [INFO ]  --- NER ---
2021-11-15 22:06:50,896 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   VEH        88.89        94.12        91.43           17
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   WEA        74.07        62.50        67.80           32
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   GPE        89.11        87.62        88.36          719
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   ORG        85.06        84.60        84.83          552
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   FAC        83.15        66.07        73.63          112
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   PER        91.09        92.12        91.60         1498
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]                   LOC        72.90        74.29        73.58          105
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]                 micro        88.48        87.81        88.14         3035
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]                 macro        83.47        80.19        81.61         3035
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  --- NER on Localization ---
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                Entity        92.56        91.89        92.23         3034
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                 micro        92.56        91.89        92.23         3034
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                 macro        92.56        91.89        92.23         3034
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]  --- NER on Classification ---
2021-11-15 22:06:50,955 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   VEH        94.12        94.12        94.12           17
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   WEA        95.24        62.50        75.47           32
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   GPE        95.60        87.62        91.44          719
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   ORG        93.59        84.60        88.87          552
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   FAC        93.67        66.07        77.49          112
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   PER        97.11        92.12        94.55         1498
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   LOC        84.78        74.29        79.19          105
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                 micro        95.59        87.81        91.53         3035
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                 macro        93.44        80.19        85.87         3035

Citation

If you have any questions related to the code or the paper, feel free to email [email protected].

@inproceedings{shen-etal-2022-piqn,
    title = "Parallel Instance Query Network for Named Entity Recognition",
    author = "Shen, Yongliang  and
      Wang, Xiaobin  and
      Tan, Zeqi  and
      Xu, Guangwei  and
      Xie, Pengjun  and
      Huang, Fei and
      Lu, Weiming and
      Zhuang, Yueting",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2203.10545",
}
Owner
Yongliang Shen
Knowledge is power.
Yongliang Shen
The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

speech-recognition-py Speech recognition is the ability of computer software to identify words and phrases in spoken language and convert them to huma

Deepangshi 1 Apr 03, 2022
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation This repository contains the implementation of the following paper: Live Speech

OldSix 575 Dec 31, 2022
The (extremely) naive sentiment classification function based on NBSVM trained on wisesight_sentiment

thai_sentiment The naive sentiment classification function based on NBSVM trained on wisesight_sentiment วิธีติดตั้ง pip install thai_sentiment==0.1.3

Charin 7 Dec 08, 2022
Generating new names based on trends in data using GPT2 (Transformer network)

MLOpsNameGenerator Overall Goal The goal of the project is to develop a model that is capable of creating Pokémon names based on its description, usin

Gustav Lang Moesmand 2 Jan 10, 2022
auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

RUO 2 Feb 22, 2022
Shared code for training sentence embeddings with Flax / JAX

flax-sentence-embeddings This repository will be used to share code for the Flax / JAX community event to train sentence embeddings on 1B+ training pa

Nils Reimers 23 Dec 30, 2022
Gold standard corpus annotated with verb-preverb connections for Hungarian.

Hungarian Preverb Corpus A gold standard corpus manually annotated with verb-preverb connections for Hungarian. corpus The corpus consist of the follo

RIL Lexical Knowledge Representation Research Group 3 Jan 27, 2022
MEDIALpy: MEDIcal Abbreviations Lookup in Python

A small python package that allows the user to look up common medical abbreviations.

Aberystwyth Systems Biology 7 Nov 09, 2022
Journalism AI – Quotes extraction for modular journalism

Quote extraction for modular journalism (JournalismAI collab 2021)

Journalism AI collab 2021 207 Dec 25, 2022
Text to speech for Vietnamese, ez to use, ez to update

Chào mọi người, đây là dự án mở nhằm giúp việc đọc được trở nên dễ dàng hơn. Rất cảm ơn đội ngũ Zalo đã cung cấp hạ tầng để mình có thể tạo ra app này

Trần Cao Minh Bách 32 Jul 29, 2022
PG-19 Language Modelling Benchmark

PG-19 Language Modelling Benchmark This repository contains the PG-19 language modeling benchmark. It includes a set of books extracted from the Proje

DeepMind 161 Oct 30, 2022
The SVO-Probes Dataset for Verb Understanding

The SVO-Probes Dataset for Verb Understanding This repository contains the SVO-Probes benchmark designed to probe for Subject, Verb, and Object unders

DeepMind 20 Nov 30, 2022
An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation

Facebook Research 409 Oct 28, 2022
Python functions for summarizing and improving voice dictation input.

Helpmespeak Help me speak uses Python functions for summarizing and improving voice dictation input. Get started with OpenAI gpt-3 OpenAI is a amazing

Margarita Humanitarian Foundation 6 Dec 17, 2022
SummerTime - Text Summarization Toolkit for Non-experts

A library to help users choose appropriate summarization tools based on their specific tasks or needs. Includes models, evaluation metrics, and datasets.

Yale-LILY 213 Jan 04, 2023
A paper list of pre-trained language models (PLMs).

Large-scale pre-trained language models (PLMs) such as BERT and GPT have achieved great success and become a milestone in NLP.

RUCAIBox 124 Jan 02, 2023
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

Zhilin Yang 3.3k Dec 28, 2022
Write Alphabet, Words and Sentences with your eyes.

The-Next-Gen-AI-Eye-Writer The Eye tracking Technique has become one of the most popular techniques within the human and computer interaction era, thi

Rohan Kasabe 2 Apr 05, 2022
A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

420 Dec 28, 2022
Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers. Cherche is meant to be used with small to medium sized corpora. C

Raphael Sourty 224 Nov 29, 2022