Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

Related tags

Text Data & NLPpiqn
Overview

README

Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model and experiments, please see our paper.

Setup

Requirements

conda create --name acl python=3.8
conda activate acl
pip install -r requirements.txt

Datasets

Nested NER:

Flat NER:

Data format:

{
    "tokens": ["Others", ",", "though", ",", "are", "novices", "."], 
    "entities": [{"type": "PER", "start": 0, "end": 1}, {"type": "PER", "start": 5, "end": 6}], "relations": [], "org_id": "CNN_IP_20030328.1600.07", 
    "ltokens": ["WOODRUFF", "We", "know", "that", "some", "of", "the", "American", "troops", "now", "fighting", "in", "Iraq", "are", "longtime", "veterans", "of", "warfare", ",", "probably", "not", "most", ",", "but", "some", ".", "Their", "military", "service", "goes", "back", "to", "the", "Vietnam", "era", "."], 
    "rtokens": ["So", "what", "is", "it", "like", "for", "them", "to", "face", "combat", "far", "from", "home", "?", "For", "an", "idea", ",", "here", "is", "CNN", "'s", "Candy", "Crowley", "with", "some", "war", "stories", "."]
}

The ltokens contains the tokens from the previous sentence. And The rtokens contains the tokens from the next sentence.

Due to the license, we cannot directly release our preprocessed datasets of ACE04, ACE05, KBP17, NNE and OntoNotes. We only release the preprocessed GENIA, FewNERD, MSRA and CoNLL03 datasets. Download them from here.

If you need other datasets, please contact me ([email protected]) by email. Note that you need to state your identity and prove that you have obtained the license.

Example

Train

python piqn.py train --config configs/nested.conf

Note: You should edit this line in config_reader.py according to the actual number of GPUs.

Evaluation

You can download our checkpoints on ACE04 and ACE05, or train your own model and then evaluate the model. Because of the limited space of Google Cloud Drive, we share the other models in Baidu Cloud Drive, please download at this link (code: js9z).

python identifier.py eval --config configs/batch_eval.conf

If you use the checkpoints (ACE05 and ACE04) we provided, you will get the following results:

  • ACE05:
2022-03-30 12:56:52,447 [MainThread  ] [INFO ]  --- NER ---
2022-03-30 12:56:52,447 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   PER        88.07        92.92        90.43         1724
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   LOC        63.93        73.58        68.42           53
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   WEA        86.27        88.00        87.13           50
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   GPE        87.22        87.65        87.44          405
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   ORG        85.74        81.64        83.64          523
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   VEH        83.87        77.23        80.41          101
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                   FAC        75.54        77.21        76.36          136
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                 micro        86.38        88.57        87.46         2992
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]                 macro        81.52        82.61        81.98         2992
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  --- NER on Localization ---
2022-03-30 12:56:52,475 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                Entity        90.58        92.91        91.73         2991
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                 micro        90.58        92.91        91.73         2991
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]                 macro        90.58        92.91        91.73         2991
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  --- NER on Classification ---
2022-03-30 12:56:52,496 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   PER        97.09        92.92        94.96         1724
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   LOC        76.47        73.58        75.00           53
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   WEA        95.65        88.00        91.67           50
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   GPE        92.93        87.65        90.22          405
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   ORG        93.85        81.64        87.32          523
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   VEH       100.00        77.23        87.15          101
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                   FAC        89.74        77.21        83.00          136
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]  
2022-03-30 12:56:52,516 [MainThread  ] [INFO ]                 micro        95.36        88.57        91.84         2992
2022-03-30 12:56:52,517 [MainThread  ] [INFO ]                 macro        92.25        82.61        87.05         2992
  • ACE04
2021-11-15 22:06:50,896 [MainThread  ] [INFO ]  --- NER ---
2021-11-15 22:06:50,896 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   VEH        88.89        94.12        91.43           17
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   WEA        74.07        62.50        67.80           32
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   GPE        89.11        87.62        88.36          719
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   ORG        85.06        84.60        84.83          552
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   FAC        83.15        66.07        73.63          112
2021-11-15 22:06:50,932 [MainThread  ] [INFO ]                   PER        91.09        92.12        91.60         1498
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]                   LOC        72.90        74.29        73.58          105
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]                 micro        88.48        87.81        88.14         3035
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]                 macro        83.47        80.19        81.61         3035
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  --- NER on Localization ---
2021-11-15 22:06:50,933 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                Entity        92.56        91.89        92.23         3034
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                 micro        92.56        91.89        92.23         3034
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]                 macro        92.56        91.89        92.23         3034
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,954 [MainThread  ] [INFO ]  --- NER on Classification ---
2021-11-15 22:06:50,955 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                  type    precision       recall     f1-score      support
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   VEH        94.12        94.12        94.12           17
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   WEA        95.24        62.50        75.47           32
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   GPE        95.60        87.62        91.44          719
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   ORG        93.59        84.60        88.87          552
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   FAC        93.67        66.07        77.49          112
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   PER        97.11        92.12        94.55         1498
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                   LOC        84.78        74.29        79.19          105
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]  
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                 micro        95.59        87.81        91.53         3035
2021-11-15 22:06:50,976 [MainThread  ] [INFO ]                 macro        93.44        80.19        85.87         3035

Citation

If you have any questions related to the code or the paper, feel free to email [email protected].

@inproceedings{shen-etal-2022-piqn,
    title = "Parallel Instance Query Network for Named Entity Recognition",
    author = "Shen, Yongliang  and
      Wang, Xiaobin  and
      Tan, Zeqi  and
      Xu, Guangwei  and
      Xie, Pengjun  and
      Huang, Fei and
      Lu, Weiming and
      Zhuang, Yueting",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2203.10545",
}
Owner
Yongliang Shen
Knowledge is power.
Yongliang Shen
Command Line Text-To-Speech using Google TTS

cli-tts Thanks to gTTS by @pndurette! This is an interactive command line text-to-speech tool using Google TTS. Just type text and the voice will be p

ReekyStive 3 Nov 11, 2022
Python SDK for working with Voicegain Speech-to-Text

Voicegain Speech-to-Text Python SDK Python SDK for the Voicegain Speech-to-Text API. This API allows for large vocabulary speech-to-text transcription

Voicegain 3 Dec 14, 2022
A minimal Conformer ASR implementation adapted from ESPnet.

Conformer ASR A minimal Conformer ASR implementation adapted from ESPnet. Introduction I want to use the pre-trained English ASR model provided by ESP

Niu Zhe 3 Jan 24, 2022
This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

NERphilosophy 👋 Welcome to the github repository of my BsC thesis. This repository contains (not all) code from my project on Named Entity Recognitio

Ruben 1 Jan 27, 2022
A versatile token stream for handwritten parsers.

Writing recursive-descent parsers by hand can be quite elegant but it's often a bit more verbose than expected, especially when it comes to handling indentation and reporting proper syntax errors. Th

Valentin Berlier 8 Nov 30, 2022
Auto-researching tool generating word documents.

About ResearchTE automates researching by generating document with answers to given questions. Supports getting results from: Google DuckDuckGo (with

1 Feb 14, 2022
✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

AI-For-Road-Safety Challenge hosted by Omdena Hyderabad Chapter Original Repo Link : https://github.com/OmdenaAI/omdena-india-roadsafety Final Present

Prathima Kadari 7 Nov 29, 2022
Snowball compiler and stemming algorithms

Snowball is a small string processing language for creating stemming algorithms for use in Information Retrieval, plus a collection of stemming algori

Snowball Stemming language and algorithms 613 Jan 07, 2023
This simple Python program calculates a love score based on your and your crush's full names in English

This simple Python program calculates a love score based on your and your crush's full names in English. There is no logic or reason in the calculation behind the love score. The calculation could ha

p.katekomol 1 Jan 24, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

GETALP 212 Dec 10, 2022
Kerberoast with ACL abuse capabilities

targetedKerberoast targetedKerberoast is a Python script that can, like many others (e.g. GetUserSPNs.py), print "kerberoast" hashes for user accounts

Shutdown 213 Dec 22, 2022
中文問句產生器;使用台達電閱讀理解資料集(DRCD)

Transformer QG on DRCD The inputs of the model refers to we integrate C and A into a new C' in the following form. C' = [c1, c2, ..., [HL], a1, ..., a

Philip 1 Oct 22, 2021
Stuff related to Ben Eater's 8bit breadboard computer

8bit breadboard computer simulator This is an assembler + simulator/emulator of Ben Eater's 8bit breadboard computer. For a version with its RAM upgra

Marijn van Vliet 29 Dec 29, 2022
Beyond Accuracy: Behavioral Testing of NLP models with CheckList

CheckList This repository contains code for testing NLP Models as described in the following paper: Beyond Accuracy: Behavioral Testing of NLP models

Marco Tulio Correia Ribeiro 1.8k Dec 28, 2022
Unofficial Python library for using the Polish Wordnet (plWordNet / Słowosieć)

Polish Wordnet Python library Simple, easy-to-use and reasonably fast library for using the Słowosieć (also known as PlWordNet) - a lexico-semantic da

Max Adamski 12 Dec 23, 2022
ACL'22: Structured Pruning Learns Compact and Accurate Models

☕ CoFiPruning: Structured Pruning Learns Compact and Accurate Models This repository contains the code and pruned models for our ACL'22 paper Structur

Princeton Natural Language Processing 130 Jan 04, 2023
An Explainable Leaderboard for NLP

ExplainaBoard: An Explainable Leaderboard for NLP Introduction | Website | Download | Backend | Paper | Video | Bib Introduction ExplainaBoard is an i

NeuLab 319 Dec 20, 2022
DAGAN - Dual Attention GANs for Semantic Image Synthesis

Contents Semantic Image Synthesis with DAGAN Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Evalu

Hao Tang 104 Oct 08, 2022
IEEEXtreme15.0 Questions And Answers

IEEEXtreme15.0 Questions And Answers IEEEXtreme is a global challenge in which teams of IEEE Student members – advised and proctored by an IEEE member

Dilan Perera 15 Oct 24, 2022