Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Last update: Dec 29, 2022

Related tags

Text Data & NLP NegSampling-NER

Overview

Negative Sampling for NER

Unlabeled entity problem is prevalent in many NER scenarios (e.g., weakly supervised NER). Our paper in ICLR-2021 proposes using negative sampling for solving this important issue. This repo. contains the implementation of our approach.

Note that this is not an officially supported Tencent product.

Preparation

Two steps. Firstly, reformulate the NER data and move it into a new folder named "dataset". The folder contains {train, dev, test}.json. Each JSON file is a list of dicts. See the following case:

[ 
 {
  "sentence": "['Somerset', '83', 'and', '174', '(', 'P.', 'Simmons', '4-38', ')', ',', 'Leicestershire', '296', '.']",
  "labeled entities": "[(0, 0, 'ORG'), (5, 6, 'PER'), (10, 10, 'ORG')]",
 },
 {
  "sentence": "['Leicestershire', '22', 'points', ',', 'Somerset', '4', '.']",
  "labeled entities": "[(0, 0, 'ORG'), (4, 4, 'ORG')]",
 }
]

Secondly, pretrained LM (i.e., BERT) and eval. script. Create a dir. named "resource" and arrange them as

resource
- bert-base-cased
  - model.pt
  - vocab.txt
- conlleval.pl

Note that the files in BERT.tar.gz need to be renamed as above.

Training and Test

CUDA_VISIBLE_DEVICES=0 python main.py -dd dataset -cd save -rd resource

Citation

@inproceedings{li2021empirical,
    title={Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition},
    author={Yangming Li and lemao liu and Shuming Shi},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=5jRVa89sZk}
}

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Related tags

Overview

Negative Sampling for NER

Preparation

Training and Test

Citation

Owner

Yangming Li

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

Phrase-Based & Neural Unsupervised Machine Translation

TensorFlow code and pre-trained models for BERT

This is the offline-training-pipeline for our project.

Code for text augmentation method leveraging large-scale language models

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

NLP Overview

Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

Client library to download and publish models and other files on the huggingface.co hub

auto_code_complete is a auto word-completetion program which allows you to customize it on your need

RecipeReduce: Simplified Recipe Processing for Lazy Programmers

Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Extract rooms type, door, neibour rooms, rooms corners nad bounding boxes, and generate graph from rplan dataset

CoSENT 比Sentence-BERT更有效的句向量方案

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Extract Keywords from sentence or Replace keywords in sentences.