CoNLL-English NER Task

en | ch

Motivation

Course Project
review the pytorch framework and sequence-labeling task
practice using the transformers of Huggingface

Dataset Introduction

A train set, a test set and a validation set in the data file

-DOCSTART- -X- O O
-sentnce- -pos- -Chuck- -Entity-

Project Structure

-data  # source data
-emb # BERT model files

-util
    -dataTool.py  # data interface
    -model.py
    -trainer.py  # train and evaluate

config.py  # parameters in the project
run.py
requirement.txt

EDA.ipynb # exploratory data analasis, 
          # which aims to confirm the hyper-params in the trials

Coding Pattern

For keeping the convenience and simplicity of experiments,
decouple the model into two units: encoder and tagger

model ==> encoder + tagger

In such a way, encoder extracts the context and linguistit features,
which will be received by tagger to output BIO tags.

Usage

chmod 755 deploy
./deploy

./gpu n  # monitor the GPU (refresh every n seconds)
./run  # start

Baseline Performance (1 ep | macro)

Model	Precision	Recall	F1
Bert-CRF	0.71	0.68	0.69
Bert-softmax	-	-	-
Bert-BiLSTM-CRF	-	-	-
Bert-BiLSTM-softmax	-	-	-

Optimization

cost sensitive learning or drop the few classes
dropout to improve the generalization performance
different backbone structures
DDP training --> large GPU caches for a large batch_size
more epochs --> schedule the learning rate dynamically while training

CoNLL-English NER Task (NER in English)

Related tags

Overview

CoNLL-English NER Task

Motivation

Dataset Introduction

Project Structure

Coding Pattern

Usage

Baseline Performance (1 ep | macro)

Optimization

Owner

Kevin

Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

基于pytorch+bert的中文事件抽取

Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Uncomplete archive of files from the European Nopsled Team

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Curso práctico: NLP de cero a cien 🤗

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

keras implement of transformers for humans

Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

A Flask Sentiment Analysis API, with visual implementation

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Conversational text Analysis using various NLP techniques

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

a chinese segment base on crf

Knowledge Management for Humans using Machine Learning & Tags

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates