NLP算法

说明

此算法仓库包括文本分类、序列标注、关系抽取、文本匹配、文本相似度匹配这五个主流NLP任务，涉及到22个相关的模型算法。

框架结构

文件结构

all_models
├── Base_line
│   ├── __init__.py
│   ├── base_data_process.py
│   ├── base_evaluation.py
│   └── single_tokenizer.py
│
├── Texts_Classification
│   ├── 机器学习_文本分类
│   ├── fasttext_文本分类
│   ├── textcnn_文本分类
│   ├── lstm_文本分类
│   ├── han_文本分类
│   ├── bert_文本分类
│   └── 数据准备
│
├── Sequence_Labeling
│   ├── crf_suite
│   ├── lstm_crf
│   ├── bert_lstm_crf
│   ├── bert_mrc
│   └── 数据准备
│
├── Relation_Extraction
│   ├── CasRel
│   ├── multihead_joint_extraction
│   ├── R-bert_relation_recognition
│   ├── attention_lstm_relation_recognition
│   ├── attention_lstm_relation_recognition_for_single_sentence
│   ├── tagging_scheme_joint_extraction
│   ├── entity_extraction_bert_lstm_crf
│   └── 数据准备
│
├── Text_Matching
│   ├── DSSM
│   ├── ARC-II
│   ├── ESIM
│   ├── bert
│   └── 数据准备
│
├── Text_Similarity_Matching
│   ├── tfidf
│   ├── BM25
│   ├── pysparnn
│   └── commodity_title.txt
│
├── 记录
├── .gitignore
└── README.md

nlp基础任务

Related tags

Overview

NLP算法

说明

框架结构

文件结构

Owner

zuxinqi

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

Need: Image Search With Python

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

An implementation of the Pay Attention when Required transformer

Uncomplete archive of files from the European Nopsled Team

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

Using BERT-based models for toxic span detection

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

This repo stores the codes for topic modeling on palliative care journals.

PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

Malware-Related Sentence Classification

This repository contains the code for "Generating Datasets with Pretrained Language Models".

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

CDLA: A Chinese document layout analysis (CDLA) dataset

Long text token classification using LongFormer