硕士期间自学的NLP子任务，供学习参考

Last update: May 31, 2022

Overview

NLP_Chinese_down_stream_task

自学的NLP子任务，供学习参考

任务1 ：短文本分类

(1).数据集：THUCNews中文文本数据集(10分类)

(2).模型：BERT+FC/LSTM，Pytorch实现

(3).使用方法：

预训练模型使用的是中文BERT-WWM, 下载地址(https://github.com/ymcui/Chinese-BERT-wwm), 下载解压后放入[bert_pretrain]文件夹下，运行“main.py”即可

(4).训练结果：

任务2：命名体识别(NER)

(1).数据集：china-people-daily-ner-corpus（中国人民日报数据集）

(2).模型：BiLSTM+CRF，Tensorflow_cpu >= 2.1

使用了中文Wikipedia训练好的100维词向量，运行main.py即可。

(3).训练结果:

(4).F1-Score结果:

任务3：文本匹配（语义相似度，Semantic Textual Similarity）

(1).数据集：fake-news-pair-classification-challenge(kaggle虚假新闻标题分类竞赛，标签有三种关系：'unrelated', 'agreed', 'disagreed')

(2).模型：Siamese LSTM + 任意文本相似度匹配方法，Tensorflow_cpu >= 2.1

(3).使用方法：

直接运行“main.py”即可

硕士期间自学的NLP子任务，供学习参考

Related tags

Overview

NLP_Chinese_down_stream_task

任务1 ：短文本分类

(3).使用方法：

(4).训练结果：

任务2：命名体识别(NER)

(3).训练结果:

(4).F1-Score结果:

任务3：文本匹配（语义相似度，Semantic Textual Similarity）

(3).使用方法：

(4).训练结果：

Reference:

Owner

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Code associated with the Don't Stop Pretraining ACL 2020 paper

NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

Two-stage text summarization with BERT and BART

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

lightweight, fast and robust columnar dataframe for data analytics with online update

HAN2HAN : Hangul Font Generation

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

AudioCLIP Extending CLIP to Image, Text and Audio

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Espial is an engine for automated organization and discovery of personal knowledge

Machine learning models from Singapore's NLP research community

Code for the Python code smells video on the ArjanCodes channel.

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Pretty-doc - Composable text objects with python

Need: Image Search With Python

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework