Chinese Grammatical Error Diagnosis

Overview

nlp-CGED

Chinese Grammatical Error Diagnosis
中文语法纠错研究 基于序列标注的方法

所需环境

Python==3.6
tensorflow==1.14.0
keras==2.3.1
bert4keras==0.10.6
笔者使用了开源的bert4keras,一个keras版的transformer模型库。bert4keras的更多介绍参见这里

项目目录

├── bert4keras
├── data    存放数据
├── pretrained_model    存放预训练模型
├── models    存放CRF等算法
├── CGED_train.py    训练代码
├── CGED_predict.py    评估和测试代码

数据集

数据集采用的CGED官方提供 转换为序列标注的形式,具体可以看data中的数据

使用说明

1.下载预训练语言模型
  可采用BERT-Base, Chinese等模型
  更多的预训练语言模型可参见bert4keras给出的权重。
2.构建数据集(数据集已处理好)
  train.json和test.json
3.训练模型

python CGED_train.py

4.评估和测试

python CGED_predict.py

结果

数据集 f1 precision recall
test 0.46373 0.48993 0.44019

有任何问题欢迎私聊

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Emily's Symbol Dictionary Design This dictionary was created with the following goals in mind: Have a consistent method to type (pretty much) every sy

Emily 68 Jan 07, 2023
PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

data2vec-pytorch PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (F

Aryan Shekarlaban 105 Jan 04, 2023
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

Artifici Online Services inc. 74 Oct 07, 2022
A text augmentation tool for named entity recognition.

neraug This python library helps you with augmenting text data for named entity recognition. Augmentation Example Reference from An Analysis of Simple

Hiroki Nakayama 48 Oct 11, 2022
Code for Text Prior Guided Scene Text Image Super-Resolution

Code for Text Prior Guided Scene Text Image Super-Resolution

82 Dec 26, 2022
A fast hierarchical dimensionality reduction algorithm.

h-NNE: Hierarchical Nearest Neighbor Embedding A fast hierarchical dimensionality reduction algorithm. h-NNE is a general purpose dimensionality reduc

Marios Koulakis 35 Dec 12, 2022
Implementation of "Adversarial purification with Score-based generative models", ICML 2021

Adversarial Purification with Score-based Generative Models by Jongmin Yoon, Sung Ju Hwang, Juho Lee This repository includes the official PyTorch imp

15 Dec 15, 2022
auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

RUO 2 Feb 22, 2022
Conversational text Analysis using various NLP techniques

Conversational text Analysis using various NLP techniques

Rita Anjana 159 Jan 06, 2023
NeMo: a toolkit for conversational AI

NVIDIA NeMo Introduction NeMo is a toolkit for creating Conversational AI applications. NeMo product page. Introductory video. The toolkit comes with

NVIDIA Corporation 5.3k Jan 04, 2023
Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your t

Ravn Tech, Inc. 166 Jan 07, 2023
The first online catalogue for Arabic NLP datasets.

Masader The first online catalogue for Arabic NLP datasets. This catalogue contains 200 datasets with more than 25 metadata annotations for each datas

ARBML 94 Dec 26, 2022
simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

Quickly train T5 models in just 3 lines of code + ONNX support simpleT5 is built on top of PyTorch-lightning ⚡️ and Transformers 🤗 that lets you quic

Shivanand Roy 220 Dec 30, 2022
CoSENT 比Sentence-BERT更有效的句向量方案

CoSENT 比Sentence-BERT更有效的句向量方案

苏剑林(Jianlin Su) 201 Dec 12, 2022
Deep Learning Topics with Computer Vision & NLP

Deep learning Udacity Course Deep Learning Topics with Computer Vision & NLP for the AWS Machine Learning Engineer Nanodegree Program Tasks are mostly

Simona Mircheva 1 Jan 20, 2022
A telegram bot to translate 100+ Languages

🔥 GOOGLE TRANSLATER 🔥 The owner would not be responsible for any kind of bans due to the bot. • ⚡ INSTALLING ⚡ • • 🔰 Deploy To Railway 🔰 • • ✅ OFF

Aɴᴋɪᴛ Kᴜᴍᴀʀ 5 Dec 20, 2021
PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

LXMERT: Learning Cross-Modality Encoder Representations from Transformers Our servers break again :(. I have updated the links so that they should wor

Hao Tan 838 Dec 19, 2022
Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

Jeong Ukjae 13 Dec 13, 2022
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 193 Dec 30, 2022
State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Trapper (Transformers wRAPPER) Trapper is an NLP library that aims to make it easier to train transformer based models on downstream tasks. It wraps h

Open Business Software Solutions 42 Sep 21, 2022