nlpcommon

nlpcommon, Python Text Tool. Python3开发。

Guide

Feature
Install
Usage
Dataset
Contact
Cite
Reference

Feature

nlpcommon is a python Open Source Toolkit for text classification. The goal is to implement text analysis algorithm, so as to achieve the use in the production environment.

nlpcommon has the characteristics of clear algorithm, high performance and customizable corpus.

Functions：

Classifier

Cluster

MiniBatchKmeans

While providing rich functions, nlpcommon internal modules adhere to low coupling, model adherence to inert loading, dictionary publication, and easy to use.

Install

Requirements and Installation

pip3 install nlpcommon

git clone https://github.com/shibing624/nlpcommon.git
cd nlpcommon
python3 setup.py install

Usage

data

Stopwrods

examples/base_demo.py:

import sys

sys.path.append('..')
from nlpcommon import stopwords

if __name__ == '__main__':
    print(len(stopwords), stopwords)

output:

2438 {'．', '大家', '孰知', '至于', './', '知道', '二话没说', '一何', '从宽', 'especially' ... }

Contact

Issue(建议)：
邮件我：xuming: [email protected]
微信我：加我微信号：xuming624, 进Python-NLP交流群，备注：姓名-公司名-NLP

Cite

如果你在研究中使用了nlpcommon，请按如下格式引用：

@software{nlpcommon,
  author = {Xu Ming},
  title = {nlpcommon: A Tool for Text NLP},
  year = {2021},
  url = {https://github.com/shibing624/nlpcommon},
}

License

授权协议为 The Apache License 2.0，可免费用做商业用途。请在产品说明中附加nlpcommon的链接和授权协议。

Contribute

项目代码还很粗糙，如果大家对代码有所改进，欢迎提交回本项目，在提交之前，注意以下两点：

在tests添加相应的单元测试
使用python setup.py test来运行所有单元测试，确保所有单测都是通过的

之后即可提交PR。

Reference

pytextclassifier

nlpcommon is a python Open Source Toolkit for text classification.

Related tags

Overview

nlpcommon

Feature

Classifier

Cluster

Install

Usage

data

Stopwrods

Contact

Cite

License

Contribute

Reference

Owner

xuming

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

Implementation of paper Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa.

RuCLIP-SB (Russian Contrastive Language–Image Pretraining SWIN-BERT) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. Unlike other versions of the model we use BERT for text encoder and SWIN transformer for image encoder.

Must-read papers on improving efficiency for pre-trained language models.

Faster, modernized fork of the language identification tool langid.py

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Linking data between GBIF, Biodiverse, and Open Tree of Life

Beyond Paragraphs: NLP for Long Sequences

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

A fast hierarchical dimensionality reduction algorithm.

Data preprocessing rosetta parser for python

neural network based speaker embedder

Utilizing RBERT model for KLUE Relation Extraction task

Official PyTorch implementation of SegFormer

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Modified GPT using average pooling to reduce the softmax attention memory constraints.

InferSent sentence embeddings