Natural Language Processing with transformers

Last update: Dec 27, 2022

Related tags

Text Data & NLP nlp transformer bert

Overview

基于transformers的自然语言处理(NLP)入门

Natural Language Processing with transformers. 本项目面向的对象是：

NLP初学者、transformer初学者
有一定的python、pytorch编程基础
对前沿的transformer模型感兴趣
了解和知道简单的深度学习模型

本项目的愿景是：

希望结合形象生动的原理讲解和多个动手实践项目，帮助初学者快速入门深度学习时代的NLP。

本项目的主要参考资料是：

Huggingface/Transformers代码库
多个优秀的Transformer讲解和分享

项目成员：

erenup(多多笔记)，北京大学，负责人
张帆，Datawhale，天津大学，篇章4
张贤，哈尔滨工业大学，篇章2
李泺秋，浙江大学，篇章3
蔡杰，北京大学，篇章4
hlzhang，麦吉尔大学，篇章4
台运鹏篇章2
张红旭篇章2

本项目总结和学习了多篇优秀文档和分享，在各个章节均有标注来源，如有侵权，请及时联系项目成员，谢谢。去Github点完Star再学习事半功倍哦 😄 ，谢谢。

项目内容

篇章1-前言

篇章2-Transformer相关原理

篇章3-编写一个Transformer模型：BERT

篇章4-使用Transformers解决NLP任务

Owner

Datawhale

for the learner，和学习者一起成长

Datawhale

GitHub Repository https://datawhalechina.github.io/learn-nlp-with-transformers

Experiments in converting wikidata to ftm

FollowTheMoney / Wikidata mappings This repo will contain tools for converting Wikidata entities into FtM schema. Prefixes: https://www.mediawiki.org/

2 Nov 12, 2021

Random Directed Acyclic Graph Generator

DAG_Generator Random Directed Acyclic Graph Generator verison1.0 简介工作流通常由DAG（有向无环图）来定义，其中每个计算任务$T_i$由一个顶点(node,task,vertex)表示。同时，任务之间的每个数据或控制依赖性由一条加权

17 Dec 27, 2022

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

WikiPron WikiPron is a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary, as well as a database of pronuncia

213 Jan 01, 2023

Leon is an open-source personal assistant who can live on your server.

Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal

11.7k Dec 30, 2022

Rootski - Full codebase for rootski.io (without the data)

📣 Welcome to the Rootski codebase! This is the codebase for the application run

20 Nov 18, 2022

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

11 Nov 16, 2022

A benchmark for evaluation and comparison of various NLP tasks in Persian language.

Persian NLP Benchmark The repository aims to track existing natural language processing models and evaluate their performance on well-known datasets.

68 Dec 19, 2022

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Trapper (Transformers wRAPPER) Trapper is an NLP library that aims to make it easier to train transformer based models on downstream tasks. It wraps h

42 Sep 21, 2022

ETM - R package for Topic Modelling in Embedding Spaces

ETM - R package for Topic Modelling in Embedding Spaces This repository contains an R package called topicmodels.etm which is an implementation of ETM

37 Nov 06, 2022

Contact Extraction with Question Answering.

contactsQA Extraction of contact entities from address blocks and imprints with Extractive Question Answering. Goal Input: Dr. Max Mustermann Hauptstr

2 Apr 20, 2022

Ecommerce product title recognition package

revizor This package solves task of splitting product title string into components, like type, brand, model and article (or SKU or product code or you

16 Mar 03, 2022

ASCEND Chinese-English code-switching dataset

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong.

11 Dec 09, 2022

Various capabilities for static malware analysis.

Malchive The malchive serves as a compendium for a variety of capabilities mainly pertaining to malware analysis, such as scripts supporting day to da

64 Nov 22, 2022

Journey is a NLP-Powered Developer assistant

Journey Journey is a NLP-Powered Developer assistant Using on the powerful Natural Language Processing library Mindmeld, this projects aims to assist

21 Dec 11, 2022

Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

SongNet SongNet: SongCi + Song (Lyrics) + Sonnet + etc. @inproceedings{li-etal-2020-rigid, title = "Rigid Formats Controlled Text Generation",

212 Dec 17, 2022

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

189 Jan 02, 2023

Predict the spans of toxic posts that were responsible for the toxic label of the posts

toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant

3 Jul 24, 2022

Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

2 Feb 10, 2022

Mastering Transformers, published by Packt

Mastering Transformers This is the code repository for Mastering Transformers, published by Packt. Build state-of-the-art models from scratch with adv

195 Jan 01, 2023