本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料,该资料目前包含 自然语言处理各领域的 面试题积累。

Overview

【关于 NLP】那些你不知道的事

作者:杨夕、芙蕖、李玲、陈海顺、twilight、LeoLRH、JimmyDU、艾春辉、张永泰、金金金

介绍

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料,该资料目前包含 自然语言处理各领域的 面试题积累。

目录架构

一、【关于 基础算法篇】那些你不知道的事

二、【关于 机器学习算法篇】那些你不知道的事

三、【关于 深度学习算法篇】那些你不知道的事

四、【关于 NLP 学习算法】那些你不知道的事

4.1 【关于 信息抽取】那些你不知道的事

4.1.1 【关于 命名实体识别】那些你不知道的事
4.1.2 【关于 关系抽取】那些你不知道的事
4.1.3 【关于 事件抽取】那些你不知道的事

4.2 【关于 NLP 预训练算法】那些你不知道的事

4.3 【关于 文本分类】那些你不知道的事

4.4 【关于 文本匹配】那些你不知道的事

4.5 【关于 问答系统】那些你不知道的事

4.5.1 【关于 FAQ 检索式问答系统】 那些你不知道的事
4.5.2 【关于 问答系统工具篇】 那些你不知道的事

4.6 【关于 对话系统】那些你不知道的事

4.7 【关于 知识图谱】那些你不知道的事

4.7.1 【关于 知识图谱】 那些你不知道的事
4.7.2 【关于 KBQA】那些你不知道的事
4.7.3 【关于 Neo4j】那些你不知道的事

4.8 【关于 文本摘要】 那些你不知道的事

4.9 【关于 知识表示学习】那些你不知道的事

五、【关于 NLP 技巧】那些你不知道的事

5.1 【关于 少样本问题】那些你不知道的事

5.2 【关于 脏数据】那些你不知道的事

5.3 【关于 炼丹炉】那些你不知道的事

六、【关于 Python 】那些你不知道的事

七、【关于 Tensorflow 】那些你不知道的事

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.5k Dec 05, 2022
Segmenter - Transformer for Semantic Segmentation

Segmenter - Transformer for Semantic Segmentation

592 Dec 27, 2022
Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

461 Dec 28, 2022
Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

XLM-EMO: Multilingual Emotion Prediction in Social Media Text Abstract Detecting emotion in text allows social and computational scientists to study h

MilaNLP 35 Sep 17, 2022
DeepPavlov Tutorials

DeepPavlov tutorials DeepPavlov: Sentence Classification with Word Embeddings DeepPavlov: Transfer Learning with BERT. Classification, Tagging, QA, Ze

Neural Networks and Deep Learning lab, MIPT 28 Sep 13, 2022
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Kundan Krishna 6 Jun 04, 2021
ASCEND Chinese-English code-switching dataset

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong.

CAiRE 11 Dec 09, 2022
PyTorch source code of NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models"

This repository contains source code for NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models" (P

Alexandra Chronopoulou 89 Aug 12, 2022
A fast and easy implementation of Transformer with PyTorch.

FasySeq FasySeq is a shorthand as a Fast and easy sequential modeling toolkit. It aims to provide a seq2seq model to researchers and developers, which

宁羽 7 Jul 18, 2022
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 12.3k Dec 31, 2022
Switch spaces for knowledge graph embeddings

SwisE Switch spaces for knowledge graph embeddings. Requirements: python3 pytorch numpy tqdm Reproduce the results To reproduce the reported results,

Shuai Zhang 4 Dec 01, 2021
The Sudachi synonym dictionary in Solar format.

solr-sudachi-synonyms The Sudachi synonym dictionary in Solar format. Summary Run a script that checks for updates to the Sudachi dictionary every hou

Karibash 3 Aug 19, 2022
source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

49 Dec 17, 2022
DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches the answers out of 60 billion phrases in Wikipedia, it is also v

Jinhyuk Lee 543 Jan 08, 2023
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning English | 中文 ❗ Now we provide inferencing code and pre-training models

164 Jan 02, 2023
A simple implementation of N-gram language model.

About A simple implementation of N-gram language model. Requirements numpy Data preparation Corpus Training data for the N-gram model, a text file lik

4 Nov 24, 2021
customer care chatbot made with Rasa Open Source.

Customer Care Bot Customer care bot for ecomm company which can solve faq and chitchat with users, can contact directly to team. 🛠 Features Basic E-c

Dishant Gandhi 23 Oct 27, 2022
Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

31 Oct 31, 2022
小布助手对话短文本语义匹配的一个baseline

oppo-text-match 小布助手对话短文本语义匹配的一个baseline 模型 参考:https://kexue.fm/archives/8213 base版本线下大概0.952,线上0.866(单模型,没做K-flod融合)。 训练 测试环境:tensorflow 1.15 + keras

苏剑林(Jianlin Su) 132 Dec 14, 2022
A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Emily's Symbol Dictionary Design This dictionary was created with the following goals in mind: Have a consistent method to type (pretty much) every sy

Emily 68 Jan 07, 2023