A paper list of pre-trained language models (PLMs).

Overview

PLM papers

Contributed by Xiaolei Wang

Large-scale pre-trained language models (PLMs) such as BERT and GPT have achieved great success and become a milestone in NLP.

In this repo, we collect some representative PLM papers in recent years based on the number of citations and papers published in latest top conferences (e.g., ACL, EMNLP, ICLR, ICML, NeurIPS).

We will keep the repo updated and welcome pull requests and issues! Thanks for your stars and forks!

Table of Contents

Survey

  1. "Pre-trained models for natural language processing: A survey". Science China Technological Sciences(2020) [PDF]
  2. "Which *BERT? A Survey Organizing Contextualized Encoders". EMNLP(2020) [PDF]
  3. "A Primer in BERTology: What We Know About How BERT Works". TACL(2020) [PDF]
  4. "From static to dynamic word representations: a survey". International Journal of Machine Learning and Cybernetics(2020) [PDF]
  5. "Overview of the Transformer-based Models for NLP Tasks". 2020 15th Conference on Computer Science and Information Systems (FedCSIS) [PDF]
  6. "A Survey on Contextual Embeddings". arXiv(2020) [PDF]
  7. "The NLP Cookbook: Modern Recipes for Transformer Based Deep Learning Architectures". IEEE Access(2021) [PDF]
  8. "Pre-Trained Models: Past, Present and Future". arXiv(2021) [PDF]
  9. "Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing". arXiv(2021) [PDF]
  10. "AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing". arXiv(2021) [PDF]
  11. "On the Opportunities and Risks of Foundation Models". arXiv(2021) [PDF]
  12. "Paradigm Shift in Natural Language Processing". arXiv(2021) [PDF]
  13. "Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey". arXiv(2021) [PDF]

Benchmark

  1. XNLI: "XNLI: Evaluating Cross-lingual Sentence Representations". EMNLP(2018) [PDF] [Dataset]
  2. GLUE: "GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding". ICLR(2019) [Homepage]
  3. SuperGLUE: "SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems". NeurIPS(2019) [Homepage]
  4. CLUE: "CLUE: A Chinese Language Understanding Evaluation Benchmark". COLING(2020) [Homepage]
  5. XTREME: "XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization". ICML(2020) [Homepage]
  6. XGLUE: "XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation". EMNLP(2020) [Homepage]
  7. DialoGLUE: "DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue". arXiv(2020) [Homepage]

PLM Design

General

  1. GPT: "Improving Language Understanding by Generative Pre-Training". OpenAI(2018) [Project]
  2. GPT-2: "Language Models are Unsupervised Multitask Learners". OpenAI(2019) [Project]
  3. BERT: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". NAACL(2019) [PDF] [Code]
  4. XLNet: "XLNet: Generalized Autoregressive Pretraining for Language Understanding". NeurIPS(2019) [PDF] [Code]
  5. SBERT: "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks". ACL(2019) [PDF] [Code]
  6. UniLM: "Unified Language Model Pre-training for Natural Language Understanding and Generation". NeurIPS(2019) [PDF] [Code]
  7. MASS: "MASS: Masked Sequence to Sequence Pre-training for Language Generation". ICML(2019) [PDF] [Code]
  8. Chinese-BERT-wwm: "Pre-Training with Whole Word Masking for Chinese BERT". arXiv(2019) [PDF] [Code]
  9. "Cloze-driven Pretraining of Self-attention Networks". EMNLP(2019) [PDF]
  10. "BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model". Workshop on Methods for Optimizing and Evaluating Neural Language Generation(2019) [PDF] [Code]
  11. GPT-3: "Language Models are Few-Shot Learners". NeurIPS(2020) [PDF] [Code]
  12. T5: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". JMLR(2020) [PDF] [Code]
  13. BART: "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension". ACL(2020) [PDF] [Code]
  14. Poly-encoders: "Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring". ICLR(2020) [PDF]
  15. SpanBERT: "SpanBERT: Improving Pre-training by Representing and Predicting Spans". TACL(2020) [PDF] [Code]
  16. ERNIE 2.0: "ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding". AAAI(2020) [PDF] [Code]
  17. SemBERT: "Semantics-Aware BERT for Language Understanding". AAAI(2020) [PDF] [Code]
  18. "Leveraging Pre-trained Checkpoints for Sequence Generation Tasks". TACL(2020) [PDF] [Code]
  19. ProphetNet: "ProphetNet: Predicting Future N-gram for Sequence-to-SequencePre-training". EMNLP(2020) [PDF]
  20. UniLMv2: "UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training". ICML(2020) [PDF] [Code]
  21. MacBERT: "Revisiting Pre-Trained Models for Chinese Natural Language Processing". EMNLP(2020) [PDF] [Code]
  22. MPNet: "MPNet: Masked and Permuted Pre-training for Language Understanding". arXiv(2020) [PDF] [Code]
  23. DEBERTA: "DeBERTa: Decoding-enhanced BERT with Disentangled Attention". ICLR(2021) [PDF] [Code]
  24. PALM: "PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation". EMNLP(2020) [PDF]
  25. Optimus: "Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space". EMNLP(2020) [PDF] [Code]
  26. "Self-training Improves Pre-training for Natural Language Understanding". NAACL(2021) [PDF] [Code]
  27. CAPT: "Rethinking Denoised Auto-Encoding in Language Pre-Training". EMNLP(2021) [PDF]
  28. "Frustratingly Simple Pretraining Alternatives to Masked Language Modeling". EMNLP(2021) [PDF] [Code]
  29. "Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models". ACL(2021) [PDF] [Code]
  30. ERNIE-Doc: "ERNIE-Doc: A Retrospective Long-Document Modeling Transformer". ACL(2021) [PDF] [Code]
  31. "Pre-training Universal Language Representation". ACL(2021) [PDF] [Code]

Knowledge

  1. ERNIE(Baidu): "ERNIE: Enhanced Representation through Knowledge Integration". arXiv(2019) [PDF] [Code]
  2. KnowBert: "Knowledge Enhanced Contextual Word Representations". EMNLP(2019) [PDF]
  3. ERNIE(Tsinghua): "ERNIE: Enhanced Language Representation with Informative Entities". ACL(2019) [PDF] [Code]
  4. COMET: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction". ACL(2019) [PDF] [Code]
  5. K-BERT: "K-BERT: Enabling Language Representation with Knowledge Graph". AAAI(2020) [PDF] [Code]
  6. WKLM: "Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model". ICLR(2020) [PDF]
  7. LUKE: "LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention". EMNLP(2020) [PDF] [Code]
  8. K-Adapter: "K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters". ICLR(2021) [PDF]
  9. KEPLER: "KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation". TACL(2021) [PDF] [Code]
  10. RuleBERT: "RuleBERT: Teaching Soft Rules to Pre-Trained Language Models". EMNLP(2021) [PDF] [Code]
  11. BeliefBank: "Exploring the Role of BERT Token Representations to Explain Sentence Probing Results". EMNLP(2021) [PDF] [Code]
  12. Phrase-BERT: "Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration". EMNLP(2021) [PDF] [Code]
  13. "Syntax-Enhanced Pre-trained Model". ACL(2021) [PDF] [Code]
  14. StructFormer: "StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling". ACL(2021) [PDF]
  15. ERICA: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning". ACL(2021) [PDF] [Code]
  16. "Structural Guidance for Transformer Language Models". ACL(2021) [PDF] [Code]
  17. HORNET: "HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources". CIKM(2021) [PDF]
  18. "Drop Redundant, Shrink Irrelevant: Selective Knowledge Injection for Language Pretraining". IJCAI(2021) [PDF]

Multilingual

  1. XLM: "Cross-lingual Language Model Pretraining". arXiv(2019) [PDF] [Code]
  2. "Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond". TACL(2019) [PDF] [Code]
  3. UDify: "75 Languages, 1 Model: Parsing Universal Dependencies Universally". EMNLP(2019) [PDF] [Code]
  4. Unicoder: "Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks". EMNLP(2019) [PDF]
  5. XLM-R: "Unsupervised Cross-lingual Representation Learning at Scale". ACL(2020) [PDF]
  6. "Multilingual Alignment of Contextual Word Representations". ICLR(2020) [PDF]
  7. mBART: "Multilingual Denoising Pre-training for Neural Machine Translation". TACL(2020) [PDF] [Code]
  8. mT5: "mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer". NAACL(2021) [PDF] [Code]
  9. InfoXLM: "InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training". NAACL(2021) [PDF] [Code]
  10. "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training". EMNLP(2021) [PDF] [Code]
  11. ERNIE-M: "ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora". EMNLP(2021) [PDF] [Code]
  12. "A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders". EMNLP(2021) [PDF]
  13. "Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation". EMNLP(2021) [PDF]
  14. "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models". ACL(2021) [PDF] [Code]
  15. "Multilingual Pre-training with Universal Dependency Learning". NeurIPS(2021) [PDF]

Multi-Modal

  1. ViLBERT: "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks". NeuralIPS(2019) [PDF]
  2. LXMERT: "LXMERT: Learning Cross-Modality Encoder Representations from Transformers". EMNLP(2019) [PDF] [Code]
  3. VideoBERT: "VideoBERT: A Joint Model for Video and Language Representation Learning" ICCV(2019) [PDF]
  4. VisualBERT: "VisualBERT: A Simple and Performant Baseline for Vision and Language". arXiv(2019) [PDF]
  5. B2T2: "Fusion of Detected Objects in Text for Visual Question Answering". EMNLP(2019) [PDF] [Code]
  6. VL-BERT: "VL-BERT: Pre-training of Generic Visual-Linguistic Representations". ICLR(2020) [PDF] [Code]
  7. Unicoder-VL: "Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training". AAAI(2020) [PDF]
  8. VLP: "Unified Vision-Language Pre-Training for Image Captioning and VQA". AAAI(2020) [PDF] [Code]
  9. UNITER: "UNITER: UNiversal Image-TExt Representation Learning". ECCV(2020) [PDF] [Code]
  10. Oscar: "Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks". ECCV(2020) [PDF] [Code]
  11. "12-in-1: Multi-Task Vision and Language Representation Learning". CVPR(2020) [PDF] [Code]
  12. ActBERT: "ActBERT: Learning Global-Local Video-Text Representations". CVPR(2020) [PDF]
  13. VLN: "Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks". CVPR(2020) [PDF]
  14. VILLA: "Large-Scale Adversarial Training for Vision-and-Language Representation Learning". arXiv(2020) [PDF] [Code]
  15. ImageBERT: "ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data". arXiv(2020) [PDF]
  16. ALIGN: "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision". ICML(2021) [PDF]
  17. ClipBERT: "Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling". CVPR(2021) [PDF] [Code]
  18. DALL·E: "Zero-Shot Text-to-Image Generation". arXiv(2021) [PDF] [Code]
  19. CLIP: "Learning Transferable Visual Models From Natural Language Supervision". arXiv(2021) [PDF] [Code]
  20. IPT: "Pre-Trained Image Processing Transformer". CVPR(2021) [PDF] [Code]
  21. CvT: "CvT: Introducing Convolutions to Vision Transformers". ICCV(2021) [PDF] [Code]
  22. "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision". ICML(2021) [PDF]
  23. TERA: "TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech". TASLP(2021) [PDF] [Code]
  24. CaiT: "Going deeper with Image Transformers". ICCV(2021) [PDF] [Code]
  25. ViViT: "ViViT: A Video Vision Transformer". ICCV(2021) [PDF] [Code]
  26. VirTex: "VirTex: Learning Visual Representations From Textual Annotations". CVPR(2021) [PDF] [Code]
  27. M6: "M6: Multi-Modality-to-Multi-Modality Multitask Mega-transformer for Unified Pretraining". KDD(2021) [PDF]
  28. "Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training". NeurIPS(2021) [PDF]
  29. GilBERT: "GilBERT: Generative Vision-Language Pre-Training for Modality-Incomplete Visual-Linguistic Tasks". SIGIR(2021) [PDF]

Information Retrieval

  1. ORQA: "Latent Retrieval for Weakly Supervised Open Domain Question Answering". ACL(2019) [PDF]
  2. REALM: "REALM: Retrieval-Augmented Language Model Pre-Training". arXiv(2020) [PDF]
  3. RAG: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks". NeurIPS(2020) [PDF] [Code]
  4. DPR: "Dense Passage Retrieval for Open-Domain Question Answering". EMNLP(2020) [PDF] [Code]
  5. "Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering". EACL(2021) [PDF] [Code]

Code

  1. CodeT5: "CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation". EMNLP(2021) [PDF] [Code]
  2. Codex: "Evaluating Large Language Models Trained on Code". arXiv(2021) [PDF] [Code]

Others

  1. ReasonBERT: "ReasonBERT: Pre-trained to Reason with Distant Supervision". EMNLP(2021) [PDF] [Code]
  2. "Sentence Bottleneck Autoencoders from Transformer Language Models". EMNLP(2021) [PDF] [Code]
  3. "Numeracy enhances the Literacy of Language Models". EMNLP(2021) [PDF] [Code]
  4. EnsLM: "EnsLM: Ensemble Language Model for Data Diversity by Semantic Clustering". ACL(2021) [PDF] [Code]
  5. "Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models". ACL(2021) [PDF] [Code]
  6. BERTAC: "BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks". ACL(2021) [PDF] [Code]
  7. "Natural Language Understanding with Privacy-Preserving BERT". CIKM(2021) [PDF]
  8. BANG: "BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining". ICML(2021) [PDF] [Code]

PLM Analysis

Knowledge

  1. "What Does BERT Look at? An Analysis of BERT’s Attention". BlackBoxNLP(2019) [PDF] [Code]
  2. "BERT Rediscovers the Classical NLP Pipeline". ACL(2019) [PDF]
  3. "How Multilingual is Multilingual BERT?". ACL(2019) [PDF]
  4. "A Structural Probe for Finding Syntax in Word Representations". NAACL(2019) [PDF] [Code]
  5. "Language Models as Knowledge Bases?". EMNLP(2019) [PDF] [Code]
  6. "What Does BERT Learn about the Structure of Language?". ACL(2019) [PDF] [Code]
  7. "Linguistic Knowledge and Transferability of Contextual Representations". NAACL(2019) [PDF]
  8. "Assessing BERT's Syntactic Abilities". arXiv(2019) [PDF] [Code]
  9. "Probing Neural Network Comprehension of Natural Language Arguments" ACL(2019) [PDF]
  10. "How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings". EMNLP(2019) [PDF]
  11. "Visualizing and Measuring the Geometry of BERT". NeurIPS(2019) [PDF]
  12. "Designing and Interpreting Probes with Control Tasks". EMNLP(2019) [PDF]
  13. "Open Sesame: Getting inside BERT’s Linguistic Knowledge". BlackboxNLP(2019) [PDF] [Code]
  14. "What do you learn from context? Probing for sentence structure in contextualized word representations". ICLR(2019) [PDF] [Code]
  15. "Commonsense Knowledge Mining from Pretrained Models". EMNLP(2019) [PDF]
  16. "Do NLP Models Know Numbers? Probing Numeracy in Embeddings". EMNLP(2019) [PDF]
  17. "On the Cross-lingual Transferability of Monolingual Representations". ACL(2020) [PDF]
  18. "Cross-Lingual Ability of Multilingual BERT: An Empirical Study". ICLR(2020) [PDF] [Code]
  19. "What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models". TACL(2020) [PDF] [Code]
  20. "How Much Knowledge Can You Pack Into the Parameters of a Language Model?". EMNLP(2020) [PDF] [Code]
  21. "How Can We Know What Language Models Know?". TACL(2020) [PDF] [Code]
  22. "oLMpics-On What Language Model Pre-training Captures". TACL(2020) [PDF] [Code]
  23. "Information-Theoretic Probing with Minimum Description Length". EMNLP(2020) [PDF] [Code]
  24. "Inducing Relational Knowledge from BERT". AAAI(2020) [PDF]
  25. AutoPrompt: "AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts". EMNLP(2020) [PDF] [Code]
  26. "Emergent linguistic structure in artificial neural networks trained by self-supervision". PNAS(2020) [PDF]
  27. "Evaluating Commonsense in Pre-Trained Language Models". AAAI(2020) [PDF] [Code]
  28. "Inducing Relational Knowledge from BERT". AAAI(2020) [PDF]
  29. "Editing Factual Knowledge in Language Models". EMNLP(2021) [PDF] [Code]
  30. "How much pretraining data do language models need to learn syntax?". EMNLP(2021) [PDF]
  31. "Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?". EMNLP(2021) [PDF] [Code]
  32. "Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords". EMNLP(2021) [PDF] [Code]
  33. "Frequency Effects on Syntactic Rule Learning in Transformers". EMNLP(2021) [PDF] [Code]
  34. "Exploring the Role of BERT Token Representations to Explain Sentence Probing Results". EMNLP(2021) [PDF] [Code]
  35. "How is BERT surprised? Layerwise detection of linguistic anomalies". ACL(2021) [PDF] [Code]
  36. "Implicit Representations of Meaning in Neural Language Model". ACL(2021) [PDF] [Code]
  37. "Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases". ACL(2021) [PDF] [Code]

Robustness

  1. "Universal Adversarial Triggers for Attacking and Analyzing NLP". EMNLP(2019) [PDF] [Code]
  2. "Pretrained Transformers Improve Out-of-Distribution Robustness". ACL(2020) [PDF] [Code]
  3. BERT-ATTACK: "BERT-ATTACK: Adversarial Attack Against BERT Using BERT". EMNLP(2020) [PDF] [Code]
  4. "Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment". AAAI(2020) [PDF] [Code]
  5. "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". EMNLP(2021) [PDF] [Code]
  6. "Sorting through the noise: Testing robustness of information processing in pre-trained language models". EMNLP(2021) [PDF] [Code]

Sparsity

  1. "Are Sixteen Heads Really Better than One?". NeurIPS(2019) [PDF] [Code]
  2. "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned". ACL(2019) [PDF] [Code]
  3. "Revealing the Dark Secrets of BERT". EMNLP(2019) [PDF]
  4. "The Lottery Ticket Hypothesis for Pre-trained BERT Networks". NeurIPS(2020) [PDF] [Code]
  5. "When BERT Plays the Lottery, All Tickets Are Winning". EMNLP(2020) [PDF] [Code]

Others

  1. "Scaling Laws for Neural Language Models". arXiv(2020) [PDF]
  2. "Extracting Training Data from Large Language Models". arXiv(2020) [PDF] [Code]
  3. "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 ". FACCT(2021) [PDF]
  4. "Extracting Training Data from Large Language Models". USENIX(2021) [PDF] [Code]
  5. "Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little". EMNLP(2021) [PDF] [Code]
  6. "Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent". EMNLP(2021) [PDF] [Code]
  7. "Discretized Integrated Gradients for Explaining Language Models". EMNLP(2021) [PDF] [Code]
  8. "Do Long-Range Language Models Actually Use Long-Range Context?". EMNLP(2021) [PDF]
  9. "Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right". EMNLP(2021) [PDF] [Code]
  10. "Incorporating Residual and Normalization Layers into Analysis of Masked Language Models". EMNLP(2021) [PDF] [Code]
  11. "Sequence Length is a Domain: Length-based Overfitting in Transformer Models". EMNLP(2021) [PDF]
  12. "Are Pretrained Convolutions Better than Pretrained Transformers?". ACL(2021) [PDF]
  13. "Positional Artefacts Propagate Through Masked Language Model Embeddings". ACL(2021) [PDF]
  14. "When Do You Need Billions of Words of Pretraining Data?". ACL(2021) [PDF] [Code]
  15. "BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?". ACL(2021) [PDF] [Code]
  16. "Examining the Inductive Bias of Neural Language Models with Artificial Languages". ACL(2021) [PDF] [Code]
  17. "Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning". NeurIPS(2021) [PDF]

Efficient PLM

Training

  1. RoBERTa: "RoBERTa: A Robustly Optimized BERT Pretraining Approach". arXiv(2019) [PDF] [Code]
  2. "Efficient Training of BERT by Progressively Stacking". ICML(2019) [PDF] [Code]
  3. Megatron-LM: "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". arXiv(2019) [PDF] [Code]
  4. ELECTRA: "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators". ICLR(2020) [PDF] [Code]
  5. "Large Batch Optimization for Deep Learning: Training BERT in 76 minutes". ICLR(2020) [PDF] [Code]
  6. GShard: "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding". arXiv(2020) [PDF]
  7. Admin: "Understanding the Difficulty of Training Transformers". EMNLP(2020) [PDF] [Code]
  8. ZeRO: "ZeRO: Memory optimizations Toward Training Trillion Parameter Models". SC20: International Conference for High Performance Computing, Networking, Storage and Analysis [PDF] [Code]
  9. Switch Transformers: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". arXiv(2021) [PDF] [Code]
  10. "How to Train BERT with an Academic Budget". EMNLP(2021) [PDF]
  11. "Optimizing Deeper Transformers on Small Datasets". ACL(2021) [PDF] [Code]
  12. "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets". ACL(2021) [PDF] [Code]

Inference

  1. "BERT Loses Patience: Fast and Robust Inference with Early Exit". NeurIPS(2020) [PDF] [Code]
  2. GAML-BERT: "GAML-BERT: Improving BERT Early Exiting by Gradient Aligned Mutual Learning". EMNLP(2021) [PDF]
  3. "Efficient Nearest Neighbor Language Models". EMNLP(2021) [PDF] [Code]
  4. GhostBERT: "GhostBERT: Generate More Features with Cheap Operations for BERT". ACL(2021) [PDF] [Code]
  5. LeeBERT: "LeeBERT: Learned Early Exit for BERT with cross-level optimization". ACL(2021) [PDF]
  6. "Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search". ACL(2021) [PDF] [Code]
  7. "Distilling Knowledge from BERT into Simple Fully Connected Neural Networks for Efficient Vertical Retrieval". CIKM(2021) [PDF]

Compression

  1. DistilBERT: "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter". arXiv(2019) [PDF] [Code]
  2. PKD: "Patient Knowledge Distillation for BERT Model Compression". EMNLP(2019) [PDF] [Code]
  3. "Distilling Task-Specific Knowledge from BERT into Simple Neural Networks". arXiv(2019) [PDF]
  4. Q8BERT: "Q8BERT: Quantized 8Bit BERT". 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS 2019 [PDF]
  5. ALBERT: "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". ICLR(2020) [PDF] [Code]
  6. TinyBERT: "TinyBERT: Distilling BERT for Natural Language Understanding". EMNLP(2020) [PDF] [Code]
  7. Layerdrop: "Reducing Transformer Depth on Demand with Structured Dropout". ICLR(2020) [PDF] [Code]
  8. Q-BERT: "Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT". AAAI(2020) [PDF]
  9. MobileBERT: "MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices". ACL(2020) [PDF] [Code]
  10. "Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning". 5th Workshop on Representation Learning for NLP(2020) [PDF] [Code]
  11. MiniLM: "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers". arXiv(2020) [PDF] [Code]
  12. FastBERT: "FastBERT: a Self-distilling BERT with Adaptive Inference Time". ACL(2020) [PDF] [Code]
  13. DeeBERT: "DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference". ACL(2020) [PDF] [Code]
  14. "Compressing Large-Scale Transformer-Based Models: A Case Study on BERT". TACL(2021) [PDF]
  15. "Winning the Lottery with Continuous Sparsification". NeurIPS(2020) [PDF] [Code]
  16. SqueezeBERT: "SqueezeBERT: What can computer vision teach NLP about efficient neural networks?". SustaiNLP(2020) [PDF]
  17. Audio ALBERT: "Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation". SLT(2021) [PDF] [Code]
  18. T2R: "Finetuning Pretrained Transformers into RNNs". EMNLP(2021) [PDF] [Code]
  19. "Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression". EMNLP(2021) [PDF] [Code]
  20. Meta-KD: "Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains". ACL(2021) [PDF] [Code]
  21. "Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization". ACL(2021) [PDF] [Code]
  22. BinaryBERT: "BinaryBERT: Pushing the Limit of BERT Quantization". ACL(2021) [PDF] [Code]
  23. AutoTinyBERT: "AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models". ACL(2021) [PDF] [Code]
  24. "Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation". ACL(2021) [PDF] [Code]
  25. "Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators". ACL(2021) [PDF] [Code]
  26. NAS-BERT: "NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search". KDD(2021) [PDF]

PLM Adaptation

Two-Stage

  1. "Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks". arXiv(2018) [PDF] [Code]
  2. "How to Fine-Tune BERT for Text Classification?". CCL(2019) [PDF]
  3. "Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks". ACL(2020) [PDF] [Code]
  4. "Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?". ACL(2020) [PDF]
  5. "What to Pre-Train on? Efficient Intermediate Task Selection". EMNLP(2021) [PDF] [Code]
  6. "On the Influence of Masking Policies in Intermediate Pre-training". EMNLP(2021) [PDF]
  7. TADPOLE: "TADPOLE: Task ADapted Pre-Training via AnOmaLy DEtection". EMNLP(2021) [PDF]

Multi-Task

  1. MT-DNN: "Multi-Task Deep Neural Networks for Natural Language Understanding". ACL(2019) [PDF] [Code]
  2. "BAM! Born-Again Multi-Task Networks for Natural Language Understanding". ACL(2019) [PDF] [Code]
  3. "Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding". arXiv(2019) [PDF] [Code]
  4. GradTS: "GradTS: A Gradient-Based Automatic Auxiliary Task Selection Method Based on Transformer Networks". EMNLP(2021) [PDF]
  5. "What's in Your Head? Emergent Behaviour in Multi-Task Transformer Models". EMNLP(2021) [PDF]
  6. MTAdam: "MTAdam: Automatic Balancing of Multiple Training Loss Terms". EMNLP(2021) [PDF]
  7. Muppet: "Muppet: Massive Multi-task Representations with Pre-Finetuning". EMNLP(2021) [PDF]
  8. "The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders". EMNLP(2021) [PDF] [Code]
  9. BERTGen: "BERTGen: Multi-task Generation through BERT". ACL(2021) [PDF] [Code]
  10. "Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks". ACL(2021) [PDF] [Code]

Adapater

  1. "BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning". ICML(2019) [PDF] [Code]
  2. Adapter: "Parameter-Efficient Transfer Learning for NLP". ICML(2019) [PDF] [Code]
  3. AdapterDrop: "AdapterDrop: On the Efficiency of Adapters in Transformers". EMNLP(2021) [PDF]
  4. "On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation". ACL(2021) [PDF]
  5. "Learning to Generate Task-Specific Adapters from Task Description". ACL(2021) [PDF] [Code]

Prompt

  1. PET: "Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference". EACL(2021) [PDF] [Code]
  2. "It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". NAACL(2021) [PDF] [Code]
  3. "Prefix-Tuning: Optimizing Continuous Prompts for Generation". arXiv(2021) [PDF]
  4. LM-BFF: "Making Pre-trained Language Models Better Few-shot Learners". ACL(2021) [PDF] [Code]
  5. "What Makes Good In-Context Examples for GPT-3?". arXiv(2021) [PDF] [Code]
  6. "The Power of Scale for Parameter-Efficient Prompt Tuning". EMNLP(2021) [PDF] [Code]
  7. "Finetuned Language Models Are Zero-Shot Learners". arXiv(2021) [PDF]
  8. "Calibrate Before Use: Improving Few-shot Performance of Language Models". ICML(2021) [PDF] [Code]
  9. TransPrompt: "TransPrompt: Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification". EMNLP(2021) [PDF] [Code]
  10. SFLM: "Revisiting Self-training for Few-shot Learning of Language Model". EMNLP(2021) [PDF] [Code]
  11. ADAPET: "Improving and Simplifying Pattern Exploiting Training". EMNLP(2021) [PDF] [Code]

Others

  1. "To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks". RepL4NLP(2019) [PDF]
  2. "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models". NAACL(2019) [PDF] [Code]
  3. "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping". arXiv(2020) [PDF]
  4. SMART: "SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization". EMNLP(2020) [PDF] [Code]
  5. "Revisiting Few-sample BERT Fine-tuning". ICLR(2021) [PDF]
  6. Mirror-BERT: "Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders". EMNLP(2021) [PDF] [Code]
  7. "Pre-train or Annotate? Domain Adaptation with a Constrained Budget". EMNLP(2021) [PDF] [Code]
  8. AVocaDo: "AVocaDo: Strategy for Adapting Vocabulary to Downstream Domain". EMNLP(2021) [PDF]
  9. CHILD-TUNING: "Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning". EMNLP(2021) [PDF] [Code]
  10. "Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation". ACL(2021) [PDF] [Code]
  11. LexFit: "LexFit: Lexical Fine-Tuning of Pretrained Language Models". ACL(2021) [PDF] [Code]
  12. "Selecting Informative Contexts Improves Language Model Fine-tuning". ACL(2021) [PDF] [Code]
  13. "An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models". ACL(2021) [PDF] [Code]
  14. "How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?". NeurIPS(2021) [PDF] [Code]
Owner
RUCAIBox
An enthusiastic group that aims to create beautiful things with AI
RUCAIBox
Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations Created by Jiahao Pang, Duanshun Li, and Dong Tian from InterDigital In

InterDigital 21 Dec 29, 2022
A repo for materials relating to the tutorial of CS-332 NLP

CS-332-NLP A repo for materials relating to the tutorial of CS-332 NLP Contents Tutorial 1: Introduction Corpus Regular expression Tokenization Tutori

Alok singh 9 Feb 15, 2022
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

Soohwan Kim 40 Sep 19, 2022
Linking data between GBIF, Biodiverse, and Open Tree of Life

GBIF-biodiverse-OpenTree Linking data between GBIF, Biodiverse, and Open Tree of Life The python scripts will rely on opentree and Dendropy. To set up

2 Oct 03, 2022
NeoDays-based tileset for the roguelike CDDA (Cataclysm Dark Days Ahead)

NeoDaysPlus Reduced contrast, expanded, and continuously developed version of the CDDA tileset NeoDays that's being completed with new sprites for mis

0 Nov 12, 2022
Production First and Production Ready End-to-End Keyword Spotting Toolkit

Production First and Production Ready End-to-End Keyword Spotting Toolkit

223 Jan 02, 2023
Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

This repository contains the resources in our paper "Revisiting Pre-trained Models for Chinese Natural Language Processing", which will be published i

Yiming Cui 463 Dec 30, 2022
Search Git commits in natural language

NaLCoS - NAtural Language COmmit Search Search commit messages in your repository in natural language. NaLCoS (NAtural Language COmmit Search) is a co

Pushkar Patel 50 Mar 22, 2022
Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

Transformers-for-NLP-2nd-Edition @copyright 2022, Packt Publishing, Denis Rothman Contact me for any question you have on LinkedIn Get the book on Ama

Denis Rothman 150 Dec 23, 2022
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

Salesforce 564 Jan 08, 2023
Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

Junseong Kim 5.3k Jan 07, 2023
Chinese version of GPT2 training code, using BERT tokenizer.

GPT2-Chinese Description Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository

Zeyao Du 5.6k Jan 04, 2023
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (S

InstaDeep Ltd 72 Dec 09, 2022
Ask for weather information like a human

weather-nlp About Ask for weather information like a human. Goals Understand typical questions like: Hourly temperatures in Potsdam on 2020-09-15. Rai

5 Oct 29, 2022
Machine learning models from Singapore's NLP research community

SG-NLP Machine learning models from Singapore's natural language processing (NLP) research community. sgnlp is a Python package that allows you to eas

AI Singapore | AI Makerspace 21 Dec 17, 2022
DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches the answers out of 60 billion phrases in Wikipedia, it is also v

Jinhyuk Lee 543 Jan 08, 2023
2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

繁體中文場景文字辨識 程式碼說明 組別:這就是我 成員:蔣明憲 唐碩謙 黃玥菱 林冠霆 蕭靖騰 目錄 環境套件 安裝方式 資料夾布局 前處理-製作偵測訓練註解檔 前處理-製作分類訓練樣本 part.py : 從 json 裁切出分類訓練樣本 Class.py : 將切出來的樣本按照文字分類到各資料夾

HuanyueTW 3 Jan 14, 2022
In this Notebook I've build some machine-learning and deep-learning to classify corona virus tweets, in both multi class classification and binary classification.

Hello, This Notebook Contains Example of Corona Virus Tweets Multi Class Classification. - Classes is: Extremely Positive, Positive, Extremely Negativ

Khaled Tofailieh 3 Dec 06, 2022
Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning This repo is for Findings at EMNLP 2021 paper: Learn Cont

INK Lab @ USC 6 Sep 02, 2022
Leon is an open-source personal assistant who can live on your server.

Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal

Leon AI 11.7k Dec 30, 2022