Norm-based Analysis of Transformer

Implementations for 2 papers introducing to analyze Transformers using vector norms:

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)
Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

This paper proposed to analyze attention, a core component of Transformer, using vector norms rather than attention weights.
Transformer analyses have been focused on mixing in attention and have typically observed attention weights.
However, in addition to attention weights, there are more factors to determine attention's outputs: the input vector itself and vector transformations.
Then, this paper proposed to analyze attention using vector norms considering them.
→ Check this paper's code: Code for emnlp2020.

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

This paper proposed to analyze attention block (i.e., attention, residual connection, and layer normalization) using vector norms.
Transformer analyses have been focused on mixing in attention.
However, there are components other than attention in Transformer, and they can play a role other than mixing.
Then, this paper proposed to expand the scope of Transformer analysis from attention into attention block.
→ Check this paper's code: Code for emnlp2021.

Citation

If you use our code for academic work, please cite:

@inproceedings{kobayashi-etal-2020-attention,  
   title = {Attention is Not Only a Weight: Analyzing Transformers with Vector Norms},  
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},  
   booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},  
   year = "2020",  
   url = "https://www.aclweb.org/anthology/2020.emnlp-main.574",  
   pages = "7057--7075",  
}
@inproceedings{kobayashi-etal-2021-incorporating,
   title = {Incorporating Residual and Normalization Layers into Analysis of Masked Language Models},
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},
   booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Proceeding (EMNLP)},
   year = "2021",
   url = "https://arxiv.org/abs/2109.07152",
   pages = "to appear",
}

Norm-based Analysis of Transformer

Related tags

Overview

Norm-based Analysis of Transformer

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Citation

Owner

Goro Kobayashi

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

This tool uses Deep Learning to help you draw and write with your hand and webcam.

Reproduces ResNet-V3 with pytorch

SigOpt wrappers for scikit-learn methods

Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

Official repo for QHack—the quantum machine learning hackathon

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

A python script to lookup Passport Index Dataset

Scalable Multi-Agent Reinforcement Learning

Deep metric learning methods implemented in Chainer

Using deep actor-critic model to learn best strategies in pair trading

Machine-in-the-Loop Rewriting for Creative Image Captioning

PyTorch implementation of PSPNet segmentation network

Implementation detail for paper "Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet"

DA2Lite is an automated model compression toolkit for PyTorch.

Official repository for "On Generating Transferable Targeted Perturbations" (ICCV 2021)

It's like Shape Editor in Maya but works with skeletons (transforms).

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

This repository includes different versions of the prescribed-time controller as Simulink blocks and MATLAB script codes for engineering applications.