Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

Last update: Jan 04, 2023

Related tags

Deep Learning BERT-Attack

Overview

BERT-ATTACK

Code for our EMNLP2020 long paper:

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Dependencies

Python 3.7
PyTorch 1.4.0
transformers 2.9.0
TextFooler

Usage

To train a classification model, please use the run_glue.py script in the huggingface transformers==2.9.0.

To generate adversarial samples based on the masked-LM, run

python bertattack.py --data_path data_defense/imdb_1k.tsv --mlm_path bert-base-uncased --tgt_path models/imdbclassifier --use_sim_mat 1 --output_dir data_defense/imdb_logs.tsv --num_label 2 --use_bpe 1 --k 48 --start 0 --end 1000 --threshold_pred_score 0

--data_path: We take IMDB dataset as an example. Datasets can be obtained in TextFooler.
--mlm_path: We use BERT-base-uncased model as our target masked-LM.
--tgt_path: We follow the official fine-tuning process in transformers to fine-tune BERT as the target model.
--k 48: The threshold k is the number of possible candidates
--output_dir : The output file.
--start: --end: in case the dataset is large, we provide a script for multi-thread process.
--threshold_pred_score: a score in cutting off predictions that may not be suitable (details in Section5.1)

Note

The datasets are re-formatted to the GLUE style.

Some configs are fixed, you can manually change them.

If you need to use similar-words-filter, you need to download and process consine similarity matrix following TextFooler. We only use the filter in sentiment classification tasks like IMDB and YELP.

If you need to evaluate the USE-results, you need to create the corresponding tensorflow environment USE.

For faster generation, you could turn off the BPE substitution.

As illustrated in the paper, we set thresholds to balance between the attack success rate and USE similarity score.

The multi-thread process use the batchrun.py script

You can run

cat cmd.txt | python batchrun.py --gpus 0,1,2,3

to simutaneously generate adversarial samples of the given dataset for faster generation. We use the IMDB dataset as an example.

Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

Related tags

Overview

BERT-ATTACK

Dependencies

Usage

Note

Owner

Linyang Li

A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

[Link]deep_portfolo - Use Reforcemet earg ad Supervsed learg to Optmze portfolo allocato []

A fast model to compute optical flow between two input images.

StarGAN - Official PyTorch Implementation (CVPR 2018)

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

Milano is a tool for automating hyper-parameters search for your models on a backend of your choice.

Unofficial Implementation of RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019)

A simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

Wordplay, an artificial Intelligence based crossword puzzle solver.

Neural Oblivious Decision Ensembles

A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners

Codes for "Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier"

Mini Software that give reminder to drink water as per your weight.

This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.

To model the probability of a soccer coach leave his/her team during Campeonato Brasileiro for 10 chosen teams and considering years 2018, 2019 and 2020.

Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

My coursework for Machine Learning (2021 Spring) at National Taiwan University (NTU)

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.