AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
arXiv link: upcoming
To be published in Findings of NAACL 2022
Authors: Chin-Lun Fu*, Zih-Ching Chen*, Yun-Ru Lee, Hung-yi Lee
Overview
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed. AdapterBias adds a token-dependent shift to the hidden output of transformer layers to adapt to downstream tasks with only a vector and a linear layer.
Dataset
We use GLUE Benchmark as our dataset. You can download all datasets from the website.
Training
cd src
python exp.py \
--adapter True \
--GLUE_path <ur_GLUE_path> \
--output_path <output_path> \
--model <model name> \
--task <the task u want to run> \
--epoch 100 \
--lr 0.0001 \
--max_len 512 \
--batch_size 32 \
-sor--seedspecifies the random seed-gor--GLUE_pathspecifies the path of your GLUE dataset.-oor--output_pathspecifies the path of saved model and saved predicted file.-mor--modelspecifies the pre-trained language model (PLM) you used in training.- Some examples:
bert-base,bert-large,roberta-base,roberta-large
- Some examples:
-tor--taskspecifies the downstream task.- Some examples:
cola,mnli,qnli,qqp,mrpc,rte,sst,sts
- Some examples:
-aor--adapterspecifies whether you adding our AdapterBias in PLM--share_alphaspecifies whether you share the same alpha in AdapterBias in all transformer layers
Inference
After you run the training, you can automatically get the prediction file in <output_path>/result/. Also, the saved model is in <output_path>/model/.
Running all nine tasks of GLUE benchmark, you can sumbit the prediction files to the website.


