Huggingface Transformers + Adapters = ❤️

Overview

adapter-transformers

A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models

Tests GitHub PyPI

adapter-transformers is an extension of HuggingFace's Transformers library, integrating adapters into state-of-the-art language models by incorporating AdapterHub, a central repository for pre-trained adapter modules.

💡 Important: This library can be used as a drop-in replacement for HuggingFace Transformers and regularly synchronizes new upstream changes. Thus, most files in this repository are direct copies from the HuggingFace Transformers source, modified only with changes required for the adapter implementations.

Installation

adapter-transformers currently supports Python 3.6+ and PyTorch 1.3.1+. After installing PyTorch, you can install adapter-transformers from PyPI ...

pip install -U adapter-transformers

... or from source by cloning the repository:

git clone https://github.com/adapter-hub/adapter-transformers.git
cd adapter-transformers
pip install .

Getting Started

HuggingFace's great documentation on getting started with Transformers can be found here. adapter-transformers is fully compatible with Transformers.

To get started with adapters, refer to these locations:

  • Colab notebook tutorials, a series notebooks providing an introduction to all the main concepts of (adapter-)transformers and AdapterHub
  • https://docs.adapterhub.ml, our documentation on training and using adapters with adapter-transformers
  • https://adapterhub.ml to explore available pre-trained adapter modules and share your own adapters
  • Examples folder of this repository containing HuggingFace's example training scripts, many adapted for training adapters

Citation

If you use this library for your work, please consider citing our paper AdapterHub: A Framework for Adapting Transformers:

@inproceedings{pfeiffer2020AdapterHub,
    title={AdapterHub: A Framework for Adapting Transformers},
    author={Pfeiffer, Jonas and
            R{\"u}ckl{\'e}, Andreas and
            Poth, Clifton and
            Kamath, Aishwarya and
            Vuli{\'c}, Ivan and
            Ruder, Sebastian and
            Cho, Kyunghyun and
            Gurevych, Iryna},
    booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
    pages={46--54},
    year={2020}
}
Comments
  • "Parallel" option for training? Parallel adapter outputs required (without interacting with each other).

    Hello,

    Thanks for this nice framework 👍 . I might be asking something that isn't yet possible but wanted to at least try asking!

    I am trying to feed two BERT-based model's outputs to subsequent NN. This requires having two BERT models to be loaded, however, the memory consumption becomes too high if I load two BERT models. To remedy this, I was wondering if I could do something like "Parallel" in training time. (FYI, I am not trying to dynamically drop the first few layers and simply trying to create two BERT forward paths with lesser memory consumption)

    I understand that active adapters can be switched by set_active_adapters(). (Actually, could you confirm if my understanding is correct?) But, this doesn't seem to fit my purpose as, in my case, I need both adapters to output independent representation based on respective adapters.

    Is there anyways that I can make adapters not interact with each other on the forward path while not loading original BERT parameters twice?

    • Making this question even more complex, I also need to make one adapter's parameters to be non-differentiable while requiring them in the forward loop. Any ideas perhaps? :)
    question 
    opened by leejayyoon 18
  • ImportError: cannot import name 'AutoModelWithHeads' from 'transformers'

    ImportError: cannot import name 'AutoModelWithHeads' from 'transformers'

    Hi I am trying with this example colab: https://colab.research.google.com/github/Adapter-Hub/website/blob/master/app/static/notebooks/Adapter_Quickstart_Training.ipynb#scrollTo=Lbwb3NRf8mBF

    getting this error:

    Traceback (most recent call last):
      File "test.py", line 11, in <module>
        from transformers import AutoTokenizer, EvalPrediction, GlueDataset, GlueDataTrainingArguments, AutoModelWithHeads, AdapterType
    ImportError: cannot import name 'AutoModelWithHeads' from 'transformers' (/idiap/user/rkarimi/libs/anaconda3/envs/adapter/lib/python3.7/site-packages/transformers/__init__.py)
    

    versions

    (adapter) [email protected]:/idiap/user/rkarimi/dev/internship/seq2seq/adapter-transformers$ conda list | grep transformers
    adapter-transformers      1.0.1                     <pip>
    transformers              3.5.1                     <pip>
    (adapter) [email protected]:/idiap/user/rkarimi/dev/internship/seq2seq/adapter-transformers$ conda list | grep pytorch
    pytorch-lightning         1.0.4                     <pip>
    adapter hub from github is installed
    
    bug 
    opened by rabeehkarimimahabadi 17
  • training the language adapters in the MAD-X paper

    training the language adapters in the MAD-X paper

    Hi I would need to train language adapters as done in MAD-X paper, I have downloaded wikipedia data, but these are very large-scale data and so far I did not managed to train them, I was wondering if you could share with me the script that you managed to train the language adapters, thank you very much in advance.

    question 
    opened by dorost1234 13
  • Add t5 adapter

    Add t5 adapter

    Followed the pattern of Bart to add adapters to T5. One change is that whereas Bart has separate classes for encoder and decoder, T5 does not. So I am using the is_decoder for changes between encoder and decoder classes, such as adding cross_attention adapters and adding invertible adapters.

    I'm working on some testing.

    opened by AmirAktify 12
  • Training an Adapter using own classification head and pytorch training loop

    Training an Adapter using own classification head and pytorch training loop

    Details

    Hello ! I want to add adapter approach in my text-classification pre-trained bert, but I did not find a good explanation in the documentation on how to that. My model class is the following:

    class BertClassifier(nn.Module):
        """Bert Model for Classification Tasks."""
        def __init__(self, freeze_bert=True):
            """
             @param    bert: a BertModel object
             @param    classifier: a torch.nn.Module classifier
             @param    freeze_bert (bool): Set `False` to fine-tune the BERT model
            """
            super(BertClassifier, self).__init__()
    
            # Instantiate BERT model
            # Specify hidden size of BERT, hidden size of our classifier, and number of labels
            self.bert = BertAdapterModel.from_pretrained(PREETRAINED_MODEL')
            self.D_in = 1024 
            self.H = 512
            self.D_out = 2
            
    
            # Add a new adapter
            self.bert.add_adapter("thermo_cl",set_active=True)
            self.bert.train_adapter(["thermo_cl"])
    
     
            # Instantiate the classifier head with some one-layer feed-forward classifier
            self.classifier = nn.Sequential(
                nn.Linear(self.D_in, 512),
                nn.Tanh(),
                nn.Linear(512, self.D_out),
                nn.Tanh()
            )
     
             # Freeze the BERT model
            if freeze_bert:
                for param in self.bert.parameters():
                    param.requires_grad = True
    
    
        def forward(self, input_ids, attention_mask):
            ''' Feed input to BERT and the classifier to compute logits.
             @param    input_ids (torch.Tensor): an input tensor with shape (batch_size,
                           max_length)
             @param    attention_mask (torch.Tensor): a tensor that hold attention mask
                           information with shape (batch_size, max_length)
             @return   logits (torch.Tensor): an output tensor with shape (batch_size,
                           num_labels) '''
             # Feed input to BERT
            outputs = self.bert(input_ids=input_ids,
                                 attention_mask=attention_mask)
             
             # Extract the last hidden state of the token `[CLS]` for classification task
            last_hidden_state_cls = outputs[0][:, 0, :]
     
             # Feed input to classifier to compute logits
            logits = self.classifier(last_hidden_state_cls)
     
            return logits
    

    The training loop is the following:

    def initialize_model(epochs):
        """ Initialize the Bert Classifier, the optimizer and the learning rate scheduler."""
        # Instantiate Bert Classifier
        bert_classifier = BertClassifier(freeze_bert=False) #false=freezed
    
        # Tell PyTorch to run the model on GPU
        bert_classifier = bert_classifier.to(device)
    
        # Create the optimizer
        optimizer = AdamW(bert_classifier.parameters(),
                          lr=lr,    # Default learning rate
                          eps=1e-8    # Default epsilon value
                          )
    
        # Total number of training steps
        total_steps = len(train_dataloader) * epochs
    
        # Set up the learning rate scheduler
        scheduler = get_linear_schedule_with_warmup(optimizer,
                                                    num_warmup_steps=0, # Default value
                                                    num_training_steps=total_steps)
    
        return bert_classifier, optimizer, scheduler
    
    def train(model, train_dataloader, val_dataloader, valid_loss_min_input, checkpoint_path, best_model_path, start_epochs, epochs, evaluation=True):
    
        """Train the BertClassifier model."""
        # Start training loop
        logging.info("--Start training...\n")
    
        # Initialize tracker for minimum validation loss
        valid_loss_min = valid_loss_min_input 
    
    
        for epoch_i in range(start_epochs, epochs):
    
                              ..............................
    
         if evaluation == True:
                # After the completion of each training epoch, measure the model's performance
                # on our validation set.
                val_loss, val_accuracy = evaluate(model, val_dataloader)
    
                # Print performance over the entire training data
                time_elapsed = time.time() - t0_epoch
                
                logging.info(f"{epoch_i + 1:^7} | {'-':^7} | {avg_train_loss:^12.6f} | {val_loss:^10.6f} | {val_accuracy:^10.6f} | {time_elapsed:^9.2f}")
    
                logging.info("-"*70)
            logging.info("\n")
    
             # create checkpoint variable and add important data
            checkpoint = {
                'epoch': epoch_i + 1,
                'valid_loss_min': val_loss,
                'state_dict': model.state_dict(),
                'optimizer': optimizer.state_dict(),
            }
            
            # save checkpoint
            save_ckp(checkpoint, False, checkpoint_path, best_model_path)
            
            ## TODO: save the model if validation loss has decreased
            if val_loss <= valid_loss_min:
                print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(valid_loss_min,val_loss))
                # save checkpoint as best model
                save_ckp(checkpoint, True, checkpoint_path, best_model_path)
                valid_loss_min = val_loss
    
    
        model.save_adapter("./final_adapter", "thermo_cl")
        logging.info("-----------------Training complete--------------------------")
    
    
    bert_classifier, optimizer, scheduler = initialize_model(epochs=n_epochs)
    train(model = bert_classifier....)
    

    As you can see I have my own personalized classification head, so I don't want to use the .add_classification_head() method. Is it correct to train and activate the adapter in this way? I would like to know if I'm using adapter properly and also how to save the checkpoint and my model weights because at the end of the training (where i suppose to save the adapter) I receive this error:

    AttributeError: 'BertClassifier' object has no attribute 'save_adapter'
    

    Thanks for the help!

    question Stale 
    opened by Ch-rode 11
  • Merge with original transformers library

    Merge with original transformers library

    🚀 Feature request

    Merge this into the original transformers library.

    Motivation

    This library is awesome so thanks a lot but it would be much more convenient to have this merged into the original transformers library. The Huggingface team seems to be focused on adding lightweight options for their models and adapters are huge time-and-memory-savers for multitask use cases and would be a great addition to the transformers library.

    Your contribution

    You've done the integration here already so it should be straightforward but happy to help. I've posted an issue on huggingface's end as well.

    discussion Stale 
    opened by salimmj 11
  • Unintuitive slowdown in data loading and model updating on using adapters

    Unintuitive slowdown in data loading and model updating on using adapters

    Environment info

    • transformers version: 1.0.1
    • Platform: Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-glibc2.10
    • Python version: 3.8.5
    • PyTorch version (GPU?): 1.7.0 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: Yes

    Who can help: @LysandreJik @patrickvonplaten

    Model I am using: Bert

    Language I am using the model on:English

    Adapter setup I am using (if any): HoulsbyConfig

    The problem arises when using: My own modified scripts: I want to use adapters for a project of mine, which will require fine-tuning BERT multiple times. In order to get an understanding of how much speedup I shall get from using adapters, I profiled the various steps in the training loop of BERT, both with and without the use of adapters The tasks I am working on is: Stanford Natural Language inference(SNLI)

    To reproduce

    Steps to reproduce the behavior: The following function is executed for a period of 4 hours on identical GPUs(via an LSF bach system) once with UseAdapter set to true and once with it set to False. The path contains a preloaded and tokenized version of the SNLI training set(as well as the test and dev sets, dropped here via underscores)

    def load_and_train(path, UseAdapter):
        x_train,y_train,a_train,t_train,_,_,_,_,_,_,_,_=load(open(path,"rb"))
        train_inst=torch.tensor(x_train)
        train_att=torch.tensor(a_train)
        train_types=torch.tensor(t_train)
        train_targ=torch.tensor(y_train)
        train_data = TensorDataset(train_inst, train_att, train_types,train_targ)
        train_sampler = RandomSampler(train_data)
        train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=32)
        model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
        if UseAdapter:
            model.add_adapter("SNLI",AdapterType.text_task,HoulsbyConfig().__dict__)
            model.train_adapter(["SNLI"])
            model.set_active_adapters(["SNLI"])
        model.cuda()
        optimizer=AdamW(model.parameters(),lr=1e-4)
        scheduler=get_linear_schedule_with_warmup(optimizer,0,len(train_dataloader)*EPOCHS)
        iter=0
        time_load=0
        time_cler=0
        time_forw=0
        time_back=0
        time_updt=0
        for e in range(15):
            model.train()
            for batch in train_dataloader:
                last=time()
                x=batch[0].cuda()
                a=batch[1].cuda()
                t=batch[2].cuda()
                y=batch[3].cuda()
                time_load+=time()-last
                last=time()
                model.zero_grad()
                time_cler+=time()-last
                last=time()
                outputs = model(x, token_type_ids=t, attention_mask=a, labels=y)
                time_forw+=time()-last
                last=time()
                loss=outputs[0]
                loss.backward()
                time_back+=time()-last
                last=time()
                optimizer.step()
                scheduler.step()
                time_updt+=time()-last
                iter+=1
                print(time_load,time_cler,time_forw,time_back,time_updt)
    

    Expected behavior

    1. With Adapters the trainer is able to run through more batches than without by the time the job gets timed out
    2. Per Batch time_load is identical for both cases
    3. Per Batch time_cler is slightly lower with adapters due to the presence of fewer gradients
    4. Per Batch time_forw is slightly higher with adapters due to extra layers that are introduced
    5. Per Batch time_back is significantly lower with adapters since it needs to save fewer gradients
    6. Per Batch time_updt is lower with adapters due to having fewer parameters to update

    Observed Behaviour

    Overall times(seconds):

    Adapter | Load Time | Clear Time | Forward Prop | Backward Prop | Update | Total | No of Batches -- | -- | -- | -- | -- | -- | -- | -- No | 9.141064644 | 349.405822 | 873.8870151 | 11770.82554 | 1159.772 | 14163.03 | 69022 Yes | 2721.683394 | 394.4980106 | 1652.686945 | 3192.402303 | 6304.335 | 14265.61 | 95981

    Per Batch Times(seconds):

    Adapter | Load Time | Clear Time | Forward Prop | Backward Prop | Update -- | -- | -- | -- | -- | -- No | 0.000132437 | 0.005062238 | 0.012660992 | 0.1705373 | 0.016803 Yes | 0.028356481 | 0.004110168 | 0.017218897 | 0.033260774 | 0.065683

    As is evident from above, points 2 and 6 above are not satisfied in this output. Note that similar observations were made in 2 reruns of the experiment. It is unclear to me if there is an explanation I am missing or if this is an implementation issue.

    bug 
    opened by cs1160701 9
  • Loading custom adapters and 'output_attentions' for AdapterFusion

    Loading custom adapters and 'output_attentions' for AdapterFusion

    Question

    Information

    Model I am using (Bert, XLNet ...): XLM-RoBERTa-base

    Language I am using the model on (English, Chinese ...): Korean

    Adapter setup I am using (if any):

    The problem arises when using:

    • [X] the official example scripts: (give details below)
    • [ ] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [X] my own task or dataset: (give details below)
    • Datasets: KorNLI and KorSTS (Machine translated Korean MNLI & STS-B dataset)
    • Its format and size are the same as the original datasets (MNLI & STS-B)

    Background

    What I'm doing is that:

    1. train Task-Adapters for KorNLI and KorSTS on the XLM-RoBERTa-base model (to train on Korean datasets) using the official code, 'run_glue_alt.py'
    2. fusion both adapters with a fusion layer using 'run_fusion_glue.py'

    Questions

    Sorry that I'm not familiar with the adapter-transformers codebase. Here are some questions about the AdapterFusion framework.

    1. Is it available to load my own pre-trained adapters using 'model.load_adapter' function in the current framework? (I'm using the latest version of adapter-transformers')
    2. The performance on the target task (KorSTS) composed with KorSTS and KorNLI single task adapters is markedly lower than the single task adapter trained on the KorSTS dataset. Even with various hyperparameter (batch size, epoch, learning rate, fusion config, ...) search, the performance doesn't seem to be improved. Is there any way to check whether the fusion layer is trained properly?
    3. Connected with the questions above, is it possible to investigate the attention distribution of the trained fusion layer? I've checked there is an option 'output_attentions' defined in the BertModel class, but I could not find a way to output attention weights of the fusion layers, not the self-attention layers of the original pre-trained model.

    Environment info

    • transformers version:
    • Platform:
    • Python version: 3.6.3
    • PyTorch version (GPU?): 1.4
    • Tensorflow version (GPU?):
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: No, I'm using a single GPU
    bug question 
    opened by bigkunzi 9
  • TypeError: unhashable type: 'Stack' error raised when using Parallel adapter heads

    TypeError: unhashable type: 'Stack' error raised when using Parallel adapter heads

    Environment info

    • adapter-transformers version:
    • Platform: Linux
    • Python version: 3.6.8
    • PyTorch version (GPU?): GPU / 1.7
    • Tensorflow version (GPU?): NA
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: Using nn.DataParallel

    Information

    Model I am using (Bert, XLNet ...): BERT pretrained model with 3 custom adapters + heads are used.

    Language I am using the model on (English, Chinese ...): EN

    Adapter setup I am using (if any): 3 Adapters (with default configuration) and 3 Classification Head.

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [ ] my own modified scripts: (give details below)

    The tasks I am working on is: Multi-task finetuning using AdapterHub

    Error below :

     (from logs) active head : [<bound method AdapterCompositionBlock.last of Stack[combined, resource_type, action]>]
    
    Traceback (most recent call last):
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py", line 1092, in forward
        head_inputs, head_name=head, attention_mask=attention_mask, return_dict=return_dict, **kwargs
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/transformers/adapters/heads.py", line 509, in forward_head
        if head not in self.heads:
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 304, in __contains__
        return key in self._modules
    TypeError: unhashable type: 'Stack'
    

    Modified code below

    
    model = AutoModelWithHeads.from_pretrained('bert_base_uncased')
    
    # 3 adapters and classification heads are added.
    model.add_adapter('name_a')
    model.add_classification_head('name_a',  {'num_labels' : 100})
    
    model.add_adapter('name_b')
    model.add_classification_head('name_b')
    
    model.add_adapter('name_c')
    model.add_classification_head('name_c',  {'num_labels' : 5})
    
    
    # Use `Parallel` to enable multiple active heads.
    adapter_names  = ['name_a', 'name_b', 'name_c']
    model.active_heads =  ac.Parallel(adapter_names)
    
    for name in adapter_names:
        model.train_adapter(name)
        
    # Invoke forward pass. This will trigger the error. 
    model(inputs)
    
    

    Expected behavior

    Model forward pass should work.

    bug 
    opened by hchoi-moveworks 8
  • Hinglish Sentiment Adapter

    Hinglish Sentiment Adapter

    🌟 New Adapter setup

    Model and Data Description

    Hinglish: Romanized version of Hindi, and is immensely popular in India, where Hindi is spoken by millions of people but typed quite often in Roman script

    Dataset: SemEval 2020 Task 9 Sentiment Analysis: 3 classes, +ve, -ve and neutral

    Open source status

    • [x] Code Implementation for the Adapter: https://colab.research.google.com/drive/19lofRd9n142xJCtUteZb5L_r7spGcGLL?usp=sharing
    • [x] Past Work: Accepted Paper, Code and Model Weights
    • [x] Who are the authors: @NirantK and @meghanabhange

    What I need help with

    • [x] Because there were no examples other than Glue Datasets, I ended up implementing a new HinglishDataset class and other skeleton code -- I'd appreciate a review if I got something wrong

    Next Steps

    If all is well in the code above, I'd like to continue along and contribute an adapter for Hinglish under the Sentiment task.

    enhancement 
    opened by NirantK 8
  • Train adapters without Hugging Face Trainer scripts

    Train adapters without Hugging Face Trainer scripts

    Hi, I was looking into example scripts for Adapter-Hub and almost all *_no_trainer.py scripts were not using adapters at all. Are you guys planning to add those scripts soon? I can also help in porting trainer scripts to no_trainer scripts if someone can guide me about what all changes will be required for that. Thank you!

    cc: @calpt

    question Stale 
    opened by bhavitvyamalik 7
  • T5: Missing tied weights crash `accelerate`

    T5: Missing tied weights crash `accelerate`

    First opened at https://github.com/huggingface/accelerate/issues/958 . When huggingface accelerate is used via device_map='auto', there is a weight tied with the missing lm_head that stimulates a crash inside the device map planning code. It would be nice if there were a clear way to retain the head and tied weight during loading.

    Environment info

    • adapter-transformers version: 3.1.0
    • Platform: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-glibc2.17
    • Python version: 3.9.16+
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): 1.13.1+cu117 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes, device_map='auto'
    • Using distributed or parallel set-up in script?: no

    Information

    Model I am using (Bert, XLNet ...): google/flan-t5-base

    Language I am using the model on (English, Chinese ...): n/a

    Adapter setup I am using (if any): AutoAdapterModel.from_pretrained

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [x] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [x] my own task or dataset: (give details below)

    To reproduce

    Steps to reproduce the behavior:

    import transformers
    model = transformers.AutoAdapterModel.from_pretrained('google/flan-t5-base', device_map='auto')
    

    Result:

    ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
    │ /home/user/scratch/test-2023-01-07.py:2 in <module>                                              │
    │                                                                                                  │
    │   1 import transformers                                                                          │
    │ ❱ 2 model = transformers.AutoAdapterModel.from_pretrained('google/flan-t5-base', device_map=     │
    │   3                                                                                              │
    │                                                                                                  │
    │ /home/user/.local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py:446 in    │
    │ from_pretrained                                                                                  │
    │                                                                                                  │
    │   443 │   │   │   return model_class.from_pretrained(pretrained_model_name_or_path, *model_arg   │
    │   444 │   │   elif type(config) in cls._model_mapping.keys():                                    │
    │   445 │   │   │   model_class = _get_model_class(config, cls._model_mapping)                     │
    │ ❱ 446 │   │   │   return model_class.from_pretrained(pretrained_model_name_or_path, *model_arg   │
    │   447 │   │   raise ValueError(                                                                  │
    │   448 │   │   │   f"Unrecognized configuration class {config.__class__} for this kind of AutoM   │
    │   449 │   │   │   f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapp   │
    │                                                                                                  │
    │ /home/user/.local/lib/python3.9/site-packages/transformers/modeling_utils.py:2121 in             │
    │ from_pretrained                                                                                  │
    │                                                                                                  │
    │   2118 │   │   │   no_split_modules = model._no_split_modules                                    │
    │   2119 │   │   │   # Make sure tied weights are tied before creating the device map.             │
    │   2120 │   │   │   model.tie_weights()                                                           │
    │ ❱ 2121 │   │   │   device_map = infer_auto_device_map(                                           │
    │   2122 │   │   │   │   model, no_split_module_classes=no_split_modules, dtype=torch_dtype, max_  │
    │   2123 │   │   │   )                                                                             │
    │   2124                                                                                           │
    │                                                                                                  │
    │ /shared/src/accelerate/src/accelerate/utils/modeling.py:545 in infer_auto_device_map             │
    │                                                                                                  │
    │   542 │   │   elif tied_param is not None:                                                       │
    │   543 │   │   │   # Determine the sized occupied by this module + the module containing the ti   │
    │   544 │   │   │   tied_module_size = module_size                                                 │
    │ ❱ 545 │   │   │   tied_module_index = [i for i, (n, _) in enumerate(modules_to_treat) if n in    │
    │   546 │   │   │   tied_module_name, tied_module = modules_to_treat[tied_module_index]            │
    │   547 │   │   │   tied_module_size += module_sizes[tied_module_name] - module_sizes[tied_param   │
    │   548 │   │   │   if current_max_size is not None and current_memory_used + tied_module_size >   │
    ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
    IndexError: list index out of range
    

    Expected behavior

    No crash. Ability to tie weights with seq2seq lm_head.

    bug 
    opened by xloem 0
  • Fusing task-specific and task-agnostic adapters

    Fusing task-specific and task-agnostic adapters

    Environment info

    • adapter-transformers version: 3.1.0
    • Platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.17
    • Python version: 3.8.11
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): 1.12.1 (False)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes
    • Using distributed or parallel set-up in script?: no

    Details

    Hi, I am trying to combine task-specific and task-agnostic adapters. Assume I have three tasks Task-A, Task-B, and, Task-C. I will add task-specific adapters and task-agnostic adapters as follows

    import transformers.adapters.composition as ac
    
    model.add_adapter("TASK-A")
    model.add_adapter("TASK-B")
    model.add_adapter("TASK-C")
    
    model.add_adapter("TASK-Agnostic")
    

    Now I want to fuse the task-specific adapter and task-agnostic adapter dynamically i.e, depending on what the task is.

    Should I fuse the adapters as follows?

    model.add_adapter_fusion(["TASK-A", "TASK-Agnostic"])
    model.add_adapter_fusion(["TASK-B", "TASK-Agnostic"])
    model.add_adapter_fusion(["TASK-C", "TASK-Agnostic"])
    

    Inside the forward_pass of Trainer, I will set the active adapters as follows

    task_name = get_task_name()
    model.active_adapters = ac.Fuse(task_name, "TASK-Agnostic")
    

    Is this the right way to implement this?

    Thanks

    question 
    opened by murthyrudra 0
  • Stacking two parallel composition blocks

    Stacking two parallel composition blocks

    Hi,

    Can I stack two Parallel composition blocks like this? ac.Stack(ac.Parallel('a', 'b'), ac.Parapllel('c', 'd'))

    I found that the inputs will only be replicated once, but should be twice. Could you help me fix it?

    Thanks!

    question 
    opened by HZQ950419 0
  • Add adapter to AutoModelForSequenceClassification model

    Add adapter to AutoModelForSequenceClassification model

    Environment info

    • adapter-transformers version: newest
    • Platform: Azure ML
    • Python version: 3.8
    • PyTorch version (GPU?):

    Details

    I try to use AutoModelForSequenceClassification model (using BART). The document is not so clear so I just load it directly and add adapter(LoRA) to it. When I run the trainer, I got the following errors

    RestException: INVALID_PARAMETER_VALUE: Response: {'Error': {'Code': 'ValidationError', 'Severity': None, 'Message': 'No more than 255 characters per params Value. Request contains 1 of greater length.', 'MessageFormat': None, 'MessageParameters': None, 'ReferenceCode': None, 'DetailsUri': None, 'Target': None, 'Details': [], 'InnerError': None, 'DebugInfo': None, 'AdditionalInfo': None}, 'Correlation': {'operation': '04d45ce3752c5e51c54e71f3950411ca', 'request': '6d216d8faea19d26'}, 'Environment': 'westus', 'Location': 'westus', 'Time': '2023-01-04T17:45:03.5650777+00:00', 'ComponentName': 'mlflow', 'error_code': 'INVALID_PARAMETER_VALUE'}

    Any ideas on how to solve it?

    question 
    opened by andyzengmath 0
  • Support for openai Whisper

    Support for openai Whisper

    🌟 New adapter setup

    Support for openai Whisper

    Add adapter integration for whisper.

    Open source status

    • [x] the model implementation is available: official code hf
    • [x] the model weights are available: hf
    • [x] who are the authors: @jongwook @ArthurZucker @sgugger
    enhancement 
    opened by karynaur 0
  • Add adapter configuration strings & restructure adapter method docs

    Add adapter configuration strings & restructure adapter method docs

    Configuration strings

    This PR adds the possibility to use flexible adapter configuration strings which allow specifying custom config attributes. Examples:

    • Set config attributes: model.add_adapter("name", config="parallel[reduction_factor=2]")
    • Config union model.add_adapter("name", config="prefix_tuning|parallel")
    • more examples: https://github.com/calpt/adapter-transformers/blob/8df62b9de2a8ab51115b191aca35b2fb53c96539/tests_adapters/test_adapter_config.py#L95-L102

    Documentation: https://github.com/calpt/adapter-transformers/blob/8df62b9de2a8ab51115b191aca35b2fb53c96539/adapter_docs/overview.md

    Configuration strings can allow passing complex configurations e.g. via command line.

    Documentation restructuring

    The adapter method documentation is now split into three pages:

    • Overview and Configuration: introduction, table, configuration
    • Adapter Methods
    • Method Combinations
    opened by calpt 0
Releases(adapters3.1.0)
  • adapters3.1.0(Sep 15, 2022)

    Based on transformers v4.21.3

    New

    New adapter methods

    New model integrations

    • Add Deberta and DebertaV2 integration(@hSterz via #340)
    • Add Vision Transformer integration (@calpt via #363)

    Misc

    • Add adapter_summary() method (@calpt via #371): More info
    • Return AdapterFusion attentions using output_adapter_fusion_attentions argument (@calpt via #417): Documentation

    Changed

    • Upgrade of underlying transformers version (@calpt via #344, #368, #404)

    Fixed

    • Infer label names for training for flex head models (@calpt via #367)
    • Ensure root dir exists when saving all adapters/heads/fusions (@calpt via #375)
    • Avoid attempting to set prediction head if non-existent (@calpt via #377)
    • Fix T5EncoderModel adapter integration (@calpt via #376)
    • Fix loading adapters together with full model (@calpt via #378)
    • Multi-gpu support for prefix-tuning (@alexanderhanboli via #359)
    • Fix issues with embedding training (@calpt via #386)
    • Fix initialization of added embeddings (@calpt via #402)
    • Fix model serialization using torch.save() & torch.load() (@calpt via #406)
    Source code(tar.gz)
    Source code(zip)
  • adapters3.0.1(May 18, 2022)

    Based on transformers v4.17.0

    New

    • Support float reduction factors in bottleneck adapter configs (@calpt via #339)

    Fixed

    • [AdapterTrainer] add missing preprocess_logits_for_metrics argument (@stefan-it via #317)
    • Fix save_all_adapters such that with_head is not ignored (@hSterz via #325)
    • Fix inferring batch size for prefix tuning (@calpt via #335)
    • Fix bug when using compacters with AdapterSetup context (@calpt via #328)
    • [Trainer] Fix issue with AdapterFusion and load_best_model_at_end (@calpt via #341)
    • Fix generation with GPT-2, T5 and Prefix Tuning (@calpt via #343)
    Source code(tar.gz)
    Source code(zip)
  • adapters3.0.0(Mar 23, 2022)

    Based on transformers v4.17.0

    New

    Efficient Fine-Tuning Methods

    • Add Prefix Tuning (@calpt via #292)
    • Add Parallel adapters & Mix-and-Match adapter (@calpt via #292)
    • Add Compacter (@hSterz via #297)

    Misc

    • Introduce XAdapterModel classes as central & recommended model classes (@calpt via #289)
    • Introduce ConfigUnion class for flexible combination of adapter configs (@calpt via #292)
    • Add AdapterSetup context manager to replace adapter_names parameter (@calpt via #257)
    • Add ForwardContext to wrap model forward pass with adapters (@calpt via #267, #295)
    • Search all remote sources when passing source=None (new default) to load_adapter() (@calpt via #309)

    Changed

    • Deprecate XModelWithHeads in favor of XAdapterModel (@calpt via #289)
    • Refactored adapter integration into model classes and model configs (@calpt via #263, #304)
    • Rename activation functions to match Transformers' names (@hSterz via #298)
    • Upgrade of underlying transformers version (@calpt via #311)

    Fixed

    • Fix seq2seq generation with flexible heads classes (@calpt via #275, @hSterz via #285)
    • Parallel composition for XLM-Roberta (@calpt via #305)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.3.0(Feb 9, 2022)

    Based on transformers v4.12.5

    New

    • Allow adding, loading & training of model embeddings (@hSterz via #245). See https://docs.adapterhub.ml/embeddings.html.

    Changed

    • Unify built-in & custom head implementation (@hSterz via #252)
    • Upgrade of underlying transformers version (@calpt via #255)

    Fixed

    • Fix documentation and consistency issues for AdapterFusion methods (@calpt via #259)
    • Fix serialization/ deserialization issues with custom adapter config classes (@calpt via #253)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.2.0(Oct 14, 2021)

    Based on transformers v4.11.3

    New

    Model support

    • T5 adapter implementation (@AmirAktify & @hSterz via #182)
    • EncoderDecoderModel adapter implementation (@calpt via #222)

    Prediction heads

    • AutoModelWithHeads prediction heads for language modeling (@calpt via #210)
    • AutoModelWithHeads prediction head & training example for dependency parsing (@calpt via #208)

    Training

    • Add a new AdapterTrainer for training adapters (@hSterz via #218, #241 )
    • Enable training of Parallel block (@hSterz via #226)

    Misc

    • Add get_adapter_info() method (@calpt via #220)
    • Add set_active argument to add & load adapter/fusion/head methods (@calpt via #214)
    • Minor improvements for adapter card creation for HF Hub upload (@calpt via #225)

    Changed

    • Upgrade of underlying transformers version (@calpt via #232, #234, #239 )
    • Allow multiple AdapterFusion configs per model; remove set_adapter_fusion_config() (@calpt via #216)

    Fixed

    • Incorrect referencing between adapter layer and layer norm for DataParallel (@calpt via #228)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.1.0(Jul 8, 2021)

    Based on transformers v4.8.2

    New

    Integration into HuggingFace's Model Hub

    • Add support for loading adapters from HuggingFace Model Hub (@calpt via #162)
    • Add method to push adapters to HuggingFace Model Hub (@calpt via #197)
    • Learn more

    BatchSplit adapter composition

    • BatchSplit composition block for adapters and heads (@hSterz via #177)
    • Learn more

    Various new features

    • Add automatic conversion of static heads when loaded via XModelWithHeads (@calpt via #181) Learn more
    • Add list_adapters() method to search for adapters (@calpt via #193) Learn more
    • Add delete_adapter(), delete_adapter_fusion() and delete_head() methods (@calpt via #189)
    • MAD-X 2.0 WikiAnn NER notebook (@hSterz via #187)
    • Upgrade of underlying transformers version (@hSterz via #183, @calpt via #194 & #200)

    Changed

    • Deprecate add_fusion() and train_fusion() in favor of add_adapter_fusion() and train_adapter_fusion() (@calpt via #190)

    Fixed

    • Suppress no-adapter warning when adapter_names is given (@calpt via #186)
    • leave_out in load_adapter() when loading language adapters from Hub (@hSterz via #177)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.0.1(May 28, 2021)

    Based on transformers v4.5.1

    New

    • Allow different reduction factors for different adapter layers (@hSterz via #161)
    • Allow dynamic dropping of adapter layers in load_adapter() (@calpt via #172)
    • Add method get_adapter() to retrieve weights of an adapter (@hSterz via #169)

    Changed

    • Re-add adapter_names argument to model forward() methods (@calpt via #176)

    Fixed

    • Fix resolving of adapter from Hub when multiple options available (@Aaronsom via #164)
    • Fix & improve adapter saving/ loading using Trainer class (@calpt via #178)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.0.0(Apr 29, 2021)

    Based on transformers v4.5.1

    All major new features & changes are described at https://docs.adapterhub.ml/v2_transition.

    • all changes merged via #105

    Additional changes & Fixes

    • Support loading adapters with load_best_model_at_end in Trainer (@calpt via #122)
    • Add setter for active_adapters property (@calpt via #132)
    • New notebooks for NER, text generation & AdapterDrop (@hSterz via #135)
    • Enable trainer to load adapters from checkpoints (@hSterz via #138)
    • Update & clean up example scripts (@hSterz via #154 & @calpt via #141, #155)
    • Add unfreeze_adapters param to train_fusion() (@calpt via #156)
    • Ensure eval/ train mode is correct for AdapterFusion (@calpt via #157)
    Source code(tar.gz)
    Source code(zip)
  • adapters1.1.1(Jan 14, 2021)

    Based on transformers v3.5.1

    New

    • Modular & custom prediction heads for flex head models (@hSterz via #88)

    Fixed

    • Fixes for DistilBERT layer norm and AdapterFusion (@calpt via #102)
    • Fix for reloading full models with AdapterFusion (@calpt via #110)
    • Fix attention and logits output for flex head models (@calpt via #103 & #111)
    • Fix loss output of flex model with QA head (@hSterz via #88)
    Source code(tar.gz)
    Source code(zip)
  • adapters1.1.0(Nov 30, 2020)

    Based on transformers v3.5.1

    New

    • New model with adapter support: DistilBERT (@calpt via #67)
    • Save label->id mapping of the task together with the adapter prediction head (@hSterz via #75)
    • Automatically set matching label->id mapping together with active prediction head (@hSterz via #81)
    • Upgraded underlying transformers version (@calpt via #55, #72 and #85)
    • Colab notebook tutorials showcasing all AdapterHub concepts (@calpt via #89)

    Fixed

    • Support for models with flexible heads in pipelines (@calpt via #80)
    • Adapt input to models with flexible heads to static prediction heads input (@calpt via #90)
    Source code(tar.gz)
    Source code(zip)
  • adapters1.0.1(Oct 6, 2020)

    Based on transformers v2.11.0

    New

    • Adds squad-style QA prediction head to flex-head models

    Bug fixes

    • Fixes loading and saving of adapter config in model.save_pretrained()
    • Fixes parsing of adapter names in fusion setup
    Source code(tar.gz)
    Source code(zip)
  • adapters1.0(Sep 9, 2020)

Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

8 Dec 25, 2022
Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

SpeechMix Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together. Introduction For the same input: from datas

Eric Lam 31 Nov 07, 2022
L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

L3Cube-MahaCorpus L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources. We expand the existing Marathi monolingual

21 Dec 17, 2022
A cross platform OCR Library based on PaddleOCR & OnnxRuntime

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

RapidOCR Team 767 Jan 09, 2023
Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Yomichad is a Japanese pop-up dictionary that can display readings and English definitions of Japanese words, kanji, and optionally named entities. It is similar to yomichan, 10ten, and rikaikun in s

Jonas Belouadi 7 Nov 07, 2022
A retro text-to-speech bot for Discord

hawking A retro text-to-speech bot for Discord, designed to work with all of the stuff you might've seen in Moonbase Alpha, using the existing command

Nick Schorr 23 Dec 25, 2022
ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset. Through its Python API, the pretrained model can be fine-tuned on any protein-related task in

241 Jan 04, 2023
Example code for "Real-World Natural Language Processing"

Real-World Natural Language Processing This repository contains example code for the book "Real-World Natural Language Processing." AllenNLP (2.5.0 or

Masato Hagiwara 303 Dec 17, 2022
DVC-NLP-Simple-usecase

dvc-NLP-simple-usecase DVC NLP project Reference repository: official reference repo DVC STUDIO MY View Bag of Words- Krish Naik TF-IDF- Krish Naik ST

SUNNY BHAVEEN CHANDRA 2 Oct 02, 2022
Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the model for this program is one of the deep-learning NLP(Natural Language Process) model struc

RUO 2 Feb 22, 2022
Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

Stat4ML Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP This is the first course from our trio courses: Statistics Foundatio

Omid Safarzadeh 83 Dec 29, 2022
Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

smaller-LaBSE LaBSE(Language-agnostic BERT Sentence Embedding) is a very good method to get sentence embeddings across languages. But it is hard to fi

Jeong Ukjae 13 Sep 02, 2022
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python) 日本語は以下に続きます (Japanese follows) English: This book is written in Japanese and primaril

Ryuichi Yamamoto 189 Dec 29, 2022
Simple translation demo showcasing our headliner package.

Headliner Demo This is a demo showcasing our Headliner package. In particular, we trained a simple seq2seq model on an English-German dataset. We didn

Axel Springer News Media & Tech GmbH & Co. KG - Ideas Engineering 16 Nov 24, 2022
DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

(简体中文|English) Quick Start | Documents | Models List PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks i

5.6k Jan 03, 2023
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai

Amazon Web Services - Labs 124 Jan 03, 2023
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

Artifici Online Services inc. 74 Oct 07, 2022
A sentence aligner for comparable corpora

About Yalign is a tool for extracting parallel sentences from comparable corpora. Statistical Machine Translation relies on parallel corpora (eg.. eur

Machinalis 128 Aug 24, 2022
PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

760 Jan 03, 2023
This is a really simple text-to-speech app made with python and tkinter.

Tkinter Text-to-Speech App by Souvik Roy This is a really simple tkinter app which converts the text you have entered into a speech. It is created wit

Souvik Roy 1 Dec 21, 2021