Library to enable Bayesian active learning in your research or labeling work.

Overview

Bayesian Active Learning (BaaL)

CircleCI Documentation Status Slack license

BaaL is an active learning library developed at ElementAI. This repository contains techniques and reusable components to make active learning accessible for all.

Read the documentation at https://baal.readthedocs.io.

Our paper can be read on arXiv. It includes tips and tricks to make active learning usable in production.

In this blog post, we present our library.

For a quick introduction to BaaL and Bayesian active learning, please see these links:

Installation and requirements

BaaL requires Python>=3.6.

To install BaaL using pip: pip install baal

To install BaaL from source: poetry install

To use BaaL with HuggingFace Trainers : pip install baal[nlp]

Papers using BaaL

What is active learning?

Active learning is a special case of machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points (to understand the concept in more depth, refer to our tutorial).

BaaL Framework

At the moment BaaL supports the following methods to perform active learning.

  • Monte-Carlo Dropout (Gal et al. 2015)
  • MCDropConnect (Mobiny et al. 2019)
  • Deep ensembles
  • Semi-supervised learning

If you want to propose new methods, please submit an issue.

The Monte-Carlo Dropout method is a known approximation for Bayesian neural networks. In this method, the Dropout layer is used both in training and test time. By running the model multiple times whilst randomly dropping weights, we calculate the uncertainty of the prediction using one of the uncertainty measurements in heuristics.py.

The framework consists of four main parts, as demonstrated in the flowchart below:

  • ActiveLearningDataset
  • Heuristics
  • ModelWrapper
  • ActiveLearningLoop

To get started, wrap your dataset in our ActiveLearningDataset class. This will ensure that the dataset is split into training and pool sets. The pool set represents the portion of the training set which is yet to be labelled.

We provide a lightweight object ModelWrapper similar to keras.Model to make it easier to train and test the model. If your model is not ready for active learning, we provide Modules to prepare them.

For example, the MCDropoutModule wrapper changes the existing dropout layer to be used in both training and inference time and the ModelWrapper makes the specifies the number of iterations to run at training and inference.

In conclusion, your script should be similar to this:

dataset = ActiveLearningDataset(your_dataset)
dataset.label_randomly(INITIAL_POOL)  # label some data
model = MCDropoutModule(your_model)
model = ModelWrapper(model, your_criterion)
active_loop = ActiveLearningLoop(dataset,
                                 get_probabilities=model.predict_on_dataset,
                                 heuristic=heuristics.BALD(shuffle_prop=0.1),
                                 query_size=NDATA_TO_LABEL)
for al_step in range(N_ALSTEP):
    model.train_on_dataset(dataset, optimizer, BATCH_SIZE, use_cuda=use_cuda)
    if not active_loop.step():
        # We're done!
        break

For a complete experiment, we provide experiments/ to understand how to write an active training process. Generally, we use the ActiveLearningLoop provided at src/baal/active/active_loop.py. This class provides functionality to get the predictions on the unlabeled pool after each (few) epoch(s) and sort the next set of data items to be labeled based on the calculated uncertainty of the pool.

Re-run our Experiments

nvidia-docker build [--target base_baal] -t baal .
nvidia-docker run --rm baal python3 experiments/vgg_mcdropout_cifar10.py 

Use BaaL for YOUR Experiments

Simply clone the repo, and create your own experiment script similar to the example at experiments/vgg_experiment.py. Make sure to use the four main parts of BaaL framework. Happy running experiments

Dev install

Simply build the Dockerfile as below:

git clone [email protected]:ElementAI/baal.git
nvidia-docker build [--target base_baal] -t baal-dev .

Now you have all the requirements to start contributing to BaaL. YEAH!

Contributing!

To contribute, see CONTRIBUTING.md.

Who We Are!

"There is passion, yet peace; serenity, yet emotion; chaos, yet order."

At ElementAI, the BaaL team tests and implements the most recent papers on uncertainty estimation and active learning. The BaaL team is here to serve you!

How to cite

If you used BaaL in one of your project, we would greatly appreciate if you cite this library using this Bibtex:

@misc{atighehchian2019baal,
  title={BaaL, a bayesian active learning library},
  author={Atighehchian, Parmida and Branchaud-Charron, Frederic and Freyberg, Jan and Pardinas, Rafael and Schell, Lorne},
  year={2019},
  howpublished={\url{https://github.com/ElementAI/baal/}},
}

Licence

To get information on licence of this API please read LICENCE

Comments
  • num_samples should be a positive integer value, but got num_samples=0

    num_samples should be a positive integer value, but got num_samples=0

    First, I would like to know if this project works with segmentation problems, namely Unet - this one in particular.

    Also, I tried to run a simple pipeline but I am getting this error when creating the dataset. However, if I run that piece of code it works:

    from torch.utils.data.dataloader import default_collate
    for data, target in DataLoader(train_dataset, batch_size, True, num_workers=4,
                                               collate_fn=None):
        print(data)
    

    My Dataset constructor:

    class Dataset(BaseDataset): """CamVid Dataset. Read images, apply augmentation and preprocessing transformations.

    Args:
        images_dir (str): path to images folder
        masks_dir (str): path to segmentation masks folder
        class_values (list): values of classes to extract from segmentation mask
        augmentation (albumentations.Compose): data transfromation pipeline 
            (e.g. flip, scale, etc.)
        preprocessing (albumentations.Compose): data preprocessing 
            (e.g. noralization, shape manipulation, etc.)
    
    """
    
        def __init__(
                self, 
                ids,
                images_dir, 
                masks_dir, 
                classes=None, 
                augmentation=None, 
                preprocessing=None,
                show_original=False
        ):
            self.ids = ids#os.listdir(images_dir)
            self.images_fps = [os.path.join(images_dir, image_id) for image_id in self.ids]
            self.masks_fps = [os.path.join(masks_dir, image_id) for image_id in self.ids]
            
            # convert str names to class values on masks
            self.class_values = [i+1 for i,c in enumerate(classes)]
            
            self.augmentation = augmentation
            self.preprocessing = preprocessing
            self.show_original=show_original
        
        def __getitem__(self, i):
            try:
                # read data
                #print(self.images_fps[i])
                filename=self.images_fps[i]
                if not filename.endswith(".png"):
                    # Virtual openslide patch in form filename_col_x_row_y
                    slide_name=Path(filename).stem.split("_col_")[0]
                    roi_x,roi_y=filename.split("col_")[1].split("_row_")
                    roi_xy_large=int(int(roi_x)*patch_size),int(int(roi_y)*patch_size)
                
                    slide= openslide.open_slide(slide_paths[slide_name])
                    image=slide.read_region(roi_xy_large,0,(patch_size,patch_size))
                    #print((roi_xy_large,0,(patch_size,patch_size)))
                    #plt.imshow(img)
                    #plt.show()
    
                    image=image.convert('RGB')
                    if self.show_original:
                        plt.imshow(image)
                        plt.show()
                    image = np.asarray(image)
                    mask=None
                    
                else:
                    image = cv2.imread(filename)
                    
    
                    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                    if self.show_original:
                        plt.imshow(image)
                        plt.show()
    
                    #print(self.masks_fps[i])
                    mask = cv2.imread(self.masks_fps[i], 0)
    
                
    
                # No mask, ex: false positives
                if mask is None:
                    mask= PIL.Image.new('RGB', (256, 256), (0, 0, 0))
                    mask=np.array(mask.convert("L"))
                    
    #             if image is None:
    #                 image= PIL.Image.new('RGB', (256, 256), (0, 0, 0))
    #                 image=np.array(image.convert("L"))
    
    
                #print('self.images_fps[i]',self.images_fps[i])
                #print('self.masks_fps[i]',self.masks_fps[i])
    
                #print('self.class_values:', self.class_values)
                #print('mask.max:', np.max(mask))
                #print('mask.min:', np.min(mask))
                #print('mask.unique:', np.unique(mask))
    
                # extract certain classes from mask (e.g. cars)
                masks = [(mask == v) for v in self.class_values]
                mask = np.stack(masks, axis=-1).astype('float')
    
                # apply augmentations
                if self.augmentation:
                    sample = self.augmentation(image=image, mask=mask)
                    image, mask = sample['image'], sample['mask']
    #                 set_trace() if DEBUG else None
                    
    #                 mask[np.all(image == [0, 0, 0], axis=-1)]=0
    
                # apply preprocessing
                if self.preprocessing:
                    sample = self.preprocessing(image=image, mask=mask)
                    image, mask = sample['image'], sample['mask']
                    
                    
    
                return image, mask
                
            except Exception as error:
                print("error",error,self.images_fps[i],self.masks_fps[i])
                raise
    
        def __len__(self):
            return len(self.ids)
    
    Regards
    
    bug 
    opened by luistelmocosta 17
  • Improve look of documentation website

    Improve look of documentation website

    We have several possibilities:

    In all cases, we should redo the tree structure to be more "Profesh"

    opened by Dref360 12
  • Information Required regarding Patch_module.

    Information Required regarding Patch_module.

    Hello I would like to get some information related to patch module, I am using baal library with hugging face for multilabel classification. I am using BALD as a heuristic and wrapping model in patch_module. Before, asking the main question I would like to specify that I am not using the library specified inside the blog post regarding NLP classification with hugging face. I have created my own custom function and the only thing I am using from the library is patch_module and heuristic for selecting samples. The problem is, I am running the active learning loop 26 times, and for iteration 7 and 12: The results seems bit confusing as shown:

    ---------------------------- iteration -7 --------------------------
    {
        "'fixed_rate'-F1": 0,
        "'floating_rate'-F1": 1.37,
        "'other'-F1": 62.27,
        "'rates'-F1": 63.56
    }
    
    ----------------------------- Iteration -6 ---------------------------
    
    {
        "'fixed_rate'-F1": 78.26,
        "'floating_rate'-F1": 78.55,
        "'other'-F1": 79.27,
        "'rates'-F1": 74.03
    }
    
    -------------------------   Iteration -5 -------------------------------
    {
        "'fixed_rate'-F1": 63.41,
        "'floating_rate'-F1": 77.32,
        "'other'-F1": 78.65,
        "'rates'-F1": 73.76
    }
    

    as shown above in the 6th iteration we getting fixed_rate f1: 78.26 and in 7th iteration we went suddenly to 0:

    We also tried for this specific iteration where f1 is 0 to train the model without Active learning, and it works totally fine so it suggest that there is no problem related to dataset. But when we adding the Active Learning procedure this becoming 0. I want to mention that this problem is with iteration 7 and 12. So I am really confused why this works fine for other iteration and only for these two iteration the f1 going down to 0. Is it because of patch_module, or the way I am using the patch_module.

       initial train -> 200
       every time we add more 200 samples into the train.
    
       trainer = Trainer(
              model=patch_module(model),
              args=training_args,
              train_dataset=train_dataset if training_args.do_train else None,
              eval_dataset=eval_dataset if training_args.do_eval else None,
              compute_metrics=custom_compute_metrics, #[ADD] passo al Trainer la nuova funzione che calcola le metriche
              tokenizer=tokenizer,
              data_collator=data_collator,
          )
    

    One more question so when add the special dropout in the model, during the testing time does it uses the special dropout for doing the predictions, testing I mean when the model is fully trained using active learning procedure, and if it does then we cannot trust the prediction because it will change everytime, and if it doesn't please can u let me know if it automatically disabled by doing model.eval() or we need to add some other things. Sorry for too much of question. Please let me know if something is not clear.

    enhancement 
    opened by Anurich 10
  • Ensemble Based Active Learning

    Ensemble Based Active Learning

    Would this project be interested in ensemble based techniques for active learning? As seen in [1] ensembles can be an effective technique for deep active learning achieving competitive/superior performance to competing techniques (ie MC-Dropout). More generally they have been widely used in non-deep based approaches to active learning as well (see [2] chapter 3 for a review).

    All of the heuristics this library has already implemented could immediately be used for ensemble predictions.

    I see that SWAG is on the road map and this could be a more general feature of different ensemble style approaches (bootstrap, FGE, SWAG, diversity encouraging, etc...)

    If this is something you all would like to move forward with. I would be happy to implement it. I've been working on active learning and already have my own implementations. Since finding this library I am in the midst of porting my code over to it, since I like it so much.

    My only concern is that when resource constrained it is difficult to hold all of the ensemble members in memory and can be expensive to move them in and out. In my own experiments I've been generating the members of the ensemble one at a time and predicting on all data points from the pool set and validation set deleting the member and start training the new member. This might be a bit awkward given the current implementation of the ModelWrapper and ActiveLearningLoop.

    Model.train()
    active_loop.set()
    

    Essentially for the above workflow to still work it would require the below line to be already executed and cached by the model and the active_loop would use the cache.

     probs = self.get_probabilities(pool, **self.kwargs)
    

    Thanks for your time.

    [1] http://openaccess.thecvf.com/content_cvpr_2018/papers/Beluch_The_Power_of_CVPR_2018_paper.pdf

    [2] https://www.morganclaypool.com/doi/abs/10.2200/S00429ED1V01Y201207AIM018

    opened by NobleKennamer 9
  • Add metric logging

    Add metric logging

    Summary:

    Add some utils to keep track of metrics over time.

    This is WIP, but what do you think of this API? It would be quite simple for the users to get a metrics for each dataset size.

    We can also add utilities for that such as MetricMixin.get_active_metric(metric_name="precision") : Dict[int, float]

    Features:

    Fixes #220

    Checklist:

    • [ ] Your code is documented (To validate this, add your module to tests/documentation_test.py).
    • [ ] Your code is tested with unit tests.
    • [ ] You moved your Issue to the PR state.
    opened by Dref360 7
  • Importing `baal` fails when the `transformers` optional dep is missing

    Importing `baal` fails when the `transformers` optional dep is missing

    This is problematic when building the conda package since it does not include the optional dep transformers.

    I see two solutions:

    • make transformers a required dep
    • add a conditional check to https://github.com/ElementAI/baal/blob/26bde3a9ef68454cb2795de3784cde4032fc0391/src/baal/transformers_trainer_wrapper.py so transformers is not imported if missing that way baal can still be used without transformers.
    bug 
    opened by hadim 7
  • A question about ActiveLearningDataset.

    A question about ActiveLearningDataset.

    Hi,I want to construct my own dataset like the following:

    import torch
    from torch.utils.data import Dataset
    from baal.active.dataset import ActiveLearningDataset
    
    a = []
    for i in range(20):
        a.append(torch.rand(10))
    
    data = Data(a)
    pool = ActiveLearningDataset(data)
    

    and I want to label several data in the dataset:

    pool.label(index=4, value=torch.tensor([0]))
    

    But I get the error:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
     in 
    ----> 1 pool.label(index=4, value=torch.tensor([0]))
    
     in label(self, index, value)
        159             if self.can_label and val is not None:
        160                 print(self._dataset.label)
    --> 161                 self._dataset.label(index, val)
        162                 self._labelled[index] = 1
        163             elif self.can_label and val is None:
    
    TypeError: 'NoneType' object is not callable
    

    I read the original code, and find the self._dataset.label is None.Is the above code right or is this a bug?

    bug documentation 
    opened by sakuraiiiii 6
  • How to manually add annotation by hand after each epoch?

    How to manually add annotation by hand after each epoch?

    Assuming baal returns a list of datums to be labeled by humans after each epoch, how do I connect it to Amazon Mechanical Turk to get them actually labeled for the next epoch's training? Can the training process be stalled while humans label these examples?

    enhancement 
    opened by whaowhao 5
  • Add seeded dropout

    Add seeded dropout

    Summary:

    Add a way to set the seed for Dropout so that the execution is always the same.

    Features:

    • Add SeededDropout

    Checklist:

    • [x] Your code is documented (To validate this, add your module to tests/documentation_test.py).
    • [x] Your code is tested with unit tests.
    • [x] You moved your Issue to the PR state.
    opened by Dref360 5
  • Support Random target_transform in ALDataset

    Support Random target_transform in ALDataset

    Is your feature request related to a problem? Please describe. In Segmentation, we need to augment both the target and the input. At test time, only the transform is updated.

    enhancement good first issue 
    opened by Dref360 5
  • Use `torchmetrics` in `baal.metrics`

    Use `torchmetrics` in `baal.metrics`

    We should swap most of our metrics for torchmetrics this would make our code more robust.

    Only ECE is special I think and we should keep our own implementation.

    enhancement good first issue 
    opened by Dref360 4
  • Baal in Production Notebook | Classification | NLP | Hugging Face

    Baal in Production Notebook | Classification | NLP | Hugging Face

    Summary:

    This is a demo/tutorial to use active learning with hugging face models in a production setting. Kindly find more about this at in the discussion at https://github.com/baal-org/baal/discussions/242

    Features:

    NA

    Checklist:

    • [ ] Your code is documented (To validate this, add your module to tests/documentation_test.py).
    • [ ] Your code is tested with unit tests.
    • [ ] You moved your Issue to the PR state.

    Given that this is a notebook and I am not setting up any new modules there are no test cases. There is some pending type hinting pending which I will complete.

    Opening a PR for your feedback, just to check if you want me to add/remove somethings

    Additional Info

    Challenges with current GPU

    Seems like the pytorch version which baal uses does not support my current GPU. Although I have tested this on Colab and it works fine.

    NVIDIA GeForce RTX 3050 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
    The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
    If you want to use the NVIDIA GeForce RTX 3050 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
    
      warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
    

    More info about this on the pytorch forum in case someone runs into a similar issue

    import torch
    
    torch.__version__
    
    1.12.1+cu102
    
    torch.cuda.get_arch_list()
    
    ['sm_37', 'sm_50', 'sm_60', 'sm_70']
    

    Although I have tested this on Colab and it works fine.

    Challenges with Black Formatting

    You might want to update your black version to black==22.3.0.

    The make format command produces an error which is identical to the one mentioned at stack overflow here.

    I have encountered this before and an upgrade does fix it

    opened by nitish1295 2
  • Stopping criterion API

    Stopping criterion API

    Discussion Related Paper

    I see this as a new object we include with the following strategies:

    • If the average uncertainty is below a threshold
    • If the labeling budget has been exhausted
    • If the model has converged.
    enhancement 
    opened by Dref360 0
  • Redo CLINC-150 experiment in a Notebook

    Redo CLINC-150 experiment in a Notebook

    We have results on CLINC-150 where we show that Entropy is better than Random. We learned a lot since our first tutorial on HuggingFace. We should redo the expeirment.

    opened by Dref360 0
  • how to integrate object detection models

    how to integrate object detection models

    Hi, thanks for providing such great work! I am working a object detection task and I have already trained a object detection model with detectron2 framework. is it possible to integrate detectron2 model into this active learning method ? or more generally, can i integrate object detection model like FasterRCNN yolo?

    opened by Huan80805 2
Releases(v1.7.0)
  • v1.7.0(Oct 28, 2022)

    Large release where we added support for torchmetrics, moved to mkdocs and made the app more robust!

    We also updated our visual identity, huge thanks to @Donasolo for the new logos!

    What's Changed

    • Move to Github Actions by @Dref360 in https://github.com/baal-org/baal/pull/212
    • Added None to act as max for np.clip by @GeorgePearse in https://github.com/baal-org/baal/pull/213
    • Assign version to pytorch-lightning by @Dref360 in https://github.com/baal-org/baal/pull/226
    • Update README and CONTRIBUTING by @Dref360 in https://github.com/baal-org/baal/pull/222
    • Add metric logging by @Dref360 in https://github.com/baal-org/baal/pull/223
    • Throw away value for custom dataloaders by @bresilla in https://github.com/baal-org/baal/pull/228
    • #232 Allow for Python <4 by @Dref360 in https://github.com/baal-org/baal/pull/233
    • #156 Support torchmetrics by @Dref360 in https://github.com/baal-org/baal/pull/230
    • Add warning when data augmentation is applied on the pool by @Dref360 in https://github.com/baal-org/baal/pull/229
    • Add structure for mkdocs instead of Sphinx by @Dref360 in https://github.com/baal-org/baal/pull/225

    New Contributors

    • @GeorgePearse made their first contribution in https://github.com/baal-org/baal/pull/213
    • @bresilla made their first contribution in https://github.com/baal-org/baal/pull/228

    Full Changelog: https://github.com/baal-org/baal/compare/v1.6.0...v1.7.0

    Source code(tar.gz)
    Source code(zip)
  • v1.6.0(May 3, 2022)

    What's Change

    • Update faq.md by @Dref360 in https://github.com/ElementAI/baal/pull/203
    • #205 Add stochastic heuristics from Kirsch et al. by @Dref360 in https://github.com/ElementAI/baal/pull/206

    Full Changelog: https://github.com/ElementAI/baal/compare/v1.5.2...v1.6.0

    Source code(tar.gz)
    Source code(zip)
  • v1.5.2(Apr 12, 2022)

    What's Changed

    • Better indexing support arrow dataset by @parmidaatg in https://github.com/ElementAI/baal/pull/183
    • Raises an error instead of a warning when label has no label by @Dref360 in https://github.com/ElementAI/baal/pull/187
    • #192 Use configure instead of configure_once to remove warnings by @Dref360 in https://github.com/ElementAI/baal/pull/193
    • #190 Fix MRO for Lightning examples and deprecate said example by @Dref360 in https://github.com/ElementAI/baal/pull/191
    • Can easily unpatch modules and use as a context manager #198 #194 :
    mc_dropout_model = MCDropoutModule(your_model)
    # this is stochastic
    predictions = [mc_dropout_model(input) for _ in range(ITERATIONS)]
    
    model = mc_dropout_model.unpatch()
    # this is deterministic
    output = model(input)
    
    
    # Context manager
    with MCDropoutModule(your_model) as model:
        # this is stochastic
        predictions = [model(input) for _ in range(ITERATIONS)]
    # this is deterministic
    output = model(input)
    

    Full Changelog: https://github.com/ElementAI/baal/compare/1.5.1...v1.5.2

    Source code(tar.gz)
    Source code(zip)
  • 1.5.1(Dec 17, 2021)

    What's Changed

    • Solve bug where a dataset wouldnt be able to label by @Dref360 in https://github.com/ElementAI/baal/pull/178

    Full Changelog: https://github.com/ElementAI/baal/compare/1.5.0...1.5.1

    Source code(tar.gz)
    Source code(zip)
  • 1.5.0(Dec 13, 2021)

    What's Changed

    • Split API documentation in multiple files by @Dref360 in https://github.com/ElementAI/baal/pull/158
    • Lightning Flash example by @parmidaatg in https://github.com/ElementAI/baal/pull/154
    • Add bandit to CircleCI by @Dref360 in https://github.com/ElementAI/baal/pull/164
    • Add Label function to HFDataset @Dref360 in https://github.com/ElementAI/baal/pull/165
    • #161 update query size by @Dref360 in https://github.com/ElementAI/baal/pull/166
    • Same fix in dropconnect and consistent dropout as in dropout by @Dref360 in https://github.com/ElementAI/baal/pull/172
    • Add last_active_step iteration to iterate over the last N active steps. by @Dref360 in https://github.com/ElementAI/baal/pull/174

    Deprecated

    1. We now deprecate our PL integration in favor of Lightning Flash. More information to come.
    2. We renamed n_data_to_label to query_size to match academic papers notation.

    Full Changelog: https://github.com/ElementAI/baal/compare/1.4.0...1.5.0

    Source code(tar.gz)
    Source code(zip)
  • 1.4.0(Oct 12, 2021)

    What's Changed

    • Support arrowdataset by @parmidaatg in https://github.com/ElementAI/baal/pull/142
    • Give ability for users to get uncertainty values. by @Dref360 in https://github.com/ElementAI/baal/pull/144
    • #146 Fix issue where at most a single submodule was affected by Dropout by @Dref360 in https://github.com/ElementAI/baal/pull/147
    • #131 Use poetry instead of setup.py by @Dref360 in https://github.com/ElementAI/baal/pull/148
    • #145 Example using MLP on MNIST by @Dref360 in https://github.com/ElementAI/baal/pull/150
    • mlp regression experiment by @parmidaatg in https://github.com/ElementAI/baal/pull/152
    • #130 Add mypy and step to test imports by @Dref360 in https://github.com/ElementAI/baal/pull/155

    Full Changelog: https://github.com/ElementAI/baal/compare/v1.3.1...1.4.0

    Source code(tar.gz)
    Source code(zip)
  • v1.3.1(Aug 3, 2021)

    Changelog:

    • Update pytorch-lightning to > 1.3.0 API
    • Make torchvision and huggingface optional dependencies.
    • New tutorial on Fairness and how to use Label Studio.
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Mar 16, 2021)

    BaaL 1.3.0 is a release focused on UX.

    Features

    • Initial support for HF Trainer along with tutorials to use HuggingFace.
    • Initial support for Semi-supervised learning, we are eager to see what the community will do with such a powerful tool!
    • Fixes in ECE computation

    Documentation

    The biggest change in this release is the new website along with tons of content.

    1. Tutorial on Deep Ensembles #94
    2. Tutorial on NLP Classification #87
    3. Tutorial on visualisation.
    4. Added a BaaL cheatsheet to translate equations to code easily.
    5. Added a list of "Core papers" to get new users started in Bayesian deep learning.
    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Nov 3, 2020)

    Changelogs

    Features

    • Initial support for ensembles. Example to come.
    • Initial support for Pytorch Lightning. Example here.

    Bugfixes

    • Fix BALD for binary classification
    • Fix Random heuristic for generators
    • Fix to_cuda for strings.
    • Fix a bug where MCDropconnect would not work with DataParallel

    Misc

    • Warning when no layers are affected by patch_layers in MCDropout, MCDropconnect.
    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(May 4, 2020)

    Changelist for v1.2.0

    • Add DirichletCalibration (Kull et al. 2019), see our blog post.
    • Add ECE Metrics for computing model's calibration.
    • Add support for Multi-Input/Output for ModelWrapper
    • Fix BatchBALD to be consistent with the official implementation
    • Add ConsistentDropout, where the masks used in MC-Dropout are the same for each input.

    Important notes

    • BaaL is now part of Pytorch Ecosystem!
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Nov 11, 2019)

    BaaL v1.1 release notes

    Changelog

    • Support for MC-Dropconnect (Mobiny, 2019)
    • ActiveLearningDataset now has better support for attributes specifics to the pool (see below).
    • More flexible support multi-inputs/outputs in ModelWrapper.
      • Can support list of inputs or outputs.
    • QoL features on ActiveLearningDataset
      • Can use a RandomState and add load_state_dict.
    • Add replicate_in_memory flag to ModelWrapper.
      • If False, the MC iterations are done in a for-loop instead of making a batch in memory.
      • (This means predict_on_batch would not take up more memory than e.g. test_on_batch)
    • Add patience and min_epoch_for_es to ModelWrapper.train_and_test_on_datasets.
      • Allows early stopping.
    • New tutorial on how to use BaaL with scikit-learn.
    • Can now combine heuristics for multi-outputs models (see baal.active.heuristics.Combine).
    • Fix documentation

    New ActiveLearningDataset

    To better support new tasks, ActiveLearningDataset can now support any attributes to be overrided when the pool is created.

    Example:

    from PIL import Image
    from torch.utils.data import Dataset
    from torchvision.transforms import Compose, ToTensor, RandomHorizontalFlip
    from baal.active.dataset import ActiveLearningDataset
    
    
    class MyDataset(Dataset):
        def __init__(self):
            self.my_tansforms = Compose([RandomHorizontalFlip(), ToTensor()])
            
        def __len__(self):
            return 10
            
        def __getitem__(self, idx):
            x = Image.open('an_image.png')
            return self.my_tansforms(x)
            
    al_dataset = ActiveLearningDataset(MyDataset(),
                                       pool_specifics={
                                       'my_tansforms': ToTensor()
                                       })
                                       
    # Now `pool.my_tansforms = ToTensor()`
    pool = al_dataset.pool
    
    Source code(tar.gz)
    Source code(zip)
Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

PARE: Part Attention Regressor for 3D Human Body Estimation [ICCV 2021] PARE: Part Attention Regressor for 3D Human Body Estimation, Muhammed Kocabas,

Muhammed Kocabas 277 Jan 03, 2023
Create time-series datacubes for supervised machine learning with ICEYE SAR images.

ICEcube is a Python library intended to help organize SAR images and annotations for supervised machine learning applications. The library generates m

ICEYE Ltd 65 Jan 03, 2023
Microscopy Image Cytometry Toolkit

Cytokit Cytokit is a collection of tools for quantifying and analyzing properties of individual cells in large fluorescent microscopy datasets with a

Hammer Lab 106 Jan 06, 2023
Official implementation of Deep Burst Super-Resolution

Deep-Burst-SR Official implementation of Deep Burst Super-Resolution Publication: Deep Burst Super-Resolution. Goutam Bhat, Martin Danelljan, Luc Van

Goutam Bhat 113 Dec 19, 2022
ScriptProfilerPy - Module to visualize where your python script is slow

ScriptProfiler helps you track where your code is slow It provides: Code lines t

Lucas BLP 3 Jun 02, 2022
Flax is a neural network ecosystem for JAX that is designed for flexibility.

Flax: A neural network library and ecosystem for JAX designed for flexibility Overview | Quick install | What does Flax look like? | Documentation See

Google 3.9k Jan 02, 2023
Hashformers is a framework for hashtag segmentation with transformers.

Hashtag segmentation is the task of automatically inserting the missing spaces between the words in a hashtag. Hashformers applies Transformer models

Ruan Chaves 41 Nov 09, 2022
FluidNet re-written with ATen tensor lib

fluidnet_cxx: Accelerating Fluid Simulation with Convolutional Neural Networks. A PyTorch/ATen Implementation. This repository is based on the paper,

JoliBrain 50 Jun 07, 2022
Awesome Remote Sensing Toolkit based on PaddlePaddle.

基于飞桨框架开发的高性能遥感图像处理开发套件,端到端地完成从训练到部署的全流程遥感深度学习应用。 最新动态 PaddleRS 即将发布alpha版本!欢迎大家试用 简介 PaddleRS是遥感科研院所、相关高校共同基于飞桨开发的遥感处理平台,支持遥感图像分类,目标检测,图像分割,以及变化检测等常用遥

146 Dec 11, 2022
Özlem Taşkın 0 Feb 23, 2022
Benchmark VAE - Library for Variational Autoencoder benchmarking

Documentation pythae This library implements some of the most common (Variational) Autoencoder models. In particular it provides the possibility to pe

1.1k Jan 02, 2023
Prometheus Exporter for data scraped from datenplattform.darmstadt.de

darmstadt-opendata-exporter Scrapes data from https://datenplattform.darmstadt.de and presents it in the Prometheus Exposition format. Pull requests w

Martin Weinelt 2 Apr 12, 2022
Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Piggyback: https://arxiv.org/abs/1801.06519 Pretrained masks and backbones are available here: https://uofi.box.com/s/c5kixsvtrghu9yj51yb1oe853ltdfz4q

Arun Mallya 165 Nov 22, 2022
UMPNet: Universal Manipulation Policy Network for Articulated Objects

UMPNet: Universal Manipulation Policy Network for Articulated Objects Zhenjia Xu, Zhanpeng He, Shuran Song Columbia University Robotics and Automation

Columbia Artificial Intelligence and Robotics Lab 33 Dec 03, 2022
Code repo for "FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation" (ICCV 2021)

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation (ICCV 2021) This repository contains the implementation of th

Yuhang Zang 21 Dec 17, 2022
This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack".

Generative Dynamic Patch Attack This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack". Requirements PyTo

Xiang Li 8 Nov 17, 2022
ML-Ensemble – high performance ensemble learning

A Python library for high performance ensemble learning ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framew

Sebastian Flennerhag 764 Dec 31, 2022
Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

PyGAS: Auto-Scaling GNNs in PyG PyGAS is the practical realization of our G NN A uto S cale (GAS) framework, which scales arbitrary message-passing GN

Matthias Fey 139 Dec 25, 2022
Official repository for "On Improving Adversarial Transferability of Vision Transformers" (2021)

Improving-Adversarial-Transferability-of-Vision-Transformers Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Khan, Fatih Porikli arxiv link A

Muzammal Naseer 47 Dec 02, 2022
Over9000 optimizer

Optimizers and tests Every result is avg of 20 runs. Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch Adam - baseline OneC

Mikhail Grankin 405 Nov 27, 2022