TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics.

Overview

Machine learning metrics for distributed, scalable PyTorch applications.


What is TorchmetricsImplementing a metricBuilt-in metricsDocsCommunityLicense


PyPI - Python Version PyPI Status PyPI Status Conda Slack license

CI testing - base Build Status codecov Documentation Status


Installation

Simple installation from PyPI

pip install torchmetrics
Other installions

Install using conda

conda install torchmetrics

Pip from source

# with git
pip install git+https://github.com/PytorchLightning/[email protected]

Pip from archive

pip install https://github.com/PyTorchLightning/metrics/archive/master.zip

What is Torchmetrics

TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics. It offers:

  • A standardized interface to increase reproducibility
  • Reduces Boilerplate
  • Automatic accumulation over batches
  • Metrics optimized for distributed-training
  • Automatic synchronization between multiple devices

You can use TorchMetrics with any PyTorch model or with PyTorch Lightning to enjoy additional features such as:

  • Module metrics are automatically placed on the correct device.
  • Native support for logging metrics in Lightning to reduce even more boilerplate.

Using TorchMetrics

Module metrics

The module-based metrics contain internal metric states (similar to the parameters of the PyTorch module) that automate accumulation and synchronization across devices!

  • Automatic accumulation over multiple batches
  • Automatic synchronization between multiple devices
  • Metric arithmetic

This can be run on CPU, single GPU or multi-GPUs!

For the single GPU/CPU case:

import torch
# import our library
import torchmetrics 

# initialize metric
metric = torchmetrics.Accuracy()

n_batches = 10
for i in range(n_batches):
    # simulate a classification problem
    preds = torch.randn(10, 5).softmax(dim=-1)
    target = torch.randint(5, (10,))

    # metric on current batch
    acc = metric(preds, target)
    print(f"Accuracy on batch {i}: {acc}")    

# metric on all batches using custom accumulation
acc = metric.compute()
print(f"Accuracy on all data: {acc}")

Module metric usage remains the same when using multiple GPUs or multiple nodes.

Example using DDP
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12355'

# create default process group
dist.init_process_group("gloo", rank=rank, world_size=world_size)

# initialize model
metric = torchmetrics.Accuracy()

# define a model and append your metric to it
# this allows metric states to be placed on correct accelerators when
# .to(device) is called on the model
model = nn.Linear(10, 10)
model.metric = metric
model = model.to(rank)

# initialize DDP
model = DDP(model, device_ids=[rank])

n_epochs = 5
# this shows iteration over multiple training epochs
for n in range(n_epochs):

    # this will be replaced by a DataLoader with a DistributedSampler
    n_batches = 10
    for i in range(n_batches):
        # simulate a classification problem
        preds = torch.randn(10, 5).softmax(dim=-1)
        target = torch.randint(5, (10,))

        # metric on current batch
        acc = metric(preds, target)
        if rank == 0:  # print only for rank 0
            print(f"Accuracy on batch {i}: {acc}")    

    # metric on all batches and all accelerators using custom accumulation
    # accuracy is same across both accelerators
    acc = metric.compute()
    print(f"Accuracy on all data: {acc}, accelerator rank: {rank}")

    # Reseting internal state such that metric ready for new data
    metric.reset()

Implementing your own Module metric

Implementing your own metric is as easy as subclassing an torch.nn.Module. Simply, subclass torchmetrics.Metric and implement the following methods:

class MyAccuracy(Metric):
    def __init__(self, dist_sync_on_step=False):
        # call `self.add_state`for every internal state that is needed for the metrics computations
	# dist_reduce_fx indicates the function that should be used to reduce 
	# state from multiple processes
	super().__init__(dist_sync_on_step=dist_sync_on_step)

        self.add_state("correct", default=torch.tensor(0), dist_reduce_fx="sum")
        self.add_state("total", default=torch.tensor(0), dist_reduce_fx="sum")

    def update(self, preds: torch.Tensor, target: torch.Tensor):
        # update metric states
        preds, target = self._input_format(preds, target)
        assert preds.shape == target.shape

        self.correct += torch.sum(preds == target)
        self.total += target.numel()

    def compute(self):
        # compute final result
        return self.correct.float() / self.total

Functional metrics

Similar to torch.nn, most metrics have both a module-based and a functional version. The functional versions are simple python functions that as input take torch.tensors and return the corresponding metric as a torch.tensor.

import torch
# import our library
import torchmetrics

# simulate a classification problem
preds = torch.randn(10, 5).softmax(dim=-1)
target = torch.randint(5, (10,))

acc = torchmetrics.functional.accuracy(preds, target)

Implemented metrics

And many more!

Contribute!

The lightning + torchmetric team is hard at work adding even more metrics. But we're looking for incredible contributors like you to submit new metrics and improve existing ones!

Join our Slack to get help becoming a contributor!

Community

For help or questions, join our huge community on Slack!

Citations

We’re excited to continue the strong legacy of opensource software and have been inspired over the years by Caffee, Theano, Keras, PyTorch, torchbearer, ignite, sklearn and fast.ai. When/if a paper is written about this, we’ll be happy to cite these frameworks and the corresponding authors.

License

Please observe the Apache 2.0 license that is listed in this repository. In addition the Lightning framework is Patent Pending.

Comments
  • Add mean average precision metric for object detection

    Add mean average precision metric for object detection

    Mean Average Precision (mAP) for object detection

    New metric for object detection.

    What does this PR do?

    This PR introduces the commonly used mean average precision metric for object detection. As there are multiple different implementations, and even different calculations, the new metric wraps the pycocotools evaluation, which is used as a standard for several academic and open-source projects for evaluation.

    This metric is actively discussed in issue, resolves #53

    TODO

    • [x] check if pycocoeval can handle tensors to avoid .cpu() calls (it cannot)
    • [x] standardize MAPMetricResults to have all evaluation results in there
    • [x] refactor some code parts (e.g. join get_coco_target and get_coco_preds methods)
    • [x] add unittests and documentation in torchmetrics format

    Note

    This is my first contribution to the PyTorchLightning project. Please review the code carefully and give me hints on how to improve and match your guidelines.

    enhancement New metric 0:] Ready-To-Go 
    opened by tkupek 61
  • Add Mean Average Precision (mAP) metric

    Add Mean Average Precision (mAP) metric

    The main metric for object detection tasks is the Mean Average Precision, implemented in PyTorch, and computed on GPU.

    It would be nice to add it to the collection of the metrics.

    The example implementation using numpy:

    https://github.com/ternaus/iglovikov_helper_functions/blob/master/iglovikov_helper_functions/metrics/map.py

    enhancement New metric 
    opened by ternaus 44
  • Metric API re-design

    Metric API re-design

    🚀 Feature

    Re-design the internal API for Base Metric class to return batch states from the update function.

    The following proposal is a BC !

    Limitations:

    Currently, the Metric API is confusing for several reasons:

    • The update function actually performs a compute if compute_on_step=True
    • The update function actually perform 2 updates, which is an expensive operation for some metrics, FID for example
    • Users don't have a clear API to perform computation
    • The Metric internals is tailored to Lightning. IMO, TM should define its own API to better faciliate metrics computation and Lightning should adapt to it.

    Proposal:

    • The update function doesn't return anything
    • There is 2 functions for reduction, compute and compute_step
    • The update function is responsible to return a dictionary containing the batch states

    Here is the internal re-design of Metric

    class Metric()
    
        def _wrap_update()
            batch_states = update(...)
            self.add_to_rank_accumulated_states(batch_states) # uses reduction function
            self.batch_states = batch_states
    
    class MyMetric(Metric):
        def update(self) -> Dict[str, Tensor]:
            ...
            return {"state_1": state_1, ...}
    
    metric = MyMetric()
    
    metric(...) # compute batch states, add them to accumulated states using reduction functions
    compute() # accumulated_states compute on all ranks
    compute_on_step() # batch_states compute on all ranks
    compute(sync_dist=False) # accumulated_states compute per rank
    compute_on_step(sync_dist=False) #  batch_states compute per rank 
    
    class Accuracy
    
        def __init__(self):
            self.add_state("correct", torch.tensor(0.), sync_dist_fn=torch.sum)
            self.add_state("total", torch.tensor(0.), sync_dist_fn=torch.sum)
    
        def update(self, preds, targets):
            return {"total": preds.shape[0], "correct": (preds == targets).sum()}
    
        def compute(self):
            return self.correct / self.total
    
    
    metric = Accuracy()
    
    None = metric.update([0, 1], [0, 0])
    
    0.5 = metric([0, 1], [0, 0], sync_dist=True) # compute batch states and cache it, add batch states to accumulated states
    1 = metric([0, 0], [0, 0], sync_dist=True)
    
    0.75 = metric.compute()
    1. = metric.compute(accumulated=False)
    
    # accumulated=True means computing accuracy on 3 batches
    # accumulated=False means computing accuracy on latest batch
    acc()
    acc()
    acc()
    

    Additional context

    sync = store cache + all_gather + reduction unsync = restore cache

    enhancement Important API / design 
    opened by tchaton 29
  • Fix metrics in macro average

    Fix metrics in macro average

    Before submitting

    • [x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • [x] Did you read the contributor guideline, Pull Request section?
    • [ ] Did you make sure to update the docs?
    • [ ] Did you write any new necessary tests?

    What does this PR do?

    Fixes #295 and Fixes #300.

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    Did you have fun?

    Make sure you had fun coding 🙃

    bug / fix 0:] Ready-To-Go 
    opened by vatch123 29
  • audio metrics: SNR, SI_SDR, SI_SNR

    audio metrics: SNR, SI_SDR, SI_SNR

    Before submitting

    • [ ] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • [x] Did you read the contributor guideline, Pull Request section?
    • [x] Did you make sure to update the docs?
    • [x] Did you write any new necessary tests?

    What does this PR do?

    Fixes #291

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    Did you have fun?

    Make sure you had fun coding 🙃

    enhancement Important New metric API / design waiting on author 
    opened by quancs 29
  • IOU with segm masks and MAP for instance segment.

    IOU with segm masks and MAP for instance segment.

    What does this PR do?

    Fixes #821

    Before submitting

    • [x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • [X] Did you read the contributor guideline, Pull Request section?
    • [X] Did you make sure to update the docs?
    • [x] Did you write any new necessary tests?

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    Did you have fun?

    Make sure you had fun coding 🙃

    enhancement 0:] Ready-To-Go 
    opened by gianscarpe 27
  • add Extended Edit Distance (EED) metric

    add Extended Edit Distance (EED) metric

    Hello all, the only thing I've not done yet is update Changelog.md. I'm assuming it should be updated after review.

    What does this PR do?

    Fixes #635

    Before submitting

    • [x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • [x] Did you read the contributor guideline, Pull Request section?
    • [x] Did you make sure to update the docs?
    • [x] Did you write any new necessary tests?

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    Did you have fun?

    Make sure you had fun coding 🙃

    New metric 0:] Ready-To-Go 
    opened by mathemusician 27
  • Metrics support mask

    Metrics support mask

    🚀 Feature

    Current metrics like Accuracy/Recall would be better to support mask.

    Motivation

    For example, when I deal with a Sequence Labeling Task and pad some sequence to max-length, I do not want to calculate metrics at the padding locations.

    Pitch

    I guess a simple manipulation would work for accuracy.(here is the original one)

    from typing import Any, Optional
    
    import torch
    from pytorch_lightning.metrics.functional.classification import (
        accuracy,
    )
    from pytorch_lightning.metrics.metric import TensorMetric
    
    
    class MaskedAccuracy(TensorMetric):
        """
        Computes the accuracy classification score
        Example:
            >>> pred = torch.tensor([0, 1, 2, 3])
            >>> target = torch.tensor([0, 1, 2, 2])
            >>> mask = torch.tensor([1, 1, 1, 0])
            >>> metric = MaskedAccuracy(num_classes=4)
            >>> metric(pred, target, mask)
            tensor(1.)
        """
    
        def __init__(
            self,
            num_classes: Optional[int] = None,
            reduction: str = 'elementwise_mean',
            reduce_group: Any = None,
            reduce_op: Any = None,
        ):
            """
            Args:
                num_classes: number of classes
                reduction: a method for reducing accuracies over labels (default: takes the mean)
                    Available reduction methods:
                    - elementwise_mean: takes the mean
                    - none: pass array
                    - sum: add elements
                reduce_group: the process group to reduce metric results from DDP
                reduce_op: the operation to perform for ddp reduction
            """
            super().__init__(name='accuracy',
                             reduce_group=reduce_group,
                             reduce_op=reduce_op)
            self.num_classes = num_classes
            self.reduction = reduction
    
        def forward(self, pred: torch.Tensor, target: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
            """
            Actual metric computation
            Args:
                pred: predicted labels
                target: ground truth labels
                mask: only caculate metrics where mask==1
            Return:
                A Tensor with the classification score.
            """
            mask_fill = (1-mask).bool()
            pred = pred.masked_fill_(mask=mask_fill, value=-1)
            target = target.masked_fill_(mask=mask_fill, value=-1)
    
            return accuracy(pred=pred, target=target,
                            num_classes=self.num_classes, reduction=self.reduction)
    
    

    Alternatives

    Additional context

    enhancement help wanted wontfix 
    opened by YuxianMeng 27
  • 3D extension for SSIM

    3D extension for SSIM

    What does this PR do?

    Fixes #812 Changes StructuralSimilarityIndexMeasure to be able to handle 3D images. Deciding whether the 2D or 3D version is used is determined automatically based on kernel_size. Basic sanity checks are carried out. Produces sensible results on my dataset.

    Before submitting

    • [x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • [x] Did you read the contributor guideline, Pull Request section?
    • [ ] Did you make sure to update the docs?
    • [x] Did you write any new necessary tests?

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    Did you have fun?

    Make sure you had fun coding 🙃

    enhancement 0:] Ready-To-Go 
    opened by weningerleon 26
  • RuntimeError when using MAP-metric

    RuntimeError when using MAP-metric

    🐛 Bug

    Hi! I am training a detection model and use MAP-metric during validation. I got the following error at the validation_step: RuntimeError: expected scalar type Float but found Bool.

    To Reproduce

    Pick a faster rcnn model, I used fasterrcnn_resnet50_fpn_v2() from torchvision . Implement validation_step where self.metrics.update(...) is called for the model results and targets and validation_epoch_end where the self.metrics.compute() is called for the previously gathered results.

    Code sample

    import torchvision
    from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
    from torchmetrics.detection.mean_ap import MeanAveragePrecision
    
    
    class FasterRCNNModel(pl.LightningModule):
        def __init__(self, num_classes):
            super().__init__()
    
            model = torchvision.models.detection.faster_rcnn.fasterrcnn_resnet50_fpn_v2()
            in_features = model.roi_heads.box_predictor.cls_score.in_features
            model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
            self.model = model
            self.metric = MeanAveragePrecision(box_format='xyxy', iou_type='bbox')
    
        def validation_step(self, batch, batch_idx):
            images, targets = batch
            preds = self.model(images)        
            self.metric.update(preds, targets)
    
        def validation_epoch_end(self, outs):
            mAP = self.metric.compute()
            self.log("val/mAP", mAP)
            self.metric.reset()
    

    targets (List[Dict]), containing:

    • boxes (torch.float32)
    • labels (torch.int64)

    preds (List[Dict]), containing:

    • boxes (torch.float32)
    • scores (torch.float32)
    • labels (torch.int64)

    Error message

      File "/homes/vsoboleva/scripts/pascal_voc/train.py", line 65, in validation_epoch_end
        mAP = self.metric.compute()
      File "/homes/vsoboleva/miniconda3/lib/python3.9/site-packages/torchmetrics/metric.py", line 523, in wrapped_func
        value = compute(*args, **kwargs)
      File "/homes/vsoboleva/miniconda3/lib/python3.9/site-packages/torchmetrics/detection/mean_ap.py", line 908, in compute
        precisions, recalls = self._calculate(classes)
      File "/homes/vsoboleva/miniconda3/lib/python3.9/site-packages/torchmetrics/detection/mean_ap.py", line 758, in _calculate
        recall, precision, scores = MeanAveragePrecision.__calculate_recall_precision_scores(
      File "/homes/vsoboleva/miniconda3/lib/python3.9/site-packages/torchmetrics/detection/mean_ap.py", line 831, in __calculate_recall_precision_scores
        det_scores = torch.cat([e["dtScores"][:max_det] for e in img_eval_cls_bbox])
    RuntimeError: expected scalar type Float but found Bool
    

    Expected behavior

    The sel.metric.compute(...) compute values correctly and does not fail with RuntimeError: expected scalar type Float but found Bool.

    Environment

    • TorchMetrics 0.9.2 build with pip
    • Python 3.9.12, torch 1.12.0, torchvision 0.13.0
    • OS (e.g., Linux): Ubuntu 20.04.3

    Additional context

    bug / fix help wanted 
    opened by V-Soboleva 19
  • Add Mean Absolute Percentage Error

    Add Mean Absolute Percentage Error

    Before submitting

    • [x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • [x] Did you read the contributor guideline, Pull Request section?
    • [x] Did you make sure to update the docs?
    • [x] Did you write any new necessary tests?

    What does this PR do?

    Fixes #235.

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    Did you have fun?

    Make sure you had fun coding 🙃

    enhancement New metric 0:] Ready-To-Go 
    opened by pranjaldatta 19
  • Cannot use average=

    Cannot use average="none" in version 0.11?

    🐛 Bug

    Using version 0.11 and trying to use Dice with average="none", I get the following error:

    ValueError: The `reduce` none is not valid.
    

    Additional context

    There is a check for the allowed average values in the Dice class. This permits "none".

    However, comparing v0.11 and v0.10.3, it looks like a second check was added further down in the class during this PR.

    if average not in ["micro", "macro", "samples"]:
                raise ValueError(f"The `reduce` {average} is not valid.")
    

    Since that doesn't check for average of "none", and since average isn't redefined above it, I think that secondary check added in #1252 causes the bug.

    bug / fix help wanted 
    opened by carbocation 1
  • CI: cache transformers in Azure

    CI: cache transformers in Azure

    What does this PR do?

    Recently the transformers become very unrealibale/unreachible and so fail many unrelated tests... so let's try to change them :rabbit:

    Before submitting

    • [x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • [x] Did you read the contributor guideline, Pull Request section?
    • [ ] Did you make sure to update the docs?
    • [x] Did you write any new necessary tests?

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    Did you have fun?

    Make sure you had fun coding 🙃

    test / CI 
    opened by Borda 1
  • Adding psnrb

    Adding psnrb

    What does this PR do?

    Fixes #799

    Before submitting

    • [x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • [x] Did you read the contributor guideline, Pull Request section?
    • [ ] Did you make sure to update the docs?
    • [ ] Did you write any new necessary tests?

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    Did you have fun?

    Make sure you had fun coding 🙃

    opened by soma2000-lang 1
  • Return also classes for MAP metric

    Return also classes for MAP metric

    What does this PR do?

    Fixes https://github.com/Lightning-AI/metrics/issues/1417 Returns also classes for MAP metric.

    Before submitting

    • [x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • [x] Did you read the contributor guideline, Pull Request section?
    • [ ] Did you make sure to update the docs?
    • [ ] Did you write any new necessary tests?

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    Did you have fun?

    Make sure you had fun coding 🙃

    enhancement 0:] Ready-To-Go 
    opened by SkafteNicki 1
  • Return class_names as part of `MeanAveragePrecision.comopute()

    Return class_names as part of `MeanAveragePrecision.comopute()

    🚀 Feature

    Return the class names as part of MeanAveragePrecision.compute().

    Motivation

    Currently only tensors are returned e.g. map_per_class and one cannot know which class the map corresponds to. It can happen that the preds and targets passed to update were missing a class, and then this class gets ignored when running compute() since metrics are computed only for classes that are seen during update, see here

    Pitch

    Return the value of self._get_classes as part of the return dict in the compute call

    Alternatives

    The current workaround is to use self._get_classes() to get this information. An alternative would be to make this function part of the public API.

    enhancement 
    opened by manuelli 4
  • Update text docs

    Update text docs

    What does this PR do?

    Fixes Text docs as a part of #1365

    Before submitting

    • [x] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
    • [x] Did you read the contributor guideline, Pull Request section?
    • [ ] Did you make sure to update the docs?
    • [ ] Did you write any new necessary tests?

    Some Notes

    • Squad.py,
      • There was a code example, which I wasn't sure if we should keep or not. I left it in for now.
      • I left the method-specific raise within the method docstring
    • sacre_bleu.py, Wasn't sure where to put an additional reference that didn't have an in-line citation, left it on the bottom under "additional citations" for now
    • Ter.py, references a specific file within a github repo that was already cited as the repo already. Wasn't sure if this was important, left them as two citations just in case

    No shape info, so no commit needed for that

    documentation 
    opened by reaganjlee 1
Releases(v0.11.0)
  • v0.11.0(Nov 30, 2022)

    We are happy to announce that Torchmetrics v0.11 is now publicly available. In Torchmetrics v0.11 we have primarily focused on the cleanup of the large classification refactor from v0.10 and adding new metrics. With v0.11 are crossing 90+ metrics in Torchmetrics nearing the milestone of having 100+ metrics.

    New domains

    In Torchmetrics we are not only looking to expand with new metrics in already established metric domains such as classification or regression, but also new domains. We are therefore happy to report that v0.11 includes two new domains: Multimodal and nominal.

    Multimodal

    If there is one topic within machine learning that is hot right now then it is generative models and in particular image-to-text generative models. Just recently stable diffusion v2 was released, able to create even more photorealistic images from a single text prompt than ever

    In Torchmetrics v0.11 we are adding a new domain called multimodal to support the evaluation of such models. For now, we are starting out with a single metric, the CLIPScore from this paper that can be used to evaluate such image-to-text models. CLIPScore currently achieves the highest correlation with human judgment, and thus a high CLIPScore for an image-text pair means that it is highly plausible that an image caption and an image are related to each other.

    Nominal

    If you have ever taken any course in statistics or introduction to machine learning you should hopefully have heard about data can be of different types of attributes: nominal, ordinal, interval, and ratio. This essentially refers to how data can be compared. For example, nominal data cannot be ordered and cannot be measured. An example, would it be data that describes the color of your car: blue, red, or green? It does not make sense to compare the different values. Ordinal data can be compared but does have not a relative meaning. An example, would it be the safety rating of a car: 1,2,3? We can say that 3 is better than 1 but the actual numerical value does not mean anything.

    In v0.11 of TorchMetrics, we are adding support for classic metrics on nominal data. In fact, 4 new metrics have already been added to this domain:

    • CramersV
    • PearsonsContingencyCoefficient
    • TschuprowsT
    • TheilsU

    All metrics are measures of association between two nominal variables, giving a value between 0 and 1, with 1 meaning that there is a perfect association between the variables.

    Small improvements

    In addition to metrics within the two new domains v0.11 of Torchmetrics contains other smaller changes and fixes:

    • TotalVariation metric has been added to the image package, which measures the complexity of an image with respect to its spatial variation.

    • MulticlassExactMatch metric has been added to the classification package, which for example can be used to measure sentence level accuracy where all tokens need to match for a sentence to be counted as correct

    • KendallRankCorrCoef have been added to the regression package for measuring the overall correlation between two variables

    • LogCoshError have been added to the regression package for measuring the residual error between two variables. It is similar to the mean squared error close to 0 but similar to the mean absolute error away from 0.


    Finally, Torchmetrics now only supports v1.8 and higher of Pytorch. It was necessary to increase from v1.3 to secure because we were running into compatibility issues with an older version of Pytorch. We strive to support as many versions of Pytorch, but for the best experience, we always recommend keeping Pytorch and Torchmetrics up to date.


    [0.11.0] - 2022-11-30

    Added

    • Added MulticlassExactMatch to classification metrics (#1343)
    • Added TotalVariation to image package (#978)
    • Added CLIPScore to new multimodal package (#1314)
    • Added regression metrics:
      • KendallRankCorrCoef (#1271)
      • LogCoshError (#1316)
    • Added new nominal metrics:
      • CramersV (#1298)
      • PearsonsContingencyCoefficient (#1334)
      • TschuprowsT (#1334)
      • TheilsU (#1337)
    • Added option to pass distributed_available_fn to metrics to allow checks for custom communication backend for making dist_sync_fn actually useful (#1301)
    • Added normalize argument to Inception, FID, KID metrics (#1246)

    Changed

    • Changed minimum Pytorch version to be 1.8 (#1263)
    • Changed interface for all functional and modular classification metrics after refactor (#1252)

    Removed

    • Removed deprecated BinnedAveragePrecision, BinnedPrecisionRecallCurve, RecallAtFixedPrecision (#1251)
    • Removed deprecated LabelRankingAveragePrecision, LabelRankingLoss and CoverageError (#1251)
    • Removed deprecated KLDivergence and AUC (#1251)

    Fixed

    • Fixed precision bug in pairwise_euclidean_distance (#1352)

    Contributors

    @borda, @justusschock, @ragavvenkatesan, @shenoynikhil, @SkafteNicki, @stancld

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.11.0-py3-none-any.whl(500.34 KB)
    torchmetrics-0.11.0.tar.gz(296.44 KB)
  • v0.10.3(Nov 16, 2022)

    [0.10.3] - 2022-11-16

    Fixed

    • Fixed bug in Metrictracker.best_metric when return_step=False (#1306)
    • Fixed bug to prevent users from going into an infinite loop if trying to iterate of a single metric (#1320)
    • Fixed bug in Metrictracker.best_metric when return_step=False (#1306)

    Contributors

    @SkafteNicki

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.10.3-py3-none-any.whl(517.32 KB)
    torchmetrics-0.10.3.tar.gz(332.54 KB)
  • v0.10.2(Oct 31, 2022)

    [0.10.2] - 2022-10-31

    Changed

    • Changed in-place operation to out-of-place operation in pairwise_cosine_similarity (#1288)

    Fixed

    • Fixed high memory usage for certain classification metrics when average='micro' (#1286)
    • Fixed precision problems when structural_similarity_index_measure was used with autocast (#1291)
    • Fixed slow performance for confusion matrix-based metrics (#1302)
    • Fixed restrictive dtype checking in spearman_corrcoef when used with autocast (#1303)

    Contributors

    @SkafteNicki

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.10.2-py3-none-any.whl(517.26 KB)
    torchmetrics-0.10.2.tar.gz(332.40 KB)
  • v0.10.1(Oct 21, 2022)

  • v0.10.0(Oct 4, 2022)

    TorchMetrics v0.10 is now out, significantly changing the whole classification package. This blog post will go over the reasons why the classification package needs to be refactored, what it means for our end users, and finally, what benefits it gives. A guide on how to upgrade your code to the recent changes can be found near the bottom.

    Why the classification metrics need to change

    We have for a long time known that there were some underlying problems with how we initially structured the classification package. Essentially, classification tasks can e divided into either binary, multiclass, or multilabel, and determining what task a user is trying to run a given metric on is hard just based on the input. The reason a package such as sklearn can do this is to only support input in very specific formats (no multi-dimensional arrays and no support for both integer and probability/logit formats).

    This meant that some metrics, especially for binary tasks, could have been calculating something different than expected if the user were to provide another shape but the expected. This is against the core value of TorchMetrics, that our users, of course should trust that the metric they are evaluating is given the excepted result.

    Additionally, classification metrics were missing consistency. For some, metrics num_classes=2 meant binary, and for others num_classes=1 meant binary. You can read more about the underlying reasons for this refactor in this and this issue.

    The solution

    The solution we went with was to split every classification metric into three separate metrics with the prefix binary_*, multiclass_* and multilabel_*. This solves a number of the above problems out of the box because it becomes easier for us to match our users' expectations for any given input shape. It additionally has some other benefits both for us as developers and ends users

    • Maintainability: by splitting the code into three distinctive functions, we are (hopefully) lowering the code complexity, making the codebase easier to maintain in the long term.
    • Speed: by completely removing the auto-detection of task at runtime, we can significantly increase computational speed (more on this later).
    • Task-specific arguments: by splitting into three functions, we also make it more clear what input arguments affect the computed result. Take - Accuracy as an example: both num_classes , top_k , average are arguments that have an influence if you are doing multiclass classification but doing nothing for binary classification and vice versa with the thresholds argument. The task-specific versions only contain the arguments that influence the given task.
    • There are many smaller quality-of-life improvements hidden throughout the refactor, however here are our top 3:

    Standardized arguments

    The input arguments for the classification package are now much more standardized. Here are a few examples:

    • Each metric now only supports arguments that influence the final result. This means that num_classes is removed from all binary_* metrics are now required for all multiclass_* metrics and renamed to num_labels for all multilabel_* metrics.
    • The ignore_index argument is now supported by ALL classification metrics and supports any value and not only values in the [0,num_classes] range (similar to torch loss functions). Below is shown an example:
    • We added a new validate_args to all classification metrics to allow users to skip validation of inputs making the computations completely faster. By default, we will still do input validation because it is the safest option for the user. Still, if you are confident that the input to the metric is correct, then you can now disable this, checking for a potential speed-up (more on this later).

    Constant memory implementations

    Some of the most useful metrics for evaluating classification problems are metrics such as ROC, AUROC, AveragePrecision, etc., because they not only evaluate your model for a single threshold but a whole range of thresholds, essentially giving you the ability to see the trade-off between Type I and Type II errors. However, a big problem with the standard formulation of these metrics (which we have been using) is that they require access to all data for their calculation. Our implementation has been extremely memory-intensive for these kinds of metrics.

    In v0.10 of TorchMetrics, all these metrics now have an argument called thresholds. By default, it is None and the metric will still save all targets and predictions in memory as you are used to. However, if this argument is instead set to a tensor - torch.linspace(0,1,100) it will instead use a constant-memory approximation by evaluating the metric under those provided thresholds.

    Setting thresholds=None has an approximate memory footprint of O(num_samples) whereas using thresholds=torch.linspace(0,1,100) has an approximate memory footprint of O(num_thresholds). In this particular case, users will save memory when the metric is computed on more than 100 samples. This feature can save memory by comparing this to modern machine learning, where evaluation is often done on thousands to millions of data points.

    This also means that the Binned* metrics that currently exist in TorchMetrics are being deprecated as their functionality is now captured by this argument.

    All metrics are faster (ish)

    By splitting each metric into 3 separate metrics, we reduce the number of calculations needed. We, therefore, expected out-of-the-box that our new implementations would be faster. The table below shows the timings of different metrics with the old and new implementations (with and without input validation). Numbers in parentheses denote speed-up over old implementations.

    The following observations can be made:

    • Some metrics are a bit faster (1.3x), and others are much faster (4.6x) after the refactor!
    • Disabling input validation can speed up things. For example, multiclass_confusion_matrix goes from a speedup of 3.36x to 4.81 when input validation is disabled. A clear advantage for users that are familiar with the metrics and do not need validation of their input at every update.
    • If we compare binary with multiclass, the biggest speedup can be seen for multiclass problems.
    • Every metric is faster except for the precision-recall curve, even the new approximative binning method. This is a bit strange, as the non-approximation should be equally fast (it's the same code). We are actively looking into this.

    [0.10.0] - 2022-10-04

    Added

    • Added a new NLP metric InfoLM (#915)
    • Added Perplexity metric (#922)
    • Added ConcordanceCorrCoef metric to regression package (#1201)
    • Added argument normalize to LPIPS metric (#1216)
    • Added support for multiprocessing of batches in PESQ metric (#1227)
    • Added support for multioutput in PearsonCorrCoef and SpearmanCorrCoef (#1200)

    Changed

    • Classification refactor (#1054, #1143, #1145, #1151, #1159, #1163, #1167, #1175, #1189, #1197, #1215, #1195)
    • Changed update in FID metric to be done in an online fashion to save memory (#1199)
    • Improved performance of retrieval metrics (#1242)
    • Changed SSIM and MSSSIM update to be online to reduce memory usage (#1231)

    Fixed

    • Fixed a bug in ssim when return_full_image=True where the score was still reduced (#1204)
    • Fixed MPS support for:
      • MAE metric (#1210)
      • Jaccard index (#1205)
    • Fixed bug in ClasswiseWrapper such that compute gave wrong result (#1225)
    • Fixed synchronization of empty list states (#1219)

    Contributors

    @Borda, @bryant1410, @geoffrey-g-delhomme, @justusschock, @lucadiliello, @nicolas-dufour, @Queuecumber, @SkafteNicki, @stancld

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.10.0-py3-none-any.whl(516.75 KB)
    torchmetrics-0.10.0.tar.gz(331.44 KB)
  • v0.9.3(Jul 23, 2022)

    [0.9.3] - 2022-08-22

    Added

    • Added global option sync_on_compute to disable automatic synchronization when compute is called (#1107)

    Fixed

    • Fixed missing reset in ClasswiseWrapper (#1129)
    • Fixed JaccardIndex multi-label compute (#1125)
    • Fix SSIM propagate device if gaussian_kernel is False, add test (#1149)

    Contributors

    @KeVoyer1, @krshrimali, @SkafteNicki

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.9.3-py3-none-any.whl(409.75 KB)
    torchmetrics-0.9.3.tar.gz(238.81 KB)
  • v0.9.2(Jun 29, 2022)

    [0.9.2] - 2022-06-29

    Fixed

    • Fixed mAP calculation for areas with 0 predictions (#1080)
    • Fixed bug where avg precision state and auroc state was not merge when using MetricCollections (#1086)
    • Skip box conversion if no boxes are present in MeanAveragePrecision (#1097)
    • Fixed inconsistency in docs and code when setting average="none" in AvaragePrecision metric (#1116)

    Contributors

    @23pointsNorth, @kouyk, @SkafteNicki

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.9.2-py3-none-any.whl(409.90 KB)
    torchmetrics-0.9.2.tar.gz(238.62 KB)
  • v0.9.1(Jun 8, 2022)

    [0.9.1] - 2022-06-08

    Added

    • Added specific RuntimeError when metric object is on the wrong device (#1056)
    • Added an option to specify own n-gram weights for BLEUScore and SacreBLEUScore instead of using uniform weights only. (#1075)

    Fixed

    • Fixed aggregation metrics when input only contains zero (#1070)
    • Fixed TypeError when providing superclass arguments as kwargs (#1069)
    • Fixed bug related to state reference in metric collection when using compute groups (#1076)

    Contributors

    @jlcsilva, @SkafteNicki, @stancld

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.9.1-py3-none-any.whl(409.84 KB)
    torchmetrics-0.9.1.tar.gz(238.27 KB)
  • v0.9.0(May 31, 2022)

    Highligths

    TorchMetrics v0.9 is now out, and it brings significant changes to how the forward method works. This blog post goes over these improvements and how they affect both users of TorchMetrics and users that implement custom metrics. TorchMetrics v0.9 also includes several new metrics and bug fixes.

    Blog: TorchMetrics v0.9 — Faster forward

    The Story of the Forward Method

    Since the beginning of TorchMetrics, Forward has served the dual purpose of calculating the metric on the current batch and accumulating in a global state. Internally, this was achieved by calling update twice: one for each purpose, which meant repeating the same computation. However, for many metrics, calling update twice is unnecessary to achieve both the local batch statistics and accumulating globally because the global statistics are simple reductions of the local batch states.

    In v0.9, we have finally implemented a logic that can take advantage of this and will only call update once before making a simple reduction. As you can see in the figure below, this can lead to a single call of forward being 2x faster in v0.9 compared to v0.8 of the same metric.

    With the improvements to forward, many metrics have become significantly faster (up to 2x) It should be noted that this change mainly benefits metrics (for example, confusionmatrix) where calling update is expensive.

    We went through all existing metrics in TorchMetrics and enabled this feature for all appropriate metrics, which was almost 95% of all metrics. We want to stress that if you are using metrics from TorchMetrics, nothing has changed to the API, and no code changes are necessary.

    [0.9.0] - 2022-05-31

    Added

    • Added RetrievalPrecisionRecallCurve and RetrievalRecallAtFixedPrecision to retrieval package (#951)
    • Added class property full_state_update that determines forward should call update once or twice (#984,#1033)
    • Added support for nested metric collections (#1003)
    • Added Dice to classification package (#1021)
    • Added support to segmentation type segm as IOU for mean average precision (#822)

    Changed

    • Renamed reduction argument to average in Jaccard score and added additional options (#874)

    Removed

    • Removed deprecated compute_on_step argument (#962, #967, #979 ,#990, #991, #993, #1005, #1004, #1007)

    Fixed

    • Fixed non-empty state dict for a few metrics (#1012)
    • Fixed bug when comparing states while finding compute groups (#1022)
    • Fixed torch.double support in stat score metrics (#1023)
    • Fixed FID calculation for non-equal size real and fake input (#1028)
    • Fixed case where KLDivergence could output Nan (#1030)
    • Fixed deterministic for PyTorch<1.8 (#1035)
    • Fixed default value for mdmc_average in Accuracy (#1036)
    • Fixed missing copy of property when using compute groups in MetricCollection (#1052)

    Contributors

    @Borda, @burglarhobbit, @charlielito, @gianscarpe, @MrShevan, @phaseolud, @razmikmelikbekyan, @SkafteNicki, @tanmoyio, @vumichien

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.9.0-py3-none-any.whl(408.36 KB)
    torchmetrics-0.9.0.tar.gz(236.25 KB)
  • v0.8.2(May 6, 2022)

    [0.8.2] - 2022-05-06

    Fixed

    • Fixed multi-device aggregation in PearsonCorrCoef (#998)
    • Fixed MAP metric when using a custom list of thresholds (#995)
    • Fixed compatibility between compute groups in MetricCollection and prefix/postfix arg (#1007)
    • Fixed compatibility with future Pytorch 1.12 in safe_matmul (#1011, #1014)

    Contributors

    @ben-davidson-6, @Borda, @SkafteNicki, @tanmoyio

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.8.2-py3-none-any.whl(400.23 KB)
    torchmetrics-0.8.2.tar.gz(230.46 KB)
  • v0.8.1(Apr 27, 2022)

    [0.8.1] - 2022-04-27

    Changed

    • Reimplemented the signal_distortion_ratio metric, which removed the absolute requirement of fast-bss-eval (#964)

    Fixed

    • Fixed "Sort currently does not support bool dtype on CUDA" error in MAP for empty preds (#983)
    • Fixed BinnedPrecisionRecallCurve when thresholds argument is not provided (#968)
    • Fixed CalibrationError to work on logit input (#985)

    Contributors

    @DuYicong515, @krshrimali, @quancs, @SkafteNicki

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.8.1-py3-none-any.whl(399.35 KB)
    torchmetrics-0.8.1.tar.gz(230.09 KB)
  • v0.8.0(Apr 15, 2022)

    We are excited to announce that TorchMetrics v0.8 is now available. The release includes several new metrics in the classification and image domains and some performance improvements for those working with metrics collections.

    Metric collections just got faster

    Common wisdom dictates that you should never evaluate the performance of your models using only a single metric but instead a collection of metrics. For example, it is common to simultaneously evaluate the accuracy, precision, recall, and f1 score in classification. In TorchMetrics, we have for a long time provided the MetricCollection object for chaining such metrics together for an easy interface to calculate them all at once. However, in many cases, such a collection of metrics shares some of the underlying computations that have been repeated for every metric in the collection. In Torchmetrics v0.8 we have introduced the concept of compute_groups to MetricCollection that will, as default, be auto-detected and group metrics that share some of the same computations.

    Thus, if you are using MetricCollections in your code, upgrading to TorchMetrics v0.8 should automatically make your code run faster without any code changes.

    Many exciting new metrics

    TorchMetrics v0.8 includes several new metrics within the classification and image domain, both for the functional and modular API. We refer to the documentation for the full description of all metrics if you want to learn more about them.

    • SpectralAngleMapper or SAM was added to the image package. This metric can calculate the spectral similarity between given reference spectra and estimated spectra.
    • CoverageError was added to the classification package. This metric can be used when you are working with multi-label data. The metric works similar to thesklearn counterpart and computes how far you need to go through ranked scores such that all true labels are covered.
    • LabelRankingAveragePrecision and LabelRankingLoss were added to the classification package. Both metrics are used in multi-label ranking problems, where the goal is to give a better rank to the labels associated with each sample. Each metric gives a measure of how well your model is doing this.
    • ErrorRelativeGlobalDimensionlessSynthesis or ERGAS was added to the image package. This metric can be used to calculate the accuracy of Pan sharpened images considering the normalized average error of each band of the resulting image.
    • UniversalImageQualityIndex was added to the image package. This metric can assess the difference between two images, which considers three different factors when computed: loss of correlation, luminance distortion, and contrast distortion.
    • ClasswiseWrapper was added to the wrapper package. This wrapper can be used in combinations with metrics that return multiple values (such as classification metrics with the average=None argument). The wrapper will unwrap the result into a dict with a label for each value.

    [0.8.0] - 2022-04-14

    Added

    • Added WeightedMeanAbsolutePercentageError to regression package (#948)
    • Added new classification metrics:
      • CoverageError (#787)
      • LabelRankingAveragePrecision and LabelRankingLoss (#787)
    • Added new image metric:
      • SpectralAngleMapper (#885)
      • ErrorRelativeGlobalDimensionlessSynthesis (#894)
      • UniversalImageQualityIndex (#824)
      • SpectralDistortionIndex (#873)
    • Added support for MetricCollection in MetricTracker (#718)
    • Added support for 3D image and uniform kernel in StructuralSimilarityIndexMeasure (#818)
    • Added smart update of MetricCollection (#709)
    • Added ClasswiseWrapper for better logging of classification metrics with multiple output values (#832)
    • Added **kwargs argument for passing additional arguments to base class (#833)
    • Added negative ignore_index for the Accuracy metric (#362)
    • Added adaptive_k for the RetrievalPrecision metric (#910)
    • Added reset_real_features argument image quality assessment metrics (#722)
    • Added new keyword argument compute_on_cpu to all metrics (#867)

    Changed

    • Made num_classes in jaccard_index a required argument (#853, #914)
    • Added normalizer, tokenizer to ROUGE metric (#838)
    • Improved shape checking of permutation_invariant_training (#864)
    • Allowed reduction None (#891)
    • MetricTracker.best_metric will now give a warning when computing on metric that do not have a best (#913)

    Deprecated

    • Deprecated argument compute_on_step (#792)
    • Deprecated passing in dist_sync_on_step, process_group, dist_sync_fn direct argument (#833)

    Removed

    • Removed support for versions of Lightning lower than v1.5 (#788)
    • Removed deprecated functions, and warnings in Text (#773)
      • WER and functional.wer
    • Removed deprecated functions and warnings in Image (#796)
      • SSIM and functional.ssim
      • PSNR and functional.psnr
    • Removed deprecated functions, and warnings in classification and regression (#806)
      • FBeta and functional.fbeta
      • F1 and functional.f1
      • Hinge and functional.hinge
      • IoU and functional.iou
      • MatthewsCorrcoef
      • PearsonCorrcoef
      • SpearmanCorrcoef
    • Removed deprecated functions, and warnings in detection and pairwise (#804)
      • MAP and functional.pairwise.manhatten
    • Removed deprecated functions, and warnings in Audio (#805)
      • PESQ and functional.audio.pesq
      • PIT and functional.audio.pit
      • SDR and functional.audio.sdr and functional.audio.si_sdr
      • SNR and functional.audio.snr and functional.audio.si_snr
      • STOI and functional.audio.stoi

    Fixed

    • Fixed device mismatch for MAP metric in specific cases (#950)
    • Improved testing speed (#820)
    • Fixed compatibility of ClasswiseWrapper with the prefix argument of MetricCollection (#843)
    • Fixed BestScore on GPU (#912)
    • Fixed Lsum computation for ROUGEScore (#944)

    Contributors

    @ankitaS11, @ashutoshml, @Borda, @hookSSi, @justusschock, @lucadiliello, @quancs, @rusty1s, @SkafteNicki, @stancld, @vumichien, @weningerleon, @yassersouri

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.8.0-py3-none-any.whl(399.06 KB)
    torchmetrics-0.8.0.tar.gz(229.41 KB)
  • v0.7.3(Mar 23, 2022)

    [0.7.3] - 2022-03-22

    Fixed

    • Fixed unsafe log operation in TweedieDeviace for power=1 (#847)
    • Fixed bug in MAP metric related to either no ground truth or no predictions (#884)
    • Fixed ConfusionMatrix, AUROC and AveragePrecision on GPU when running in deterministic mode (#900)
    • Fixed NaN or Inf results returned by signal_distortion_ratio (#899)
    • Fixed memory leak when using update method with tensor where requires_grad=True (#902)

    Contributors

    @mtailanian, @quancs, @SkafteNicki

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.7.3-py3-none-any.whl(388.88 KB)
    torchmetrics-0.7.3.tar.gz(219.86 KB)
  • v0.7.2(Feb 10, 2022)

  • v0.7.1(Feb 3, 2022)

    [0.7.1] - 2022-02-03

    Changed

    • Used torch.bucketize in calibration error when torch>1.8 for faster computations (#769)
    • Improve mAP performance (#742)

    Fixed

    • Fixed check for available modules (#772)
    • Fixed Matthews correlation coefficient when the denominator is 0 (#781)

    Contributors

    @Borda, @ramonemiliani93, @SkafteNicki, @twsl

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.7.1-py3-none-any.whl(387.84 KB)
    torchmetrics-0.7.1.tar.gz(217.59 KB)
  • v0.7.0(Jan 17, 2022)

    We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pretty significant. It includes several new metrics (mainly for NLP), naming and import changes, general improvements to the API, and some other great features. TorchMetrics thus now has over 60+ metrics, and the package is more user-friendly than ever.

    NLP metrics - Text package

    Text package is a part of TorchMetrics as of v0.5. With the growing capability of language generation models, there is also a real need to have reliable evaluation metrics. With several added metrics and unified API, TorchMetrics makes the usage of various metrics even easier! TorchMetrics v0.7 newly includes a couple of machine translation metrics such as chrF, chrF++, Translation Edit Rate, or Extended Edit Distance. Furthermore, it also supports other metrics - Match Error Rate, Word Information Lost, Word Information Preserved, and SQuAD evaluation metrics. Last but not least, we also made possible the evaluation of the ROUGE score using multiple references.

    Argument unification

    Importantly, all text metrics assume preds, target input order with these explicit keyword arguments. If different naming was used before v0.7, it is deprecated and completely removed in v0.8.

    Import and naming changes

    TorchMetrics v0.7 brings more extensive and minor changes to how metrics should be imported. The import changes directly impact v0.7, meaning that you will most likely need to change the import statement for some specific metrics. All naming changes follow our standard deprecation process, meaning that in v0.7, any metric that is renamed will still work but raise an error asking to use the new metric name. From v0.8, the old metric names will no longer be available.

    [0.7.0] - 2022-01-17

    Added

    • Added NLP metrics:
      • MatchErrorRate (#619)
      • WordInfoLost and WordInfoPreserved (#630)
      • SQuAD (#623)
      • CHRFScore (#641)
      • TranslationEditRate (#646)
      • ExtendedEditDistance (#668)
    • Added MultiScaleSSIM into image metrics (#679)
    • Added Signal to Distortion Ratio (SDR) to audio package (#565)
    • Added MinMaxMetric to wrappers (#556)
    • Added ignore_index to retrieval metrics (#676)
    • Added support for multi references in ROUGEScore (#680)
    • Added a default VSCode devcontainer configuration (#621)

    Changed

    • Scalar metrics will now consistently have additional dimensions squeezed (#622)
    • Metrics having third party dependencies removed from global import (#463)
    • Untokenized for BLEUScore input stay consistent with all the other text metrics (#640)
    • Arguments reordered for TER, BLEUScore, SacreBLEUScore, CHRFScore now the expected input order is predictions first and target second (#696)
    • Changed dtype of metric state from torch.float to torch.long in ConfusionMatrix to accommodate larger values (#715)
    • Unify preds, target input argument's naming across all text metrics (#723, #727)
      • bert, bleu, chrf, sacre_bleu, wip, wil, cer, ter, wer, mer, rouge, squad

    Deprecated

    • Renamed IoU -> Jaccard Index (#662)
    • Renamed text WER metric: (#714)
      • functional.wer -> functional.word_error_rate
      • WER -> WordErrorRate
    • Renamed correlation coefficient classes: (#710)
      • MatthewsCorrcoef -> MatthewsCorrCoef
      • PearsonCorrcoef -> PearsonCorrCoef
      • SpearmanCorrcoef -> SpearmanCorrCoef
    • Renamed audio STOI metric: (#753, #758)
      • audio.STOI to audio.ShortTimeObjectiveIntelligibility
      • functional.audio.stoi to functional.audio.short_time_objective_intelligibility
    • Renamed audio PESQ metrics: (#751)
      • functional.audio.pesq -> functional.audio.perceptual_evaluation_speech_quality
      • audio.PESQ -> audio.PerceptualEvaluationSpeechQuality
    • Renamed audio SDR metrics: (#711)
      • functional.sdr -> functional.signal_distortion_ratio
      • functional.si_sdr -> functional.scale_invariant_signal_distortion_ratio
      • SDR -> SignalDistortionRatio
      • SI_SDR -> ScaleInvariantSignalDistortionRatio
    • Renamed audio SNR metrics: (#712)
      • functional.snr -> functional.signal_distortion_ratio
      • functional.si_snr -> functional.scale_invariant_signal_noise_ratio
      • SNR -> SignalNoiseRatio
      • SI_SNR -> ScaleInvariantSignalNoiseRatio
    • Renamed F-score metrics: (#731, #740)
      • functional.f1 -> functional.f1_score
      • F1 -> F1Score
      • functional.fbeta -> functional.fbeta_score
      • FBeta -> FBetaScore
    • Renamed Hinge metric: (#734)
      • functional.hinge -> functional.hinge_loss
      • Hinge -> HingeLoss
    • Renamed image PSNR metrics (#732)
      • functional.psnr -> functional.peak_signal_noise_ratio
      • PSNR -> PeakSignalNoiseRatio
    • Renamed image PIT metric: (#737)
      • functional.pit -> functional.permutation_invariant_training
      • PIT -> PermutationInvariantTraining
    • Renamed image SSIM metric: (#747)
      • functional.ssim -> functional.scale_invariant_signal_noise_ratio
      • SSIM -> StructuralSimilarityIndexMeasure
    • Renamed detection MAP to MeanAveragePrecision metric (#754)
    • Renamed Fidelity & LPIPS image metric: (#752)
      • image.FID -> image.FrechetInceptionDistance
      • image.KID -> image.KernelInceptionDistance
      • image.LPIPS -> image.LearnedPerceptualImagePatchSimilarity

    Removed

    • Removed embedding_similarity metric (#638)
    • Removed argument concatenate_texts from wer metric (#638)
    • Removed arguments newline_sep and decimal_places from rouge metric (#638)

    Fixed

    • Fixed MetricCollection kwargs filtering when no kwargs are present in update signature (#707)

    Contributors

    @ashutoshml, @Borda, @cuent, @Fariborzzz, @getgaurav2, @janhenriklambrechts, @justusschock, @karthikrangasai, @lucadiliello, @mahinlma, @mathemusician, @mona0809, @mrleu, @puhuk, @quancs, @SkafteNicki, @stancld, @twsl

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.7.0-py3-none-any.whl(387.33 KB)
    torchmetrics-0.7.0.tar.gz(216.82 KB)
  • v0.6.2(Dec 15, 2021)

  • v0.6.1(Dec 6, 2021)

    [0.6.1] - 2021-12-06

    Changed

    • Migrate MAP metrics from pycocotools to PyTorch (#632)
    • Use torch.topk instead of torch.argsort in retrieval precision for speedup (#627)

    Fixed

    • Fix empty predictions in MAP metric (#594, #610, #624)
    • Fix edge case of AUROC with average=weighted on GPU (#606)
    • Fixed forward in compositional metrics (#645)

    Contributors

    @Callidior, @SkafteNicki, @tkupek, @twsl, @zuoxingdong

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.6.1-py3-none-any.whl(324.66 KB)
    torchmetrics-0.6.1.tar.gz(175.61 KB)
  • v0.6.0(Oct 28, 2021)

    [0.6.0] - 2021-10-28

    We are excited to announce that Torchmetrics v0.6 is now publicly available. TorchMetrics v0.6 does not focus on specific domains but adds a ton of new metrics to several domains, thus increasing the number of metrics in the repository to over 60! Not only have v0.6 added metrics within already covered domains, but we also add support for two new: Pairwise metrics and detection.

    https://devblog.pytorchlightning.ai/torchmetrics-v0-6-more-metrics-than-ever-e98c3983621e

    Pairwise Metrics

    TorchMetrics v0.6 offers a new set of metrics in its functional backend for calculating pairwise distances. Given a tensor X with shape [N,d] (N observations, each in d dimensions), a pairwise metric calculates [N,N] matrix of all possible combinations between the rows of X.

    Detection

    TorchMetrics v0.6 now includes a detection package that provides for the MAP metric. The implementation essentially wraps pycocotools around securing that we get the correct value, but with the benefit of now being able to scale to multiple devices (as any other metric in TorchMetrics).

    New additions

    • In the audio package, we have two new metrics: Perceptual Evaluation of Speech Quality (PESQ) and Short Term Objective Intelligibility (STOI). Both metrics can be used to assert speech quality.

    • In the retrieval package, we also have two new metrics: R-precision and Hit-rate. R-precision corresponds to recall at the R-th position of the query. The hit rate is the ratio of the total number of hits returned as a result of a query (hits) to the total number of hits returned.

    • The text package also receives an update in the form of two new metrics: Sacre BLEU score and character error rate. Sacre BLUE score provides and more systematic way of comparing BLUE scores across tasks. The character error rate is similar to the word error rate but instead calculates if a given algorithm has correctly predicted a sentence based on a character-by-character comparison.

    • The regression package got a single new metric in the form of the Tweedie deviance score metric. Deviance scores are generally a better measure of fit than measures such as squared error when trying to model data coming from highly screwed distributions.

    • Finally, we have added five new metrics for simple aggregation: SumMetric, MeanMetric, MinMetric, MaxMetric, CatMetric. All five metrics take in a single input (either native python floats or torch.Tensor) and keep track of the sum, average, min, etc. These new aggregation metrics are especially useful in combination with self.log from lightning if you want to log something other than the average of the metric you are tracking.

    Detail changes

    Added

    • Added audio metrics:
      • Perceptual Evaluation of Speech Quality (PESQ) (#353)
      • Short Term Objective Intelligibility (STOI) (#353)
    • Added Information retrieval metrics:
      • RetrievalRPrecision (#577)
      • RetrievalHitRate (#576)
    • Added NLP metrics:
      • SacreBLEUScore (#546)
      • CharErrorRate (#575)
    • Added other metrics:
      • Tweedie Deviance Score (#499)
      • Learned Perceptual Image Patch Similarity (LPIPS) (#431)
    • Added MAP (mean average precision) metric to new detection package (#467)
    • Added support for float targets in nDCG metric (#437)
    • Added average argument to AveragePrecision metric for reducing multi-label and multi-class problems (#477)
    • Added MultioutputWrapper (#510)
    • Added metric sweeping:
      • higher_is_better as constant attribute (#544)
      • higher_is_better to rest of codebase (#584)
    • Added simple aggregation metrics: SumMetric, MeanMetric, CatMetric, MinMetric, MaxMetric (#506)
    • Added pairwise submodule with metrics (#553)
      • pairwise_cosine_similarity
      • pairwise_euclidean_distance
      • pairwise_linear_similarity
      • pairwise_manhatten_distance

    Changed

    • AveragePrecision will now as default output the macro average for multilabel and multiclass problems (#477)
    • half, double, float will no longer change the dtype of the metric states. Use metric.set_dtype instead (#493)
    • Renamed AverageMeter to MeanMetric (#506)
    • Changed is_differentiable from property to a constant attribute (#551)
    • ROC and AUROC will no longer throw an error when either the positive or negative class is missing. Instead, return 0 scores and give a warning

    Deprecated

    • Deprecated torchmetrics.functional.self_supervised.embedding_similarity in favour of new pairwise submodule

    Removed

    • Removed dtype property (#493)

    Fixed

    • Fixed bug in F1 with average='macro' and ignore_index!=None (#495)
    • Fixed bug in pit by using the returned first result to initialize device and type (#533)
    • Fixed SSIM metric using too much memory (#539)
    • Fixed bug where device property was not properly updated when the metric was a child of a module (#542)

    Contributors

    @an1lam, @Borda, @karthikrangasai, @lucadiliello, @mahinlma, @Obus, @quancs, @SkafteNicki, @stancld, @tkupek

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.6.0-py3-none-any.whl(321.63 KB)
    torchmetrics-0.6.0.tar.gz(171.65 KB)
  • v0.5.1(Sep 1, 2021)

    [0.5.1] - 2021-08-30

    Added

    • Added device and dtype properties (#462)
    • Added TextTester class for robustly testing text metrics (#450)

    Changed

    • Added support for float targets in nDCG metric (#437)

    Removed

    • Removed rouge-score as dependency for text package (#443)
    • Removed jiwer as dependency for text package (#446)
    • Removed bert-score as dependency for text package (#473)

    Fixed

    • Fixed ranking of samples in SpearmanCorrCoef metric (#448)
    • Fixed bug where compositional metrics where unable to sync because of type mismatch (#454)
    • Fixed metric hashing (#478)
    • Fixed BootStrapper metrics not working on GPU (#462)
    • Fixed the semantic ordering of kernel height and width in SSIM metric (#474)

    Contributors

    @justusschock, @karthikrangasai, @kingyiusuen, @Obus, @SkafteNicki, @stancld

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.5.1-py3-none-any.whl(276.33 KB)
    torchmetrics-0.5.1.tar.gz(151.02 KB)
  • v0.5.0(Aug 10, 2021)

    [0.5.0] - 2021-08-09

    This release includes general improvements to the library and new metrics within the NLP domain.

    https://devblog.pytorchlightning.ai/torchmetrics-v0-5-nlp-metrics-f4232467b0c5

    Natural language processing is arguably one of the most exciting areas of machine learning, with models such as BERT, ROBERTA, GPT-3 etc., really pushing what automated text translation, recognition, and generation systems are capable of. 

    With the introduction of these models, many metrics have been proposed that measure how well these models perform. TorchMetrics v0.5 includes 4 such metrics: BERT score, BLEU, ROUGE and WER.

    Detail changes

    Added

    • Added Text-related (NLP) metrics:
      • Word Error Rate (WER) (#383)
      • ROUGE (#399)
      • BERT score (#424)
      • BLUE score (#360)
    • Added MetricTracker wrapper metric for keeping track of the same metric over multiple epochs (#238)
    • Added other metrics:
      • Symmetric Mean Absolute Percentage error (SMAPE) (#375)
      • Calibration error (#394)
      • Permutation Invariant Training (PIT) (#384)
    • Added support in nDCG metric for target with values larger than 1 (#349)
    • Added support for negative targets in nDCG metric (#378)
    • Added None as reduction option in CosineSimilarity metric (#400)
    • Allowed passing labels in (n_samples, n_classes) to AveragePrecision (#386)

    Changed

    • Moved psnr and ssim from functional.regression.* to functional.image.* (#382)
    • Moved image_gradient from functional.image_gradients to functional.image.gradients (#381)
    • Moved R2Score from regression.r2score to regression.r2 (#371)
    • Pearson metric now only store 6 statistics instead of all predictions and targets (#380)
    • Use torch.argmax instead of torch.topk when k=1 for better performance (#419)
    • Moved check for number of samples in R2 score to support single sample updating (#426)

    Deprecated

    • Rename r2score >> r2_score and kldivergence >> kl_divergence in functional (#371)
    • Moved bleu_score from functional.nlp to functional.text.bleu (#360)

    Removed

    • Removed restriction that threshold has to be in (0,1) range to support logit input (#351, #401)
    • Removed restriction that preds could not be bigger than num_classes to support logit input (#357)
    • Removed module regression.psnr and regression.ssim (#382):
    • Removed (#379):
      • function functional.mean_relative_error
      • num_thresholds argument in BinnedPrecisionRecallCurve

    Fixed

    • Fixed bug where classification metrics with average='macro' would lead to wrong result if a class was missing (#303)
    • Fixed weighted, multi-class AUROC computation to allow for 0 observations of some class, as contribution to final AUROC is 0 (#376)
    • Fixed that _forward_cache and _computed attributes are also moved to the correct device if metric is moved (#413)
    • Fixed calculation in IoU metric when using ignore_index argument (#328)

    Contributors

    @BeyondTheProof, @Borda, @CSautier, @discort, @edwardclem, @gagan3012, @hugoperrin, @karthikrangasai, @paul-grundmann, @quancs, @rajs96, @SkafteNicki, @vatch123

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.5.0-py3-none-any.whl(265.63 KB)
    torchmetrics-0.5.0.tar.gz(139.14 KB)
  • v0.4.1(Jul 5, 2021)

  • v0.4.0(Jun 29, 2021)

    Overview

    https://devblog.pytorchlightning.ai/torchmetrics-v0-4-introducing-multimedia-metrics-e6380a3ad354

    Audio

    The first highlight of v0.4.0 is a set of 3 new metrics for calculating for evaluating audio data: Scale-invariant signal-to-distortion ratio, Scale-invariant signal-to-noise ratio, and signal-to-noise ratio. All these metrics take a predicted audio tensor and a target tensor, both with the shape [...,time] and calculate the metric over the time axis.

    Image

    Version v0.4.0 also includes a completely new image package. Since its initial 0.2.0 release, Torchmetrics has had both PSNR and SSIM in its regression module, metrics that can be used to evaluate image quality.  With the image module, we are adding three new metrics for evaluating the quality of generative models (such as GANS): Inception score (IS), Fréchet inception distance (FID) and kernel inception distance (KID).

    More Functionality

    In addition to the new audio and image package, we also want to highlight a couple of features:

    • Addition of MeanAbsolutePercentageError (MAPE) metric to the regression package. Useful in regression settings where you want to focus on the relative instead of absolute error.
    • Addition of KLDivergence metric to the classification package. Useful for measuring the distance between probability distributions like the ones outputted in variational auto-encoders.
    • Addition of CosineSimilarity metric to the regression package. Useful for calculating the angle between two embedding vectors in domains such as metric learning.
    • As requested by multiple users, Accuracy, Precision, Recall, FBeta, F1, StatScore, Hamming, ConfusionMatrix now directly support that predictions can be unnormalized, e.g. logits from your model. No need to call .softmax(dim=-1) anymore!
    • All modular metrics now have both a sync and sync_context methods that allow the user full control over when metric states are synced. Note that we still automatically do this whenever calling the compute method.
    • The is_differentiable property has been adopted by many more of our metrics!

    Thanks

    Big thanks to all community members for their contributions and feedback. A special thanks to @quancs for leading the development of the new audio package.

    [0.4.0] - 2021-06-24

    Added

    • Added Cosine Similarity metric (#305)
    • Added Specificity metric (#210)
    • Added add_metrics method to MetricCollection for adding additional metrics after initialization (#221)
    • Added pre-gather reduction in the case of dist_reduce_fx="cat" to reduce communication cost (#217)
    • Added better error message for AUROC when num_classes is not provided for multiclass input (#244)
    • Added support for unnormalized scores (e.g. logits) in Accuracy, Precision, Recall, FBeta, F1, StatScore, Hamming, ConfusionMatrix metrics (#200)
    • Added MeanAbsolutePercentageError(MAPE) metric. (#248)
    • Added squared argument to MeanSquaredError for computing RMSE (#249)
    • Added FID metric (#213)
    • Added is_differentiable property to ConfusionMatrix, F1, FBeta, Hamming, Hinge, IOU, MatthewsCorrcoef, Precision, Recall, PrecisionRecallCurve, ROC, StatScores (#253)
    • Added audio metrics: SNR, SI_SDR, SI_SNR (#292)
    • Added Inception Score metric to image module (#299)
    • Added KID metric to image module (#301)
    • Added sync and sync_context methods for manually controlling when metric states are synced (#302)
    • Added KLDivergence metric (#247)

    Changed

    • Forward cache is reset when reset method is called (#260)
    • Improved per-class metric handling for imbalanced datasets for precision, recall, precision_recall, fbeta, f1, accuracy, and specificity (#204)
    • Decorated torch.jit.unused to MetricCollection forward (#307)
    • Renamed thresholds argument to binned metrics for manually controlling the thresholds (#322)

    Deprecated

    • Deprecated torchmetrics.functional.mean_relative_error (#248)
    • Deprecated num_thresholds argument in BinnedPrecisionRecallCurve (#322)

    Removed

    • Removed argument is_multiclass (#319)

    Fixed

    • AUC can also support more dimensional inputs when all but one dimension are of size 1 (#242)
    • Fixed dtype of modular metrics after reset has been called (#243)
    • Fixed calculation in matthews_corrcoef to correctly match formula (#321)

    Contributors

    @AnselmC, @arvindmuralie77, @bhadreshpsavani, @Borda, @GiannisVagionakis, @hassiahk, @IgorHoholko, @johannespitz, @justusschock, @maximsch2, @pranjaldatta, @quancs, @simran2905, @SkafteNicki, @tchaton

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.4.0-py3-none-any.whl(227.11 KB)
    torchmetrics-0.4.0.tar.gz(113.90 KB)
  • v0.3.2(May 10, 2021)

    [0.3.2] - 2021-05-10

    Added

    • Added is_differentiable property:
      • To AUC, AUROC, CohenKappa and AveragePrecision (#178)
      • To PearsonCorrCoef, SpearmanCorrcoef, R2Score and ExplainedVariance (#225)

    Changed

    • MetricCollection should return metrics with prefix on items(), keys() (#209)
    • Calling compute before update will now give an warning (#164)

    Removed

    • Removed numpy as dependency (#212)

    Fixed

    • Fixed auc calculation and add tests (#197)
    • Fixed loading persisted metric states using load_state_dict() (#202)
    • Fixed PSNR not working with DDP (#214)
    • Fixed metric calculation with unequal batch sizes (#220)
    • Fixed metric concatenation for list states for zero-dim input (#229)
    • Fixed numerical instability in AUROC metric for large input (#230)

    Contributors

    @bhadreshpsavani, @hlin09, @maximsch2, @SkafteNicki, @tchaton

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.3.2-py3-none-any.whl(267.77 KB)
    torchmetrics-0.3.2.tar.gz(96.48 KB)
  • v0.3.1(Apr 21, 2021)

  • v0.3.0(Apr 20, 2021)

    Information Retrieval

    Information retrieval (IR) metrics are used to evaluate how well a system is retrieving information from a database or from a collection of documents. This is the case with search engines, where a query provided by the user is compared with many possible results, some of which are relevant and some are not.

    When you query a search engine, you hope that results that could be useful are ranked higher on the results page. However, each query is usually compared with a different set of documents. For this reason, we had to implement a mechanism to allow users to easily compute the IR metrics in cases where each query is compared with a different number of possible candidates.

    For this reason, IR metrics feature an additional argument called indexes that say to which query a prediction refers to. In the end, all query-document pairs are grouped by query index and then the final result is computed as the average of the metric over each group.

    In total 6 new metrics have been added for doing information retrieval:

    • RetrievalMAP (Mean Average Precision)
    • RetrievalMRR (Mean Reciprocal Rank)
    • RetrievalPrecision (Precision for IR)
    • RetrievalRecall (Recall for IR)
    • RetrievalNormalizedDCG (Normalized Discounted Cumulative Gain)
    • RetrievalFallOut (Fall Out rate for IR)

    Special thanks go to @lucadiliello, for implementing all IR.

    Expanding and improving the collection

    In addition to expanding our collection to the field of information retrieval, this release also includes new metrics for the classification domain:

    • BootStrapper metric that can wrap around any other metric in our collection for easy computation of confidence intervals
    • CohenKappa is a statistic that is used to measure inter-rater reliability for qualitative (categorical) items
    • MatthewsCorrcoef or phi coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications
    • Hinge loss is used for "maximum-margin" classification, most notably for support vector machines.
    • PearsonCorrcoef is a metric for measuring the linear correlation between two sets of data
    • SpearmanCorrcoef is a metric for measuring the rank correlation between two sets of data. It assesses how well the relationship between two variables can be described using a monotonic function.

    Binned metrics

    The current implementation of the AveragePrecision and PrecisionRecallCurve has the drawback that it saves all predictions and targets in memory to correctly calculate the metric value. These metrics now receive a binned version that calculates the value at fixed thresholds. This is less precise than original implementations but also much more memory efficient.

    Special thanks go to @SkafteNicki, for letting all this happen.

    https://devblog.pytorchlightning.ai/torchmetrics-v0-3-0-information-retrieval-metrics-and-more-c55265e9b94f

    [0.3.0] - 2021-04-20

    Added

    • Added BootStrapper to easily calculate confidence intervals for metrics (#101)
    • Added Binned metrics (#128)
    • Added metrics for Information Retrieval:
      • Added RetrievalMAP (PL^5032)
      • Added RetrievalMRR (#119)
      • Added RetrievalPrecision (#139)
      • Added RetrievalRecall (#146)
      • Added RetrievalNormalizedDCG (#160)
      • Added RetrievalFallOut (#161)
    • Added other metrics:
      • Added CohenKappa (#69)
      • Added MatthewsCorrcoef (#98)
      • Added PearsonCorrcoef (#157)
      • Added SpearmanCorrcoef (#158)
      • Added Hinge (#120)
    • Added average='micro' as an option in AUROC for multilabel problems (#110)
    • Added multilabel support to ROC metric (#114)
    • Added testing for half precision (#77, #135)
    • Added AverageMeter for ad-hoc averages of values (#138)
    • Added prefix argument to MetricCollection (#70)
    • Added __getitem__ as metric arithmetic operation (#142)
    • Added property is_differentiable to metrics and test for differentiability (#154)
    • Added support for average, ignore_index and mdmc_average in Accuracy metric (#166)
    • Added postfix arg to MetricCollection (#188)

    Changed

    • Changed ExplainedVariance from storing all preds/targets to tracking 5 statistics (#68)
    • Changed behavior of confusionmatrix for multilabel data to better match multilabel_confusion_matrix from sklearn (#134)
    • Updated FBeta arguments (#111)
    • Changed reset method to use detach.clone() instead of deepcopy when resetting to default (#163)
    • Metrics passed as dict to MetricCollection will now always be in deterministic order (#173)
    • Allowed MetricCollection pass metrics as arguments (#176)

    Deprecated

    • Rename argument is_multiclass -> multiclass (#162)

    Removed

    • Prune remaining deprecated (#92)

    Fixed

    • Fixed when _stable_1d_sort to work when n>=N (PL^6177)
    • Fixed _computed attribute not being correctly reset (#147)
    • Fixed to Blau score (#165)
    • Fixed backwards compatibility for logging with older version of pytorch-lightning (#182)

    Contributors

    @alanhdu, @arvindmuralie77, @bhadreshpsavani, @Borda, @ethanwharris, @lucadiliello, @maximsch2, @SkafteNicki, @thomasgaudelet, @victorjoos

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.3.0-py3-none-any.whl(264.41 KB)
    torchmetrics-0.3.0.tar.gz(94.90 KB)
  • v0.2.0(Mar 12, 2021)

    What is Torchmetrics

    TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics. It offers:

    • A standardized interface to increase reproducability
    • Reduces Boilerplate
    • Distributed-training compatible
    • Automatic accumulation over batches
    • Automatic synchronization between multiple devices

    You can use TorchMetrics in any PyTorch model, or with in PyTorch Lightning to enjoy additional features:

    • Module metrics are automatically placed on the correct device.
    • Native support for logging metrics in Lightning to reduce even more boilerplate.

    Using functional metrics

    Similar to torch.nn, most metrics have both a module-based and a functional version. The functional version implements the basic operations required for computing each metric. They are simple python functions that as input take torch.tensors and return the corresponding metric as a torch.tensor.

    import torch
    # import our library
    import torchmetrics
    
    # simulate a classification problem
    preds = torch.randn(10, 5).softmax(dim=-1)
    target = torch.randint(5, (10,))
    
    acc = torchmetrics.functional.accuracy(preds, target)
    

    Using Module metrics

    Nearly all functional metrics have a corresponding module-based metric that calls it a functional counterpart underneath. The module-based metrics are characterized by having one or more internal metrics states (similar to the parameters of the PyTorch module) that allow them to offer additional functionalities:

    • Accumulation of multiple batches
    • Automatic synchronization between multiple devices
    • Metric arithmetic
    import torch
    # import our library
    import torchmetrics
    
    # initialize metric
    metric = torchmetrics.Accuracy()
    
    n_batches = 10
    for i in range(n_batches):
        # simulate a classification problem
        preds = torch.randn(10, 5).softmax(dim=-1)
        target = torch.randint(5, (10,))
        # metric on current batch
        acc = metric(preds, target)
        print(f"Accuracy on batch {i}: {acc}")
    
    # metric on all batches using custom accumulation
    acc = metric.compute()
    print(f"Accuracy on all data: {acc}")
    

    Built-in metrics

    • Accuracy
    • AveragePrecision
    • AUC
    • AUROC
    • F1
    • Hamming Distance
    • ROC
    • ExplainedVariance
    • MeanSquaredError
    • R2Score
    • bleu_score
    • embedding_similarity

    And many more!

    Contributors

    @Borda, @SkafteNicki, @williamFalcon, @teddykoker, @justusschock, @tadejsv, @edenlightning, @ydcjeff, @ddrevicky, @ananyahjha93, @awaelchli, @rohitgr7, @akihironitta, @manipopopo, @Diuven, @arnaudgelas, @s-rog, @c00k1ez, @tgaddair, @elias-ramzi, @cuent, @jpcarzolio, @bryant1410, @shivdhar, @Sordie, @krzysztofwos, @abhik-99, @bernardomig, @peblair, @InCogNiTo124, @j-dsouza, @pranjaldatta, @ananthsub, @deng-cy, @abhinavg97, @tridao, @prampey, @abrahambotros, @ozen, @ShomyLiu, @yuntai, @pwwang

    If we forgot someone due to not matching commit email with GitHub account, let us know :]

    Source code(tar.gz)
    Source code(zip)
    torchmetrics-0.2.0-py3-none-any.whl(172.74 KB)
    torchmetrics-0.2.0.tar.gz(68.85 KB)
Music source separation is a task to separate audio recordings into individual sources

Music Source Separation Music source separation is a task to separate audio recordings into individual sources. This repository is an PyTorch implmeme

Bytedance Inc. 958 Jan 03, 2023
Bot developed in Python that automates races in pegaxy.

español | português About it: This is a fork from pega-racing-bot. This bot, developed in Python, is to automate races in pegaxy. The game developers

4 Apr 08, 2022
Generalized Proximal Policy Optimization with Sample Reuse (GePPO)

Generalized Proximal Policy Optimization with Sample Reuse This repository is the official implementation of the reinforcement learning algorithm Gene

Jimmy Queeney 9 Nov 28, 2022
Multimodal Temporal Context Network (MTCN)

Multimodal Temporal Context Network (MTCN) This repository implements the model proposed in the paper: Evangelos Kazakos, Jaesung Huh, Arsha Nagrani,

Evangelos Kazakos 13 Nov 24, 2022
Affine / perspective transformation in Pose Estimation with Tensorflow 2

Pose Transformation Affine / Perspective transformation in Pose Estimation with Tensorflow 2 Introduction 이 repo는 pose estimation을 연구하고 개발하는 데 도움이 되기

Kim Junho 1 Dec 22, 2021
Collision risk estimation using stochastic motion models

collision_risk_estimation Collision risk estimation using stochastic motion models. This is a new approach, based on stochastic models, to predict the

Unmesh 7 Jun 26, 2022
Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

WenxueCui 7 Nov 07, 2022
Laser device for neutralizing - mosquitoes, weeds and pests

Laser device for neutralizing - mosquitoes, weeds and pests (in progress) Here I will post information for creating a laser device. A warning!! How It

Ildaron 1k Jan 02, 2023
Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

JeffLi 347 Dec 24, 2022
meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

meProp The codes were used for the paper meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting (ICML 2017) [pdf]

LancoPKU 107 Nov 18, 2022
Fortuitous Forgetting in Connectionist Networks

Fortuitous Forgetting in Connectionist Networks Introduction This repository includes reference code for the paper Fortuitous Forgetting in Connection

Hattie Zhou 14 Nov 26, 2022
Pytorch implementation of our method for regularizing nerual radiance fields for few-shot neural volume rendering.

InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering Pytorch implementation of our method for regularizing nerual radiance fields f

106 Jan 06, 2023
A curated list of long-tailed recognition resources.

Awesome Long-tailed Recognition A curated list of long-tailed recognition and related resources. Please feel free to pull requests or open an issue to

Zhiwei ZHANG 542 Jan 01, 2023
JugLab 33 Dec 30, 2022
SegTransVAE: Hybrid CNN - Transformer with Regularization for medical image segmentation

SegTransVAE: Hybrid CNN - Transformer with Regularization for medical image segmentation This repo is the official implementation for SegTransVAE. Seg

Nguyen Truong Hai 4 Aug 04, 2022
Towards Multi-Camera 3D Human Pose Estimation in Wild Environment

PanopticStudio Toolbox This repository has a toolbox to download, process, and visualize the Panoptic Studio (Panoptic) data. Note: Sep-21-2020: Curre

335 Jan 09, 2023
The easiest tool for extracting radiomics features and training ML models on them.

Simple pipeline for experimenting with radiomics features Installation git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git cd classrad pi

Piotr Woźnicki 17 Aug 04, 2022
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

Jia Research Lab 115 Dec 23, 2022
5 Jan 05, 2023
Eth brownie struct encoding example

eth-brownie struct encoding example Overview This repository contains an example of encoding a struct, so that it can be used in a function call, usin

Ittai Svidler 2 Mar 04, 2022