Quantify the difference between two arbitrary curves in space

Overview

similaritymeasures

Downloads a month similaritymeasures ci codecov

Quantify the difference between two arbitrary curves

Curves in this case are:

  • discretized by inidviudal data points
  • ordered from a beginning to an ending

Consider the following two curves. We want to quantify how different the Numerical curve is from the Experimental curve. Notice how there are no concurrent Stress or Strain values in the two curves. Additionally one curve has more data points than the other curves.

Image of two different curves

In the ideal case the Numerical curve would match the Experimental curve exactly. This means that the two curves would appear directly on top of each other. Our measures of similarity would return a zero distance between two curves that were on top of each other.

Methods covered

This library includes the following methods to quantify the difference (or similarity) between two curves:

  • Partial Curve Mappingx (PCM) method: Matches the area of a subset between the two curves [1]
  • Area methodx: An algorithm for calculating the Area between two curves in 2D space [2]
  • Discrete Frechet distancey: The shortest distance in-between two curves, where you are allowed to very the speed at which you travel along each curve independently (walking dog problem) [3, 4, 5, 6, 7, 8]
  • Curve Lengthx method: Assumes that the only true independent variable of the curves is the arc-length distance along the curve from the origin [9, 10]
  • Dynamic Time Warpingy (DTW): A non-metric distance between two time-series curves that has been proven useful for a variety of applications [11, 12, 13, 14, 15, 16]

x denotes methods created specifically for material parameter identification

y denotes that the method implemented in this library supports N-D data!

Installation

Install with pip

[sudo] pip install similaritymeasures

or clone and install from source.

git clone https://github.com/cjekel/similarity_measures
[sudo] pip install ./similarity_measures

Example usage

This shows you how to compute the various similarity measures

import numpy as np
import similaritymeasures
import matplotlib.pyplot as plt

# Generate random experimental data
x = np.random.random(100)
y = np.random.random(100)
exp_data = np.zeros((100, 2))
exp_data[:, 0] = x
exp_data[:, 1] = y

# Generate random numerical data
x = np.random.random(100)
y = np.random.random(100)
num_data = np.zeros((100, 2))
num_data[:, 0] = x
num_data[:, 1] = y

# quantify the difference between the two curves using PCM
pcm = similaritymeasures.pcm(exp_data, num_data)

# quantify the difference between the two curves using
# Discrete Frechet distance
df = similaritymeasures.frechet_dist(exp_data, num_data)

# quantify the difference between the two curves using
# area between two curves
area = similaritymeasures.area_between_two_curves(exp_data, num_data)

# quantify the difference between the two curves using
# Curve Length based similarity measure
cl = similaritymeasures.curve_length_measure(exp_data, num_data)

# quantify the difference between the two curves using
# Dynamic Time Warping distance
dtw, d = similaritymeasures.dtw(exp_data, num_data)

# print the results
print(pcm, df, area, cl, dtw)

# plot the data
plt.figure()
plt.plot(exp_data[:, 0], exp_data[:, 1])
plt.plot(num_data[:, 0], num_data[:, 1])
plt.show()

If you are interested in setting up an optimization problem using these measures, check out this Jupyter Notebook which replicates Section 3.2 from [2].

Changelog

Version 0.3.0: Frechet distance now supports N-D data! See CHANGELOG.md for full details.

Documenation

Each function includes a descriptive docstring, which you can view online here.

References

[1] Katharina Witowski and Nielen Stander. Parameter Identification of Hysteretic Models Using Partial Curve Mapping. 12th AIAA Aviation Technology, Integration, and Op- erations (ATIO) Conference and 14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, sep 2012. doi: doi:10.2514/6.2012-5580.

[2] Jekel, C. F., Venter, G., Venter, M. P., Stander, N., & Haftka, R. T. (2018). Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. International Journal of Material Forming. https://doi.org/10.1007/s12289-018-1421-8

[3] M Maurice Frechet. Sur quelques points du calcul fonctionnel. Rendiconti del Circol Matematico di Palermo (1884-1940), 22(1):1–72, 1906.

[4] Thomas Eiter and Heikki Mannila. Computing discrete Frechet distance. Technical report, 1994.

[5] Anne Driemel, Sariel Har-Peled, and Carola Wenk. Approximating the Frechet Distance for Realistic Curves in Near Linear Time. Discrete & Computational Geometry, 48(1): 94–127, 2012. ISSN 1432-0444. doi: 10.1007/s00454-012-9402-z. URL http://dx.doi.org/10.1007/s00454-012-9402-z.

[6] K Bringmann. Why Walking the Dog Takes Time: Frechet Distance Has No Strongly Subquadratic Algorithms Unless SETH Fails, 2014.

[7] Sean L Seyler, Avishek Kumar, M F Thorpe, and Oliver Beckstein. Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLOS Computational Biology, 11(10):1–37, 2015. doi: 10.1371/journal.pcbi.1004568. URL https://doi.org/10.1371/journal.pcbi.1004568.

[8] Helmut Alt and Michael Godau. Computing the Frechet Distance Between Two Polyg- onal Curves. International Journal of Computational Geometry & Applications, 05 (01n02):75–91, 1995. doi: 10.1142/S0218195995000064.

[9] A Andrade-Campos, R De-Carvalho, and R A F Valente. Novel criteria for determina- tion of material model parameters. International Journal of Mechanical Sciences, 54 (1):294–305, 2012. ISSN 0020-7403. doi: https://doi.org/10.1016/j.ijmecsci.2011.11.010. URL http://www.sciencedirect.com/science/article/pii/S0020740311002451.

[10] J Cao and J Lin. A study on formulation of objective functions for determin- ing material models. International Journal of Mechanical Sciences, 50(2):193–204, 2008. ISSN 0020-7403. doi: https://doi.org/10.1016/j.ijmecsci.2007.07.003. URL http://www.sciencedirect.com/science/article/pii/S0020740307001178.

[11] Donald J Berndt and James Clifford. Using Dynamic Time Warping to Find Pat- terns in Time Series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, AAAIWS’94, pages 359–370. AAAI Press, 1994. URL http://dl.acm.org/citation.cfm?id=3000850.3000887.

[12] François Petitjean, Alain Ketterlin, and Pierre Gançarski. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition, 44 (3):678–693, 2011. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2010.09.013. URL http://www.sciencedirect.com/science/article/pii/S003132031000453X.

[13] Toni Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software; Vol 1, Issue 7 (2009), aug 2009. URL http://dx.doi.org/10.18637/jss.v031.i07.

[14] Stan Salvador and Philip Chan. Toward Accurate Dynamic Time Warping in Linear Time and Space. Intell. Data Anal., 11(5):561–580, oct 2007. ISSN 1088-467X. URL http://dl.acm.org/citation.cfm?id=1367985.1367993.

[15] Paolo Tormene, Toni Giorgino, Silvana Quaglini, and Mario Stefanelli. Matching incomplete time series with dynamic time warping: an algorithm and an applica- tion to post-stroke rehabilitation. Artificial Intelligence in Medicine, 45(1):11–34, 2009. ISSN 0933-3657. doi: https://doi.org/10.1016/j.artmed.2008.11.007. URL http://www.sciencedirect.com/science/article/pii/S0933365708001772.

[16] Senin, P., 2008. Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 855, pp.1-23. http://seninp.github.io/assets/pubs/senin_dtw_litreview_2008.pdf

Contributions welcome!

This is by no means a complete list of all possible similarity measures. For instance the SciPy Hausdorff distance is an alternative similarity measure useful if you don't know the beginning and ending of each curve. There are many more possible functions out there. Feel free to send PRs for other functions in literature!

Requirements for adding new method to this library:

  • all methods should be able to quantify the difference between two curves
  • method must support the case where each curve may have a different number of data points
  • follow the style of existing functions
  • reference to method details, or descriptive docstring of the method
  • include test(s) for your new method
  • minimum Python dependencies (try to stick to SciPy/numpy functions if possible)

Please cite

If you've found this information or library helpful please cite the following paper. You should also cite the papers of any methods that you have used.

Jekel, C. F., Venter, G., Venter, M. P., Stander, N., & Haftka, R. T. (2018). Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. International Journal of Material Forming. https://doi.org/10.1007/s12289-018-1421-8

@article{Jekel2019,
author = {Jekel, Charles F and Venter, Gerhard and Venter, Martin P and Stander, Nielen and Haftka, Raphael T},
doi = {10.1007/s12289-018-1421-8},
issn = {1960-6214},
journal = {International Journal of Material Forming},
month = {may},
title = {{Similarity measures for identifying material parameters from hysteresis loops using inverse analysis}},
url = {https://doi.org/10.1007/s12289-018-1421-8},
year = {2019}
}
Comments
  • frechet_dist input size is bounded by maximum recursion depth

    frechet_dist input size is bounded by maximum recursion depth

    Consider the followings:

    max_len = 1000
    a = [[1,2,3] for i in range(max_len)]
    b = [[1,6,3] for i in range(max_len)]
    frechet_dist(a,b)
    
    

    While running this code on a 32GB RAM machine it raises a stack-overflow error. I would suggest to switch the recursion based computations to iterative based computations using Queue's.

    Is anyone currently working on optimizing the memory usage of frechet_dist ?

    Thank you for your work, Arbel Amir

    opened by ArbelAmir 8
  • discrete Frechet distance between lists or 1D arrays

    discrete Frechet distance between lists or 1D arrays

    my question might sounds a little dumb. but is it possible to use similaritymeasures.frechet_dist() for lists or 1D arrays? i tried to calculate similarity between a list and other multiple list (which also contains the first list )but the most similar output was not the first argument.which i expect to return it since they are exactly the same.but it works when implemented in real coordinates with lat ,lon like trajectories and the most similar output is the first given argument.i'm trying to use factors other than distance for calculating similarity between two thing and those parametrs are just a numerical values and i'm wondering how can i use this frechet _dist for list arrays.

    opened by miladad8 6
  • Regarding code update in

    Regarding code update in "is_simple_quad" function on Aug 18,2019

    Dear Authors,

    Thanks for your contribution in the form of "simialritymeasures" library for quantifying the difference between the curves. I have been using it for finding the area between the curves. But, since your update in the code to check if the quadrilateral is simple or not [ in "is_simple_quad" function on Aug 18,2019], the output for area between the curves is not correct. (However, if I use the previous code the area returned is correct). Specifically, the "if condition" which checks the number of cross products with same sign, should be: sum(crossTF) > 2 instead of sum(crossTF) == 2

    The same can be checked from the following code which tries to find the area between two simple curves. Running the following prints : area1 : 0.0

    while using the previous code give correct area (4 in this case)

    import matplotlib.pyplot as plt
    import similaritymeasures
    
    xaxis=[0,1, 2, 3, 4]
    curve1=[0,0,0,0,0]
    curve2=[1,1,1,1,1]
    exp_data = np.zeros((len(xaxis), 2))
    num_data = np.zeros((len(xaxis), 2))
    
    exp_data[:, 0] = xaxis
    exp_data[:, 1] = curve1
    num_data[:, 0] = xaxis
    num_data[:, 1] = curve2
    
    plt.figure()
    plt.scatter(xaxis, curve1)
    plt.scatter(xaxis, curve2)
    plt.show()
            
    area1=similaritymeasures.area_between_two_curves(exp_data, num_data)
    print("area1 : "+str(area1) )```
    opened by aanchalMongia 5
  • Problem during pip install:

    Problem during pip install: "UnicodeDecodeError: 'gbk' codec can't decode byte"

    pip install similaritymeasures gives

    (base) D:\repositories\joinTracks>pip install similaritymeasures
    Collecting similaritymeasures
      Using cached similaritymeasures-0.4.3.tar.gz (397 kB)
        ERROR: Command errored out with exit status 1:
         command: 'C:\Users\s00557672\Anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\S00557~1\\AppData\\Local\\Temp\\pip-install-d5tndazp\\similaritymeasures\\setup.py'"'"';
    __file__='"'"'C:\\Users\\S00557~1\\AppData\\Local\\Temp\\pip-install-d5tndazp\\similaritymeasures\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'
    "');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\pip-egg-info'
             cwd: C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\
        Complete output (5 lines):
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\setup.py", line 12, in <module>
            long_description=open('README.md').read(),
        UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 5204: illegal multibyte sequence
        ----------------------------------------
    ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
    
    

    How can I fix it?

    opened by sergorl 4
  • Added MAE and MSE

    Added MAE and MSE

    Added extra functions to find the Mean Absolute Distance (MAE) and the Mean Squared Distance (MSE) between the two curves. It works with all the distance measures in scipy.spatial.distance.cdist.

    opened by HarshRaoD 2
  • Similarity between two curves which have different number of data points

    Similarity between two curves which have different number of data points

    Hello, I would like to know how to compute the similarity of two curves which own various number of data points? Such like that you referred to on the main page: Also, which methods support this? And what is the meaning of the results? Looking for your reply.

    opened by xiaobrnbrn 2
  • pcm may be wrong

    pcm may be wrong

    A user has pointed out that i have potentially an incorrect pcm implementation because I divide the distances by a max value.

    It is possible that it is a mistake on my part, where I was trying to combine code for the curve_length and pcm methods. It is also possible that I thought xmax and ymax would always be one, so it wouldn't matter. The curve_length method needs the max values because there is no other normalization.

    The line in question: https://github.com/cjekel/similarity_measures/blob/master/similaritymeasures/similaritymeasures.py#L352

    I'm pretty sure that line is correct for the curve_length_measure method.

    bug help wanted 
    opened by cjekel 0
  • Improv perf

    Improv perf

    Good afternoon, I hope this message finds you well, and I compliment and thank you for the code.

    I worked with frechet and dtw, and I found that computational performances of the frechet function were subpar since they didn't use the cdist function from scipy (which i found by far more performing than the minkowski_distance one).

    I took the freedom to change the code and propose you a pull request (i also modified tests code in order for it not to import the already installed package).

    On my day to day job I also use cython to improve performances of python programs, and I was thinking that maybe it could benefit some of the loops in the code.

    Best of wishes for whatever!

    Nuc

    opened by nucccc 3
  • Similarity between two curves using PyTorch

    Similarity between two curves using PyTorch

    Hi guys,

    I want to implement in my trainer a measure of similarity between my predicted trajectory and the GT trajectory. Here is an example:

    imagen

    The GT is the red line, my observation is the yellow line (almost hidden by the other participants) and the green line is my prediction. The other agents are not used at this moment.

    Now, in order to train my DL based Motion Prediction algorithm I am using the ADE, FDE and NLL losses w.r.t. the GT. Nevertheless, I think that if my prediction does not match exactly the GT but it is in the same centerline (but driving with a different velocity, for example) it will be better. E.g.

    imagen

    This prediction does not match the GT (until the red diamond at the bottom), but at least the shapes of both curves are more or less the same.

    How could I do that?

    opened by Cram3r95 20
  • Incorrect measurement of area between intersecting 2D curves

    Incorrect measurement of area between intersecting 2D curves

    It appears that either my understanding of area between the curves or its calculation in the library is incorrect (the referenced paper is paywalled). In the following example, I have two plots, where grey line is original data, and there are two different blue splines. Visually, you can clearly see that the area between two lines on the left plot is several times larger than the area on the right plot, but the calculation with similaritymeasures.area_between_two_curves shows only a 2x difference. image image

    Here is the GitHub gist, where I present the calculation (the GDrive zip with airfoils is public, so the whole thing can be ran in Colab or elsewhere if you modify the path in the 2nd cell): https://gist.github.com/rafalszulejko/2c9ff645b448d60d857975a8f7965045#file-wing-optimization2-ipynb

    opened by rafalszulejko 11
  • Faster DTW

    Faster DTW

    Hello,

    Thanks for a really nice repo with an easy-to-use API for quickly generating some metrics on curve similarities. I just thought I would let you know that there is a much faster DTW implementation than the one you are using in this repo which if it covers your needs you should consider replacing with the current implementation:

    Link to faster DTW implementation

    Carry on the great work! :)

    opened by vancromy 2
  • Add other interpolation methods to the area between curves method

    Add other interpolation methods to the area between curves method

    Right now the area between curves method uses bisection of largest gap to add artificial data points. This method was used to minimize the number of artificial quads/points. However, this can have some negative effects in some cases, specifically when the sampling rate is artificial and does not match (e.g. one curve is just a straight line with few points).

    A potential alternative it to use the arc length projection of one curve's points onto the other. This would preserve the sampling rate, and may make for more uniform quads. This is similar to what's done in PCM method.

    When another interpolation method is added, give users the choice of which interpolation method to use. Changing the interpolation method is anticipated to change the results.

    opened by cjekel 0
Releases(0.6.0)
  • 0.6.0(Oct 8, 2022)

    • similaritymeasures.pcm now produces different values! This was done to better follow the original algorithm. To get the same results from previous versions, set norm_seg_length=True. What this option does is scale each segment length by the maximum values of the curve (borrowed from the curve_length_measure). This scaling should not be needed with the PCM method because both curves are always scaled initially.
    • Fix docstring documentation for returns in similaritymeasures.dtw and similaritymeasures.curve_length_measure
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Aug 6, 2022)

Owner
Charles Jekel
Charles Jekel
Intro-to-dl - Resources for "Introduction to Deep Learning" course.

Introduction to Deep Learning course resources https://www.coursera.org/learn/intro-to-deep-learning Running on Google Colab (tested for all weeks) Go

Advanced Machine Learning specialisation by HSE 761 Dec 24, 2022
Code for "Universal inference meets random projections: a scalable test for log-concavity"

How to use this repository This repository contains code to replicate the results of "Universal inference meets random projections: a scalable test fo

Robin Dunn 0 Nov 21, 2021
Implementation for On Provable Benefits of Depth in Training Graph Convolutional Networks

Implementation for On Provable Benefits of Depth in Training Graph Convolutional Networks Setup This implementation is based on PyTorch = 1.0.0. Smal

Weilin Cong 8 Oct 28, 2022
official implemntation for "Contrastive Learning with Stronger Augmentations"

CLSA CLSA is a self-supervised learning methods which focused on the pattern learning from strong augmentations. Copyright (C) 2020 Xiao Wang, Guo-Jun

Lab for MAchine Perception and LEarning (MAPLE) 47 Nov 29, 2022
NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem

NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem Liang Xin, Wen Song, Zhiguang

xinliangedu 33 Dec 27, 2022
Official implementation of MSR-GCN (ICCV 2021 paper)

MSR-GCN Official implementation of MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction (ICCV 2021 paper) [Paper] [Sup

LevonDang 42 Nov 07, 2022
Trading Strategies for Freqtrade

Freqtrade Strategies Strategies for Freqtrade, developed primarily in a partnership between @werkkrew and @JimmyNixx from the Freqtrade Discord. Use t

Bryan Chain 242 Jan 07, 2023
ComPhy: Compositional Physical Reasoning ofObjects and Events from Videos

ComPhy This repository holds the code for the paper. ComPhy: Compositional Physical Reasoning ofObjects and Events from Videos, (Under review) PDF Pro

29 Dec 29, 2022
Semantic Edge Detection with Diverse Deep Supervision

Semantic Edge Detection with Diverse Deep Supervision This repository contains the code for our IJCV paper: "Semantic Edge Detection with Diverse Deep

Yun Liu 12 Dec 31, 2022
Interpretable-contrastive-word-mover-s-embedding

Interpretable-contrastive-word-mover-s-embedding Paper Datasets Here is a Dropbox link to the datasets used in the paper: https://www.dropbox.com/sh/n

0 Nov 02, 2021
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib

Grammarly 226 Dec 29, 2022
Deep generative models of 3D grids for structure-based drug discovery

What is liGAN? liGAN is a research codebase for training and evaluating deep generative models for de novo drug design based on 3D atomic density grid

Matt Ragoza 152 Jan 03, 2023
Controlling the MicriSpotAI robot from scratch

Project-MicroSpot-AI Controlling the MicriSpotAI robot from scratch Colaborators Alexander Dennis Components from MicroSpot The MicriSpotAI has the fo

Dennis Núñez-Fernández 5 Oct 20, 2022
Pytorch Geometric Tutorials

Pytorch Geometric Tutorials

Antonio Longa 648 Jan 08, 2023
This is a simple plugin for Vim that allows you to use OpenAI Codex.

🤖 Vim Codex An AI plugin that does the work for you. This is a simple plugin for Vim that will allow you to use OpenAI Codex. To use this plugin you

Tom Dörr 195 Dec 28, 2022
[AAAI-2021] Visual Boundary Knowledge Translation for Foreground Segmentation

Trans-Net Code for (Visual Boundary Knowledge Translation for Foreground Segmentation, AAAI2021). [https://ojs.aaai.org/index.php/AAAI/article/view/16

ZJU-VIPA 2 Mar 04, 2022
Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)

A simple API for working with University of California, Irvine (UCI) Machine Learning (ML) repository Table of Contents Introduction About Page of the

Tirthajyoti Sarkar 223 Dec 05, 2022
2.86% and 15.85% on CIFAR-10 and CIFAR-100

Shake-Shake regularization This repository contains the code for the paper Shake-Shake regularization. This arxiv paper is an extension of Shake-Shake

Xavier Gastaldi 294 Nov 22, 2022
1st-in-MICCAI2020-CPM - Combined Radiology and Pathology Classification

Combined Radiology and Pathology Classification MICCAI 2020 Combined Radiology a

22 Dec 08, 2022
Source Code of NeurIPS21 paper: Recognizing Vector Graphics without Rasterization

YOLaT-VectorGraphicsRecognition This repository is the official PyTorch implementation of our NeurIPS-2021 paper: Recognizing Vector Graphics without

Microsoft 49 Dec 20, 2022