A machine learning toolkit dedicated to time-series data

Last update: Jan 05, 2023

Overview

tslearn

The machine learning toolkit for time series analysis in Python

Section	Description
Installation	Installing the dependencies and tslearn
Getting started	A quick introduction on how to use tslearn
Available features	An extensive overview of tslearn's functionalities
Documentation	A link to our API reference and a gallery of examples
Contributing	A guide for heroes willing to contribute
Citation	A citation for tslearn for scholarly articles

Installation

There are different alternatives to install tslearn:

PyPi: python -m pip install tslearn
Conda: conda install -c conda-forge tslearn
Git: python -m pip install https://github.com/tslearn-team/tslearn/archive/master.zip

In order for the installation to be successful, the required dependencies must be installed. For a more detailed guide on how to install tslearn, please see the Documentation.

Getting started

1. Getting the data in the right format

tslearn expects a time series dataset to be formatted as a 3D numpy array. The three dimensions correspond to the number of time series, the number of measurements per time series and the number of dimensions respectively (n_ts, max_sz, d). In order to get the data in the right format, different solutions exist:

It should further be noted that tslearn supports variable-length timeseries.

>>> from tslearn.utils import to_time_series_dataset
>>> my_first_time_series = [1, 3, 4, 2]
>>> my_second_time_series = [1, 2, 4, 2]
>>> my_third_time_series = [1, 2, 4, 2, 2]
>>> X = to_time_series_dataset([my_first_time_series,
                                my_second_time_series,
                                my_third_time_series])
>>> y = [0, 1, 1]

2. Data preprocessing and transformations

Optionally, tslearn has several utilities to preprocess the data. In order to facilitate the convergence of different algorithms, you can scale time series. Alternatively, in order to speed up training times, one can resample the data or apply a piece-wise transformation.

>>> from tslearn.preprocessing import TimeSeriesScalerMinMax
>>> X_scaled = TimeSeriesScalerMinMax().fit_transform(X)
>>> print(X_scaled)
[[[0.] [0.667] [1.] [0.333] [nan]]
 [[0.] [0.333] [1.] [0.333] [nan]]
 [[0.] [0.333] [1.] [0.333] [0.333]]]

3. Training a model

After getting the data in the right format, a model can be trained. Depending on the use case, tslearn supports different tasks: classification, clustering and regression. For an extensive overview of possibilities, check out our gallery of examples.

>>> from tslearn.neighbors import KNeighborsTimeSeriesClassifier
>>> knn = KNeighborsTimeSeriesClassifier(n_neighbors=1)
>>> knn.fit(X_scaled, y)
>>> print(knn.predict(X_scaled))
[0 1 1]

As can be seen, the models in tslearn follow the same API as those of the well-known scikit-learn. Moreover, they are fully compatible with it, allowing to use different scikit-learn utilities such as hyper-parameter tuning and pipelines.

4. More analyses

tslearn further allows to perform all different types of analysis. Examples include calculating barycenters of a group of time series or calculate the distances between time series using a variety of distance metrics.

Available features

data	processing	clustering	classification	regression	metrics
UCR Datasets	Scaling	TimeSeriesKMeans	KNN Classifier	KNN Regressor	Dynamic Time Warping
Generators	Piecewise	KShape	TimeSeriesSVC	TimeSeriesSVR	Global Alignment Kernel
Conversion(1, 2)		KernelKmeans	ShapeletModel	MLP	Barycenters
			Early Classification		Matrix Profile

Documentation

The documentation is hosted at readthedocs. It includes an API, gallery of examples and a user guide.

Contributing

If you would like to contribute to tslearn, please have a look at our contribution guidelines. A list of interesting TODO's can be found here. If you want other ML methods for time series to be added to this TODO list, do not hesitate to open an issue!

Referencing tslearn

If you use tslearn in a scientific publication, we would appreciate citations:

@article{JMLR:v21:20-091,
  author  = {Romain Tavenard and Johann Faouzi and Gilles Vandewiele and 
             Felix Divo and Guillaume Androz and Chester Holtz and 
             Marie Payne and Roman Yurchak and Marc Ru{\ss}wurm and 
             Kushal Kolar and Eli Woods},
  title   = {Tslearn, A Machine Learning Toolkit for Time Series Data},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {118},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/20-091.html}
}

Acknowledgments

Authors would like to thank Mathieu Blondel for providing code for Kernel k-means and Soft-DTW.

Comments

[MRG] Flow to test for sklearn compatibility

Hello,

This is a PR which allows to test automatically for all tslearn estimators whether they comply to the required checks of sklearn, allowing them to be used in their utilities such as GridSearchCV, Pipeline, ... The code to do this is currently located in tslearn/testing_utils.py, but should be moved to tslearn/testing when available.

I also included an example demonstrating how GlobalGAKMeans can now be used with an sklearn pipeline, in tslearn/docs/examples/plot_gakkmeans_sklearn.

All feedback is more than welcome!

Kind regards, Gilles

opened by GillesVandewiele 162
[WIP] Save models to hdf5 and other formats
Hi,

I thought it would be useful to save the KShape model without pickling. I implemented a simple to_hdf5() method for saving a KShape model to an hdf5 file and from_hdf5() for reloading it so that predictions can be done with the model.

Changes to the KShape class:

the class attribute "model_attrs" is a list of attributes that are sufficient to describe the model.

to_dict() method packages the model attributes and params to a dict.

to_hdf5() and from_hdf() can be used to save/load the model to/from hdf5 files.

put instance attributes in constructor

An hdftools module is added to handle saving a dict of numpy arrays to an hdf file.

Usage:

ks.to_hdf5('/path/to/file.h5') model = KShape.from_hdf5('path/to/file.h5')
opened by kushalkolar 37
[MRG] Adding SAX+MINDIST to KNN
This PR contains the following changes:

'sax' is now a valid metric for KNN:

knn = KNeighborsTimeSeriesClassifier(n_neighbors=1, metric='sax')

Added BaseEstimator to classes in preprocessing module so that they can be used within a Pipeline (errors were raised when using TimeSeriesScalerMeanVariance)

Fixed a bug in kneighbors method which would always return [0] as nearest neighbor for every sample.

knn = KNeighborsTimeSeriesClassifier(n_neighbors=1, metric='dtw') knn.fit(X_train, y_train) _, ind = knn.kneighbors(X_test) # ind would be filled with 0's

Slightly changed to code of kneighbors so that its result is consistent with sklearn. There was a small difference in breaking ties (tslearn would pick largest index while sklearn would pick the smallest index). Now the following code is equivalent:

knn = KNeighborsTimeSeriesClassifier(n_neighbors=1, metric='dtw') knn.fit(X_train, y_train) _, ind = knn.kneighbors(X_test) knn = KNeighborsTimeSeriesClassifier(n_neighbors=1, metric='precomputed') all_X = numpy.vstack((X_train, X_test)) distances = pairwise_distances(all_X, metric=dtw) X_train = distances[:len(X_train), :len(X_train)] X_test = distances[len(X_train):, :len(X_train)] knn.fit(X_train, y_train) _, ind = knn.kneighbors(X_test) # both ind vectors are now equal (while that was not the case before this PR)

Some remarks:

I am unexperienced with numba; adding an njit decorator to cdist_sax did not work immediately, I could perhaps use some help with that.
opened by GillesVandewiele 37
[MRG] Shapelet Support Tensorflow 2

Made a few changes to support Tensorflow 2, and remove Keras as a separate dependancy. I'm just testing out tslearn and am not sure if these changes are wanted. No offense will be taken if these don't get included. :)

Have a great day, I'm excited to see what tslearn has to offer.

opened by page1 34
Replace implicit imports with explicit imports

Fixes #134

As title says, the implicit imports are replaced with explicit imports in test_estimators.py. It was a bit hard to find some of them from scikit-learn. Let's see if it improves code coverage.

opened by johannfaouzi 27
[MRG] Accept variable-length time series for some pairs metrics/estimators
This is an attempt to make it possible to use estimators with metrics like DTW on variable-length time series.

The first attempt here is to make DTW/soft-DTW usable for kNN estimators on variable-length time series.

The test I ran is:

from tslearn.neighbors import KNeighborsTimeSeriesClassifier from tslearn.utils import to_time_series_dataset X = to_time_series_dataset([[1, 2, 3, 4], [1, 2, 3], [2, 5, 6, 7, 8, 9]]) y = [0, 0, 1] clf = KNeighborsTimeSeriesClassifier(metric="dtw", n_neighbors=1, metric_params={"global_constraint": "sakoe_chiba"}) clf.fit(X, y) print("---", clf._ts_fit) print(clf.predict(X))

First, we have to think about whether the hack I introduced is a good way to reach our goal and second, once we have chosen a way to proceed, we will have to:

do the same for other estimators (all those that accept dtw, soft-dtw, gak as metrics, ideally)

find a way to hack sklearn k-fold variants, since there are some checks for all-finite entries in the datasets there which fail for variable-length time series, if I remember correctly

@GillesVandewiele since you recently worked on making the estimators sklearn-compatible, could you review this PR?
opened by rtavenar 27
Make binary wheels for all platforms

Making binary wheels and uploading them to PyPi, would allow to pip install tslearn without needing a compiler or Cython.

Usually this requires quite a bit of work, see e.g. https://github.com/MacPython/scikit-learn-wheels/. However there is a shortcut with https://github.com/regro/conda-press that might allow generating wheels from conda-forge builds. I have not used it yet personally, but it could be worth a try.

opened by rth 23
kNN using SAX+MINDIST

When using this class what are the available "metrics" parameters that can be used? only "dtw"? any recommendation if i would want to use euclidean or for example the SAX distance, on using this classifier on a dataset with a SAX representation?
new feature

opened by ManuelMonteiro24 22
[WIP] Fix sklearn import deprecation warnings
This PR fixes the deprecation warnings that are raised when importing certain (now private) modules from sklearn.

Private API

Many things will move to a private API in the new sklearn version. Their module name will change and have a leading underscore. e.g. sklearn.neighbors.base becomes sklearn.neighbors._base. Unfortunately, these new module names will cause a crash in environments with older sklearn versions.

The proposed fix is the following for all deprecation warninings:

try: from sklearn.neighbors._base import KNeighborsMixin except ImportError: from sklearn.neighbors.base import KNeighborsMixin
opened by GillesVandewiele 20
Add initial guess as centroid

According to the issue #58 , here a proposal to improve clustering (only for the KShape method for now) by letting the user choose an initial guess as centroids. This guess is a numpy array of int which are the indices of the samples to be used as centroids instead of a random vector.

opened by gandroz 19
Scalable matrix profile
Is your feature request related to a problem? Please describe. tslearn has a matrix profile module that relies on a naive implementation. Based on a discussion with @seanlaw in #126 we could maybe consider having STUMPY as an optional dependency for this matrixprofile module in order to benefit from their scalable implementations.

Describe the solution you'd like That would require to improve on the existing MatrixProfile class by allowing to pick an implementation (using parameters passed at __init__ time) and the _transform(...) method should call the correct function

One additional thing to check is how stumpy deals with:

[x] variable-length time series

[ ] multidimensional time series

I will probably not have time to work on it. If anyone is interested to give a hand on this, feel free to tell.
new feature good first issue
opened by rtavenar 18
Memory issue for larger data

My dataset contains ~500000 rows and the clustering algorithm is having trouble as so much memory cannot be allocated (even when i'm reducing it to 100000 rows. Is there a way I can change the datatype to float32 or 16 so as to reduce the memory required?
new feature

opened by AtharvanDogra 0
[WIP] Add PyTorch backend for soft-DTW

This PR plans to make compatible the files soft_dtw_fast.py and softdtw_variants.py with the PyTorch backend.

We will take inspiration from the following GitHub repository: https://github.com/Sleepwalking/pytorch-softdtw/blob/master/soft_dtw.py

An introduction to Dynamic Time Warping can be found at: https://rtavenar.github.io/blog/dtw.html

An introduction about the differentiability of DTW and the case of soft-DTW can be found at: https://rtavenar.github.io/blog/softdtw.html

opened by YannCabanes 13
Continuous integration failing test on Linux for test check_pipeline_consistency of class LearningShapelets

This bug was first noticed in continuous integration tests of the PR #411 (which is now merged), but this bug seems unrelated to the PR. The continuous integration tests are failing with Linux but they pass with Windows and MacOS. I use Linux and Python 3.8, and the tests pass on my local computer. The failing test is related to the test the class tslearn.shapelets.shapelets.LearningShapelets by functions: test_all_estimators (tslearn/tests/test_estimators.py) --> check_estimator (tslearn/tests/test_estimators.py) --> check_pipeline_consistency (tslearn/tests/sklearn_patches.py).
bug

opened by YannCabanes 1
Application of shapelet discovery and shapelet transform on datasets without label

Hello , I have a dataset , like this where Q0 is the feature value and TS is the time stamp , and I would like to apply shapelet discovery and shapelet transform on this csv file. I have this huge one time series which I have sliced down to the number of parts(data snippets) , And every snippet is similar to this(below one) , now what I would like to do is to shapelet discovery first and then shapelet transform in order to detect anomalies in the time series data. Q0 TS 0.012364804744720459, 2018-03-02 00:44:51.303082 0.012344598770141602, 2018-03-02 00:44:51.375207 0.012604951858520508, 2018-03-02 00:44:51.475198 0.012307226657867432, 2018-03-02 00:44:51.575189 0.012397348880767822, 2018-03-02 00:44:51.675180 0.013141036033630371, 2018-03-02 00:44:51.775171 0.012811839580535889, 2018-03-02 00:44:51.875162 0.012950420379638672, 2018-03-02 00:44:51.975153 0.013257980346679688, 2018-03-02 00:44:52.075144
new feature

opened by adityabhandwalkar 0
Implement TimeSeriesBisectingKMeans

Is your feature request related to a problem? Please describe. Classical hierarchical clustering approaches requiring a distance matrix are far too resource intense for a lot of samples. Thus scikit learn has introduced the BisectingKMeans. However, there I cant use the dtw distance metric

Describe the solution you'd like It would be really great to more or less copy the scikit implementation following the TimeSeriesKMeans API. PS If you are really nice you could add an easier way to access the hierarchy as scikit does.
new feature

opened by adagrad 2

Releases(v0.5.2)

v0.5.2(Aug 16, 2021)

Source code(tar.gz)
Source code(zip)
v0.5.1.0(May 17, 2021)

Source code(tar.gz)
Source code(zip)
v0.5.1(May 17, 2021)

Source code(tar.gz)
Source code(zip)
v0.5.0.5(Jan 25, 2021)

Source code(tar.gz)
Source code(zip)
v0.5.0.3(Jan 25, 2021)

Source code(tar.gz)
Source code(zip)
v0.5.0.2(Jan 24, 2021)

Source code(tar.gz)
Source code(zip)
v0.5.0.1(Jan 24, 2021)

Source code(tar.gz)
Source code(zip)
v0.5.0(Jan 24, 2021)

Source code(tar.gz)
Source code(zip)
v0.4.1(Jun 18, 2020)

Source code(tar.gz)
Source code(zip)
v0.4.0.1(Jun 15, 2020)

Source code(tar.gz)
Source code(zip)
v0.3.1(May 2, 2020)

Source code(tar.gz)
Source code(zip)
v0.2.0(Aug 27, 2019)

Source code(tar.gz)
Source code(zip)
v0.1.24(Sep 26, 2018)

Source code(tar.gz)
Source code(zip)
v0.1.14(Apr 27, 2018)

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository https://tslearn.readthedocs.io

A Software Framework for Neuromorphic Computing

338 Dec 26, 2022

Time series changepoint detection

changepy Changepoint detection in time series in pure python Install pip install changepy Examples from changepy import pelt from cha

92 Nov 08, 2022

Deploy AutoML as a service using Flask

AutoML Service Deploy automated machine learning (AutoML) as a service using Flask, for both pipeline training and pipeline serving. The framework imp

221 Nov 04, 2022

Dieses Projekt ermöglicht es den Smartmeter der EVN (Netz Niederösterreich) über die Kundenschnittstelle auszulesen.

SmartMeterEVN Dieses Projekt ermöglicht es den Smartmeter der EVN (Netz Niederösterreich) über die Kundenschnittstelle auszulesen. Smart Meter werden

43 Dec 04, 2022

ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstraction

2.6k Jan 08, 2023

Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Tangram Website | Discord Tangram makes it easy for programmers to train, deploy, and monitor machine learning models. Run tangram train to train a mo

1.4k Jan 05, 2023

Module for statistical learning, with a particular emphasis on time-dependent modelling

Operating system Build Status Linux/Mac Windows tick tick is a Python 3 module for statistical learning, with a particular emphasis on time-dependent

410 Dec 14, 2022

Extreme Learning Machine implementation in Python

Python-ELM v0.3 --- ARCHIVED March 2021 --- This is an implementation of the Extreme Learning Machine [1][2] in Python, based on scikit-learn. From

511 Dec 20, 2022

InfiniteBoost: building infinite ensembles with gradient descent

InfiniteBoost Code for a paper InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109). A. Rogozhnikov, T. Likhomanenko De

183 Jan 03, 2023

scikit-learn is a python module for machine learning built on top of numpy / scipy

About scikit-learn is a python module for machine learning built on top of numpy / scipy. The purpose of the scikit-learn-tutorial subproject is to le

122 Dec 12, 2022

A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod

15 Aug 12, 2022

mlpack: a scalable C++ machine learning library --

4.2k Jan 01, 2023

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

363 Dec 14, 2022

A machine learning toolkit dedicated to time-series data

Related tags

Overview

tslearn

Installation

Getting started

1. Getting the data in the right format

2. Data preprocessing and transformations

3. Training a model

4. More analyses

Available features

Documentation

Contributing

Referencing tslearn

Acknowledgments

Comments

Private API

Releases(v0.5.2)

v0.5.2(Aug 16, 2021)

v0.5.1.0(May 17, 2021)

v0.5.1(May 17, 2021)

v0.5.0.5(Jan 25, 2021)

v0.5.0.3(Jan 25, 2021)

v0.5.0.2(Jan 24, 2021)

v0.5.0.1(Jan 24, 2021)

v0.5.0(Jan 24, 2021)

v0.4.1(Jun 18, 2020)

v0.4.0.1(Jun 15, 2020)

v0.3.1(May 2, 2020)

v0.2.0(Aug 27, 2019)

v0.1.24(Sep 26, 2018)

v0.1.14(Apr 27, 2018)

Owner

A Software Framework for Neuromorphic Computing

Time series changepoint detection

Deploy AutoML as a service using Flask

Dieses Projekt ermöglicht es den Smartmeter der EVN (Netz Niederösterreich) über die Kundenschnittstelle auszulesen.

ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Module for statistical learning, with a particular emphasis on time-dependent modelling

Extreme Learning Machine implementation in Python

InfiniteBoost: building infinite ensembles with gradient descent

scikit-learn is a python module for machine learning built on top of numpy / scipy

A Lucid Framework for Transparent and Interpretable Machine Learning Models.

mlpack: a scalable C++ machine learning library --

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

A collection of neat and practical data science and machine learning projects

Real-time domain adaptation for semantic segmentation

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

A Collection of Conference & School Notes in Machine Learning 🦄📝🎉

A simple machine learning python sign language detection project.

Scikit-Learn useful pre-defined Pipelines Hub

Dive into Machine Learning