A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

Last update: Dec 29, 2022

Overview

MatrixProfile

MatrixProfile is a Python 3 library, brought to you by the Matrix Profile Foundation, for mining time series data. The Matrix Profile is a novel data structure with corresponding algorithms (stomp, regimes, motifs, etc.) developed by the Keogh and Mueen research groups at UC-Riverside and the University of New Mexico. The goal of this library is to make these algorithms accessible to both the novice and expert through standardization of core concepts, a simplistic API, and sensible default parameter values.

In addition to this Python library, the Matrix Profile Foundation, provides implementations in other languages. These languages have a pretty consistent API allowing you to easily switch between them without a huge learning curve.

tsmp - an R implementation
go-matrixprofile - a Golang implementation

Python Support

Currently, we support the following versions of Python:

Python 2 is no longer supported. There are earlier versions of this library that support Python 2.

Installation

The easiest way to install this library is using pip or conda. If you would like to install it from source, please review the installation documentation for your platform.

Installation with pip

pip install matrixprofile

Installation with conda

conda config --add channels conda-forge
conda install matrixprofile

Getting Started

This article provides introductory material on the Matrix Profile: Introduction to Matrix Profiles

This article provides details about core concepts introduced in this library: How To Painlessly Analyze Your Time Series

Our documentation provides a quick start guide, examples and api documentation. It is the source of truth for getting up and running.

Algorithms

For details about the algorithms implemented, including performance characteristics, please refer to the documentation.

Getting Help

We provide a dedicated Discord channel where practitioners can discuss applications and ask questions about the Matrix Profile Foundation libraries. If you rather not join Discord, then please open a Github issue.

Contributing

Please review the contributing guidelines located in our documentation.

Code of Conduct

Please review our Code of Conduct documentation.

Citations

All proper acknowledgements for works of others may be found in our citation documentation.

Citing

Please cite this work using the Journal of Open Source Software article.

Van Benschoten et al., (2020). MPA: a novel cross-language API for time series analysis. Journal of Open Source Software, 5(49), 2179, https://doi.org/10.21105/joss.02179

@article{Van Benschoten2020,
    doi = {10.21105/joss.02179},
    url = {https://doi.org/10.21105/joss.02179},
    year = {2020},
    publisher = {The Open Journal},
    volume = {5},
    number = {49},
    pages = {2179},
    author = {Andrew Van Benschoten and Austin Ouyang and Francisco Bischoff and Tyler Marrs},
    title = {MPA: a novel cross-language API for time series analysis},
    journal = {Journal of Open Source Software}
}

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

Related tags

Overview

MatrixProfile

Python Support

Installation

Getting Started

Algorithms

Getting Help

Contributing

Code of Conduct

Citations

Citing

Owner

Matrix Profile Foundation

DataPrep — The easiest way to prepare data in Python

This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot.

An extension to pandas dataframes describe function.

A highly efficient and modular implementation of Gaussian Processes in PyTorch

Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Flenser is a simple, minimal, automated exploratory data analysis tool.

BIGDATA SIMULATION ONE PIECE WORLD CENSUS

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

Provide a market analysis (R)

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

Working Time Statistics of working hours and working conditions by industry and company

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

Data pipelines built with polars

University Challenge 2021 With Python

track your GitHub statistics

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

Pyspark Spotify ETL