tsflex is a toolkit for flexible time series processing & feature extraction, that is efficient and makes few assumptions about sequence data.
Useful links
Installation
| command | |
|---|---|
| pip | pip install tsflex | 
| conda | conda install -c conda-forge tsflex | 
Usage
tsflex is built to be intuitive, so we encourage you to copy-paste this code and toy with some parameters!
Feature extraction
import pandas as pd; import numpy as np; import scipy.stats as ss
from tsflex.features import MultipleFeatureDescriptors, FeatureCollection
from tsflex.utils.data import load_empatica_data
# 1. Load sequence-indexed data (in this case a time-index)
df_tmp, df_acc, df_ibi = load_empatica_data(['tmp', 'acc', 'ibi'])
# 2. Construct your feature extraction configuration
fc = FeatureCollection(
    MultipleFeatureDescriptors(
          functions=[np.min, np.mean, np.std, ss.skew, ss.kurtosis],
          series_names=["TMP", "ACC_x", "ACC_y", "IBI"],
          windows=["15min", "30min"],
          strides="15min",
    )
)
# 3. Extract features
fc.calculate(data=[df_tmp, df_acc, df_ibi], approve_sparsity=True)
Note that the feature extraction is performed on multivariate data with varying sample rates.
| signal | columns | sample rate | 
|---|---|---|
| df_tmp | ["TMP"] | 4Hz | 
| df_acc | ["ACC_x", "ACC_y", "ACC_z" ] | 32Hz | 
| df_ibi | ["IBI"] | irregularly sampled | 
Processing
Why tsflex? 
✨
 
- Flexible:- handles multivariate/multimodal time series
- versatile function support => integrates with many packages for:
- processing (e.g., scipy.signal, statsmodels.tsa)
- feature extraction (e.g., numpy, scipy.stats, seglearn¹, tsfresh¹, tsfel¹)
 
- feature extraction handles multiple strides & window sizes
 
- Efficient:
 - view-based operations for processing & feature extraction => extremely low memory peak & fast execution time
 
 
- view-based operations for processing & feature extraction => extremely low memory peak & fast execution time
- Intuitive:
 - maintains the sequence-index of the data
- feature extraction constructs interpretable output column names
- intuitive API
 
- Few assumptionsabout the sequence data:- no assumptions about sampling rate
- able to deal with multivariate asynchronous data
 i.e. data with small time-offsets between the modalities
 
- Advanced functionalities:- apply FeatureCollection.reduce after feature selection for faster inference
- use function execution time logging to discover processing and feature extraction bottlenecks
- embedded SeriesPipeline & FeatureCollection serialization
- time series chunking
 
¹ These integrations are shown in integration-example notebooks.
Future work 
🔨
 
- scikit-learn integration for both processing and feature extraction
 note: is actively developed upon sklearn integration branch.
- Support time series segmentation (exposing under the hood strided-rolling functionality) - see this issue
- Support for multi-indexed dataframes
=> Also see the enhancement issues
Contributing 
👪
 
We are thrilled to see your contributions to further enhance tsflex.
 See this guide for more instructions on how to contribute.
Referencing our package
If you use tsflex in a scientific publication, we would highly appreciate citing us as:
@article{vanderdonckt2021tsflex,
    author = {Van Der Donckt, Jonas and Van Der Donckt, Jeroen and Deprost, Emiel and Van Hoecke, Sofie},
    title = {tsflex: flexible time series processing \& feature extraction},
    journal = {SoftwareX},
    year = {2021},
    url = {https://github.com/predict-idlab/tsflex},
    publisher={Elsevier}
}
Linkt to the preprint paper: https://arxiv.org/abs/2111.12429
 


