A library to generate synthetic time series data by easy-to-use factors and generator

Overview

timeseries-generator

This repository consists of a python packages that generates synthetic time series dataset in a generic way (under /timeseries_generator) and demo notebooks on how to generate synthetic timeseries data (under /examples). The goal here is to have non-sensitive data available to demo solutions and test the effectiveness of those solutions and/or algorithms. In order to test your algorithm, you want to have time series available containing different kinds of trends. The python package should help create different kinds of time series while still being maintainable.

timeseries_generator package

For this package, it is assumed that a time series is composed of a base value multiplied by many factors.

ts = base_value * factor1 * factor2 * ... * factorN + Noiser

Diagram

These factors can be anything, random noise, linear trends, to seasonality. The factors can affect different features. For example, some features in your time series may have a seasonal component, while others do not.

Different factors are represented in different classes, which inherit from the BaseFactor class. Factor classes are input for the Generator class, which creates a dataframe containing the features, base value, all the different factors working on the base value and and the final factor and value.

Core concept

  • Generator: a python class to generate the time series. A generator contains a list of factors and noiser. By overlaying the factors and noiser, generator can produce a customized time series
  • Factor: a python class to generate the trend, seasonality, holiday factors, etc. Factors take effect by multiplying on the base value of the generator.
  • Noised: a python class to generate time series noise data. Noiser take effect by summing on top of "factorized" time series. This formula describes the concepts we talk above

Built-in Factors

  • LinearTrend: give a linear trend based on the input slope and intercept
  • CountryYearlyTrend: give a yearly-based market cap factor based on the GDP per - capita.
  • EUEcoTrendComponents: give a monthly changed factor based on EU industry product public data
  • HolidayTrendComponents: simulate the holiday sale peak. It adapts the holiday days - differently in different country
  • BlackFridaySaleComponents: simulate the BlackFriday sale event
  • WeekendTrendComponents: more sales at weekends than on weekdays
  • FeatureRandFactorComponents: set up different sale amount for different stores and different product
  • ProductSeasonTrendComponents: simulate season-sensitive product sales. In this example code, we have 3 different types of product:
    • winter jacket: inverse-proportional to the temperature, more sales in winter
    • basketball top: proportional to the temperature, more sales in summer
    • Yoga Mat: temperature insensitive

Installation

pip install timeseries-generator

Usage

from timeseries_generator import LinearTrend, Generator, WhiteNoise, RandomFeatureFactor
import pandas as pd

# setting up a linear tren
lt = LinearTrend(coef=2.0, offset=1., col_name="my_linear_trend")
g = Generator(factors={lt}, features=None, date_range=pd.date_range(start="01-01-2020", end="01-20-2020"))
g.generate()
g.plot()

# update by adding some white noise to the generator
wn = WhiteNoise(stdev_factor=0.05)
g.update_factor(wn)
g.generate()
g.plot()

Example Notebooks

We currently have 2 example notebooks available:

  1. generate_stationary_process: Good for introducing the basics of the timeseries_generator. Shows how to apply simple linear trends and how to introduce features and labels, as well as random noise.
  2. use_external_factors: Goes more into detail and shows how to use the external_factors submodule. Shows how to create seasonal trends.

Web based prototyping UI

We also use Streamlit to build a web-based UI to demonstrate how to use this package to generate synthesis time series data in an interactive web UI.

streamlit run examples/streamlit/app.py

Web UI

License

This package is released under the Apache License, Version 2.0

You might also like...
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

Visualize classified time series data with interactive Sankey plots in Google Earth Engine
Visualize classified time series data with interactive Sankey plots in Google Earth Engine

sankee Visualize changes in classified time series data with interactive Sankey plots in Google Earth Engine Contents Description Installation Using P

PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing values.

A collection of Scikit-Learn compatible time series transformers and tools.
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Automatic extraction of relevant features from time series:
Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

Probabilistic time series modeling in Python
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Comments
  • Time series data augmentation

    Time series data augmentation

    There is a code example that gives to increase the amount of series data by adding slightly modified copies of already existing time series data or newly created synthetic series data from existing data?

    opened by YAYAYru 0
  • KeyError: 'country'

    KeyError: 'country'

    From the following code,

    from timeseries_generator import HolidayFactor, LinearTrend, Generator
    
    lt = LinearTrend(coef=2.0, offset=1., col_name="my_linear_trend")
    
    g: Generator = Generator(factors={lt}, features=None, date_range=pd.date_range(start="01-01-2020", end="01-01-2021"))
    
    holiday_factor = HolidayFactor(
        country_feature_name="country",
    )
    g.add_factor(holiday_factor)
    g.generate()
    

    I get the error. I am not sure this is expected behavior.

    File /usr/local/Caskroom/miniconda/base/envs/tf/lib/python3.9/site-packages/pandas/core/frame.py:10083, in DataFrame.merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
    ...
    -> 1849     raise KeyError(key)
       1851 # Check for duplicates
       1852 if values.ndim > 1:
    
    KeyError: 'country'
    
    opened by twobitunicorn 0
  • [Feature request] Customizable feature combinations

    [Feature request] Customizable feature combinations

    Hi team, Thanks for the useful library! I wonder if you'd be open to this idea:

    I would like to be able to:

    • Set up categorizing features (let's say, for illustration, CATEGORY=[footwear, t-shirts, socks], SIZE=[S, M, L, US-Mens-8, US-Womens-6) and define Factors on them
    • Generate time-series with more restricted feature combinations than the outer product (again for illustration, "t-shirt sizes for t-shirts, shoe sizes for footwear")

    Today, it seems like Generator.generate() hard-codes the assumption that time-series should be generated for the product of all provided feature values.

    It'd be helpful if, instead, we could have the option of customizing this join to limit down generated combinations?

    Some options I can think of:

    1. Leave the library as-is: Users generate full outer product and limit down what they want in post-processing
      • This seems possible already, but very RAM-intensive if your desired combinations are sparse?
    2. Accept an optional dataframe of factor combinations as parameter to the generate() method
      • Gives full flexibility over which combinations are kept / ignored, without assuming any particular rigid hierarchies between features
      • ...But might need to do a bit of validation to protect against user errors? May not be super easy to use without some documented examples / functions to generate the dataframe
    3. Some more complex API for feature configuration that accommodates specifying valid/invalid feature combinations
      • Might be nicer for usability, but difficult to make general: E.g. a straightforward hierarchy could be represented as a nested dict, but in practice many applications have multiple intersecting views of product category information e.g. brand, type, target segment, etc.
    opened by athewsey 1
  • Generate hourly data

    Generate hourly data

    First of all, thank you for making this repository public! I enjoy its ease of use and the built-in factors.

    Problem description

    I'm currently trying to generate revenue data for a bar/restaurant on an hourly basis. As far as I can see, the timeseries-generator only supports generating one data point per day, not per hour.

    I tried to generate hourly data like g = Generator(factors={lt}, features=None, date_range=pd.date_range(start='15/9/2021', end='30/9/2021', freq='h')) which didn't work.

    Potential solution

    Add the possibility to generate hourly data too. If this is a promising idea in your opinion, I'm willing to contribute to the implementation.

    Thank you in advance!

    opened by nileger 1
Releases(v0.1.0)
  • v0.1.0(Jul 20, 2021)

    • first release of time series generators, including:
      • base factor
      • linear trend factor
      • sinusoidal factor
      • white noise factor
      • random factor
      • holiday factor
      • weekday factor
      • country GDP factor
      • EU industry index factor
    • Examples
      • notebooks which includes some simple examples
      • streamlit dashboard
    Source code(tar.gz)
    Source code(zip)
Owner
Nike Inc.
Nike Inc.
Python Machine Learning Jupyter Notebooks (ML website)

Python Machine Learning Jupyter Notebooks (ML website) Dr. Tirthajyoti Sarkar, Fremont, California (Please feel free to connect on LinkedIn here) Also

Tirthajyoti Sarkar 2.6k Jan 03, 2023
AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

Derek Snow 465 Jan 02, 2023
🎛 Distributed machine learning made simple.

🎛 lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight

Machine Learning Tooling 44 Nov 27, 2022
GAM timeseries modeling with auto-changepoint detection. Inspired by Facebook Prophet and implemented in PyMC3

pm-prophet Pymc3-based universal time series prediction and decomposition library (inspired by Facebook Prophet). However, while Faceook prophet is a

Luca Giacomel 314 Dec 25, 2022
Tools for mathematical optimization region

Tools for mathematical optimization region

林景 15 Nov 30, 2022
A single Python file with some tools for visualizing machine learning in the terminal.

Machine Learning Visualization Tools A single Python file with some tools for visualizing machine learning in the terminal. This demo is composed of t

Bram Wasti 35 Dec 29, 2022
100 Days of Machine and Deep Learning Code

💯 Days of Machine Learning and Deep Learning Code MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Cluste

Tanishq Gautam 66 Nov 02, 2022
Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc)

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc). Structured a custom ensemble model and a neural network. Found a outperformed

Chris Yuan 1 Feb 06, 2022
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

SUN Group @ UMN 28 Aug 03, 2022
Client - 🔥 A tool for visualizing and tracking your machine learning experiments

Weights and Biases Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to produ

Weights & Biases 5.2k Jan 03, 2023
onelearn: Online learning in Python

onelearn: Online learning in Python Documentation | Reproduce experiments | onelearn stands for ONE-shot LEARNning. It is a small python package for o

15 Nov 06, 2022
Kaggle Competition using 15 numerical predictors to predict a continuous outcome.

Kaggle-Comp.-Data-Mining Kaggle Competition using 15 numerical predictors to predict a continuous outcome as part of a final project for a stats data

moisey alaev 1 Dec 28, 2021
Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

Send Rockets To Mars With AI Send rockets to Mars with artificial intelligence(Genetic algorithm) in python. Tools Python 3 EasyDraw How to Play Insta

Mohammad Dori 3 Jul 15, 2022
The code from the Machine Learning Bookcamp book and a free course based on the book

The code from the Machine Learning Bookcamp book and a free course based on the book

Alexey Grigorev 5.5k Jan 09, 2023
An easier way to build neural search on the cloud

Jina is geared towards building search systems for any kind of data, including text, images, audio, video and many more. With the modular design & multi-layer abstraction, you can leverage the effici

Jina AI 17k Jan 01, 2023
In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

SKlearn_to_MLFLow In this Repo a simple Sklearn Model will be trained and pushed to MLFlow Install This Repo is based on poetry python3 -m venv .venv

1 Dec 13, 2021
#30DaysOfStreamlit is a 30-day social challenge for you to build and deploy Streamlit apps.

30 Days Of Streamlit 🎈 This is the official repo of #30DaysOfStreamlit — a 30-day social challenge for you to learn, build and deploy Streamlit apps.

Streamlit 53 Jan 02, 2023
Simple linear model implementations from scratch.

Hand Crafted Models Simple linear model implementations from scratch. Table of contents Overview Project Structure Getting started Citing this project

Jonathan Sadighian 2 Sep 13, 2021
PyHarmonize: Adding harmony lines to recorded melodies in Python

PyHarmonize: Adding harmony lines to recorded melodies in Python About To use this module, the user provides a wav file containing a melody, the key i

Julian Kappler 2 May 20, 2022
Stats, linear algebra and einops for xarray

xarray-einstats Stats, linear algebra and einops for xarray ⚠️ Caution: This project is still in a very early development stage Installation To instal

ArviZ 30 Dec 28, 2022