Greedy Gaussian Segmentation

Related tags

Deep LearningGGS
Overview

GGS

Greedy Gaussian Segmentation (GGS) is a Python solver for efficiently segmenting multivariate time series data. For implementation details, please see our paper at http://stanford.edu/~boyd/papers/ggs.html.


The GGS Solver takes an n-by-T data matrix and breaks the T timestamps on an n-dimensional vector into segments over which the data is well explained as independent samples from a multivariate Gaussian distribution. It does so by formulating a covariance-regularized maximum likelihood problem and solving it using a greedy heuristic, with full details described in the paper.

Download & Setup

  1. Download the source code in the terminal by running:
git clone [email protected]:davidhallac/GGS.git
  1. Confirm that the code was properly downloaded by running:
cd GGS
python helloworld.py
  1. To write your own Python function that uses ggs, simply make sure that ggs.py is in the same directory as your new file, and then add the following code to the beginning of your script:
from ggs import *

Supported Functions

The GGS package has three main functions:

bps, objectives = GGS(data, Kmax, lamb)

Finds K breakpoints in the data for a given regularization parameter lambda

Inputs

data - a n-by-T data matrix, with T timestamps of an n-dimensional vector

Kmax - the number of breakpoints to find

lamb - regularization parameter for the regularized covariance

Returns

bps - List of lists, where element i of the larger list is the set of breakpoints found at K = i in the GGS algorithm

objectives - List of the objective values at each intermediate step (for K = 0 to Kmax)


meancovs = GGSMeanCov(data, breakpoints, lamb)

Finds the means and regularized covariances of each segment, given a set of breakpoints.

Inputs

data - a n-by-T data matrix, with T timestamps of an n-dimensional vector

breakpoints - a list of breakpoint locations

lamb - regularization parameter for the regularized covariance

Returns

meancovs - a list of (mean, covariance) tuples for each segment in the data


cvResults = GGSCrossVal(data, Kmax=25, lambList = [0.1, 1, 10])

Runs 10-fold cross validation, and returns the train and test set likelihood for every (K, lambda) pair up to Kmax

Inputs

data - a n-by-T data matrix, with T timestamps of an n-dimensional vector

Kmax - the maximum number of breakpoints to run GGS on

lambList - a list of regularization parameters to test

Returns

cvResults - list of (lamb, ([TrainLL],[TestLL])) tuples for each regularization parameter in lambList. Here, TrainLL and TestLL are the average per-sample log-likelihood across the 10 folds of cross-validation for all K's from 0 to Kmax


Additional optional parameters (for all three functions above):

features = [] - select a certain subset of columns in the data to operate on

verbose = False - Print intermediate steps when running the algorithm

Example Usage

Running financeExample.py will yield the following plot, showing the objective (Equation 4 in the paper) vs. the number of breakpoints:

Objective vs. # of breakpoints

Once we have solved for the locations of the breakpoints, we can use the FindMeanCovs() function to find the means and covariances of each segment. In the example in helloworld.py, plotting the means, variances, and covariances of the three signals yields:

Means and covariances over time

To run cross-validation, which can be useful in determining optimal values of K and lambda, we can use the following code to load the data, run the cross-validation, and then plot the test and train likelihood:

from ggs import *
import numpy as np
import matplotlib.pyplot as plt

filename = "Returns.txt"
data = np.genfromtxt(filename,delimiter=' ')
feats = [0,3,7]

#Run cross-validaton up to Kmax = 30, at lambda = 1e-4
maxBreaks = 30
lls = GGSCrossVal(data, Kmax=maxBreaks, lambList = [1e-4], features = feats, verbose = False)

trainLikelihood = lls[0][1][0]
testLikelihood = lls[0][1][1]
plt.plot(range(maxBreaks+1), testLikelihood)
plt.plot(range(maxBreaks+1), trainLikelihood)
plt.legend(['Test LL','Train LL'], loc='best')
plt.show()

The resulting plot looks like:

Test and train likelihood

References

Greedy Gaussian Segmentation of Time Series Data -- D. Hallac, P. Nystrup, and S. Boyd

Authors

David Hallac, Peter Nystrup, and Stephen Boyd.

Owner
Stanford University Convex Optimization Group
Stanford University Convex Optimization Group
Label Studio is a multi-type data labeling and annotation tool with standardized output format

Website • Docs • Twitter • Join Slack Community What is Label Studio? Label Studio is an open source data labeling tool. It lets you label data types

Heartex 11.7k Jan 09, 2023
MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios

MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios This is the official TensorFlow implementation of MetaTTE in the

morningstarwang 4 Dec 14, 2022
TensorFlow CNN for fast style transfer

Fast Style Transfer in TensorFlow Add styles from famous paintings to any photo in a fraction of a second! It takes 100ms on a 2015 Titan X to style t

1 Dec 14, 2021
BBScan py3 - BBScan py3 With Python

BBScan_py3 This repository is forked from lijiejie/BBScan 1.5. I migrated the fo

baiyunfei 12 Dec 30, 2022
DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

Jingwei Huang 130 Dec 02, 2022
AQP is a modular pipeline built to enable the comparison and testing of different quality metric configurations.

Audio Quality Platform - AQP An Open Modular Python Platform for Objective Speech and Audio Quality Metrics AQP is a highly modular pipeline designed

Jack Geraghty 24 Oct 01, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

36 Nov 23, 2022
Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021) This is the implementation of PSD (ICCV 2021),

12 Dec 12, 2022
AgML is a comprehensive library for agricultural machine learning

AgML is a comprehensive library for agricultural machine learning. Currently, AgML provides access to a wealth of public agricultural datasets for common agricultural deep learning tasks.

Plant AI and Biophysics Lab 1 Jul 07, 2022
Towards Representation Learning for Atmospheric Dynamics (AtmoDist)

Towards Representation Learning for Atmospheric Dynamics (AtmoDist) The prediction of future climate scenarios under anthropogenic forcing is critical

Sebastian Hoffmann 4 Dec 15, 2022
TransPrompt - Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

TransPrompt This code is implement for our EMNLP 2021's paper 《TransPrompt:Towards an Automatic Transferable Prompting Framework for Few-shot Text Cla

WangJianing 23 Dec 21, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Payphone 8 Nov 21, 2022
Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Noah Getz 3 Jun 22, 2022
cl;asification problem using classification models in supervised learning

wine-quality-predition---classification cl;asification problem using classification models in supervised learning Wine Quality Prediction Analysis - C

Vineeth Reddy Gangula 1 Jan 18, 2022
Approaches to modeling terrain and maps in python

topography 🌎 Contains different approaches to modeling terrain and topographic-style maps in python Features Inverse Distance Weighting (IDW) A given

John Gutierrez 1 Aug 10, 2022
Occlusion robust 3D face reconstruction model in CFR-GAN (WACV 2022)

Occlusion Robust 3D face Reconstruction Yeong-Joon Ju, Gun-Hee Lee, Jung-Ho Hong, and Seong-Whan Lee Code for Occlusion Robust 3D Face Reconstruction

Yeongjoon 31 Dec 19, 2022
Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization

Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization Code for reproducing our results in the Head2Toe paper. Paper: arxiv.or

Google Research 62 Dec 12, 2022
This implementation contains the application of GPlearn's symbolic transformer on a commodity futures sector of the financial market.

GPlearn_finiance_stock_futures_extension This implementation contains the application of GPlearn's symbolic transformer on a commodity futures sector

Chengwei <a href=[email protected]"> 189 Dec 25, 2022
Benchmark library for high-dimensional HPO of black-box models based on Weighted Lasso regression

LassoBench LassoBench is a library for high-dimensional hyperparameter optimization benchmarks based on Weighted Lasso regression. Note: LassoBench is

Kenan Šehić 5 Mar 15, 2022
A collection of papers about Transformer in the field of medical image analysis.

A collection of papers about Transformer in the field of medical image analysis.

Junyu Chen 377 Jan 05, 2023