pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.

Related tags

Deep Learningpyhsmm
Overview

Build Status

Bayesian inference in HSMMs and HMMs

This is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.

There are also some extensions:

Installing from PyPI

Give this a shot:

pip install pyhsmm

You may need to install a compiler with -std=c++11 support, like gcc-4.7 or higher.

To install manually from the git repo, you'll need cython. Then try this:

python setup.py install

It might also help to look at the travis file to see how to set up a working install from scratch.

Running

See the examples directory.

For the Python interpreter to be able to import pyhsmm, you'll need it on your Python path. Since the current working directory is usually included in the Python path, you can probably run the examples from the same directory in which you run the git clone with commands like python pyhsmm/examples/hsmm.py. You might also want to add pyhsmm to your global Python path (e.g. by copying it to your site-packages directory).

A Simple Demonstration

Here's how to draw from the HDP-HSMM posterior over HSMMs given a sequence of observations. (The same example, along with the code to generate the synthetic data loaded in this example, can be found in examples/basic.py.)

Let's say we have some 2D data in a data.txt file:

$ head -5 data.txt
-3.711962552600095444e-02 1.456401745267922598e-01
7.553818775915704942e-02 2.457422192223903679e-01
-2.465977987699214502e+00 5.537627981813508793e-01
-7.031638516485749779e-01 1.536468304146855757e-01
-9.224669847039665971e-01 3.680035337673161489e-01

In Python, we can plot the data in a 2D plot, collapsing out the time dimension:

import numpy as np
from matplotlib import pyplot as plt

data = np.loadtxt('data.txt')
plt.plot(data[:,0],data[:,1],'kx')

2D data

We can also make a plot of time versus the first principal component:

from pyhsmm.util.plot import pca_project_data
plt.plot(pca_project_data(data,1))

Data first principal component vs time

To learn an HSMM, we'll use pyhsmm to create a WeakLimitHDPHSMM instance using some reasonable hyperparameters. We'll ask this model to infer the number of states as well, so we'll give it an Nmax parameter:

import pyhsmm
import pyhsmm.basic.distributions as distributions

obs_dim = 2
Nmax = 25

obs_hypparams = {'mu_0':np.zeros(obs_dim),
                'sigma_0':np.eye(obs_dim),
                'kappa_0':0.3,
                'nu_0':obs_dim+5}
dur_hypparams = {'alpha_0':2*30,
                 'beta_0':2}

obs_distns = [distributions.Gaussian(**obs_hypparams) for state in range(Nmax)]
dur_distns = [distributions.PoissonDuration(**dur_hypparams) for state in range(Nmax)]

posteriormodel = pyhsmm.models.WeakLimitHDPHSMM(
        alpha=6.,gamma=6., # better to sample over these; see concentration-resampling.py
        init_state_concentration=6., # pretty inconsequential
        obs_distns=obs_distns,
        dur_distns=dur_distns)

(The first two arguments set the "new-table" proportionality constant for the meta-Chinese Restaurant Process and the other CRPs, respectively, in the HDP prior on transition matrices. For this example, they really don't matter at all, but on real data it's much better to infer these parameters, as in examples/concentration_resampling.py.)

Then, we add the data we want to condition on:

posteriormodel.add_data(data,trunc=60)

The trunc parameter is an optional argument that can speed up inference: it sets a truncation limit on the maximum duration for any state. If you don't pass in the trunc argument, no truncation is used and all possible state duration lengths are considered. (pyhsmm has fancier ways to speed up message passing over durations, but they aren't documented.)

If we had multiple observation sequences to learn from, we could add them to the model just by calling add_data() for each observation sequence.

Now we run a resampling loop. For each iteration of the loop, all the latent variables of the model will be resampled by Gibbs sampling steps, including the transition matrix, the observation means and covariances, the duration parameters, and the hidden state sequence. We'll also copy some samples so that we can plot them.

models = []
for idx in progprint_xrange(150):
    posteriormodel.resample_model()
    if (idx+1) % 10 == 0:
        models.append(copy.deepcopy(posteriormodel))

Now we can plot our saved samples:

fig = plt.figure()
for idx, model in enumerate(models):
    plt.clf()
    model.plot()
    plt.gcf().suptitle('HDP-HSMM sampled after %d iterations' % (10*(idx+1)))
    plt.savefig('iter_%.3d.png' % (10*(idx+1)))

Sampled models

I generated these data from an HSMM that looked like this:

Randomly-generated model and data

So the posterior samples look pretty good!

A convenient shortcut to build a list of sampled models is to write

model_samples = [model.resample_and_copy() for itr in progprint_xrange(150)]

That will build a list of model objects (each of which can be inspected, plotted, pickled, etc, independently) in a way that won't duplicate data that isn't changed (like the observations or hyperparameter arrays) so that memory usage is minimized. It also minimizes file size if you save samples like

import cPickle
with open('sampled_models.pickle','w') as outfile:
    cPickle.dump(model_samples,outfile,protocol=-1)

Extending the Code

To add your own observation or duration distributions, implement the interfaces defined in basic/abstractions.py. To get a flavor of the style, see pybasicbayes.

References

@article{johnson2013hdphsmm,
    title={Bayesian Nonparametric Hidden Semi-Markov Models},
    author={Johnson, Matthew J. and Willsky, Alan S.},
    journal={Journal of Machine Learning Research},
    pages={673--701},
    volume={14},
    month={February},
    year={2013},
}

Authors

Matt Johnson, Alex Wiltschko, Yarden Katz, Chia-ying (Jackie) Lee, Scott Linderman, Kevin Squire, Nick Foti.

Owner
Matthew Johnson
research scientist @ Google Brain
Matthew Johnson
Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

BERT-ATTACK Code for our EMNLP2020 long paper: BERT-ATTACK: Adversarial Attack Against BERT Using BERT Dependencies Python 3.7 PyTorch 1.4.0 transform

Linyang Li 142 Jan 04, 2023
A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

120 Dec 12, 2022
Jittor 64*64 implementation of StyleGAN

StyleGanJittor (Tsinghua university computer graphics course) Overview Jittor 64

Song Shengyu 3 Jan 20, 2022
(CVPR2021) DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation

DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation CVPR2021(oral) [arxiv] Requirements python3.7 pytorch==

W-zx-Y 85 Dec 07, 2022
Tutorial: Introduction to Graph Machine Learning, with Jupyter notebooks

GraphMLTutorialNLDL22 Tutorial NLDL22: Introduction to Graph Machine Learning, with Jupyter notebooks This tutorial takes place during the conference

UiT Machine Learning Group 3 Jan 10, 2022
Si Adek Keras is software VR dangerous object detection.

Si Adek Python Keras Sistem Informasi Deteksi Benda Berbahaya Keras Python. Version 1.0 Developed by Ananda Rauf Maududi. Developed date: 24 November

Ananda Rauf 1 Dec 21, 2021
code from "Tensor decomposition of higher-order correlations by nonlinear Hebbian plasticity"

Code associated with the paper "Tensor decomposition of higher-order correlations by nonlinear Hebbian learning," Ocker & Buice, Neurips 2021. "plot_f

Gabriel Koch Ocker 4 Oct 16, 2022
Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Official PyTorch code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction. Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe,

stanley 152 Dec 16, 2022
中文语音识别系列,读者可以借助它快速训练属于自己的中文语音识别模型,或直接使用预训练模型测试效果。

MASR中文语音识别(pytorch版) 开箱即用 自行训练 使用与训练分离(增量训练) 识别率高 说明:因为每个人电脑机器不同,而且有些安装包安装起来比较麻烦,强烈建议直接用我编译好的docker环境跑 目前docker基础环境为ubuntu-cuda10.1-cudnn7-pytorch1.6.

发送小信号 180 Dec 17, 2022
Codebase for "Revisiting spatio-temporal layouts for compositional action recognition" (Oral at BMVC 2021).

Revisiting spatio-temporal layouts for compositional action recognition Codebase for "Revisiting spatio-temporal layouts for compositional action reco

Gorjan 20 Dec 15, 2022
An image classification app boilerplate to serve your deep learning models asap!

Image 🖼 Classification App Boilerplate Have you been puzzled by tons of videos, blogs and other resources on the internet and don't know where and ho

Smaranjit Ghose 27 Oct 06, 2022
Educational 2D SLAM implementation based on ICP and Pose Graph

slam-playground Educational 2D SLAM implementation based on ICP and Pose Graph How to use: Use keyboard arrow keys to navigate robot. Press 'r' to vie

Kirill 19 Dec 17, 2022
Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021) Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma. We address the pr

Kranti Kumar Parida 33 Jun 27, 2022
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model About This repository contains the code to replicate the syn

Haruka Kiyohara 12 Dec 07, 2022
A lightweight face-recognition toolbox and pipeline based on tensorflow-lite

FaceIDLight 📘 Description A lightweight face-recognition toolbox and pipeline based on tensorflow-lite with MTCNN-Face-Detection and ArcFace-Face-Rec

Martin Knoche 16 Dec 07, 2022
Top #1 Submission code for the first https://alphamev.ai MEV competition with best AUC (0.9893) and MSE (0.0982).

alphamev-winning-submission Top #1 Submission code for the first alphamev MEV competition with best AUC (0.9893) and MSE (0.0982). The code won't run

70 Oct 29, 2022
Multi Camera Calibration

Multi Camera Calibration 'modules/camera_calibration/app/camera_calibration.cpp' is for calculating extrinsic parameter of each individual cameras. 'm

7 Dec 01, 2022
Learning Open-World Object Proposals without Learning to Classify

Learning Open-World Object Proposals without Learning to Classify Pytorch implementation for "Learning Open-World Object Proposals without Learning to

Dahun Kim 149 Dec 22, 2022
Fuzzing JavaScript Engines with Aspect-preserving Mutation

DIE Repository for "Fuzzing JavaScript Engines with Aspect-preserving Mutation" (in S&P'20). You can check the paper for technical details. Environmen

gts3.org (<a href=[email protected])"> 190 Dec 11, 2022
🤗 Push your spaCy pipelines to the Hugging Face Hub

spacy-huggingface-hub: Push your spaCy pipelines to the Hugging Face Hub This package provides a CLI command for uploading any trained spaCy pipeline

Explosion 30 Oct 09, 2022