A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Last update: Dec 17, 2022

Related tags

Overview

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.

It supports various methods of self-supervised training, adapted for event sequences:

Contrastive Learning for Event Sequences (CoLES)
Contrastive Predictive Coding (CPC)
Replaced Token Detection (RTD) from ELECTRA
Next Sequence Prediction (NSP) from BERT
Sequences Order Prediction (SOP) from ALBERT

It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.

The following variants of the contrastive losses are supported:

Contrastive loss (paper)
Triplet loss (paper)
Binomial deviance loss (paper)
Histogramm loss (paper)
Margin loss (paper)
VICReg loss (paper)

Install from PyPi

pip install pytorch-lifestream

Install from source

# Ubuntu 20.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync  --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest

Demo notebooks

Self-supervided training and embeddings for downstream task notebook
Self-supervided embeddings in CatBoost notebook
Self-supervided training and fine-tuning notebook
PySpark and Parquet for data preprocessing notebook

Experiments on public datasets

pytorch-lifestream usage experiments on several public event datasets are available in the separate repo

Comments

torch.stack in def collate_feature_dict

ptls/data_load/utils.py

Hello!

If the dataloader has a feature called target. And the batchsize is not a multiple of the length of the dataset, then an error pops up on the last batch: "Sizes of tensors must match except in dimension 0". Due to the use of torch.staсk when processing a feature startwith 'target'.

opened by Ivanich-spb 11
Not supported multiGPU option from pytorchlightning.Trainer

Try to set Trainer(gpus=[0,1]), while using PtlsDataModule as data module, get such error:

AttributeError: Can't pickle local object 'PtlsDataModule.__init__.<locals>.train_dataloader'

opened by mazitovs 1
Correct seq_len for feature dict
rec = { 'mcc': [0, 1, 2, 3], 'target_distribution': [0.1, 0.2, 0.4, 0.1, 0.1, 0.0], }

How to get correct seq_len. true len: 4 possible length: 4, 6 'target_distribution' is incorrect field to get length, this is not a sequence, this is an array
opened by ivkireev86 1
Save categories encodings along with model weights in demos

Вместе с обученной моделью необходимо сохранять обученный препроцессор и разбивку на трейн-тест. Иначе категории могут поехать и сохраненная предобученная модель станет бесполезной.

opened by ivkireev86 1
Documentation index
Прототип главной страницы документации. Три секции:

описание моделей библиотеки

гайд как использовать библиотеку

как писать свои компоненты

Есть краткое описание и ссылки на подробные (которые напишем потом).

В описании модулей предложена структура библиотеки. Предполагается, что мы эти модули в ближайшее создадим и перетащим туда соответсвующие классы из библиотеки. Старые, модули, которые станут пустыми, удалим. Далее будем придерживаться схемы, описанной в этом документе.

На ревью предлагается чекнуть предлагаемую структуру библиотеки, названия модулей ну и сам описательный текст документа.
opened by ivkireev86 1
KL cyclostationarity test tools

Test provides a hystogram with self-samples similarity vs. random sample similarity. Shows compatibility with CoLES.

Think about tests for other frameworks.

opened by ivkireev86 0
Repair pyspark tests
def test_dt_to_timestamp(): spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00'}, {'dt': '2012-01-01 12:01:16'}, {'dt': '2021-12-30 00:00:00'} ])

df = df.withColumn('ts', dt_to_timestamp('dt')) ts = [rec.ts for rec in df.select('ts').collect()]

assert ts == [0, 1325419276, 1640822400]

E assert [-10800, 1325...6, 1640811600] == [0, 1325419276, 1640822400] E At index 0 diff: -10800 != 0 E Use -v to get more diff

ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:16: AssertionError

def test_datetime_to_timestamp(): t = DatetimeToTimestamp(col_name_original='dt') spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00', 'rn': 1}, {'dt': '2012-01-01 12:01:16', 'rn': 2}, {'dt': '2021-12-30 00:00:00', 'rn': 3} ]) df = t.fit_transform(df) et = [rec.event_time for rec in df.select('event_time').collect()]

assert et[0] == 0

E assert -10800 == 0

ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:48: AssertionError
opened by ikretus 0
docs. Development guide (for demo notebooks)
add current patterns

when model training start print message "model training stats, please wait. See tensorboard to track progress", use it with enable_progress=False

documentation user feedback
opened by ivkireev86 0

Releases(v0.5.1)

v0.5.1(Dec 28, 2022)
What's Changed

fixed cpc import by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/90

add softmaxloss and tests by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/87

MLM NSP Module by @mazitovs in https://github.com/dllllb/pytorch-lifestream/pull/88

fix test dropout error by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/91

New Contributors

@ArtyomVorobev made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/90

@mazitovs made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/88

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.5.0...v0.5.1
Source code(tar.gz)
Source code(zip)
v0.5.0(Nov 9, 2022)
What's Changed

Fix metrics reset by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/72

Pandas preprocessing without df copy, faster preprocessing for large datasets by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/73

fix in supervised-sequence-to-target.ipynb by @blinovpd in https://github.com/dllllb/pytorch-lifestream/pull/74

ptls.nn.PBDropout by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/75

tanh for rnn starter by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/76

Auc regr metric by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/78

spatial dropout for NoisyEmbedding, LastMaxAvgEncoder, warning for bidir RnnEncoder by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/80

Hparam tuning demo. hydra, optuna, tensorboard by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/81

tabformer by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/83

Supervised Coles Module, trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/84

New Contributors

@blinovpd made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/74

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.4.0...v0.5.0
Source code(tar.gz)
Source code(zip)
v0.4.0(Jul 27, 2022)
What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0
Source code(tar.gz)
Source code(zip)
v0.3.0(Jun 12, 2022)
More Pythonic Core API: constructor arguments instead of config objects

What's Changed

cpc params by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/9

All modules by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/15

Mlm pretrain by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/13

all encoders and get rid of get_loss by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/19

init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/20

Documentation index by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/8

Demos api update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/18

loss output correction by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/22

Test fixes by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/23

readme_demo_link by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/25

init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/26

work without logger by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/7

trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/28

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.1.2...v0.3.0
Source code(tar.gz)
Source code(zip)

Owner

Dmitri Babaev

GitHub Repository

Tensorflow Repo for "DeepGCNs: Can GCNs Go as Deep as CNNs?"

DeepGCNs: Can GCNs Go as Deep as CNNs? In this work, we present new ways to successfully train very deep GCNs. We borrow concepts from CNNs, mainly re

612 Nov 15, 2022

The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network.

UNet-SIDE The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network. For Super Reso

1 Jan 13, 2022

Camera ready code repo for the NeuRIPS 2021 paper: "Impression learning: Online representation learning with synaptic plasticity".

Impression-Learning-Camera-Ready Camera ready code repo for the NeuRIPS 2021 paper: "Impression learning: Online representation learning with synaptic

2 Feb 09, 2022

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Related tags

Overview

Install from PyPi

Install from source

Demo notebooks

Experiments on public datasets

Comments

Releases(v0.5.1)

v0.5.1(Dec 28, 2022)

What's Changed

New Contributors

v0.5.0(Nov 9, 2022)

What's Changed

New Contributors

v0.4.0(Jul 27, 2022)

What's Changed

New Contributors

What's Changed

New Contributors

What's Changed

New Contributors

v0.3.0(Jun 12, 2022)

What's Changed

Owner

Dmitri Babaev

Tensorflow Repo for "DeepGCNs: Can GCNs Go as Deep as CNNs?"

The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network.

A flag generation AI created using DeepAIs API

nnFormer: Interleaved Transformer for Volumetric Segmentation

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

Model Agnostic Interpretability for Multiple Instance Learning

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

This repo contains research materials released by members of the Google Brain team in Tokyo.

Code for "Unsupervised State Representation Learning in Atari"

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Semi-supervised Implicit Scene Completion from Sparse LiDAR

ML course - EPFL Machine Learning Course, Fall 2021

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Deep Learning for Time Series Forecasting.

A flexible ML framework built to simplify medical image reconstruction and analysis experimentation.

Identify the emotion of multiple speakers in an Audio Segment

[3DV 2020] PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction

Type4Py: Deep Similarity Learning-Based Type Inference for Python

Camera ready code repo for the NeuRIPS 2021 paper: "Impression learning: Online representation learning with synaptic plasticity".