TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

Last update: Oct 31, 2021

Overview

TCube: Domain-Agnostic Neural Time series Narration

This repository contains the code for the paper: "TCube: Domain-Agnostic Neural Time series Narration" (to appear in IEEE ICDM 2021).

The PLMs used in this effort (T5, BART, and GPT-2) are implemented using the HuggingFace library (https://huggingface.co/) and finetuned to the WebNLG v3 (https://gitlab.com/shimorina/webnlg-dataset/-/tree/master/release_v3.0) and DART (https://arxiv.org/abs/2007.02871) datasets.

Clones of both datasets are available under /Finetune PLMs/Datasets in this repository.

The PLMs fine-tuned to WebNLG/DART could not be uploaded due to the 1GB limitations of GitLFS. However, pre-made scripts in this repository (detailed below) are present for convientiently fine-tuning these models.

The entire repository is based on Python 3.6 and the results are visaulized through the iPython Notebooks.

Dependencies

Interactive Environments

notebook
ipywidgets==7.5.1

Deep Learning Frameworks

torch 1.7.1 (suited to your CUDA version)
pytorch-lightning 0.9.0
transformers==3.1.0

NLP Toolkits

sentencepiece==0.1.91
nltk

Scientific Computing, Data Manipulation, and Visualizations

numpy
scipy
sklearn
matplotib
pandas
pwlf

Evaluation

rouge-score
textstat
lexical_diversity
language-tool-python

Misc

xlrd
tqdm
cython

Please make sure that the aforementioned Python packages with their specified versions are installed in your system in a separate virtual environment.

Data-Preprocessing Scripts

Under /Finetune PLMs in this repository there are two scripts for pre-processing the WebNLG and DART datasets:

preprocess_webnlg.py
preprocess_dart.py

These scripts draw from the original datasets in /Finetune PLMs/Datasets/WebNLGv3 and /Finetune PLMs/Datasets/DART and prepare CSV files in /Finetune PLMs/Datasets breaking the original datasets into train, dev, and test sets in the format required by our PLMs.

Fine-tuning Scripts

Under /Finetune PLMs in this repository there are three scripts for fine-tuning T5, BART, and GPT-2:

finetuneT5.py
finetuneBART.py
finetuneGPT2.py

Visualization and Evaluation Notebooks

In the root directory are 10 notebooks. For the descriptions of the time-series datasets used:

Datatsets.ipynb

For comparisons of segmentation and regime-change detection algorithms:

Error Determination.ipynb
Regime Detection.ipynb
Segmentation.ipynb
Trend Detection Plot.ipynb

For the evaluation of the TCube framework on respective time-series datasets:

T3-COVID.ipnyb
T3-DOTS.ipnyb
T3-Pollution.ipnyb
T3-Population.ipnyb
T3-Temperature.ipnyb

Citation and Contact

If any part of this code repository or the TCube framework is used in your work, please cite our paper. Thanks!

Contact: Mandar Sharma ([email protected]), First Author.

TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

Related tags

Overview

TCube: Domain-Agnostic Neural Time series Narration

Dependencies

Interactive Environments

Deep Learning Frameworks

NLP Toolkits

Scientific Computing, Data Manipulation, and Visualizations

Evaluation

Misc

Data-Preprocessing Scripts

Fine-tuning Scripts

Visualization and Evaluation Notebooks

Citation and Contact

Owner

Mandar Sharma

Auto-updating data to assist in investment to NEPSE

Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

Multiband spectro-radiometric satellite image analysis with K-means cluster algorithm

Galaxy images labelled by morphology (shape). Aimed at ML development and teaching

FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows

Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

Least Square Calibration for Peer Reviews

A Quick and Dirty Progressive Neural Network written in TensorFlow.

Multi-Horizon-Forecasting-for-Limit-Order-Books

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

HuSpaCy: industrial-strength Hungarian natural language processing

Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Keras Image Embeddings using Contrastive Loss

[ICCV 2021 Oral] Deep Evidential Action Recognition

A flag generation AI created using DeepAIs API

code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology"

Simple reimplemetation experiments about FcaNet

RL and distillation in CARLA using a factorized world model

StarGAN - Official PyTorch Implementation (CVPR 2018)