Simple integer-valued time series bit packing

Overview

Smahat Time Series Encoding

Smahat allows to encode a sequence of integer values using a fixed (for all values) number of bits but minimal with regards to the data range. For example: for a series of boolean values only one bit is needed, for a series of integer percentages 7 bits are needed, etc.

Smahat is useful when:

  • Time series is integer-valued. (It doesn't work with floats :))
  • The range of the data is known in advance (if not streaming, this is not necessary).
  • The data range is relatively small.
  • The data does not have properties that would make other compression algorithms useful, or these other algorithms have an unacceptable cost for the use case.

Smahat can also be used as a baseline to calculate the true compression ratio of a compression algorithm on data of a certain nature.

Installation

To install the latest release:

$ pip install smahat

You can also build a local package and install it:

$ make build
$ pip install dist/*.whl

Usage

Import smahat module.

>>> import smahat

Data to encode.

>>> values = [12, 0, 17, 15, 78, 10]

Encoding

You can use encode_next to encode one value by one:

>>> encoder = smahat.Encoder(min_value=0, max_value=100, strategy='saturate')
>>> for v in values:
...     encoder.encode_next(v)
>>> content = encoder.get_encoded()
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Or you can use Encoder.encode_all to encode all values (range min and max will be inferred from values if not provided):

>>> content = smahat.Encoder.encode_all(values, min_value=0, max_value=100, strategy='saturate')
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Decoding

To decode use Decoder.decode_all.

>>> smahat.Decoder.decode_all(content)
[12, 0, 17, 15, 78, 10]

Encoding result

The result of the encoding of a sequence of values using Smahat is a SmahatContent dictionary containing the encoded data, plus three fields : shift is used to bring the data range to start from zero (values are shifted and encoded in pure binary), n_bits_per_value indicates the number of bits used to encode each value, n_padding_bits (between 0 and 7) indicates the number of unused padding bits within the last byte.

class SmahatContent(TypedDict):
    encoded: bytes
    shift: int
    n_bits_per_value: int
    n_padding_bits: int

If you want to use this library for message exchanges, you can serialize the result of the encoding as you like (JSON, protobuf, etc.)

Contribute

$ git clone https://github.com/ghilesmeddour/smahat-time-series-encoding.git
$ cd smahat-time-series-compression
make format
make dead-code-check
make test
make type-check
make coverage
make build

TODOs

  • Add unit tests.
  • Improve doc.
You might also like...
Simple yet flexible natural sorting in Python.

natsort Simple yet flexible natural sorting in Python. Source Code: https://github.com/SethMMorton/natsort Downloads: https://pypi.org/project/natsort

A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!
A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!

Ubelt is a small library of robust, tested, documented, and simple functions that extend the Python standard library. It has a flat API that all behav

Simple python module to get the information regarding battery in python.
Simple python module to get the information regarding battery in python.

Battery Stats A python3 module created for easily reading the current parameters of Battery in realtime. It reads battery stats from /sys/class/power_

A simple Python app that generates semi-random chord progressions.

chords-generator A simple Python app that generates semi-random chord progressions.

A simple and easy to use Spam Bot made in Python!

This is a simple spam bot made in python. You can use to to spam anyone with anything on any platform.

Runes - Simple Cookies You Can Extend (similar to Macaroons)

Runes - Simple Cookies You Can Extend (similar to Macaroons) is a paper called "Macaroons: Cookies with Context

Simple collection of GTPS Flood in Python.

GTPS Flood Simple collection of GTPS Flood in Python. NOTE Give me credit if you use this source, don't trade/sell this tool, And USE AT YOUR OWN RISK

Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method.

Astvuln Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method. Some search methods ar

A set of Python scripts to surpass human limits in accomplishing simple tasks.

Human benchmark fooler Summary A set of Python scripts with Selenium designed to surpass human limits in accomplishing simple tasks available on https

Releases(v0.0.1)
Owner
Ghiles Meddour
Data Analyst at Munic
Ghiles Meddour
Color getter (including method to get random color or complementary color) made out of Python

python-color-getter Color getter (including method to get random color or complementary color) made out of Python Setup pip3 install git+https://githu

Jung Gyu Yoon 2 Sep 17, 2022
produces PCA on genotypes from fasta files (popPhyl's ID format)

popPhyl_PCA Performs PCA of genotypes. Works in two steps. 1. Input file A single fasta file containing different loci, in different populations/speci

camille roux 2 Oct 08, 2021
This script allows you to retrieve all functions / variables names of a Python code, and the variables values.

Memory Extractor This script allows you to retrieve all functions / variables names of a Python code, and the variables values. How to use it ? The si

Venax 2 Dec 26, 2021
Know your customer pipeline in apache air flow

KYC_pipline Know your customer pipeline in apache air flow For a successful pipeline run take these steps: Run you Airflow server Admin - connection

saeed 4 Aug 01, 2022
NFT-Generator is the best way to generate thousands of NFTs quick and easily with Python.

NFT-Generator is the best way to generate thousands of NFTs quick and easily with Python. Just add your files, set your configuration and run the scri

78 Dec 27, 2022
一款不需要买代理来减少扫网站目录被封概率的扫描器,适用于中小规格字典。

PoorScanner使用说明书 -工具在不同环境下可能不怎么稳定,如果有什么问题恳请大家反馈。说明书有什么错误的地方也大家欢迎指正。 更新记录 2021.8.23 修复了云函数主程序 gitee上传文件接口写错了的BUG(之前把自己的上传地址写死进去了,没从配置文件里读) 更新了说明书 PoorS

14 Aug 02, 2022
a demo show how to dump lldb info to ida.

用一个demo来聊聊动态trace 这个仓库能做什么? 帮助理解动态trace的思想。仓库内的demo,可操作,可实践。 动态trace核心思想: 动态记录一个函数内每一条指令的执行中产生的信息,并导入IDA,用来弥补IDA等静态分析工具的不足。 反编译看一下 先clone仓库,把hellolldb

25 Nov 28, 2022
A Random Password Generator made from Python

Things you need Python Step 1 Download the python file from Releases Step 2 Go to the directory where the python file is and run it Step 3 Type the le

Kavindu Nimsara 3 May 30, 2022
Python library to decorate and beautify strings

outputformat Python library to decorate and beautify your standard output 💖 Ins

Felipe Delestro Matos 259 Dec 13, 2022
Python based utilities for interacting with digital multimeters that are built on the FS9721-LP3 chipset.

Python based utilities for interacting with digital multimeters that are built on the FS9721-LP3 chipset.

Fergus 1 Feb 02, 2022
Utility to add/remove licenses to/from source files

Utility to add/remove licenses to/from source files. Supports processing any combination of globs, files, and directories (recurse). Pruning options allow skipping non-licensing files.

Eduardo Ponce Mojica 2 Jan 29, 2022
Find version automatically based on git tags and commit messages.

GIT-CONVENTIONAL-VERSION Find version automatically based on git tags and commit messages. The tool is very specific in its function, so it is very fl

0 Nov 07, 2021
Python tool to check a web applications compliance with OWASP HTTP response headers best practices

Check Your Head A quick and easy way to check a web applications response headers!

Zak 6 Nov 09, 2021
Finger is a function symbol recognition engine for binary programs

Finger is a function symbol recognition engine for binary programs

332 Jan 01, 2023
Various importers for cointracker

cointracker_importers Various importers for cointracker To convert nexo .csv format to cointracker .csv format: Download nexo csv file. run python Nex

Stefanos Anastasiou 9 Oct 24, 2022
Rabbito is a mini tool to find serialized objects in input values

Rabbito-ObjectFinder Rabbito is a mini tool to find serialized objects in input values What does Rabbito do Rabbito has the main object finding Serial

7 Dec 13, 2021
Conveniently measures the time of your loops, contexts and functions.

Conveniently measures the time of your loops, contexts and functions.

Maciej J Mikulski 79 Nov 15, 2022
Greenery - tools for parsing and manipulating regular expressions

Greenery - tools for parsing and manipulating regular expressions

qntm 242 Dec 15, 2022
Convert any-bit number to decimal number and vise versa.

2deci Convert any-bit number to decimal number and vise versa. --bit n to set bit to n --exp xxx to set expression to xxx --r to run reversely (from d

3 Sep 15, 2021
Fcpy: A Python package for high performance, fast convergence and high precision numerical fractional calculus computing.

Fcpy: A Python package for high performance, fast convergence and high precision numerical fractional calculus computing.

SciFracX 1 Mar 23, 2022