Tuple-sum-filter - Library to play with filtering numeric sequences by sums of their pairs, triplets, etc. With a bonus CLI demo

Last update: Feb 16, 2022

Related tags

Overview

Tuple Sum Filter

A library to play with filtering numeric sequences by sums of their pairs, triplets, etc.

Comes with a bonus CLI to demo the functionality.

Requires (and CI tests on) python 3.8 to 3.10. If you need to use python 3.7 then try replacing math.prod(some_iterable) with functools.reduce(lambda x, y: x * y, some_iterable)

Approach

We're thinking of this mostly as a library with the CLI as only for demo purposes. Ways you can see this in the code:

logging should really handled by the consumer,
- our get_logger should be something that is passed into the lib
the CLI is pretty light on automated tests
we use pretty loose production dependency pinning
- rather than pip freeze > requirements.txt of a deployed app
- we want to keep things loose so that consumers can keep installing us alongside other things
- we should probably set up tox/nox test runs against v.latest of our dependencies

Running the demo

in a fresh virtualenv (python>=3.8)

# install project and deps
pip install git+https://github.com/lbillingham/tuple-sum-filter.git

# create a suitable input file
echo "1721\n979\n366\n299\n675\n1456\n" > example.txt

# run the demo
filter_demo --input_file=example.txt --sum_target=2020 --dimension=2

you should see output like

checking for pairs of numbers that sum to 2020 in example.txt
Results pair (1721, 299) match: sum to 2020 and multiply to 514579

Consuming the library

The main filtering functions are pairs_that_sum_to and triplets_that_sum_to. They both have signatures (numbers: Sequence[int|float], sum_target: int|float) -> things_that_passed_the_filter list[tuple].

There is also a file-reading helper numbers_in_file exported at the top level.

Developing

Run the following to install the project (and dev dependencies) into your active virtualenv:

make dev_install

day-to-day development tasks can be orchestrated via make

dependency management
test/lint/typecheck running
coverage reporting
run make without any arguments to see a list

There is a CI suite which runs lint and test on several python versions. We don't run typechecking as a gate in CI because we think that turns a sometimes-useful tool into a Goodhart target.

Performance

We have not been optimizing for performance and it kind of shows.

When we run the benchmarking suite we see ~0.4 seconds fairly consistently for the triplet/3D problem.

We have at least 3 ideas of how to speed things up: several of them include dropping floating-point support.

$ make benchmark

tests/performance_check.py ..                                                                                                                                [100%]


------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------
Name (time in ms)             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_input1_pairs          5.4665 (1.0)        6.2297 (1.0)        5.6687 (1.0)      0.1018 (1.0)        5.6575 (1.0)      0.1289 (1.0)          47;3  176.4077 (1.0)         172           1
test_input1_triplets     384.6154 (70.36)    386.5000 (62.04)    385.4776 (68.00)    0.8287 (8.14)     385.4333 (68.13)    1.5047 (11.67)         2;0    2.5942 (0.01)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

🍪 ✂️ cookiecut from lbillingham's python-cli-template

You might also like...

Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

codon_optimize_cds_with_many_taxids_singlefasta Linux GUI app to codon optimize many single-fasta files with coding sequences, using many taxonomy ids

1 Jan 23, 2022

Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a local folder

Ingestinator Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a

2 Nov 18, 2022

Python Common things by Problem Fighter Library, (Exception, Debug Log, etc.)

In the name of God, the Most Gracious, the Most Merciful. PF-PY-Common Documentation Install and update using pip: pip install -U xxxx Please find the

3 Jan 15, 2022

Devil - Very Semple Auto Filter V1 Bot

Devil Very Semple Auto Filter V1 Bot

2 Jun 27, 2022

Cairo-bloom - A naive bloom filter implementation in Cairo

🥀 cairo-bloom A naive bloom filter implementation in Cairo. A Bloom filter is a

37 Oct 1, 2022

Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.

Nanopore-Workflow Snakemake workflow to process and filter long read data from Oxford Nanopore Technologies. It is designed to compare whole human gen

5 May 13, 2022

Runnable Python demo of ArtLine

artline-demo How to run? pip3 install -r requirements.txt python3 app.py How to use? Run the Flask app Open localhost:5000 in browser Select an image(

134 Jul 29, 2022

Tiny demo site for exploring SameSite=Lax

samesite-lax-demo Background on my blog: Exploring the SameSite cookie attribute for preventing CSRF This repo holds some tools for exploring the impl

6 Nov 10, 2021

An extended version of the hotkeys demo code using action classes

An extended version of the hotkeys application using action classes. In adafruit's Hotkeys code, a macro is using a series of integers, assumed to be

5 May 1, 2022

Comments

Perf: :zap: merge if you want to go faster but don't need float support

This moves away from the shared itertools implimentations for finding pairs, triplets of the input numbers that sum to a given target.

Instead, we

1st expose the underlying $~O^{dimensions}$ nested loops
trade some extra memory and some $O^{1}$ lookups to give us $~O^{dimensions-1}$
get a >= 170x speedup in our benchmarks

However, we:

loose the ability to properly work with floating point input
- the fast lookup uses hasing and hashing floats gets weird due to floating point equality
can't trivially extend to higher-dimension problems: 4-element-tuples etc.

I've moved the float input tests out the their own file and away from the CI test path

Performance benchmarks

with these changes:

$ make benchmark
tests/performance_check.py ..                                                                                                                             [100%]

-------------------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------------------
Name (time in us)               Min                   Max                  Mean              StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_input1_pairs           22.1660 (1.0)        166.3790 (1.0)         23.6089 (1.0)        5.0170 (1.0)         23.0000 (1.0)        0.5123 (1.0)        87;521  42,356.8183 (1.0)        7677           1
test_input1_triplets     1,994.8000 (89.99)    3,561.1120 (21.40)    2,152.1272 (91.16)    204.1428 (40.69)    2,033.1040 (88.40)    299.1878 (584.07)       41;4     464.6565 (0.01)        341           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

itertools, but non-float supporting version

$ make benchmark
tests/performance_check.py ..                                                                                                      [100%]

------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------
Name (time in ms)             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_input1_pairs          2.8727 (1.0)        4.2386 (1.0)        3.1265 (1.0)      0.1638 (1.0)        3.1067 (1.0)      0.1888 (1.0)          78;9  319.8414 (1.0)         326           1
test_input1_triplets     211.6325 (73.67)    213.3950 (50.35)    212.4042 (67.94)    0.6555 (4.00)     212.2717 (68.33)    0.8081 (4.28)          2;0    4.7080 (0.01)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
=========================================================== 2 passed in 3.59s ============================================================

Note the change in units this branch is in microseconds, the itertools version is in milliseconds

opened by lbillingham 0

Releases(v0.0.1)

v0.0.1(Feb 17, 2022)

Initial release. Lib allows filtering by sum over pairs and triplets of numbers loaded from a local file. Plus a bonus CLI app that can be used for demoing the lib.

Solution is itertools-y and rather slow (probably $O^{n}$ where pairs->n=2 and triplets->n=3).

This is the version shown to MM
Source code(tar.gz)
Source code(zip)

Owner

Laurence Billingham

+ sustainable software + data science

GitHub Repository

A water drinking notification every hour to keep you healthy while coding :)

Water_Notification A water drinking notification every hour to keep you healthy while coding. 💧 💧 Stay Hydrated Stay Healthy 💧 💧 Authors @CrazyCat

1 Dec 22, 2021

A pomodoro app written in Python

Pomodoro It's a pomodoro app written in Python. You can minimize it while you're working if you want to, it'll pop up on your screen when the timer is

1 Dec 20, 2021

Parser for the GeoSuite[tm] PRV export format

Parser for the GeoSuite[tm] PRV export format This library provides functionality to parse geotechnical investigation data in .prv files generated by

1 Dec 17, 2021

A Blender addon to enable reloading linked libraries from UI.

library_reload_linked_libraries A Blender addon to enable reloading linked libraries from UI.

3 Nov 27, 2022

A python package template that can be adapted for RAP projects

Warning - this repository is a snapshot of a repository internal to NHS Digital. This means that links to videos and some URLs may not work. Repositor

3 Nov 08, 2022

Run-Your-Own Firefox Sync Server

Run-Your-Own Firefox Sync Server This is an all-in-one package for running a self-hosted Firefox Sync server. It bundles the "tokenserver" project for

1.7k Dec 30, 2022

Make discord server By Coding!

Discord Server Maker Make discord server by Coding! FAQ How can i get role permissons? Open discord with chrome developer tool, go to network and clic

1 Jul 17, 2022

Context-free grammar to Sublime-syntax file

Generate a sublime-syntax file from a non-left-recursive, follow-determined, context-free grammar

8 Nov 17, 2022

Whole-day timezone comparison

Timezone Converter Compare a full day of your local timezone with foreign ones $ timezone-converter tijuana --zone $ timezone-converter tijuana new_yo

12 Nov 24, 2022

Generate Azure Blob Storage account authentication headers for Munki

Azure Blob Storage Authentication for Munki The Azure Blob Storage Middleware allows munki clients to connect securely, and directly to a munki repo h

10 Apr 12, 2022

Covid-ChatBot - A Rapid Response Virtual Agent for Covid-19 Queries

COVID-19 CHatBot A Rapid Response Virtual Agent for Covid-19 Queries Contents What is ChatBot Types of ChatBots About the Project Dataset Prerequisite

2 Jan 04, 2022

UdemyPy is a bot that hourly looks for Udemy free courses and post them in my Telegram Channel: Free Courses.

UdemyPy UdemyPy is a bot that hourly looks for Udemy free courses and post them in my Telegram Channel: Free Courses. How does it work? For publishing

88 Dec 25, 2022

Contains the code of my learning of Python OOP.

OOP Python This repository contains the code of my learning of Python OOP. All the code: is following PEP 8 ✅ has proper concept illustrations and com

2 Jan 15, 2022

GMHI: Gut Microbiome Health Index

GMHI: Gut Microbiome Health Index Description Gut Microbiome Health Index (GMHI)

2 Jun 30, 2022

Get a list of content on your Netflix My List that is expiring in the next month or two.

Netflix My List Expiring Movies Annoyed at Netflix for taking away your movies? Now you don't have to be! Installation instructions Install selenium C

24 Aug 06, 2022

This is a simple python script for checking A/L Examination results of srilankan students

AL-Result-Checker This is a simple python script for checking A/L Examination results of srilankan students INSTALLATION [Termux] [Linux] : apt-get up

8 Oct 24, 2022

In this project, we'll be creating a virtual personal assistant for ourselves using our favorite programming language

In this project, we'll be creating a virtual personal assistant for ourselves using our favorite programming language, Python. We can perform several offline as well as online operations using the bo

188 Jan 03, 2023

Reproduce digital electronics in Python

Pylectronics Reproduce digital electronics in Python Report Bug · Request Feature Table of Contents About The Project Getting Started Prerequisites In

45 Dec 20, 2021

Python template for Advent of Code event

Advent of Code Python Starter A tamplate for Advent of Code write in Python. Usage The project use poetry for project manager. Clone this repository a

6 Dec 31, 2022

A TODO-list tool written in Python

PyTD A TODO-list tool written in Python. Its goal is to provide a stable posibility to get a good view over all your TODOs motivate you to actually fi

1 Feb 12, 2022