Tuple-sum-filter - Library to play with filtering numeric sequences by sums of their pairs, triplets, etc. With a bonus CLI demo

Overview

Tuple Sum Filter

A library to play with filtering numeric sequences by sums of their pairs, triplets, etc.

Comes with a bonus CLI to demo the functionality.

Requires (and CI tests on) python 3.8 to 3.10. If you need to use python 3.7 then try replacing math.prod(some_iterable) with functools.reduce(lambda x, y: x * y, some_iterable)

Approach

We're thinking of this mostly as a library with the CLI as only for demo purposes. Ways you can see this in the code:

  • logging should really handled by the consumer,
    • our get_logger should be something that is passed into the lib
  • the CLI is pretty light on automated tests
  • we use pretty loose production dependency pinning
    • rather than pip freeze > requirements.txt of a deployed app
    • we want to keep things loose so that consumers can keep installing us alongside other things
    • we should probably set up tox/nox test runs against v.latest of our dependencies

Running the demo

in a fresh virtualenv (python>=3.8)

# install project and deps
pip install git+https://github.com/lbillingham/tuple-sum-filter.git

# create a suitable input file
echo "1721\n979\n366\n299\n675\n1456\n" > example.txt

# run the demo
filter_demo --input_file=example.txt --sum_target=2020 --dimension=2

you should see output like

checking for pairs of numbers that sum to 2020 in example.txt
Results pair (1721, 299) match: sum to 2020 and multiply to 514579

Consuming the library

The main filtering functions are pairs_that_sum_to and triplets_that_sum_to. They both have signatures (numbers: Sequence[int|float], sum_target: int|float) -> things_that_passed_the_filter list[tuple].

There is also a file-reading helper numbers_in_file exported at the top level.

Developing

Run the following to install the project (and dev dependencies) into your active virtualenv:

make dev_install

day-to-day development tasks can be orchestrated via make

  • dependency management
  • test/lint/typecheck running
  • coverage reporting
  • run make without any arguments to see a list

There is a CI suite which runs lint and test on several python versions. We don't run typechecking as a gate in CI because we think that turns a sometimes-useful tool into a Goodhart target.

Performance

We have not been optimizing for performance and it kind of shows.

When we run the benchmarking suite we see ~0.4 seconds fairly consistently for the triplet/3D problem.

We have at least 3 ideas of how to speed things up: several of them include dropping floating-point support.

$ make benchmark

tests/performance_check.py ..                                                                                                                                [100%]


------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------
Name (time in ms)             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_input1_pairs          5.4665 (1.0)        6.2297 (1.0)        5.6687 (1.0)      0.1018 (1.0)        5.6575 (1.0)      0.1289 (1.0)          47;3  176.4077 (1.0)         172           1
test_input1_triplets     384.6154 (70.36)    386.5000 (62.04)    385.4776 (68.00)    0.8287 (8.14)     385.4333 (68.13)    1.5047 (11.67)         2;0    2.5942 (0.01)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--

🍪 ✂️ cookiecut from lbillingham's python-cli-template

You might also like...
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

codon_optimize_cds_with_many_taxids_singlefasta Linux GUI app to codon optimize many single-fasta files with coding sequences, using many taxonomy ids

Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a local folder

Ingestinator Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a

Python Common things by Problem Fighter Library, (Exception, Debug Log, etc.)

In the name of God, the Most Gracious, the Most Merciful. PF-PY-Common Documentation Install and update using pip: pip install -U xxxx Please find the

Devil - Very Semple Auto Filter V1 Bot
Devil - Very Semple Auto Filter V1 Bot

Devil Very Semple Auto Filter V1 Bot

Cairo-bloom - A naive bloom filter implementation in Cairo

🥀 cairo-bloom A naive bloom filter implementation in Cairo. A Bloom filter is a

Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.
Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.

Nanopore-Workflow Snakemake workflow to process and filter long read data from Oxford Nanopore Technologies. It is designed to compare whole human gen

Runnable Python demo of ArtLine

artline-demo How to run? pip3 install -r requirements.txt python3 app.py How to use? Run the Flask app Open localhost:5000 in browser Select an image(

Tiny demo site for exploring SameSite=Lax

samesite-lax-demo Background on my blog: Exploring the SameSite cookie attribute for preventing CSRF This repo holds some tools for exploring the impl

An extended version of the hotkeys demo code using action classes

An extended version of the hotkeys application using action classes. In adafruit's Hotkeys code, a macro is using a series of integers, assumed to be

Comments
  • Perf: :zap:  merge if you want to go faster but don't need float support

    Perf: :zap: merge if you want to go faster but don't need float support

    This moves away from the shared itertools implimentations for finding pairs, triplets of the input numbers that sum to a given target.

    Instead, we

    • 1st expose the underlying $~O^{dimensions}$ nested loops
    • trade some extra memory and some $O^{1}$ lookups to give us $~O^{dimensions-1}$
    • get a >= 170x speedup in our benchmarks

    However, we:

    • loose the ability to properly work with floating point input
      • the fast lookup uses hasing and hashing floats gets weird due to floating point equality
    • can't trivially extend to higher-dimension problems: 4-element-tuples etc.

    I've moved the float input tests out the their own file and away from the CI test path

    Performance benchmarks

    with these changes:

    $ make benchmark
    tests/performance_check.py ..                                                                                                                             [100%]
    
    -------------------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------------------
    Name (time in us)               Min                   Max                  Mean              StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    test_input1_pairs           22.1660 (1.0)        166.3790 (1.0)         23.6089 (1.0)        5.0170 (1.0)         23.0000 (1.0)        0.5123 (1.0)        87;521  42,356.8183 (1.0)        7677           1
    test_input1_triplets     1,994.8000 (89.99)    3,561.1120 (21.40)    2,152.1272 (91.16)    204.1428 (40.69)    2,033.1040 (88.40)    299.1878 (584.07)       41;4     464.6565 (0.01)        341           1
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    

    itertools, but non-float supporting version

    $ make benchmark
    tests/performance_check.py ..                                                                                                      [100%]
    
    ------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------
    Name (time in ms)             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    test_input1_pairs          2.8727 (1.0)        4.2386 (1.0)        3.1265 (1.0)      0.1638 (1.0)        3.1067 (1.0)      0.1888 (1.0)          78;9  319.8414 (1.0)         326           1
    test_input1_triplets     211.6325 (73.67)    213.3950 (50.35)    212.4042 (67.94)    0.6555 (4.00)     212.2717 (68.33)    0.8081 (4.28)          2;0    4.7080 (0.01)          5           1
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    =========================================================== 2 passed in 3.59s ============================================================
    

    Note the change in units this branch is in microseconds, the itertools version is in milliseconds

    opened by lbillingham 0
Releases(v0.0.1)
  • v0.0.1(Feb 17, 2022)

    Initial release. Lib allows filtering by sum over pairs and triplets of numbers loaded from a local file. Plus a bonus CLI app that can be used for demoing the lib.

    Solution is itertools-y and rather slow (probably $O^{n}$ where pairs->n=2 and triplets->n=3).

    This is the version shown to MM

    Source code(tar.gz)
    Source code(zip)
Owner
Laurence Billingham
+ sustainable software + data science
Laurence Billingham
WriteAIr is a website which allows users to stream their writing.

WriteAIr is a website which allows users to stream their writing. It uses HSV masking to detect a pen which the user writes with. Plus, users can select a wide range of options through hand gestures!

Atharva Patil 1 Nov 01, 2021
A simple desktop application to scan and export Genshin Impact Artifacts.

「天目」 -- Amenoma 简体中文 | English 「天目流的诀窍就是滴水穿石的耐心和全力以赴的意志」 扫描背包中的圣遗物,并导出至 json 格式。之后可导入圣遗物分析工具( 莫娜占卜铺 、 MingyuLab 、 Genshin Optimizer 进行计算与规划等。 已支持 原神2.

夏至 475 Dec 30, 2022
Windows Task Manager with special features, written in Python.

Killer That damn Chrome ⬇ Download here · 👋 Join our discord Tired of trying to kill processes with the default Windows Task Manager? Selecting one b

Nathan Araújo 49 Jan 03, 2023
Import Apex legends mprt files exported from Legion

Apex-mprt-importer-for-Blender Import Apex legends mprt files exported from Legion. REQUIRES CAST IMPORTER Usage: Use a VPK extracter to extract the m

15 Dec 18, 2022
switching computer? changing your setup? You need to automate the download of your current setup? This is the right tool for you :incoming_envelope:

🔮 setup_shift(SS.py) switching computer? changing your setup? You need to automate the download of your current setup? This is the right tool for you

Mohamed Elfaleh 15 Aug 26, 2022
Airflow Operator for running Soda SQL scans

Airflow Operator for running Soda SQL scans

Todd de Quincey 7 Oct 18, 2022
Projeto job insights - Projeto avaliativo da Trybe do Bloco 32: Introdução à Python

Termos e acordos Ao iniciar este projeto, você concorda com as diretrizes do Código de Ética e Conduta e do Manual da Pessoa Estudante da Trybe. Boas

Lucas Muffato 1 Dec 09, 2021
Бэкапалка таблиц mysql 8 через брокер сообщений nats

nats-mysql-tables-backup Бэкап таблиц mysql 8 через брокер сообщений nats (проверено и работает в ubuntu 20.04, при наличии python 3.8) ПРИМЕРЫ: Ниже

Constantine 1 Dec 13, 2021
Clear merged pull requests ref (branch) on GitHub

GitHub PR Cleansing This tool is used to clear merged pull requests ref (branch) on GitHub. GitHub has no feature to auto delete branches on pull requ

Andi N. Dirgantara 12 Apr 19, 2022
A simple BrainF**k compiler written in Python

bf-comp A simple BrainF**k compiler written in Python. What else were you looking for?

1 Jan 09, 2022
Lookup for interesting stuff in SMB shares

SMBSR - what is that? Well, SMBSR is a python script which given a CIDR/IP/IP_file/HOSTNAME(s) enumerates all the SMB services listening (445) among t

Vincenzo 112 Dec 15, 2022
A simple python script where the user inputs the current ingredients they have in their kitchen into ingredients.txt

A simple python script where the user inputs the current ingredients they have in their kitchen into ingredients.txt and then runs the main.py script, and it will output what recipes can be created b

Jordan Leich 3 Nov 02, 2022
:snake: Complete C99 parser in pure Python

pycparser v2.20 Contents 1 Introduction 1.1 What is pycparser? 1.2 What is it good for? 1.3 Which version of C does pycparser support? 1.4 What gramma

Eli Bendersky 2.8k Dec 29, 2022
AlexaUsingPython - Alexa will pay attention to your order, as: Hello Alexa, play music, Hello Alexa

AlexaUsingPython - Alexa will pay attention to your order, as: Hello Alexa, play music, Hello Alexa, what's the time? Alexa will pay attention to your order, get it, and afterward do some activity as

Abubakar Sattar 10 Aug 18, 2022
Now you'll never be late for your Webinars or Meetings on the GoToWebinar Platform

GoToWebinar Launcher : Now you'll never be late for your Webinars or Meetings on the GoToWebinar Platform About Are you popular for always being late

Jay Thorat 6 Jun 07, 2022
An easy FASTA object handler, reader, writer and translator for small to medium size projects without dependencies.

miniFASTA An easy FASTA object handler, reader, writer and translator for small to medium size projects without dependencies. Installation Using pip /

Jules Kreuer 3 Jun 30, 2022
The tool helps to find hidden parameters that can be vulnerable or can reveal interesting functionality that other hunters miss.

The tool helps to find hidden parameters that can be vulnerable or can reveal interesting functionality that other hunters miss. Greater accuracy is achieved thanks to the line-by-line comparison of

197 Nov 14, 2022
This tool allows you to do goole dorking much easier

This tool allows you to do goole dorking much easier

Steven 8 Mar 06, 2022
Online HackerRank problem solving challenges

LinkedListHackerRank Online HackerRank problem solving challenges This challenge is part of a tutorial track by MyCodeSchool You are given the pointer

Sefineh Tesfa 1 Nov 21, 2021
Hopefully it'll become a very annoying desktop pet

AnnoyingPet Basic Tutorial: https://seebass22.github.io/python-desktop-pet-tutorial/ Handling Mouse Input: https://pythonhosted.org/pynput/mouse.html

1 Jun 08, 2022