Bayesian A/B testing

Overview

Tests Codecov PyPI

Bayesian A/B testing

bayesian_testing is a small package for a quick evaluation of A/B (or A/B/C/...) tests using Bayesian approach.

The package currently supports these data inputs:

  • binary data ([0, 1, 0, ...]) - convenient for conversion-like A/B testing
  • normal data with unknown variance - convenient for normal data A/B testing
  • delta-lognormal data (lognormal data with zeros) - convenient for revenue-like A/B testing

The core evaluation metric of the approach is Probability of Being Best (i.e. "being larger" from data point of view) which is calculated using simulations from posterior distributions (considering given data).

Installation

bayesian_testing can be installed using pip:

pip install bayesian_testing

Alternatively, you can clone the repository and use poetry manually:

cd bayesian_testing
pip install poetry
poetry install
poetry shell

Basic Usage

The primary features are BinaryDataTest, NormalDataTest and DeltaLognormalDataTest classes.

In all cases, there are two methods to insert data:

  • add_variant_data - adding raw data for a variant as a list of numbers (or numpy 1-D array)
  • add_variant_data_agg - adding aggregated variant data (this can be practical for large data, as the aggregation can be done on a database level)

Both methods for adding data are allowing specification of prior distribution using default parameters (see details in respective docstrings). Default prior setup should be sufficient for most of the cases (e.g. in cases with unknown priors or large amounts of data).

To get the results of the test, simply call method evaluate, or probabs_of_being_best for returning just the probabilities.

Probabilities of being best are approximated using simulations, hence evaluate can return slightly different values for different runs. To stabilize it, you can set sim_count parameter of evaluate to higher value (default value is 20K), or even use seed parameter to fix it completely.

BinaryDataTest

Class for Bayesian A/B test for binary-like data (e.g. conversions, successes, etc.).

import numpy as np
from bayesian_testing.experiments import BinaryDataTest

# generating some random data
rng = np.random.default_rng(52)
# random 1x1500 array of 0/1 data with 5.2% probability for 1:
data_a = rng.binomial(n=1, p=0.052, size=1500)
# random 1x1200 array of 0/1 data with 6.7% probability for 1:
data_b = rng.binomial(n=1, p=0.067, size=1200)

# initialize a test
test = BinaryDataTest()

# add variant using raw data (arrays of zeros and ones):
test.add_variant_data("A", data_a)
test.add_variant_data("B", data_b)
# priors can be specified like this (default for this test is a=b=1/2):
# test.add_variant_data("B", data_b, a_prior=1, b_prior=20)

# add variant using aggregated data (same as raw data with 950 zeros and 50 ones):
test.add_variant_data_agg("C", totals=1000, positives=50)

# evaluate test
test.evaluate()
[{'variant': 'A',
  'totals': 1500,
  'positives': 80,
  'conv_rate': 0.05333,
  'prob_being_best': 0.06625},
 {'variant': 'B',
  'totals': 1200,
  'positives': 80,
  'conv_rate': 0.06667,
  'prob_being_best': 0.89005},
 {'variant': 'C',
  'totals': 1000,
  'positives': 50,
  'conv_rate': 0.05,
  'prob_being_best': 0.0437}]

NormalDataTest

Class for Bayesian A/B test for normal data.

import numpy as np
from bayesian_testing.experiments import NormalDataTest

# generating some random data
rng = np.random.default_rng(21)
data_a = rng.normal(7.2, 2, 1000)
data_b = rng.normal(7.1, 2, 800)
data_c = rng.normal(7.0, 4, 500)

# initialize a test
test = NormalDataTest()

# add variant using raw data:
test.add_variant_data("A", data_a)
test.add_variant_data("B", data_b)
# test.add_variant_data("C", data_c)

# add variant using aggregated data:
test.add_variant_data_agg("C", len(data_c), sum(data_c), sum(np.square(data_c)))

# evaluate test
test.evaluate(sim_count=20000, seed=52)
[{'variant': 'A',
  'totals': 1000,
  'sum_values': 7294.67901,
  'avg_values': 7.29468,
  'prob_being_best': 0.1707},
 {'variant': 'B',
  'totals': 800,
  'sum_values': 5685.86168,
  'avg_values': 7.10733,
  'prob_being_best': 0.00125},
 {'variant': 'C',
  'totals': 500,
  'sum_values': 3736.91581,
  'avg_values': 7.47383,
  'prob_being_best': 0.82805}]

DeltaLognormalDataTest

Class for Bayesian A/B test for delta-lognormal data (log-normal with zeros). Delta-lognormal data is typical case of revenue per session data where many sessions have 0 revenue but non-zero values are positive numbers with possible log-normal distribution. To handle this data, the calculation is combining binary Bayes model for zero vs non-zero "conversions" and log-normal model for non-zero values.

0 for x in data_b), sum_values=sum(data_b), sum_logs=sum([np.log(x) for x in data_b if x > 0]), sum_logs_2=sum([np.square(np.log(x)) for x in data_b if x > 0]) ) test.evaluate(seed=21)">
import numpy as np
from bayesian_testing.experiments import DeltaLognormalDataTest

test = DeltaLognormalDataTest()

data_a = [7.1, 0.3, 5.9, 0, 1.3, 0.3, 0, 0, 0, 0, 0, 1.5, 2.2, 0, 4.9, 0, 0, 0, 0, 0]
data_b = [4.0, 0, 3.3, 19.3, 18.5, 0, 0, 0, 12.9, 0, 0, 0, 0, 0, 0, 0, 0, 3.7, 0, 0]

# adding variant using raw data
test.add_variant_data("A", data_a)

# alternatively, variant can be also added using aggregated data:
test.add_variant_data_agg(
    name="B",
    totals=len(data_b),
    positives=sum(x > 0 for x in data_b),
    sum_values=sum(data_b),
    sum_logs=sum([np.log(x) for x in data_b if x > 0]),
    sum_logs_2=sum([np.square(np.log(x)) for x in data_b if x > 0])
)

test.evaluate(seed=21)
[{'variant': 'A',
  'totals': 20,
  'positives': 8,
  'sum_values': 23.5,
  'avg_values': 1.175,
  'avg_positive_values': 2.9375,
  'prob_being_best': 0.18915},
 {'variant': 'B',
  'totals': 20,
  'positives': 6,
  'sum_values': 61.7,
  'avg_values': 3.085,
  'avg_positive_values': 10.28333,
  'prob_being_best': 0.81085}]

Development

To set up development environment use Poetry and pre-commit:

pip install poetry
poetry install
poetry run pre-commit install

Roadmap

Test classes to be added:

  • PoissonDataTest
  • ExponentialDataTest

Metrics to be added:

  • Expected Loss
  • Potential Value Remaining

References

You might also like...
Language-agnostic HTTP API Testing Tool
Language-agnostic HTTP API Testing Tool

Dredd — HTTP API Testing Framework Dredd is a language-agnostic command-line tool for validating API description document against backend implementati

Web testing library for Robot Framework

SeleniumLibrary Contents Introduction Keyword Documentation Installation Browser drivers Usage Extending SeleniumLibrary Community Versions History In

✅ Python web automation and testing. 🚀 Fast, easy, reliable. 💠
✅ Python web automation and testing. 🚀 Fast, easy, reliable. 💠

Build fast, reliable, end-to-end tests. SeleniumBase is a Python framework for web automation, end-to-end testing, and more. Tests are run with "pytes

A command-line tool and Python library and Pytest plugin for automated testing of RESTful APIs, with a simple, concise and flexible YAML-based syntax

1.0 Release See here for details about breaking changes with the upcoming 1.0 release: https://github.com/taverntesting/tavern/issues/495 Easier API t

One-stop solution for HTTP(S) testing.
One-stop solution for HTTP(S) testing.

HttpRunner HttpRunner is a simple & elegant, yet powerful HTTP(S) testing framework. Enjoy! ✨ 🚀 ✨ Design Philosophy Convention over configuration ROI

Declarative HTTP Testing for Python and anything else

Gabbi Release Notes Gabbi is a tool for running HTTP tests where requests and responses are represented in a declarative YAML-based form. The simplest

A modern API testing tool for web applications built with Open API and GraphQL specifications.
A modern API testing tool for web applications built with Open API and GraphQL specifications.

Schemathesis Schemathesis is a modern API testing tool for web applications built with Open API and GraphQL specifications. It reads the application s

A framework-agnostic library for testing ASGI web applications

async-asgi-testclient Async ASGI TestClient is a library for testing web applications that implements ASGI specification (version 2 and 3). The motiva

A Modular Penetration Testing Framework
A Modular Penetration Testing Framework

fsociety A Modular Penetration Testing Framework Install pip install fsociety Update pip install --upgrade fsociety Usage usage: fsociety [-h] [-i] [-

Comments
  • Results are different from online tool

    Results are different from online tool

    Hi,

    I tested your library and cross-checked against this online calculator: Here is the result from your library:

    [{'variant': 'True True True False False False False',
      'totals': 1172,
      'positives': 461,
      'positive_rate': 0.39334,
      'prob_being_best': 0.7422,
      'expected_loss': 0.0582635},
     {'variant': 'False True True False False False False',
      'totals': 222,
      'positives': 27,
      'positive_rate': 0.12162,
      'prob_being_best': 0.0,
      'expected_loss': 0.3280173},
     {'variant': 'False False True False False False False',
      'totals': 1363,
      'positives': 63,
      'positive_rate': 0.04622,
      'prob_being_best': 0.0,
      'expected_loss': 0.4051768},
     {'variant': 'False False False False False False False',
      'totals': 1052,
      'positives': 0,
      'positive_rate': 0.0,
      'prob_being_best': 0.0,
      'expected_loss': 0.4512031},
     {'variant': 'True False True False False False False',
      'totals': 1,
      'positives': 0,
      'positive_rate': 0.0,
      'prob_being_best': 0.2578,
      'expected_loss': 0.1997566}]
    

    So the best variant has 74% probability to be the winner. On the online calculator it is 63.48% instead (last variant is 36.52% instead of 25.78%).

    I used the BinaryDataTest() without any priors.

    I did not dig deeper on what might be right here, but wanted to drop this as feedback.

    opened by ThomasMeissnerDS 6
  • Minimum sample size

    Minimum sample size

    First, this package is great! I wanted to know if the probability estimates rely on a minimum sample size or how one might go about determining minimum sample size for a Binary test, for example.

    opened by abrunner94 5
  • Bump jupyter-server from 1.13.5 to 1.15.4

    Bump jupyter-server from 1.13.5 to 1.15.4

    Bumps jupyter-server from 1.13.5 to 1.15.4.

    Release notes

    Sourced from jupyter-server's releases.

    v1.15.3

    1.15.3

    (Full Changelog)

    Bugs fixed

    Maintenance and upkeep improvements

    Contributors to this release

    (GitHub contributors page for this release)

    @​blink1073 | @​codecov-commenter | @​minrk

    v1.15.2

    1.15.2

    (Full Changelog)

    Bugs fixed

    Maintenance and upkeep improvements

    Contributors to this release

    (GitHub contributors page for this release)

    @​blink1073 | @​minrk | @​Zsailer

    v1.15.1

    1.15.1

    (Full Changelog)

    ... (truncated)

    Changelog

    Sourced from jupyter-server's changelog.

    Changelog

    All notable changes to this project will be documented in this file.

    1.16.0

    (Full Changelog)

    New features added

    Enhancements made

    Bugs fixed

    Maintenance and upkeep improvements

    Other merged PRs

    Contributors to this release

    (GitHub contributors page for this release)

    @​andreyvelich | @​blink1073 | @​codecov-commenter | @​divyansshhh | @​dleen | @​fcollonval | @​jhamet93 | @​meeseeksdev | @​minrk | @​rccern | @​welcome | @​Zsailer

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
Releases(v0.5.3)
Owner
Matus Baniar
Data data data
Matus Baniar
Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.

The Mixer is a helper to generate instances of Django or SQLAlchemy models. It's useful for testing and fixture replacement. Fast and convenient test-

Kirill Klenov 871 Dec 25, 2022
A framework-agnostic library for testing ASGI web applications

async-asgi-testclient Async ASGI TestClient is a library for testing web applications that implements ASGI specification (version 2 and 3). The motiva

122 Nov 22, 2022
DUCKSPLOIT - Windows Hacking FrameWork using Reverse Shell

Ducksploit Install Ducksploit Hacker setup raspberry pico Download https://githu

2 Jan 31, 2022
UUM Merit Form Filler is a web automation which helps automate entering a matric number to the UUM system in order for participants to obtain a merit

About UUM Merit Form Filler UUM Merit Form Filler is a web automation which helps automate entering a matric number to the UUM system in order for par

Ilham Rachmat 3 May 31, 2022
Pytest-typechecker - Pytest plugin to test how type checkers respond to code

pytest-typechecker this is a plugin for pytest that allows you to create tests t

vivax 2 Aug 20, 2022
A pure Python script to easily get a reverse shell

easy-shell A pure Python script to easily get a reverse shell. How it works? After sending a request, it generates a payload with different commands a

Cristian Souza 48 Dec 12, 2022
Descriptor Vector Exchange

Descriptor Vector Exchange This repo provides code for learning dense landmarks without supervision. Our approach is described in the ICCV 2019 paper

James Thewlis 74 Nov 29, 2022
Active Directory Penetration Testing methods with simulations

AD penetration Testing Project By Ruben Enkaoua - GL4Di4T0R Based on the TCM PEH course (Heath Adams) Index 1 - Setting Up the Lab Intallation of a Wi

GL4DI4T0R 3 Aug 12, 2021
Um scraper feito em python que gera arquivos de excel baseados nas tier lists do site LoLalytics.

LoLalytics-scraper Um scraper feito em python que gera arquivos de excel baseados nas tier lists do site LoLalytics. Começando por um único script com

Kevin Souza 1 Feb 19, 2022
Penetration testing

Penetration testing

3 Jan 11, 2022
A mocking library for requests

httmock A mocking library for requests for Python 2.7 and 3.4+. Installation pip install httmock Or, if you are a Gentoo user: emerge dev-python/httm

Patryk Zawadzki 452 Dec 28, 2022
Webscreener is a tool for mass web domains pentesting.

Webscreener is a tool for mass web domains pentesting. It is used to take snapshots for domains that is generated by a tool like knockpy or Sublist3r. It cuts out most of the pentesting time by scree

Seekurity 3 Jun 07, 2021
It helps to use fixtures in pytest.mark.parametrize

pytest-lazy-fixture Use your fixtures in @pytest.mark.parametrize. Installation pip install pytest-lazy-fixture Usage import pytest @pytest.fixture(p

Marsel Zaripov 299 Dec 24, 2022
A Proof of concept of a modern python CLI with click, pydantic, rich and anyio

httpcli This project is a proof of concept of a modern python networking cli which can be simple and easy to maintain using some of the best packages

Kevin Tewouda 17 Nov 15, 2022
frwk_51pwn is an open-sourced remote vulnerability testing and proof-of-concept development framework

frwk_51pwn Legal Disclaimer Usage of frwk_51pwn for attacking targets without prior mutual consent is illegal. frwk_51pwn is for security testing purp

51pwn 4 Apr 24, 2022
Auto Click by pyautogui and excel operations.

Auto Click by pyautogui and excel operations.

Janney 2 Dec 21, 2021
Auto-hms-action - Automation of NU Health Management System

🦾 Automation of NU Health Management System 🤖 長崎大学 健康管理システムの自動化 🏯 Usage / 使い方

k5-mot 3 Mar 04, 2022
masscan + nmap 快速端口存活检测和服务识别

masnmap masscan + nmap 快速端口存活检测和服务识别。 思路很简单,将masscan在端口探测的高速和nmap服务探测的准确性结合起来,达到一种相对比较理想的效果。 先使用masscan以较高速率对ip存活端口进行探测,再以多进程的方式,使用nmap对开放的端口进行服务探测。 安

starnightcyber 75 Dec 19, 2022
Pytest modified env

Pytest plugin to fail a test if it leaves modified os.environ afterwards.

wemake.services 7 Sep 11, 2022
Browser reload with uvicorn

uvicorn-browser This project is inspired by autoreload. Installation pip install uvicorn-browser Usage Run uvicorn-browser --help to see all options.

Marcelo Trylesinski 64 Dec 17, 2022