This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).

Overview

Parquet benchmarks

This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).

The results on Azure's Standard D4s v3 (4 vcpus, 16 GiB memory) are available here.

Read uncompressed

read uncompressed i64

read uncompressed bool

read uncompressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

read uncompressed dict utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Read compressed (snappy)

read compressed i64

read compressed bool

read compressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Write uncompressed

write uncompressed i64

write uncompressed bool

write uncompressed utf8

Write compressed (snappy)

write compressed i64

write compressed bool

write compressed utf8

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Run benchmarks

To reproduce, use

python3 -m venv venv
venv/bin/pip install -U pip
venv/bin/pip install pyarrow

# create files
venv/bin/python write_parquet.py

# run benchmarks
venv/bin/python run.py

# print results to stdout as csv
venv/bin/python summarize.py

Details

The benchmark reads a single column from a file pre-loaded into memory, decompresses and deserializes the column to an arrow array.

The benchmark includes different configurations:

  • dictionary-encoded vs plain encoding
  • single page vs multiple pages
  • compressed vs uncompressed
  • different types:
    • i64
    • bool
    • utf8
A pytest plugin to skip `@pytest.mark.slow` tests by default.

pytest-skip-slow A pytest plugin to skip @pytest.mark.slow tests by default. Include the slow tests with --slow. Installation $ pip install pytest-ski

Brian Okken 19 Jan 04, 2023
pytest_pyramid provides basic fixtures for testing pyramid applications with pytest test suite

pytest_pyramid pytest_pyramid provides basic fixtures for testing pyramid applications with pytest test suite. By default, pytest_pyramid will create

Grzegorz Śliwiński 12 Dec 04, 2022
Avocado is a set of tools and libraries to help with automated testing.

Welcome to Avocado Avocado is a set of tools and libraries to help with automated testing. One can call it a test framework with benefits. Native test

Ana Guerrero Lopez 1 Nov 19, 2021
Automação de Processos (obtenção de informações com o Selenium), atualização de Planilha e Envio de E-mail.

Automação de Processo: Código para acompanhar o valor de algumas ações na B3. O código entra no Google Drive, puxa os valores das ações (pré estabelec

Hemili Beatriz 1 Jan 08, 2022
Sixpack is a language-agnostic a/b-testing framework

Sixpack Sixpack is a framework to enable A/B testing across multiple programming languages. It does this by exposing a simple API for client libraries

1.7k Dec 24, 2022
A small automated test structure using python to test *.cpp codes

Get Started Insert C++ Codes Add Test Code Run Test Samples Check Coverages Insert C++ Codes you can easily add c++ files in /inputs directory there i

Alireza Zahiri 2 Aug 03, 2022
Test django schema and data migrations, including migrations' order and best practices.

django-test-migrations Features Allows to test django schema and data migrations Allows to test both forward and rollback migrations Allows to test th

wemake.services 382 Dec 27, 2022
Screenplay pattern base for Python automated UI test suites.

ScreenPy TITLE CARD: "ScreenPy" TITLE DISAPPEARS.

Perry Goy 39 Nov 15, 2022
One-stop solution for HTTP(S) testing.

HttpRunner HttpRunner is a simple & elegant, yet powerful HTTP(S) testing framework. Enjoy! ✨ 🚀 ✨ Design Philosophy Convention over configuration ROI

HttpRunner 3.5k Jan 04, 2023
a wrapper around pytest for executing tests to look for test flakiness and runtime regression

bubblewrap a wrapper around pytest for assessing flakiness and runtime regressions a cs implementations practice project How to Run: First, install de

Anna Nagy 1 Aug 05, 2021
A web scraping using Selenium Webdriver

Savee - Images Downloader Project using Selenium Webdriver to download images from someone's profile on https:www.savee.it website. Usage The project

Caio Eduardo Lobo 1 Dec 17, 2021
A library to make concurrent selenium tests that automatically download and setup webdrivers

AutoParaSelenium A library to make parallel selenium tests that automatically download and setup webdrivers Usage Installation pip install autoparasel

Ronak Badhe 8 Mar 13, 2022
Airspeed Velocity: A simple Python benchmarking tool with web-based reporting

airspeed velocity airspeed velocity (asv) is a tool for benchmarking Python packages over their lifetime. It is primarily designed to benchmark a sing

745 Dec 28, 2022
Flexible test automation for Python

Nox - Flexible test automation for Python nox is a command-line tool that automates testing in multiple Python environments, similar to tox. Unlike to

Stargirl Flowers 941 Jan 03, 2023
Aioresponses is a helper for mock/fake web requests in python aiohttp package.

aioresponses Aioresponses is a helper to mock/fake web requests in python aiohttp package. For requests module there are a lot of packages that help u

402 Jan 06, 2023
Testing Calculations in Python, using OOP (Object-Oriented Programming)

Testing Calculations in Python, using OOP (Object-Oriented Programming) Create environment with venv python3 -m venv venv Activate environment . venv

William Koller 1 Nov 11, 2021
A simple script to login into twitter using Selenium in python.

Quick Talk A simple script to login into twitter using Selenium in python. I was looking for a way to login into twitter using Selenium in python. Sin

Lzy-slh 4 Nov 20, 2022
Baseball Discord bot that can post up-to-date scores, lineups, and home runs.

Sunny Day Discord Bot Baseball Discord bot that can post up-to-date scores, lineups, and home runs. Uses webscraping techniques to scrape baseball dat

Benjamin Hammack 1 Jun 20, 2022
Command line driven CI frontend and development task automation tool.

tox automation project Command line driven CI frontend and development task automation tool At its core tox provides a convenient way to run arbitrary

tox development team 3.1k Jan 04, 2023
create custom test databases that are populated with fake data

About Generate fake but valid data filled databases for test purposes using most popular patterns(AFAIK). Current support is sqlite, mysql, postgresql

Emir Ozer 2.2k Jan 04, 2023