Parquet benchmarks

This repository contains a set of benchmarks of different implementations of Parquet (storage format) <-> Arrow (in-memory format).

The results on Azure's Standard D4s v3 (4 vcpus, 16 GiB memory) are available here.

Read uncompressed

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Read compressed (snappy)

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Write uncompressed

Write compressed (snappy)

(Note: neither pyarrow nor arrow validate utf8, which can result in undefined behavior.)

Run benchmarks

To reproduce, use

python3 -m venv venv
venv/bin/pip install -U pip
venv/bin/pip install pyarrow

# create files
venv/bin/python write_parquet.py

# run benchmarks
venv/bin/python run.py

# print results to stdout as csv
venv/bin/python summarize.py

Details

The benchmark reads a single column from a file pre-loaded into memory, decompresses and deserializes the column to an arrow array.

The benchmark includes different configurations:

dictionary-encoded vs plain encoding
single page vs multiple pages
compressed vs uncompressed
different types:
- i64
- bool
- utf8

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
benches		benches
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
bench.py		bench.py
run.py		run.py
summarize.py		summarize.py
write_parquet.py		write_parquet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benches

benches

src

src

.gitignore

.gitignore

Cargo.toml

Cargo.toml

LICENSE

LICENSE

README.md

README.md

bench.py

bench.py

run.py

run.py

summarize.py

summarize.py

write_parquet.py

write_parquet.py

Repository files navigation

Parquet benchmarks

Read uncompressed

Read compressed (snappy)

Write uncompressed

Write compressed (snappy)

Run benchmarks

Details

About

Releases

Packages

Languages

License

DataEngineeringLabs/parquet-benchmark

Folders and files

Latest commit

History

Repository files navigation

Parquet benchmarks

Read uncompressed

Read compressed (snappy)

Write uncompressed

Write compressed (snappy)

Run benchmarks

Details

About

Resources

License

Stars

Watchers

Forks

Languages