Python PostgreSQL adapter to stream results of multi-statement queries without a server-side cursor

Overview

streampq CircleCI Test Coverage

Stream results of multi-statement PostgreSQL queries from Python without server-side cursors. Has benefits over some other Python PostgreSQL libraries:

  • Streams results from complex multi-statement queries even though SQL doesn't allow server-side cursors for such queries - suitable for large amounts of results that don't fit in memory.

  • CTRL+C (SIGINT) by default behaves as expected even during slow queries - a KeyboardInterrupt is raised and quickly bubbles up through streampq code. Unless client code prevents it, the program will exit.

  • Every effort is made to cancel queries on KeyboardInterrupt, SystemExit, or errors - the server doesn't continue needlessly using resources.

Particularly useful when temporary tables are needed to store intermediate results in multi-statement SQL scripts.

Installation

pip install streampq

The libpq binary library is also required. This is typically either already installed, or installed by:

  • macOS + brew: brew install libpq
  • Linux (Debian): apt install libpq5
  • Linux (Red Hat):yum install postgresql-libs

The only runtime dependencies are libpq and Python itself.

Usage

from streampq import streampq_connect

# libpq connection paramters
# https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
#
# Any can be ommitted and environment variables will be used instead
# https://www.postgresql.org/docs/current/libpq-envars.html
connection_params = (
    ('host', 'localhost'),
    ('port', '5432'),
    ('dbname', 'postgres'),
    ('user', 'postgres'),
    ('password', 'password'),
)

# SQL statement(s) - if more than one, separate by ;
sql = '''
    SELECT * FROM my_table;
    SELECT * FROM my_other_table;
'''

# Connection and querying is via a context manager
with streampq_connect(connection_params) as query:
    for (columns, rows) in query(sql):
        print(columns)  # Tuple of column names
        for row in rows:
            print(row)  # Tuple of row  values

PostgreSQL types to Python type decoding

There are 164 built-in PostgreSQL data types (including array types), and streampq converts them to Python types. In summary:

PostgreSQL types Python type
null None
text (e.g. varchar), xml, network addresses, and money str
byte (e.g. bytea) bytes
integer (e.g. int4) int
inexact real number (e.g. float4) float
exact real number (e.g. numeric) Decimal
date date
timestamp datetime (without timezone)
timestamptz datetime (with offset timezone)
json and jsonb output of json.loads
interval streampq.Interval
range (e.g. daterange) streampq.Range
multirange (e.g. datemultirange) tuples of streampq.Range
arrays and vectors tuple (of any of the above types, or of nested tuples)

To customise these, override the default value of the get_decoders parameter of the streampq_connect function in streampq.py.

In general, built-in types are preferred over custom types, and immutable types are preferred over mutable.

streampq.Interval

The Python built-in timedelta type is not used for PostgreSQL interval since timedelta does not offer a way to store PostgreSQL intervals of years or months, other than converting to days which would be a loss of information.

Instead, a namedtuple is defined, streampq.Interval, with members:

Member Type
years int
months int
days int
hours int
minutes int
seconds Decimal

streampq.Range

There is no Python built-in type for a PosgreSQL range. So for these, a namedtuple is defined, streampq.Range, with members:

Member Type
lower int, date, datetime (without timezone), or datetime (with offset timezone)
upper int, date, datetime (without timezone), or datetime (with offset timezone)
bounds str - one of (), (], [), or []

Bind parameters - literals

Dynamic SQL literals can be bound using the literals parameter of the query function. It must be an iterable of key-value pairs.

sql = '''
    SELECT * FROM my_table WHERE my_col = {my_col_value};
'''

with streampq_connect(connection_params) as query:
    for (columns, rows) in query(sql, literals=(
        ('my_col_value', 'my-value'),
    )):
        for row in rows:
            pass

Bind parameters - identifiers

Dynamic SQL identifiers, e.g. column names, can be bound using the identifiers parameter of the query function. It must be an iterable of key-value pairs.

sql = '''
    SELECT * FROM my_table WHERE {column_name} = 'my-value';
'''

with streampq_connect(connection_params) as query:
    for (columns, rows) in query(sql, identifiers=(
        ('column_name', 'my_col'),
    )):
        for row in rows:
            pass

Identifiers and literals use different escaping rules - hence the need for 2 different parameters.

Single-statement SQL queries

While this library is specialsed for multi-statement queries, it works fine when there is only one. In this case the iterable returned from the query function yields only a single (columns, rows) pair.

Exceptions

Exceptions derive from streampq.StreamPQError. If there is any more information available on the error, it's added as a string in its args property. This is included in the string representation of the exception by default.

Exception hierarchy

  • StreamPQError

    Base class for all explicitly-thrown exceptions

    • ConnectionError

      An error occurred while attempting to connect to the database.

    • QueryError

      An error occurred while attempting to run a query. Typically this is due to a syntax error or a missing column.

    • CancelError

      An error occurred while attempting to cancel a query.

    • CommunicationError

      An error occurred communicating with the database after successful connection.

Owner
Department for International Trade
Department for International Trade
TileDB-Py is a Python interface to the TileDB Storage Engine.

TileDB-Py TileDB-Py is a Python interface to the TileDB Storage Engine. Quick Links Installation Build Instructions TileDB Documentation Python API re

TileDB, Inc. 149 Nov 28, 2022
Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases

Estoult Estoult is a Python toolkit for data mapping with an integrated query builder for SQL databases. It currently supports MySQL, PostgreSQL, and

halcyon[nouveau] 15 Dec 29, 2022
Apache Libcloud is a Python library which hides differences between different cloud provider APIs and allows you to manage different cloud resources through a unified and easy to use API

Apache Libcloud - a unified interface for the cloud Apache Libcloud is a Python library which hides differences between different cloud provider APIs

The Apache Software Foundation 1.9k Dec 25, 2022
Kafka Connect JDBC Docker Image.

kafka-connect-jdbc This is a dockerized version of the Confluent JDBC database connector. Usage This image is running the connect-standalone command w

Marc Horlacher 1 Jan 05, 2022
This repository is for active development of the Azure SDK for Python.

Azure SDK for Python This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public

Microsoft Azure 3.4k Jan 02, 2023
Python client for Apache Kafka

Kafka Python client Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the offici

Dana Powers 5.1k Jan 08, 2023
A tool to snapshot sqlite databases you don't own

The core here is my first attempt at a solution of this, combining ideas from browser_history.py and karlicoss/HPI/sqlite.py to create a library/CLI tool to (as safely as possible) copy databases whi

Sean Breckenridge 10 Dec 22, 2022
Familiar asyncio ORM for python, built with relations in mind

Tortoise ORM Introduction Tortoise ORM is an easy-to-use asyncio ORM (Object Relational Mapper) inspired by Django. Tortoise ORM was build with relati

Tortoise 3.3k Dec 31, 2022
MySQL database connector for Python (with Python 3 support)

mysqlclient This project is a fork of MySQLdb1. This project adds Python 3 support and fixed many bugs. PyPI: https://pypi.org/project/mysqlclient/ Gi

PyMySQL 2.2k Dec 25, 2022
Little wrapper around asyncpg for specific experience.

Little wrapper around asyncpg for specific experience.

Nikita Sivakov 3 Nov 15, 2021
sync/async MongoDB ODM, yes.

μMongo: sync/async ODM μMongo is a Python MongoDB ODM. It inception comes from two needs: the lack of async ODM and the difficulty to do document (un)

Scille 428 Dec 29, 2022
Example Python codes that works with MySQL and Excel files (.xlsx)

Python x MySQL x Excel by Zinglecode Example Python codes that do the processes between MySQL database and Excel spreadsheet files. YouTube videos MyS

Potchara Puttawanchai 1 Feb 07, 2022
A HugSQL-inspired database library for Python

PugSQL PugSQL is a simple Python interface for using parameterized SQL, in files. See pugsql.org for the documentation. To install: pip install pugsql

Dan McKinley 558 Dec 24, 2022
Implementing basic MongoDB CRUD (Create, Read, Update, Delete) queries, using Python.

MongoDB with Python Implementing basic MongoDB CRUD (Create, Read, Update, Delete) queries, using Python. We can connect to a MongoDB database hosted

MousamSingh 4 Dec 01, 2021
A fast PostgreSQL Database Client Library for Python/asyncio.

asyncpg -- A fast PostgreSQL Database Client Library for Python/asyncio asyncpg is a database interface library designed specifically for PostgreSQL a

magicstack 5.8k Dec 31, 2022
A tutorial designed to introduce you to SQlite 3 database using python

SQLite3-python-tutorial A tutorial designed to introduce you to SQlite 3 database using python What is SQLite? SQLite is an in-process library that im

0 Dec 28, 2021
Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

EntityDocumentMaker Version 1.00 After importing the entity definition (Excel file), store the data in sqlite3. エンティティ定義(Excelファイル)をインポートした後、データをsqlit

G-jon FujiYama 1 Jan 09, 2022
A Redis client library for Twisted Python

txRedis Asynchronous Redis client for Twisted Python. Install Install via pip. Usage examples can be found in the examples/ directory of this reposito

Dorian Raymer 127 Oct 23, 2022
Implementing basic MySQL CRUD (Create, Read, Update, Delete) queries, using Python.

MySQL with Python Implementing basic MySQL CRUD (Create, Read, Update, Delete) queries, using Python. We can connect to a MySQL database hosted locall

MousamSingh 5 Dec 01, 2021
Redis Python Client

redis-py The Python interface to the Redis key-value store. Python 2 Compatibility Note redis-py 3.5.x will be the last version of redis-py that suppo

Andy McCurdy 11k Dec 29, 2022