Python virtual filesystem for SQLite to read from and write to S3

Last update: Jan 04, 2023

Related tags

Overview

sqlite-s3vfs

Python virtual filesystem for SQLite to read from and write to S3.

No locking is performed, so client code must ensure that writes do not overlap with other writes or reads. If multiple writes happen at the same time, the database will probably become corrupt and data be lost.

Inspired by phiresky's sql.js-httpvfs, dacort's Stack Overflow answer, and sqlite-s3-query.

How does it work?

sqlite-s3vfs stores the SQLite database in fixed-sized blocks, and each is stored as a separate object in S3. SQLite stores its data in fixed-size pages, and always writes exactly a page at a time. This virtual filesystem translates page reads and writes to block reads and writes. In the case of SQLite pages being the same size as blocks, which is the case by default, each page write results in exactly one block write.

Separate objects are required since S3 does not support the partial replace of an object; to change even 1 byte, it must be re-uploaded in full.

Installation

sqlite-s3vfs depends on APSW, which is not officially available on PyPI, but can be installed directly from GitHub.

pip install sqlite-s3vfs
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--sqlite --global-option=build --global-option=--enable-all-extensions

Usage

sqlite-s3vfs is an APSW virtual filesystem that requires boto3 for its communication with S3.

import apsw
import boto3
import sqlite_s3vfs

# A boto3 bucket resource
bucket = boto3.Session().resource('s3').Bucket('my-bucket')

# An S3VFS for that bucket
s3vfs = sqlite_s3vfs.S3VFS(bucket=bucket)

# sqlite-s3vfs stores many objects under this prefix
# Note that it's not typical to start a key prefix with '/'
key_prefix = 'my/path/cool.sqlite'

# Connect, insert data, and query
with apsw.Connection(key_prefix, vfs=s3vfs.name) as db:
    cursor = db.cursor()
    cursor.execute(f'''
        CREATE TABLE foo(x,y);
        INSERT INTO foo VALUES(1,2);
    ''')
    cursor.execute('SELECT * FROM foo;')
    print(cursor.fetchall())

See the APSW documentation for more examples.

Serializing (getting a regular SQLite file out of the VFS)

The bytes corresponding to a regular SQLite file can be extracted with the serialize_iter function, which returns an iterable,

for chunk in s3vfs.serialize_iter(key_prefix=key_prefix):
    print(chunk)

or with serialize_fileobj, which returns a non-seekable file-like object. This can be passed to Boto3's upload_fileobj method to upload a regular SQLite file to S3.

target_obj = boto3.Session().resource('s3').Bucket('my-target-bucket').Object('target/cool.sqlite')
target_obj.upload_fileobj(s3vfs.serialize_fileobj(key_prefix=key_prefix))

Deserializing (getting a regular SQLite file into the VFS)

# Any iterable that yields bytes can be used. In this example, bytes come from
# a regular SQLite file already in S3
source_obj = boto3.Session().resource('s3').Bucket('my-source-bucket').Object('source/cool.sqlite')
bytes_iter = source_obj.get()['Body'].iter_chunks()

s3vfs.deserialize_iter(key_prefix='my/path/cool.sqlite', bytes_iter=bytes_iter)

Tests

The tests require the dev dependencies and APSW to installed, and MinIO started

pip install -r requirements-dev.txt
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--all --global-option=build --global-option=--enable-all-extensions
./start-minio.sh

can be run with pytest

pytest

and finally Minio stopped

./stop-minio.sh

Python virtual filesystem for SQLite to read from and write to S3

Related tags

Overview

sqlite-s3vfs

How does it work?

Installation

Usage

Serializing (getting a regular SQLite file out of the VFS)

Deserializing (getting a regular SQLite file into the VFS)

Tests

Owner

Department for International Trade

Transforme rapidamente seu arquivo CSV (de qualquer tamanho) para SQL de forma rápida.

useful files for the Freenove Big Hexapod

Various technical documentation, in electronically parseable format

A simple Python code that takes input from a csv file and makes it into a vcf file.

This is a file deletion program that asks you for an extension of a file (.mp3, .pdf, .docx, etc.) to delete all of the files in a dir that have that extension.

Two scripts help you to convert csv file to md file by template

Simple addon to create folder structures in blender.

BREP : Binary Search in plaintext and gzip files

Search for files under the specified directory. Extract the file name and file path and import them as data.

Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files

A python wrapper for libmagic

Copy only text-like files from the folder

fast change directory with python and ruby

Nmap XML output to CSV and HTTP/HTTPS URLS.

Pti-file-format - Reverse engineering the Polyend Tracker instrument file format

Ini adalah program python untuk mengubah background foto dalam 1 folder, tidak perlu satu satu

ZipFly is a zip archive generator based on zipfile.py

Python module that parse power builder file (PBD) and analyze code

A simple bulk file renamer, written in python.

A simple tool to find and replace all the matches of a regular expression in file(s).