A Python DB-API and SQLAlchemy dialect to Google Spreasheets

Overview

Build Status codecov

Note: shillelagh is a drop-in replacement for gsheets-db-api, with many additional features. You should use it instead. If you're using SQLAlchemy all you need to do:

$ pip uninstall gsheetsdb
$ pip install shillelagh

If you're using the DB API:

# from gsheetsdb import connect
from shillelagh.backends.apsw.db import connect

A Python DB API 2.0 for Google Spreadsheets

This module allows you to query Google Spreadsheets using SQL.

Using this spreadsheet as an example:

A B
1 country cnt
2 BR 1
3 BR 3
4 IN 5

Here's a simple query using the Python API:

from gsheetsdb import connect

conn = connect()
result = conn.execute("""
    SELECT
        country
      , SUM(cnt)
    FROM
        "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
    GROUP BY
        country
""", headers=1)
for row in result:
    print(row)

This will print:

Row(country='BR', sum_cnt=4.0)
Row(country='IN', sum_cnt=5.0)

How it works

Transpiling

Google spreadsheets can actually be queried with a very limited SQL API. This module will transpile the SQL query into a simpler query that the API understands. Eg, the query above would be translated to:

SELECT A, SUM(B) GROUP BY A

Processors

In addition to transpiling, this module also provides pre- and post-processors. The pre-processors add more columns to the query, and the post-processors build the actual result from those extra columns. Eg, COUNT(*) is not supported, so the following query:

SELECT COUNT(*) FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"

Gets translated to:

SELECT COUNT(A), COUNT(B)

And then the maximum count is returned. This assumes that at least one column has no NULLs.

SQLite

When a query can't be expressed, the module will issue a SELECT *, load the data into an in-memory SQLite table, and execute the query in SQLite. This is obviously inneficient, since all data has to be downloaded, but ensures that all queries succeed.

Installation

$ pip install gsheetsdb
$ pip install gsheetsdb[cli]         # if you want to use the CLI
$ pip install gsheetsdb[sqlalchemy]  # if you want to use it with SQLAlchemy

CLI

The module will install an executable called gsheetsdb:

$ gsheetsdb --headers=1
> SELECT * FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
country      cnt
---------  -----
BR             1
BR             3
IN             5
> SELECT country, SUM(cnt) FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1
pscv8ZXPtg8/" GROUP BY country
country      sum cnt
---------  ---------
BR                 4
IN                 5
>

SQLAlchemy support

This module provides a SQLAlchemy dialect. You don't need to specify a URL, since the spreadsheet is extracted from the FROM clause:

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine('gsheets://')
inspector = inspect(engine)

table = Table(
    'https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0',
    MetaData(bind=engine),
    autoload=True)
query = select([func.count(table.columns.country)], from_obj=table)
print(query.scalar())  # prints 3.0

Alternatively, you can initialize the engine with a "catalog". The catalog is a Google spreadsheet where each row points to another Google spreadsheet, with URL, number of headers and schema as the columns. You can see an example here:

A B C
1 https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0 1 default
2 https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=1077884006 2 default

This will make the two spreadsheets above available as "tables" in the default schema.

Authentication

You can access spreadsheets that are shared only within an organization. In order to do this, first create a service account. Make sure you select "Enable G Suite Domain-wide Delegation". Download the key as a JSON file.

Next, you need to manage API client access at https://admin.google.com/${DOMAIN}/AdminHome?chromeless=1#OGX:ManageOauthClients. Add the "Unique ID" from the previous step as the "Client Name", and add https://spreadsheets.google.com/feeds as the scope.

Now, when creating the connection from the DB API or from SQLAlchemy you can point to the JSON file and the user you want to impersonate:

>>> auth = {'service_account_file': '/path/to/certificate.json', 'subject': '[email protected]'}
>>> conn = connect(auth)
Owner
Beto Dealmeida
Writing open source software since 2003.
Beto Dealmeida
Monty, Mongo tinified. MongoDB implemented in Python !

Monty, Mongo tinified. MongoDB implemented in Python ! Inspired by TinyDB and it's extension TinyMongo. MontyDB is: A tiny version of MongoDB, against

David Lai 522 Jan 01, 2023
sync/async MongoDB ODM, yes.

μMongo: sync/async ODM μMongo is a Python MongoDB ODM. It inception comes from two needs: the lack of async ODM and the difficulty to do document (un)

Scille 428 Dec 29, 2022
MinIO Client SDK for Python

MinIO Python SDK for Amazon S3 Compatible Cloud Storage MinIO Python SDK is Simple Storage Service (aka S3) client to perform bucket and object operat

High Performance, Kubernetes Native Object Storage 582 Dec 28, 2022
Application which allows you to make PostgreSQL databases with Python

Automate PostgreSQL Databases with Python Application which allows you to make PostgreSQL databases with Python I used the psycopg2 library which is u

Marc-Alistair Coffi 0 Dec 31, 2021
aioodbc - is a library for accessing a ODBC databases from the asyncio

aioodbc aioodbc is a Python 3.5+ module that makes it possible to access ODBC databases with asyncio. It relies on the awesome pyodbc library and pres

aio-libs 253 Dec 31, 2022
A collection of awesome sqlite tools, scripts, books, etc

Awesome Series @ Planet Open Data World (Countries, Cities, Codes, ...) • Football (Clubs, Players, Stadiums, ...) • SQLite (Tools, Books, Schemas, ..

Planet Open Data 205 Dec 16, 2022
Async ODM (Object Document Mapper) for MongoDB based on python type hints

ODMantic Documentation: https://art049.github.io/odmantic/ Asynchronous ODM(Object Document Mapper) for MongoDB based on standard python type hints. I

Arthur Pastel 732 Dec 31, 2022
MariaDB connector using python and flask

MariaDB connector using python and flask This should work with flask and to be deployed on docker. Setting up stuff 1. Docker build and run docker bui

Bayangmbe Mounmo 1 Jan 11, 2022
asyncio compatible driver for elasticsearch

asyncio client library for elasticsearch aioes is a asyncio compatible library for working with Elasticsearch The project is abandoned aioes is not su

97 Sep 05, 2022
The Database Toolkit for Python

SQLAlchemy The Python SQL Toolkit and Object Relational Mapper Introduction SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that giv

SQLAlchemy 6.5k Jan 01, 2023
Prometheus instrumentation library for Python applications

Prometheus Python Client The official Python 2 and 3 client for Prometheus. Three Step Demo One: Install the client: pip install prometheus-client Tw

Prometheus 3.2k Jan 07, 2023
A Relational Database Management System for a miniature version of Twitter written in MySQL with CLI in python.

Mini-Twitter-Database This was done as a database design course project at Amirkabir university of technology. This is a relational database managemen

Ali 12 Nov 23, 2022
Create a database, insert data and easily select it with Sqlite

sqliteBasics create a database, insert data and easily select it with Sqlite Watch on YouTube a step by step tutorial explaining this code: https://yo

Mariya 27 Dec 27, 2022
Pandas Google BigQuery

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

Python for Data 345 Dec 28, 2022
Lazydata: Scalable data dependencies for Python projects

lazydata: scalable data dependencies lazydata is a minimalist library for including data dependencies into Python projects. Problem: Keeping all data

629 Nov 21, 2022
Redis Python Client - The Python interface to the Redis key-value store.

redis-py The Python interface to the Redis key-value store. Installation | Contributing | Getting Started | Connecting To Redis Installation redis-py

Redis 11k Jan 08, 2023
Pure Python MySQL Client

PyMySQL Table of Contents Requirements Installation Documentation Example Resources License This package contains a pure-Python MySQL client library,

PyMySQL 7.2k Jan 09, 2023
Official Python low-level client for Elasticsearch

Python Elasticsearch Client Official low-level client for Elasticsearch. Its goal is to provide common ground for all Elasticsearch-related code in Py

elastic 3.8k Jan 01, 2023
Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

Py2neo v3 Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The core library ha

64 Oct 14, 2022
Dinamopy is a python helper library for dynamodb

Dinamopy is a python helper library for dynamodb. You can define your access patterns in a json file and can use dynamic method names to make operations.

Rasim Andıran 2 Jul 18, 2022