A Python DB-API and SQLAlchemy dialect to Google Spreasheets

Overview

Build Status codecov

Note: shillelagh is a drop-in replacement for gsheets-db-api, with many additional features. You should use it instead. If you're using SQLAlchemy all you need to do:

$ pip uninstall gsheetsdb
$ pip install shillelagh

If you're using the DB API:

# from gsheetsdb import connect
from shillelagh.backends.apsw.db import connect

A Python DB API 2.0 for Google Spreadsheets

This module allows you to query Google Spreadsheets using SQL.

Using this spreadsheet as an example:

A B
1 country cnt
2 BR 1
3 BR 3
4 IN 5

Here's a simple query using the Python API:

from gsheetsdb import connect

conn = connect()
result = conn.execute("""
    SELECT
        country
      , SUM(cnt)
    FROM
        "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
    GROUP BY
        country
""", headers=1)
for row in result:
    print(row)

This will print:

Row(country='BR', sum_cnt=4.0)
Row(country='IN', sum_cnt=5.0)

How it works

Transpiling

Google spreadsheets can actually be queried with a very limited SQL API. This module will transpile the SQL query into a simpler query that the API understands. Eg, the query above would be translated to:

SELECT A, SUM(B) GROUP BY A

Processors

In addition to transpiling, this module also provides pre- and post-processors. The pre-processors add more columns to the query, and the post-processors build the actual result from those extra columns. Eg, COUNT(*) is not supported, so the following query:

SELECT COUNT(*) FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"

Gets translated to:

SELECT COUNT(A), COUNT(B)

And then the maximum count is returned. This assumes that at least one column has no NULLs.

SQLite

When a query can't be expressed, the module will issue a SELECT *, load the data into an in-memory SQLite table, and execute the query in SQLite. This is obviously inneficient, since all data has to be downloaded, but ensures that all queries succeed.

Installation

$ pip install gsheetsdb
$ pip install gsheetsdb[cli]         # if you want to use the CLI
$ pip install gsheetsdb[sqlalchemy]  # if you want to use it with SQLAlchemy

CLI

The module will install an executable called gsheetsdb:

$ gsheetsdb --headers=1
> SELECT * FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/"
country      cnt
---------  -----
BR             1
BR             3
IN             5
> SELECT country, SUM(cnt) FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1
pscv8ZXPtg8/" GROUP BY country
country      sum cnt
---------  ---------
BR                 4
IN                 5
>

SQLAlchemy support

This module provides a SQLAlchemy dialect. You don't need to specify a URL, since the spreadsheet is extracted from the FROM clause:

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine('gsheets://')
inspector = inspect(engine)

table = Table(
    'https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0',
    MetaData(bind=engine),
    autoload=True)
query = select([func.count(table.columns.country)], from_obj=table)
print(query.scalar())  # prints 3.0

Alternatively, you can initialize the engine with a "catalog". The catalog is a Google spreadsheet where each row points to another Google spreadsheet, with URL, number of headers and schema as the columns. You can see an example here:

A B C
1 https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0 1 default
2 https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=1077884006 2 default

This will make the two spreadsheets above available as "tables" in the default schema.

Authentication

You can access spreadsheets that are shared only within an organization. In order to do this, first create a service account. Make sure you select "Enable G Suite Domain-wide Delegation". Download the key as a JSON file.

Next, you need to manage API client access at https://admin.google.com/${DOMAIN}/AdminHome?chromeless=1#OGX:ManageOauthClients. Add the "Unique ID" from the previous step as the "Client Name", and add https://spreadsheets.google.com/feeds as the scope.

Now, when creating the connection from the DB API or from SQLAlchemy you can point to the JSON file and the user you want to impersonate:

>>> auth = {'service_account_file': '/path/to/certificate.json', 'subject': '[email protected]'}
>>> conn = connect(auth)
Owner
Beto Dealmeida
Writing open source software since 2003.
Beto Dealmeida
Redis Python Client

redis-py The Python interface to the Redis key-value store. Python 2 Compatibility Note redis-py 3.5.x will be the last version of redis-py that suppo

Andy McCurdy 11k Dec 29, 2022
A Python-based RPC-like toolkit for interfacing with QuestDB.

pykit A Python-based RPC-like toolkit for interfacing with QuestDB. Requirements Python 3.9 Java Azul

QuestDB 11 Aug 03, 2022
Simple Python demo app that connects to an Oracle DB.

Cloud Foundry Sample Python Application Connecting to Oracle Simple Python demo app that connects to an Oracle DB. The app is based on the example pro

Daniel Buchko 1 Jan 10, 2022
Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

Py2neo Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The library supports b

Nigel Small 1.2k Jan 02, 2023
A framework based on tornado for easier development, scaling up and maintenance

turbo 中文文档 Turbo is a framework for fast building web site and RESTFul api, based on tornado. Easily scale up and maintain Rapid development for RESTF

133 Dec 06, 2022
asyncio (PEP 3156) Redis support

aioredis asyncio (PEP 3156) Redis client library. Features hiredis parser Yes Pure-python parser Yes Low-level & High-level APIs Yes Connections Pool

aio-libs 2.2k Jan 04, 2023
Dlsite-doujin-renamer - Dlsite doujin renamer tool with python

dlsite-doujin-renamer Features 支持深度查找带有 RJ 号的文件夹 支持手动选择文件夹或拖拽文件夹到软件窗口 支持在 config

111 Jan 02, 2023
Prometheus instrumentation library for Python applications

Prometheus Python Client The official Python 2 and 3 client for Prometheus. Three Step Demo One: Install the client: pip install prometheus-client Tw

Prometheus 3.2k Jan 07, 2023
Simplest SQL mapper in Python, probably

SQL MAPPER Basically what it does is: it executes some SQL thru a database connector you fed it, maps it to some model and gives to u. Also it can cre

2 Nov 07, 2022
AWS SDK for Python

Boto3 - The AWS SDK for Python Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to wri

the boto project 7.8k Jan 04, 2023
Lazydata: Scalable data dependencies for Python projects

lazydata: scalable data dependencies lazydata is a minimalist library for including data dependencies into Python projects. Problem: Keeping all data

629 Nov 21, 2022
#crypto #cipher #encode #decode #hash

🌹 CYPHER TOOLS 🌹 Written by TMRSWRR Version 1.0.0 All in one tools for CRYPTOLOGY. Instagram: Capture the Root 🖼️ Screenshots 🖼️ 📹 How to use 📹

50 Dec 23, 2022
Query multiple mongoDB database collections easily

leakscoop Perform queries across multiple MongoDB databases and collections, where the field names and the field content structure in each database ma

bagel 5 Jun 24, 2021
pandas-gbq is a package providing an interface to the Google BigQuery API from pandas

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

Google APIs 348 Jan 03, 2023
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

dbd: database prototyping tool dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL d

Zdenek Svoboda 47 Dec 07, 2022
SQL for Humans™

Records: SQL for Humans™ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Kenneth Reitz 6.9k Jan 07, 2023
A fast PostgreSQL Database Client Library for Python/asyncio.

asyncpg -- A fast PostgreSQL Database Client Library for Python/asyncio asyncpg is a database interface library designed specifically for PostgreSQL a

magicstack 5.8k Dec 31, 2022
Py2neo is a client library and toolkit for working with Neo4j from within Python

Py2neo Py2neo is a client library and toolkit for working with Neo4j from within Python applications. The library supports both Bolt and HTTP and prov

py2neo.org 1.2k Jan 02, 2023
A collection of awesome sqlite tools, scripts, books, etc

Awesome Series @ Planet Open Data World (Countries, Cities, Codes, ...) • Football (Clubs, Players, Stadiums, ...) • SQLite (Tools, Books, Schemas, ..

Planet Open Data 205 Dec 16, 2022
A simple password manager I typed with python using MongoDB .

Python with MongoDB A simple python code example using MongoDB. How do i run this code • First of all you need to have a python on your computer. If y

31 Dec 06, 2022