SpyQL - SQL with Python in the middle

Overview

SpyQL

SQL with Python in the middle

https://pypi.python.org/pypi/spyql https://travis-ci.com/dcmoura/spyql https://spyql.readthedocs.io/en/latest/?version=latest codecov code style: black license: MIT

Concept

SpyQL is a query language that combines:

  • the simplicity and structure of SQL
  • with the power and readability of Python
SELECT
    date.fromtimestamp(purchase_ts) AS purchase_date,
    price * quantity AS total
FROM csv
WHERE department.upper() == 'IT'
TO json

SQL provides the structure of the query, while Python is used to define expressions, bringing along a vast ecosystem of packages.

SpyQL command-line tool

With the SpyQL command-line tool you can make SQL-like SELECTs powered by Python on top of text data (e.g. CSV and JSON). Data can come from files but also from data streams, such as as Kafka, or from databases such as PostgreSQL. Basically, data can come from any command that outputs text :-). More, data can be generated by a Python iterator! Take a look at the examples section to see how to query parquet, process API calls, transverse directories of zipped JSONs, among many other things.

SpyQL also allows you to easily convert between text data formats:

  • FROM: CSV, JSON, TEXT and Python iterators (YES, you can use a list comprehension as the data source)

  • TO: CSV, JSON, SQL (INSERT statements), pretty terminal printing, and terminal plotting.

The JSON format is JSON lines, where each line has a valid JSON object or array. Piping with jq allows SpyQL to handle any JSON input (more on the examples section).

You can leverage command line tools to process other file types like Parquet and XML (more on the examples section).

Installation

To install SpyQL, run this command in your terminal:

pip install spyql

Hello world

To test your installation run in the terminal:

spyql "SELECT 'Hello world' as Message TO pretty"

Output:

Message
-----------
Hello world

Try replacing the output format by json and csv, and try adding more columns. e.g. run in the terminal:

spyql "SELECT 'Hello world' as message, 1+2 as three TO json"

Output:

{"message": "Hello world", "three": 3}

Principles

Right now, the focus is on building a command-line tool that follows these core principles:

  • Simple: simple to use with a straightforward implementation
  • Familiar: you should feel at home if you are acquainted with SQL and Python
  • Light: small memory footprint that allows you to process large data that fit into your machine
  • Useful: it should make your life easier, filling a gap in the eco-system

Syntax

[ IMPORT python_module [ AS identifier ] [, ...] ]
SELECT [ DISTINCT | PARTIALS ] 
    [ * | python_expression [ AS output_column_name ] [, ...] ]
    [ FROM csv | spy | text | python_expression | json [ EXPLODE path ] ]
    [ WHERE python_expression ]
    [ GROUP BY output_column_number | python_expression  [, ...] ]
    [ ORDER BY output_column_number | python_expression
        [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
    [ LIMIT row_count ]
    [ OFFSET num_rows_to_skip ]
    [ TO csv | json | spy | sql | pretty | plot ]

Notable differences to SQL

In SpyQL:

  • there is guarantee that the order of the output rows is the same as in the input (if no reordering is done)
  • the AS keyword must precede a column alias definition (it is not optional as in SQL)
  • you can always access the nth input column by using the default column names colN (e.g. col1 for the first column)
  • currently only a small subset of SQL is supported, namely SELECT statements without: sub-queries, joins, set operations, etc (check the Syntax section)
  • sub-queries are achieved by piping (see the [Command line examples](#command line examples) section)
  • aggregation functions have the suffix _agg to avoid conflicts with python's built-in functions:
Operation PostgreSQL SpyQL
Sum all values of a column SELECT sum(col_name) SELECT sum_agg(col_name)
Sum an array SELECT sum(a) FROM (SELECT unnest(array[1,2,3]) AS a) AS t SELECT sum([1,2,3])
  • expressions are pure Python:
SQL SpySQL
x = y x == y
x BETWEEN a AND b a <= x <= b
CAST(x AS INTEGER) int(x)
CASE WHEN x > 0 THEN 1 ELSE -1 END 1 if x > 0 else -1
upper('hello') 'hello'.upper()

Notable differences to Python

Additional syntax

We added additional syntax for making querying easier:

Python SpySQL shortcut Purpose
json['hello']['planet earth'] json->hello->'planet earth' Easy access of elements in dicts (e.g. JSONs)

NULL datatype

Python's None generates exceptions when making operations on missing data, breaking query execution (e.g. None + 1 throws a TypeError). To overcome this, we created a NULL type that has the same behavior as in SQL (e.g. NULL + 1 returns NULL), allowing for queries to continue processing data.

Operation Native Python throws SpySQL returns SpySQL warning
NULL + 1 NameError NULL
a_dict['inexistent_key'] KeyError NULL yes
int('') ValueError NULL yes
int('abc') ValueError NULL yes

The above dictionary key access only returns NULL if the dict is an instance of NullSafeDict. SpyQL adds NullSafeDict, which extends python's native dict. JSONs are automatically loaded as NullSafeDict. Unless you are creating dictionaries on the fly you do not need to worry about this.

Importing python modules and user-defined functions

By default, spyql do some commonly used imports:

  • everything from the math module
  • datetime, date and timezone from the datetime module
  • the re module

SpyQL queries support a single import statement at the beginning of the query where several modules can be imported (e.g. IMPORT numpy AS np, sys SELECT ...). Note that the python syntax from module import identifier is not supported in queries.

In addition, you can create a python file that is loaded before executing queries. Here you can define imports, functions, variables, etc using regular python code. Everything defined in this file is available to all your spyql queries. The file should be located at XDG_CONFIG_HOME/spyql/init.py. If the environment variable XDG_CONFIG_HOME is not defined, it defaults to HOME/.config (e.g. /Users/janedoe/.config/spyql/init.py).

Example queries

You can run the following example queries in the terminal: spyql "the_query" < a_data_file

Example data files are not provided on most cases.

Query a CSV (and print a pretty table)

SELECT a_col_name, 'positive' if col2 >= 0 else 'negative' AS sign
FROM csv
TO pretty

Convert CSV to a flat JSON

SELECT * FROM csv TO json

Convert from CSV to a hierarchical JSON

SELECT {'client': {'id': col1, 'name': col2}, 'price': 120.40}
FROM csv TO json

or

SELECT {'id': col1, 'name': col2} AS client, 120.40 AS price
FROM csv TO json

JSON to CSV, filtering out NULLs

SELECT json->client->id AS id, json->client->name AS name, json->price AS price
FROM json
WHERE json->client->name is not NULL
TO csv

Explode JSON to CSV

SELECT json->invoice_num AS id, json->items->name AS name, json-items->price AS price
FROM json
EXPLODE json->items
TO csv

Sample input:

{"invoice_num" : 1028, "items": [{"name": "tomatoes", "price": 1.5}, {"name": "bananas", "price": 2.0}]}
{"invoice_num" : 1029, "items": [{"name": "peaches", "price": 3.12}]}

Output:

id, name, price
1028, tomatoes, 1.5
1028, bananas, 2.0
1029, peaches, 3.12

Python iterator/list/comprehension to JSON

SELECT 10 * cos(col1 * ((pi * 4) / 90)
FROM range(80)
TO json

or

SELECT col1
FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
TO json

Importing python modules

Here we import hashlib to calculate a md5 hash for each input line. Before running this example you need to install the hashlib package (pip install hashlib).

IMPORT hashlib as hl
SELECT hl.md5(col1.encode('utf-8')).hexdigest()
FROM text

Getting the top 5 records

SELECT int(score) AS score, player_name
FROM csv
ORDER BY 1 DESC NULLS LAST, score_date
LIMIT 5

Aggregations

Totals by player, alphabetically ordered.

SELECT json->player_name, sum_agg(json->score) AS total_score
FROM json
GROUP BY 1
ORDER BY 1

Partial aggregations

Calculating the cumulative sum of a variable using the PARTIALS modifier. Also demoing the lag aggregator.

SELECT PARTIALS 
    json->new_entries, 
    sum_agg(json->new_entries) AS cum_new_entries,
    lag(json->new_entries) AS prev_entries
FROM json
TO json

Sample input:

{"new_entries" : 10}
{"new_entries" : 5}
{"new_entries" : 25}
{"new_entries" : null}
{}
{"new_entries" : 100}

Output:

{"new_entries" : 10,   "cum_new_entries" : 10,  "prev_entries": null}
{"new_entries" : 5,    "cum_new_entries" : 15,  "prev_entries": 10}
{"new_entries" : 25,   "cum_new_entries" : 40,  "prev_entries": 5}
{"new_entries" : null, "cum_new_entries" : 40,  "prev_entries": 25}
{"new_entries" : null, "cum_new_entries" : 40,  "prev_entries": null}
{"new_entries" : 100,  "cum_new_entries" : 140, "prev_entries": null}

If PARTIALSwas omitted the result would be equivalent to the last output row.

Distinct rows

SELECT DISTINCT *
FROM csv

Command line examples

To run the following examples, type Ctrl-x Ctrl-e on you terminal. This will open your default editor (emacs/vim). Paste the code of one of the examples, save and exit.

Queries on Parquet with directories

Here, find transverses a directory and executes parquet-tools for each parquet file, dumping each file to json format. jq -c makes sure that the output has 1 json per line before handing over to spyql. This is far from being an efficient way to query parquet files, but it might be a handy option if you need to do a quick inspection.

a_field, json->a_num_field * 2 + 1 FROM json "">
find /the/directory -name "*.parquet" -exec parquet-tools cat --json {} \; |
jq -c |
spyql "
	SELECT json->a_field, json->a_num_field * 2 + 1
	FROM json
"

Querying multiple json.gz files

a_field, json->a_num_field * 2 + 1 FROM json "">
gzcat *.json.gz |
jq -c |
spyql "
	SELECT json->a_field, json->a_num_field * 2 + 1
	FROM json
"

Querying YAML / XML / TOML files

yq converts yaml, xml and toml files to json, allowing to easily query any of these with spyql.

a_field FROM json"">
cat file.yaml | yq -c | spyql "SELECT json->a_field FROM json"
a_field FROM json"">
cat file.xml | xq -c | spyql "SELECT json->a_field FROM json"
a_field FROM json"">
cat file.toml | tomlq -c | spyql "SELECT json->a_field FROM json"

Kafka to PostegreSQL pipeline

Read data from a kafka topic and write to postgres table name customer.

customer->id AS id, json->customer->name AS name FROM json TO sql " | psql -U an_user_name -h a.host.com a_database_name">
kafkacat -b the.broker.com -t the.topic |
spyql -Otable=customer -Ochunk_size=1 --unbuffered "
	SELECT
		json->customer->id AS id,
		json->customer->name AS name
	FROM json
	TO sql
" |
psql -U an_user_name -h a.host.com a_database_name

Monitoring statistics in Kafka

Read data from a kafka topic, continuously calculating statistics.

kafkacat -b the.broker.com -t the.topic |
spyql --unbuffered "
	SELECT PARTIALS
        count_agg(*) AS running_count,
		sum_agg(value) AS running_sum,
		min_agg(value) AS min_so_far, 
        value AS current_value
	FROM json
	TO csv
" 

Sub-queries (piping)

A special file format (spy) is used to efficiently pipe data between queries.

first_name, json->middle_name, json->last_name]) AS full_name FROM json TO spy" | spyql "SELECT full_name, full_name.upper() FROM spy"">
cat a_file.json |
spyql "
	SELECT ' '.join([json->first_name, json->middle_name, json->last_name]) AS full_name
	FROM json
	TO spy" |
spyql "SELECT full_name, full_name.upper() FROM spy"

Queries over APIs

data->email AS email, 'Dear {}, thank you for being a great customer!'.format(json->data->first_name) AS msg FROM json EXPLODE json->data TO json "">
curl https://reqres.in/api/users?page=2 |
spyql "
	SELECT
		json->data->email AS email,
		'Dear {}, thank you for being a great customer!'.format(json->data->first_name) AS msg
	FROM json
	EXPLODE json->data
	TO json
"

Plotting to the terminal

spyql "
    SELECT col1
    FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
    TO plot
"

Plotting with gnuplot

To the terminal:

spyql "
    SELECT col1
    FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
    TO csv
" |
sed 1d |
feedgnuplot --terminal 'dumb 80,30' --exit --lines

To GUI:

spyql "
    SELECT col1
    FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
    TO csv
" |
sed 1d |
feedgnuplot --lines --points --exit

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Comments
  • Interactive SpyQL

    Interactive SpyQL

    This is v0 for interactive spyql. It implements the following new things:

    1. Q class and q function in interactive.py: This is the class/function that the user can add in their script
    2. InteractiveProcessor in processor.py: row yielding loaded by the Processor
    3. InteractiveWriter in processor.py: writer that adds the output in a list

    Other Important changes:

    1. init_vars, go and _go now have user_query_vars as a valid input
    2. Processor can now be explicitly defined as interactive

    Misc:

    1. ifs changed to elifs where needed

    Issue: #64

    To test this python tests/interactive_test.py.

    opened by yashbonde 12
  • Refactor `parse_structure` and allow comment in query

    Refactor `parse_structure` and allow comment in query

    The purposes of this PR is

    • Refactor parse_structure function (cf. https://github.com/dcmoura/spyql/issues/85#issue-1459257843)
    • Support for line comments (cf. https://github.com/dcmoura/spyql/issues/84)
    enhancement 
    opened by Hayashi-Yudai 9
  • Key-value aggregations and lookups

    Key-value aggregations and lookups

    This PR closes #59.

    First, it introduces dict_agg for key-value aggregations. Example:

    $ cat codes.csv
    MCC,MCC (int),MNC,MNC (int),ISO,Country,Country Code,Network
    289,649,88,2191,ge,Abkhazia,7,A-Mobile
    289,649,68,1679,ge,Abkhazia,7,A-Mobile
    289,649,67,1663,ge,Abkhazia,7,Aquafon
    412,1042,01,31,af,Afghanistan,93,AWCC
    412,1042,50,1295,af,Afghanistan,93,Etisalat
    412,1042,30,783,af,Afghanistan,93,Etisalat
    452,1106,04,79,vn,Vietnam,84,Viettel
    452,1106,02,47,vn,Vietnam,84,VinaPhone
    543,1347,299,665,wf,Wallis and Futuna,,Failed Calls
    543,1347,01,31,wf,Wallis and Futuna,,Manuia
    421,1057,999,2457,ye,Yemen,967,Fix Line
    421,1057,04,79,ye,Yemen,967,HITS/Y Unitel
    421,1057,01,31,ye,Yemen,967,Sabaphone
    421,1057,03,63,ye,Yemen,967,Yemen Mob. CDMA
    645,1605,01,31,zm,Zambia,260,Airtel
    645,1605,299,665,zm,Zambia,260,Failed Calls
    
    $ spyql "
        SELECT dict_agg(MCC, Country) AS json 
        FROM csv TO json
      " < codes.csv > code_country.json
    
    $ jq . code_country.json                                        
    {
      "289": "Abkhazia",
      "412": "Afghanistan",
      "452": "Vietnam",
      "543": "Wallis and Futuna",
      "421": "Yemen",
      "645": "Zambia"
    }
    

    Second, it introduces the ~~kv function~~ -J option to allow loading JSON objects for key-value lookups, emulating left equi joins. Example:

    $ cat towers_mini.csv 
    radio,mcc,net,area,cell,unit,lon,lat,range,samples,changeable,created,updated,averageSignal
    UMTS,262,2,776,165186,0,13.329163,52.46521,1000,2,1,1286645042,1295769656,0
    GSM,262,3,1075,5651,0,13.329163,52.46521,1000,2,1,1286645042,1295769656,0
    UMTS,262,2,801,165123,0,13.295516967773,52.494735717773,1000,1,1,1286864565,1286864565,0
    UMTS,262,2,801,165121,0,13.301697,52.499886,1000,4,1,1286884439,1297735807,0
    GSM,624,2,6,13883,0,9.704361,4.063911,1588,8,1,1352031870,1374212876,0
    GSM,624,2,6,34381,0,9.709009,4.066368,2484,13,1,1352031870,1380389330,0
    GSM,452,1,10068,5293,0,105.8031463623,20.959854125977,1000,2,1,1459692342,1488347104,0
    GSM,645,3,130,3282,0,28.070755004883,-14.902267456055,1000,1,1,1459743588,1459743588,0
    GSM,645,3,1,32221,0,28.252716,-15.439224,1000,2,1,1369353852,1369353852,0
    GSM,645,3,1,33932,0,28.255325,-15.436752,1000,5,1,1369353888,1369353888,0
    GSM,645,3,1,31662,0,28.258072,-15.434555,1000,5,1,1369353960,1369353960,0
    
    $ spyql -Jcountries=code_country.json "SELECT mcc, countries[mcc] AS country, count_agg(1) AS n_recs FROM csv GROUP BY 1, 2 ORDER BY 1 TO pretty" < towers_mini.csv
      mcc  country      n_recs
    -----  ---------  --------
      262                    4
      452  Vietnam           1
      624                    2
      645  Zambia            4
    
    opened by dcmoura 7
  • Code improvements

    Code improvements

    Some cases where the code could be more pythonic:

    • some for cycles where range(len(l)) might be replaced by enumerate
    • some functions where yeld might be used to return a generator

    Changes should be accessed for performance.

    enhancement good first issue 
    opened by dcmoura 6
  • code improvement for issue #60

    code improvement for issue #60

    Improve the code.

    TODOs

    • [x] Fix code to more pythonic style
      • Fix import style
      • Remove some unused imports
    • [x] Fix error in lint by flake8

    Issue: https://github.com/dcmoura/spyql/issues/60

    opened by Hayashi-Yudai 5
  • LIKE clause

    LIKE clause

    Adds a LIKE function to the WHERE clause for easier matching. Closes #17

    Some notes

    • Right now it can match not only with strings but also with everything else, as the values are stringified before matching. Let me know if this is not intended and if it should be for strings only.
    • It only supports the % wildcard operator.
    • Let me know if the commit messages should follow a specific format

    Future work

    Either in the scope of this PR or in a future one, it is still missing other basic matching functions, like _ for single character matching or [] to specify a range.

    enhancement 
    opened by ricardocchaves 4
  • Fixes git-actions tests on py3.6 and adds py3.11

    Fixes git-actions tests on py3.6 and adds py3.11

    Python 3.6 env stopped launching. Seems to be related to an upgrade on ubuntu version and support for py3.6 was not included in the new version.

    Solution based on https://github.com/actions/setup-python/issues/543#issuecomment-1335614916

    opened by dcmoura 3
  • [FR] COUNT(DISTINCT col1) ... GROUP BY

    [FR] COUNT(DISTINCT col1) ... GROUP BY

    Thank you for the great tool!

    I'd like to request feature which works like COUNT(DISTINCT col1) ... GROUP BY of SQL. I tried count_agg(distinct col1), but got syntax error.

    opened by Minyus 3
  • Full-featured documentation

    Full-featured documentation

    This PR proposes a brand new documentation. This PR closes #34 .

    Please ignore the commits... the history of this PR is quite disturbing 😅 Everything will be squashed into a single commit when merging to master.

    Please focus on 1) the structure and then 2) the content of:

    • https://github.com/dcmoura/spyql/blob/docs/README.rst
    • https://spyql.readthedocs.io/en/docs/

    I am proposing deprecating the -> operator. It will still be supported but is not mentioned in the documentation. From now on, the . (dot) operator should be the standard way of accessing fields in dicts, followed by standard Python syntax (a_dict["key"]). Major drawback is that we will need to do a new video demoing the CLI and update some posts (e.g. stack overflow).

    There is still some work to do on the lib side. Namely, we should add a section under Examples/Recipes. However, there is still some dev work to do, namely on accessing Pandas dataframes and series, as well as numpy arrays. Eventually we can also visit polars and other libs for tabular data. When we do that, we can extend the documentation.

    In this first review let's try to focus on the big fish, and not so much on the details. My goal is to publish the documentation soon, and then we can iterate on it over time. Thanks!

    documentation 
    opened by dcmoura 3
  • Column names with TEXT format

    Column names with TEXT format

    When using TEXT as an import format, I couldn't find a way of configuring the column name. The column name seems to be evaluated as long as we don't specify * in the SELECT statement which we have to do for TEXT files. Am I missing something? Here's an example:

    cat /tmp/test.txt | spyql -Otable=domains 'Select * as colname from text To SQL'
    INSERT INTO "domains"("col1") VALUES ('aaaaaaaaaa'),('bbbbbbbbb'),('cccccccccc'),('dddddddddd'),('eeeeeeeeee');
    
    cat /tmp/test.txt | spyql -Otable=domains 'Select 1 as colname from text To SQL' 
    INSERT INTO "domains"("colname") VALUES (1),(1),(1),(1),(1);
    

    Expected result would be:

    cat /tmp/test.txt | spyql -Otable=domains 'Select * as colname from text To SQL'
    INSERT INTO "domains"("colname") VALUES ('aaaaaaaaaa'),('bbbbbbbbb'),('cccccccccc'),('dddddddddd'),('eeeeeeeeee');
    
    question 
    opened by henshin 3
  • Cannot dump JSON object with Null values to string

    Cannot dump JSON object with Null values to string

    Use Case

    I'm using the CLI to import JSON Line log files into a Postgres database. One of the columns I'm trying to import is jsonb, and needs to be inserted as a JSON string.

    Approach

    spyql 'IMPORT json SELECT json.dumps(.a) FROM {"a": {"foo": 1, "bar": 2}} TO sql'
    INSERT INTO "table_name"("dumps_a") VALUES ('{"foo": 1, "bar": 2}');
    

    This works fine until it hits a null value:

    spyql 'IMPORT json SELECT json.dumps(.a) FROM {"a": {"foo": 1, "bar": null}} TO sql'
    ERROR	could not evaluate SELECT expression #1: json.dumps(row.a)
    	at data row #1: [{'a': {'foo': 1, 'bar': NULL}}]
    TypeError: Object of type NullType is not JSON serializable
    

    I also tried passing the default kwarg to json.dumps and hit a different error:

    spyql 'IMPORT json SELECT json.dumps(.a, default=str) FROM {"a": {"foo": 1, "bar": null}} TO sql'
    ERROR	could not evaluate SELECT expression #1: json.dumps(row.a, default=str)
    	at data row #1: [{'a': {'foo': 1, 'bar': NULL}}]
    ValueError: Circular reference detected
    

    Very open to alternative approaches; thank you so much for making this great tool available. :)

    opened by mgburns 2
  • [FR] join 2 CSV files

    [FR] join 2 CSV files

    Hi @dcmoura , I see examples to join a JSON file in the document, would joining 2 CSV files be supported? https://spyql.readthedocs.io/en/latest/recipes.html?highlight=join#equi-joins

    opened by Minyus 6
  • general text file support

    general text file support

    I'm using spyql to handle general text files (single-column, not comma-separated), but got a problem.

    A single-column text file like the following is misinterpreted as multiple columns by spyql.

    test.csv:

    user_id
    a11e11
    b22e22
    
    $ spyql "SELECT * FROM csv('test.csv') TO pretty"
    us      r_id
    ----  ------
    a11       11
    b22       22
    

    If there are no common characters among rows, the following error is returned.

    ERROR   Could not detect CSV dialect from input
    _csv.Error: Could not determine delimiter
    

    I found that explicitly setting a delimiter like csv('test.csv', delimiter=',') works as a workaround in the current version, but I would like to suggest the following options.

    [Option 1] Modify default behavior of csv()

    1. As a delimiter character, search only "," (especially not alphanumeric characters)
    2. If no delimiter is detected (and each row is not too long), read it as a single-column text file rather than returning an error

    [Option 2] Add an alternative to csv() which can read a single-column text file (something like txt())

    opened by Minyus 6
  • Cannot run CI/CD pipe on Python 3.11

    Cannot run CI/CD pipe on Python 3.11

    Tried to add Python 3.11 but broke the build:

    Installing build dependencies: started
    [66](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:67)
      Installing build dependencies: finished with status 'done'
    [67](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:68)
      Getting requirements to build wheel: started
    [68](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:69)
      Getting requirements to build wheel: finished with status 'done'
    [69](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:70)
      Preparing metadata (pyproject.toml): started
    [70](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:71)
      Preparing metadata (pyproject.toml): finished with status 'error'
    [71](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:72)
      error: subprocess-exited-with-error
    [72](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:73)
      
    [73](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:74)
      × Preparing metadata (pyproject.toml) did not run successfully.
    [74](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:75)
      │ exit code: 1
    [75](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:76)
      ╰─> [11 lines of output]
    [76](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:77)
          💥 maturin failed
    [77](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:78)
            Caused by: Cargo metadata failed. Does your crate compile with `cargo build`?
    [78](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:79)
            Caused by: `cargo metadata` exited with an error: error: failed to run `rustc` to learn about target-specific information
    [79](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:80)
          
    [80](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:81)
          Caused by:
    [81](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:82)
            process didn't exit successfully: `rustc - --crate-name ___ --print=file-names -Z mutable-noalias=yes -C target-feature=+sse2 --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit status: 1)
    [82](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:83)
            --- stderr
    [83](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:84)
            error: the option `Z` is only accepted on the nightly compiler
    [84](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:85)
          Error running maturin: Command '['maturin', 'pep517', 'write-dist-info', '--metadata-directory', '/tmp/pip-modern-metadata-2ryt4i58', '--interpreter', '/opt/hostedtoolcache/Python/3.11.0/x64/bin/python', '--manylinux=off', '--strip=on']' returned non-zero exit status 1.
    [85](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:86)
          Checking for Rust toolchain....
    [86](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:87)
          Running `maturin pep517 write-dist-info --metadata-directory /tmp/pip-modern-metadata-2ryt4i58 --interpreter /opt/hostedtoolcache/Python/3.11.0/x64/bin/python --manylinux=off --strip=on`
    [87](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:88)
          [end of output]
    [88](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:89)
      
    [89](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:90)
      note: This error originates from a subprocess, and is likely not a problem with pip.
    [90](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:91)
    error: metadata-generation-failed
    [91](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:92)
    
    [92](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:93)
    × Encountered error while generating package metadata.
    [93](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:94)
    ╰─> See above for output.
    [94](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:95)
    
    [95](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:96)
    note: This is an issue with the package mentioned above, not pip.
    [96](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:97)
    hint: See above for details.
    [97](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:98)
    Error: Process completed with exit code 1.
    
    bug good first issue 
    opened by dcmoura 0
  • Support reading and writing of JSON objects (not JSON lines)

    Support reading and writing of JSON objects (not JSON lines)

    Currently SPyQL only allows to read and write JSON lines. Writing JSON arrays can be done using the dict_agg, aggregating everything into an array and writing an JSON with a single line.

    The idea is to add an argument lines=True to json and orjson writers and processors. The processor should be able to handle single object files as well as arrays of objects or arrays of scalars. When lines is False the processor should load the full input into memory and then parse it. While this is not ideal, it is the most straightforward implementation. In addition, arrays of JSON shouldn't be used for large data, in that cases JSON lines should be used instead.

    The writer should write an array of objects when lines is False.

    core 
    opened by dcmoura 1
Releases(v0.9.0)
Owner
Daniel Moura
Daniel Moura
AWS SDK for Python

Boto3 - The AWS SDK for Python Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to wri

the boto project 7.8k Jan 04, 2023
Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.

Simple DDL Parser Build with ply (lex & yacc in python). A lot of samples in 'tests/. Is it Stable? Yes, library already has about 5000+ usage per day

Iuliia Volkova 95 Jan 05, 2023
aiosql - Simple SQL in Python

aiosql - Simple SQL in Python SQL is code. Write it, version control it, comment it, and run it using files. Writing your SQL code in Python programs

Will Vaughn 1.1k Jan 08, 2023
A Redis client library for Twisted Python

txRedis Asynchronous Redis client for Twisted Python. Install Install via pip. Usage examples can be found in the examples/ directory of this reposito

Dorian Raymer 127 Oct 23, 2022
Application which allows you to make PostgreSQL databases with Python

Automate PostgreSQL Databases with Python Application which allows you to make PostgreSQL databases with Python I used the psycopg2 library which is u

Marc-Alistair Coffi 0 Dec 31, 2021
Anomaly detection on SQL data warehouses and databases

With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases. Getting Started Install via Docker docker run -p 300

Cuebook 171 Dec 18, 2022
Logica is a logic programming language that compiles to StandardSQL and runs on Google BigQuery.

Logica: language of Big Data Logica is an open source declarative logic programming language for data manipulation. Logica is a successor to Yedalog,

Evgeny Skvortsov 1.5k Dec 30, 2022
Asynchronous Python client for InfluxDB

aioinflux Asynchronous Python client for InfluxDB. Built on top of aiohttp and asyncio. Aioinflux is an alternative to the official InfluxDB Python cl

Gustavo Bezerra 159 Dec 27, 2022
Pandas Google BigQuery

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

Python for Data 345 Dec 28, 2022
Baserow is an open source no-code database tool and Airtable alternative

Baserow is an open source no-code database tool and Airtable alternative

1.3k Jan 01, 2023
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

Amazon Web Services - Labs 3.3k Dec 31, 2022
A fast PostgreSQL Database Client Library for Python/asyncio.

asyncpg -- A fast PostgreSQL Database Client Library for Python/asyncio asyncpg is a database interface library designed specifically for PostgreSQL a

magicstack 5.8k Dec 31, 2022
Query multiple mongoDB database collections easily

leakscoop Perform queries across multiple MongoDB databases and collections, where the field names and the field content structure in each database ma

bagel 5 Jun 24, 2021
Pure-python PostgreSQL driver

pg-purepy pg-purepy is a pure-Python PostgreSQL wrapper based on the anyio library. A lot of this library was inspired by the pg8000 library. Credits

Lura Skye 11 May 23, 2022
SQL for Humansâ„¢

Records: SQL for Humansâ„¢ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Kenneth Reitz 6.9k Jan 07, 2023
Lazydata: Scalable data dependencies for Python projects

lazydata: scalable data dependencies lazydata is a minimalist library for including data dependencies into Python projects. Problem: Keeping all data

629 Nov 21, 2022
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Redash is designed to enable anyone, regardless of the level of technical sophistication, to harness the power of data big and small. SQL users levera

Redash 22.4k Dec 30, 2022
SpyQL - SQL with Python in the middle

SpyQL SQL with Python in the middle Concept SpyQL is a query language that combines: the simplicity and structure of SQL with the power and readabilit

Daniel Moura 853 Dec 30, 2022
MySQL database connector for Python (with Python 3 support)

mysqlclient This project is a fork of MySQLdb1. This project adds Python 3 support and fixed many bugs. PyPI: https://pypi.org/project/mysqlclient/ Gi

PyMySQL 2.2k Dec 25, 2022
Little wrapper around asyncpg for specific experience.

Little wrapper around asyncpg for specific experience.

Nikita Sivakov 3 Nov 15, 2021