dask-sql is a distributed SQL query engine in python using Dask

Overview

Conda PyPI GitHub Workflow Status Read the Docs Codecov GitHub Binder

SQL + Python

dask-sql is a distributed SQL query engine in Python. It allows you to query and transform your data using a mixture of common SQL operations and Python code and also scale up the calculation easily if you need it.

  • Combine the power of Python and SQL: load your data with Python, transform it with SQL, enhance it with Python and query it with SQL - or the other way round. With dask-sql you can mix the well known Python dataframe API of pandas and Dask with common SQL operations, to process your data in exactly the way that is easiest for you.
  • Infinite Scaling: using the power of the great Dask ecosystem, your computations can scale as you need it - from your laptop to your super cluster - without changing any line of SQL code. From k8s to cloud deployments, from batch systems to YARN - if Dask supports it, so will dask-sql.
  • Your data - your queries: Use Python user-defined functions (UDFs) in SQL without any performance drawback and extend your SQL queries with the large number of Python libraries, e.g. machine learning, different complicated input formats, complex statistics.
  • Easy to install and maintain: dask-sql is just a pip/conda install away (or a docker run if you prefer). No need for complicated cluster setups - dask-sql will run out of the box on your machine and can be easily connected to your computing cluster.
  • Use SQL from wherever you like: dask-sql integrates with your jupyter notebook, your normal Python module or can be used as a standalone SQL server from any BI tool. It even integrates natively with Apache Hue.

Read more in the documentation.

dask-sql GIF

Example

For this example, we use some data loaded from disk and query them with a SQL command from our python code. Any pandas or dask dataframe can be used as input and dask-sql understands a large amount of formats (csv, parquet, json,...) and locations (s3, hdfs, gcs,...).

import dask.dataframe as dd
from dask_sql import Context

# Create a context to hold the registered tables
c = Context()

# Load the data and register it in the context
# This will give the table a name, that we can use in queries
df = dd.read_csv("...")
c.create_table("my_data", df)

# Now execute a SQL query. The result is again dask dataframe.
result = c.sql("""
    SELECT
        my_data.name,
        SUM(my_data.x)
    FROM
        my_data
    GROUP BY
        my_data.name
""", return_futures=False)

# Show the result
print(result)

Quickstart

Have a look into the documentation or start the example notebook on binder.

dask-sql is currently under development and does so far not understand all SQL commands (but a large fraction). We are actively looking for feedback, improvements and contributors!

If you would like to utilize GPUs for your SQL queries, have a look into the blazingSQL project.

Installation

dask-sql can be installed via conda (preferred) or pip - or in a development environment.

With conda

Create a new conda environment or use your already present environment:

conda create -n dask-sql
conda activate dask-sql

Install the package from the conda-forge channel:

conda install dask-sql -c conda-forge

With pip

dask-sql needs Java for the parsing of the SQL queries. Make sure you have a running java installation with version >= 8.

To test if you have Java properly installed and set up, run

$ java -version
openjdk version "1.8.0_152-release"
OpenJDK Runtime Environment (build 1.8.0_152-release-1056-b12)
OpenJDK 64-Bit Server VM (build 25.152-b12, mixed mode)

After installing Java, you can install the package with

pip install dask-sql

For development

If you want to have the newest (unreleased) dask-sql version or if you plan to do development on dask-sql, you can also install the package from sources.

git clone https://github.com/nils-braun/dask-sql.git

Create a new conda environment and install the development environment:

conda create -n dask-sql --file conda.txt -c conda-forge

It is not recommended to use pip instead of conda for the environment setup. If you however need to, make sure to have Java (jdk >= 8) and maven installed and correctly setup before continuing. Have a look into conda.txt for the rest of the development environment.

After that, you can install the package in development mode

pip install -e ".[dev]"

To compile the Java classes (at the beginning or after changes), run

python setup.py java

This repository uses pre-commit hooks. To install them, call

pre-commit install

Testing

You can run the tests (after installation) with

pytest tests

SQL Server

dask-sql comes with a small test implementation for a SQL server. Instead of rebuilding a full ODBC driver, we re-use the presto wire protocol. It is - so far - only a start of the development and missing important concepts, such as authentication.

You can test the sql presto server by running (after installation)

dask-sql-server

or by using the created docker image

docker run --rm -it -p 8080:8080 nbraun/dask-sql

in one terminal. This will spin up a server on port 8080 (by default) that looks similar to a normal presto database to any presto client.

You can test this for example with the default presto client:

presto --server localhost:8080

Now you can fire simple SQL queries (as no data is loaded by default):

=> SELECT 1 + 1;
 EXPR$0
--------
    2
(1 row)

You can find more information in the documentation.

CLI

You can also run the CLI dask-sql for testing out SQL commands quickly:

dask-sql --load-test-data --startup

(dask-sql) > SELECT * FROM timeseries LIMIT 10;

How does it work?

At the core, dask-sql does two things:

  • translate the SQL query using Apache Calcite into a relational algebra, which is specified as a tree of java objects - similar to many other SQL engines (Hive, Flink, ...)
  • convert this description of the query from java objects into dask API calls (and execute them) - returning a dask dataframe.

For the first step, Apache Calcite needs to know about the columns and types of the dask dataframes, therefore some java classes to store this information for dask dataframes are defined in planner. After the translation to a relational algebra is done (using RelationalAlgebraGenerator.getRelationalAlgebra), the python methods defined in dask_sql.physical turn this into a physical dask execution plan by converting each piece of the relational algebra one-by-one.

Comments
  • TypeError: sequence item 0: expected str instance, NoneType found on  running python setup.py java on source

    TypeError: sequence item 0: expected str instance, NoneType found on running python setup.py java on source

    $ git clone https://github.com/nils-braun/dask-sql.git
    
    $ cd dask-sql
    
    $ pytest tests
    ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
    pytest: error: unrecognized arguments: --cov --cov-config=.coveragerc tests
      inifile: /mnt/d/Programs/dask/dask-sql/pytest.ini
      rootdir: /mnt/d/Programs/dask/dask-sql
    
    
    $ python setup.py java
    running java
    Traceback (most recent call last):
      File "setup.py", line 93, in <module>
        command_options={"build_sphinx": {"source_dir": ("setup.py", "docs"),}},
      File "/home/saulo/anaconda3/lib/python3.7/site-packages/setuptools/__init__.py", line 165, in setup
        return distutils.core.setup(**attrs)
      File "/home/saulo/anaconda3/lib/python3.7/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/home/saulo/anaconda3/lib/python3.7/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/home/saulo/anaconda3/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "setup.py", line 30, in run
        self.announce(f"Running command: {' '.join(command)}", level=distutils.log.INFO)
    TypeError: sequence item 0: expected str instance, NoneType found
    
    $ python dask-sql-test.py
    Traceback (most recent call last):
      File "dask-sql-test.py", line 1, in <module>
        from dask_sql import Context
      File "/mnt/d/Programs/dask/dask-sql/dask_sql/__init__.py", line 1, in <module>
        from .context import Context
      File "/mnt/d/Programs/dask/dask-sql/dask_sql/context.py", line 9, in <module>
        from dask_sql.java import (
      File "/mnt/d/Programs/dask/dask-sql/dask_sql/java.py", line 88, in <module>
        DaskTable = com.dask.sql.schema.DaskTable
    AttributeError: Java package 'com' has no attribute 'dask'
    
    $ python -V
    Python 3.7.6
    
    $ lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description:    Ubuntu 20.04.1 LTS
    Release:        20.04
    Codename:       focal
    
    $ java -version
    openjdk version "14.0.2" 2020-07-14
    OpenJDK Runtime Environment (build 14.0.2+12-Ubuntu-120.04)
    OpenJDK 64-Bit Server VM (build 14.0.2+12-Ubuntu-120.04, mixed mode, sharing)
    
    opened by sauloal 14
  • Add a packaged version of dask-sql

    Add a packaged version of dask-sql

    Currently, dask-sql can only be installed via the source. We should find out, if uploading the packaged jar (contained in a wheel) together with the python code makes sense and if and how we can create a conda package (probably via conda-forge).

    opened by nils-braun 13
  • [BUG] CVEs in conda release

    [BUG] CVEs in conda release

    What happened:

    Running Grype on DaskSQL.jar from the latest conda release (dask-sql=2022.1) returned 6 fixable CVEs

    grype graphistry/graphistry-nvidia:v2.39.7-11.4 \
        --only-fixed \
        -o template \
        -t grype.friendly.tmpl
    

    with template grype.friendly.tmpl

    "Package","Version Installed","Vulnerability ID","Severity","Location",
    {{- range .Matches}}
    "{{.Artifact.Name}}","{{.Artifact.Version}}","{{.Vulnerability.ID}}","{{.Vulnerability.Severity}}","{{.Artifact.Locations}}"
    {{- end}}
    

    =>

    ...
    jackson-databind","2.10.0","GHSA-57j2-w4cx-62h2","High","[Location<RealPath="/opt/conda/envs/rapids/lib/python3.8/site-packages/dask_sql/jar/DaskSQL.jar" Layer="sha256:5c80fa32eb12dd95d387ae9121c3a8ba9713207626bbc7b849613b4bb0eb3586">]"
    "httpclient","4.5.9","GHSA-7r82-7xv7-xcpj","Medium","[Location<RealPath="/opt/conda/envs/rapids/lib/python3.8/site-packages/dask_sql/jar/DaskSQL.jar" Layer="sha256:5c80fa32eb12dd95d387ae9121c3a8ba9713207626bbc7b849613b4bb0eb3586">]"
    "json-smart","2.3","GHSA-fg2v-w576-w4v3","High","[Location<RealPath="/opt/conda/envs/rapids/lib/python3.8/site-packages/dask_sql/jar/DaskSQL.jar" Layer="sha256:5c80fa32eb12dd95d387ae9121c3a8ba9713207626bbc7b849613b4bb0eb3586">]"
    "commons-io","2.4","GHSA-gwrp-pvrq-jmwv","Medium","[Location<RealPath="/opt/conda/envs/rapids/lib/python3.8/site-packages/dask_sql/jar/DaskSQL.jar" Layer="sha256:5c80fa32eb12dd95d387ae9121c3a8ba9713207626bbc7b849613b4bb0eb3586">]"
    "snakeyaml","1.24","GHSA-rvwf-54qp-4r6v","High","[Location<RealPath="/opt/conda/envs/rapids/lib/python3.8/site-packages/dask_sql/jar/DaskSQL.jar" Layer="sha256:5c80fa32eb12dd95d387ae9121c3a8ba9713207626bbc7b849613b4bb0eb3586">]"
    "json-smart","2.3","GHSA-v528-7hrm-frqp","Critical","[Location<RealPath="/opt/conda/envs/rapids/lib/python3.8/site-packages/dask_sql/jar/DaskSQL.jar" Layer="sha256:5c80fa32eb12dd95d387ae9121c3a8ba9713207626bbc7b849613b4bb0eb3586">]
    

    What you expected to happen:

    The latest stable release should ideally have no fixable CVEs

    Minimal Complete Verifiable Example:

    See above

    Anything else we need to know?:

    Environment:

    • dask-sql version: 2022.01
    • Python version: Any
    • Operating System: Any (Ubuntu container)
    • Install method (conda, pip, source): Conda
    bug 
    opened by lmeyerov 10
  • Complex join fails with memory error

    Complex join fails with memory error

    From @timhdesilva

    So I have a large dataset (50GB) that needs to be merged with a small dataset that is a Pandas dataframe. Prior to the merge, I need to perform a groupby observation on the large dataset. Using Dask, I have been able to perform the groupby observation on the large dataset (which is a Dask dataframe). When I then merge the two datasets using X.merge(Y), I have no issues. The problem is that I need to perform a merge than is not exact (i.e. one column between two others), which is why I'm turning to dask-sql. When I try to do the merge with dask-sql though, I get a memory error (the number of observations should only be ~ 10x than the exact merge, so memory shouldn't be a problem).

    Any ideas here? I'm thinking somehow the issue might be that I am performing a groupby operation on the Dask dataframe prior to the dask-sql merge. Is this allowed - i.e. can one do a groupby and not execute it prior to using the dask-sql create_table() command and then performing a dask-sql merge with c.sql?

    opened by nils-braun 10
  • Upgrade to DataFusion 14.0.0

    Upgrade to DataFusion 14.0.0

    Changes in this PR:

    • Use DataFusion 14.0.0
    • Added copy of filter_push_down rule from DataFusion 13.0.0 because there are changes in the DataFusion 14.0.0 version that cause regressions for us. We should revert back to using DataFusion's version at some point. I filed https://github.com/dask-contrib/dask-sql/issues/908 for this.
    opened by andygrove 9
  • [ENH] substr() not supported in dask-sql

    [ENH] substr() not supported in dask-sql

    Is your feature request related to a problem? Please describe. I'm working on porting a large set of queries from another engine to dask-sql. I see that I can update the queries to use "substring" instead, but it would be nice if users didn't have to.

    Describe the solution you'd like Can we have substr() supported in dask-sql in the same way that substring() is?

    Describe alternatives you've considered substring() works in dask-sql not substr(). However, we do not want to alter the sql files by changing substr() to substring()

    Additional context Here's an example query I'd like to be able to run: `import cudf from dask_sql import Context

      dc = Context()
      
      df = cudf.DataFrame({'s_c': ['ATX', 'LAX', 'SFO'], 's_d':['38714','37206','38714'], 
                           'd_d':['1900-01-01','1900-01-04','2199-12-28']})
      dc.create_table('my_table', df) 
      
     
      query = """
                select substr(s_c,1,30)
                from
                 (select s_c
                  from my_table
                  where s_d = d_d           
                  group by s_c)
              """
      
      print(dc.sql(query).compute())`
    
    enhancement SQL grammar java 
    opened by DaceT 9
  • Error: Unable to instantiate java compiler

    Error: Unable to instantiate java compiler

    Hi! @nils-braun,

    As you already know I mistakenly opened this issue on Dask-Docker repo and you were kindly alerted by @jrbourbeau

    I will copy/paste my original post here as well as your initial answer (Thank you for your quick reply)

    Here is my original post:

    ####################################################################

    What happened:

    After installing Java and dask-sql using pip, whenever I try to run a SQL query from my python code I get the following error:

    ...
    File "/home/vquery/.local/lib/python3.8/site-packages/dask_sql/context.py", line 378, in sql
        rel, select_names, _ = self._get_ral(sql)
      File "/home/vquery/.local/lib/python3.8/site-packages/dask_sql/context.py", line 515, in _get_ral
        nonOptimizedRelNode = generator.getRelationalAlgebra(validatedSqlNode)
    java.lang.java.lang.IllegalStateException: java.lang.IllegalStateException: Unable to instantiate java compiler
    ...
    ...
    File "JaninoRelMetadataProvider.java", line 426, in org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.compile
      File "CompilerFactoryFactory.java", line 61, in org.codehaus.commons.compiler.CompilerFactoryFactory.getDefaultCompilerFactory
    java.lang.java.lang.NullPointerException: java.lang.NullPointerException
    

    What you expected to happen:

    I should get a dataframe as a result.

    Minimal Complete Verifiable Example:

    
    # The cluster/client setup is done first, in another module not the one executing the SQL query
    # Also tried other cluster/scheduler types with the same error
    from dask.distributed import Client, LocalCluster
    cluster = LocalCluster(
        n_workers=4,
        threads_per_worker=1,
        processes=False,
        dashboard_address=':8787',
        asynchronous=False,
        memory_limit='1GB'
        )
    client = Client(cluster)
    
    # The SQL code is executed in its own module
    import dask.dataframe as dd
    from dask_sql import Context
    
    c = Context()
    df = dd.read_parquet('/vQuery/files/results/US_Accidents_June20.parquet') 
    c.register_dask_table(df, 'df')
    df = c.sql("""select ID, Source from df""") # This line fails with the error reported
    
    

    Anything else we need to know?:

    As mentioned in the code snippet above, due to the way my application is designed, the Dask client/cluster setup is done before dask-sql context is created.

    Environment:

    • Dask version:
      • dask: 2020.12.0
      • dask-sql: 0.3.1
    • Python version:
      • Python 3.8.5
    • Operating System:
      • Ubuntu 20.04.1 LTS
    • Install method (conda, pip, source):
      • pip
    • Application Framework
      • Jupyter Notebook/Ipywidgets & Voila Server

    Install steps

    $ sudo apt install default-jre
    
    $ sudo apt install default-jdk
    
    $ java -version
    openjdk version "11.0.10" 2021-01-19
    OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.20.04)
    OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
    
    $ javac -version
    javac 11.0.10
    
    $ echo $JAVA_HOME
    /usr/lib/jvm/java-11-openjdk-amd64
    
    $ pip install dask-sql
    
    $ pip list | grep dask-sql
    dask-sql               0.3.1
    
    opened by LaurentEsingle 9
  • Add max version constraint for `fugue`

    Add max version constraint for `fugue`

    It looks like the recent release of Fugue 0.7.0 has bumped its qpd dependency to a version that only has python support up to 3.8. I'm not sure if this is the cause for the recent Fugue-related failures, but it does mean that at least for now, we should constrain to fugue<0.7.0, where 3.9+ support is guaranteed.

    In the long run, we should probably see what the blockers are to allowing 3.9+ support on qpd again, cc @goodwanghan in case you have any additional context to provide here.

    opened by charlesbluca 8
  • Add STDDEV, STDDEV_SAMP, and STDDEV_POP

    Add STDDEV, STDDEV_SAMP, and STDDEV_POP

    Closes #608

    Blocked by: https://github.com/rapidsai/cudf/issues/11515

    Note: currently, performing multiple aggregations at once seems to result in incorrect values. Ex: SELECT STDDEV(a) AS s1, STDDEV_POP(a) AS s2 FROM df returns the same result for both s1 and s2 but running two separate queries (one for each aggregation) returns the correct results (#655)

    datafusion 
    opened by ChrisJar 8
  • [BUG] Segfaults on

    [BUG] Segfaults on "select count(*) from test" with tables on top of cuDF DataFrames

    test.py:

    if __name__ == "__main__":
        from dask.distributed import Client
        from dask_cuda import LocalCUDACluster
    
        cluster = LocalCUDACluster(protocol="tcp")
        client = Client(cluster)
        print(client)
    
        from dask_sql import Context
        import cudf
    
        c = Context()
    
        test_df = cudf.DataFrame({'id': [0, 1, 2]})
        c.create_table("test", test_df)
    
        # segfault
        print(c.sql("select count(*) from test").compute())
    

    EDIT: Leaving the below UCX snippet and trace for historical purposes, but the issue seems entirely unrelated to UCX.

    from dask.distributed import Client
    from dask_cuda import LocalCUDACluster
    from dask_sql import Context
    import pandas as pd
    
    cluster = LocalCUDACluster(protocol="ucx")
    client = Client(cluster)
    
    c = Context()
    
    test_df = pd.DataFrame({'id': [0, 1, 2]})
    c.create_table("test", test_df)
    
    # segfault
    c.sql("select count(*) from test")
    

    trace:

    /home/rgelhausen/conda/envs/dsql-3-07/lib/python3.9/site-packages/distributed-2022.2.1+8.g39c5e885-py3.9.egg/distributed/comm/ucx.py:83: UserWarning: A CUDA context for device 0 already exists on process ID 1251168. This is often the result of a CUDA-enabled library calling a CUDA runtime function before Dask-CUDA can spawn worker processes. Please make sure any such function calls don't happen at import time or in the global scope of a program.
      warnings.warn(
    distributed.preloading - INFO - Import preload module: dask_cuda.initialize
    ...
    [rl-dgx2-r13-u7-rapids-dgx201:1232380:0:1232380] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
    ==== backtrace (tid:1232380) ====
     0  /home/rgelhausen/conda/envs/dsql-3-07/lib/python3.9/site-packages/ucp/_libs/../../../../libucs.so.0(ucs_handle_error+0x155) [0x7f921c5883f5]
     1  /home/rgelhausen/conda/envs/dsql-3-07/lib/python3.9/site-packages/ucp/_libs/../../../../libucs.so.0(+0x2d791) [0x7f921c588791]
     2  /home/rgelhausen/conda/envs/dsql-3-07/lib/python3.9/site-packages/ucp/_libs/../../../../libucs.so.0(+0x2d962) [0x7f921c588962]
     3  /lib/x86_64-linux-gnu/libc.so.6(+0x430c0) [0x7f976d27b0c0]
     4  [0x7f93a78e6b58]
    =================================
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007f93a78e6b58, pid=1232380, tid=1232380
    #
    # JRE version: OpenJDK Runtime Environment (11.0.1+13) (build 11.0.1+13-LTS)
    # Java VM: OpenJDK 64-Bit Server VM (11.0.1+13-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
    # Problematic frame:
    # J 1791 c2 java.util.Arrays.hashCode([Ljava/lang/Object;)I [email protected] (56 bytes) @ 0x00007f93a78e6b58 [0x00007f93a78e6b20+0x0000000000000038]
    #
    # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/nfs/rgelhausen/notebooks/core.1232380)
    #
    # An error report file with more information is saved as:
    # /home/nfs/rgelhausen/notebooks/hs_err_pid1232380.log
    Compiled method (c2)   17616 1791       4       java.util.Arrays::hashCode (56 bytes)
     total in heap  [0x00007f93a78e6990,0x00007f93a78e6d80] = 1008
     relocation     [0x00007f93a78e6b08,0x00007f93a78e6b20] = 24
     main code      [0x00007f93a78e6b20,0x00007f93a78e6c60] = 320
     stub code      [0x00007f93a78e6c60,0x00007f93a78e6c78] = 24
     metadata       [0x00007f93a78e6c78,0x00007f93a78e6c80] = 8
     scopes data    [0x00007f93a78e6c80,0x00007f93a78e6ce8] = 104
     scopes pcs     [0x00007f93a78e6ce8,0x00007f93a78e6d48] = 96
     dependencies   [0x00007f93a78e6d48,0x00007f93a78e6d50] = 8
     handler table  [0x00007f93a78e6d50,0x00007f93a78e6d68] = 24
     nul chk table  [0x00007f93a78e6d68,0x00007f93a78e6d80] = 24
    Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled
    #
    # If you would like to submit a bug report, please visit:
    
    bug needs triage 
    opened by randerzander 8
  • Update docs theme, use sphinx-tabs for CPU/GPU examples

    Update docs theme, use sphinx-tabs for CPU/GPU examples

    This PR bumps the dask-sphinx-theme to be more in line with Dask / Distributed's docs, and adds the sphinx-tabs extension so that code-blocks can be tabbed to show their GPU equivalent (when possible)

    opened by charlesbluca 8
  • Bump pypa/cibuildwheel from 2.11.3 to 2.11.4

    Bump pypa/cibuildwheel from 2.11.3 to 2.11.4

    Bumps pypa/cibuildwheel from 2.11.3 to 2.11.4.

    Release notes

    Sourced from pypa/cibuildwheel's releases.

    v2.11.4

    • 🐛 Fix a bug that caused missing wheels on Windows when a test was skipped using CIBW_TEST_SKIP (#1377)
    • 🛠 Updates CPython 3.11 to 3.11.1 (#1371)
    • 🛠 Updates PyPy 3.7 to 3.7.10, except on macOS which remains on 7.3.9 due to a bug. (#1371)
    • 📚 Added a reference to abi3audit to the docs (#1347)
    Changelog

    Sourced from pypa/cibuildwheel's changelog.

    v2.11.4

    24 Dec 2022

    • 🐛 Fix a bug that caused missing wheels on Windows when a test was skipped using CIBW_TEST_SKIP (#1377)
    • 🛠 Updates CPython 3.11 to 3.11.1 (#1371)
    • 🛠 Updates PyPy to 7.3.10, except on macOS which remains on 7.3.9 due to a bug on that platform. (#1371)
    • 📚 Added a reference to abi3audit to the docs (#1347)
    Commits
    • 27fc88e Bump version: v2.11.4
    • a7e9ece Merge pull request #1371 from pypa/update-dependencies-pr
    • b9a3ed8 Update cibuildwheel/resources/build-platforms.toml
    • 3dcc2ff fix: not skipping the tests stops the copy (Windows ARM) (#1377)
    • 1c9ec76 Merge pull request #1378 from pypa/henryiii-patch-3
    • 22b433d Merge pull request #1379 from pypa/pre-commit-ci-update-config
    • 98fdf8c [pre-commit.ci] pre-commit autoupdate
    • cefc5a5 Update dependencies
    • e53253d ci: move to ubuntu 20
    • e9ecc65 [pre-commit.ci] pre-commit autoupdate (#1374)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • [BUG] Fix `test_random` on Dask cluster

    [BUG] Fix `test_random` on Dask cluster

    Right now, test_random fails on our Dask cluster integration test with a TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given.

    Like #977, I think this may have to do with the newest NumPy version release.

    See also the definition of __randomstate_ctor in the NumPy source code.

    bug needs triage 
    opened by sarahyurick 0
  • [BUG] Schema: <schema> not found in DaskSQLContext

    [BUG] Schema: not found in DaskSQLContext

    Screen Shot 2022-12-20 at 3 56 30 PM

    this worked in dask 2022.8, but after the switch to dataFusion, I get this error when running queries. We believe this is because dataFusion doesn't support schemas - is it possible to add support this again?

    bug needs triage 
    opened by hungcs 1
  • Bump async-trait from 0.1.59 to 0.1.60 in /dask_planner

    Bump async-trait from 0.1.59 to 0.1.60 in /dask_planner

    Bumps async-trait from 0.1.59 to 0.1.60.

    Release notes

    Sourced from async-trait's releases.

    0.1.60

    • Documentation improvements
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies rust 
    opened by dependabot[bot] 1
  • [ENH] Add ~isna() support for predicate pushdown

    [ENH] Add ~isna() support for predicate pushdown

    Is your feature request related to a problem? Please describe. A common filter applied to many sql queries is filtering out nulls for certain tables that usually get's pushed down to the TableScan step. We implement is not null as a combination of df.isna() chained with a not operation. It would be good to support identifying these patterns in the hlg for predicate pushdown.

    Describe the solution you'd like

    Describe alternatives you've considered

    Additional context

    enhancement needs triage 
    opened by ayushdg 0
Releases(2022.12.0)
  • 2022.12.0(Dec 2, 2022)

    What's Changed

    • Unpin dask/distributed for development by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/892
    • Add replace operator by @ChrisJar in https://github.com/dask-contrib/dask-sql/pull/897
    • Replace variadic with exact where appropriate by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/885
    • Bump pyo3 from 0.17.2 to 0.17.3 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/900
    • Sort + limit topk optimization (initial) by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/893
    • [bug][docs] my_ds -> my_df by @nickvazz in https://github.com/dask-contrib/dask-sql/pull/905
    • Bump env_logger from 0.9.1 to 0.9.3 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/906
    • Bump mimalloc from 0.1.30 to 0.1.31 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/910
    • Replace dask_ml.wrappers.Incremental with custom Incremental class by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/855
    • Update flake8 link to use github by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/915
    • Use conda-incubator/[email protected] & enable automatic GH Action updates by @jakirkham in https://github.com/dask-contrib/dask-sql/pull/917
    • Bump uuid from 1.2.1 to 1.2.2 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/916
    • Upgrade to DataFusion 14.0.0 by @andygrove in https://github.com/dask-contrib/dask-sql/pull/903
    • Bump actions/checkout from 2 to 3 by @dependabot in https://github.com/dask-contrib/dask-sql/pull/920
    • Support to_timestamp by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/838
    • Bump actions/setup-python from 2 to 4 by @dependabot in https://github.com/dask-contrib/dask-sql/pull/921
    • Bump Docker workflow actions by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/930
    • Bump mimalloc from 0.1.31 to 0.1.32 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/923
    • Bump tokio from 1.21.2 to 1.22.0 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/927
    • Bump peter-evans/create-pull-request from 3 to 4 by @dependabot in https://github.com/dask-contrib/dask-sql/pull/929
    • Temporarily fix gpuci by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/942
    • Remove all Dask-ML uses by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/886
    • Dependabot updates by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/944
    • Bump async-trait from 0.1.58 to 0.1.59 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/946
    • Add TIMESTAMPDIFF support by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/876
    • Implement basic COALESCE functionality by @ChrisJar in https://github.com/dask-contrib/dask-sql/pull/823
    • Add support for filter pushdown rule by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/924
    • Resolve test_date_functions() by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/813
    • Set dask/distributed pinning for release by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/947
    • Set dask/distributed max version in Dockerfile by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/952

    New Contributors

    • @nickvazz made their first contribution in https://github.com/dask-contrib/dask-sql/pull/905
    • @jakirkham made their first contribution in https://github.com/dask-contrib/dask-sql/pull/917

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.10.1...2022.12.0

    Source code(tar.gz)
    Source code(zip)
  • 2022.10.1(Oct 25, 2022)

    What's Changed

    • Unpin dask/distributed for development by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/848
    • Switch docs/CI away from conda-installed Rust by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/817
    • Add /opt/cargo/bin to gpuCI PATH by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/856
    • Enable crate sorting with rustfmt by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/819
    • Update datafusion dependency during upstream testing by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/814
    • Bump mimalloc from 0.1.29 to 0.1.30 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/862
    • Update gpuCI RAPIDS_VER to 22.12 by @github-actions in https://github.com/dask-contrib/dask-sql/pull/863
    • Refactor which_upstream logic in upstream scheduled workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/864
    • Add testing for OSX by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/859
    • Wrap which_upstream logic in expression syntax by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/866
    • Check for np.timedelta64 in as_timelike by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/860
    • Update test-upstream.yml typo by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/869
    • Use latest DataFusion rev by @andygrove in https://github.com/dask-contrib/dask-sql/pull/865
    • Only use upstream Dask in scheduled cluster testing if which_upstream == 'Dask' by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/872
    • Bump async-trait from 0.1.57 to 0.1.58 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/870
    • Add pypi release workflow by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/858
    • Ignore index for union all test by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/875
    • Bump versioneer-vendored files to 0.27 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/881
    • Bump uvicorn minimum version to 0.13.4 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/873
    • Install twine in cibuildwheel environment by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/874
    • Replace dask_ml.wrappers.ParallelPostFit with custom ParallelPostFit class by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/832
    • Add py to testing environments to resolve pytest 7.2.0 issues by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/890
    • Use latest DataFusion rev by @andygrove in https://github.com/dask-contrib/dask-sql/pull/889
    • Pin dask/distributed for release by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/891

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.10.0...2022.10.1

    Source code(tar.gz)
    Source code(zip)
  • 2022.10.1rc1(Oct 24, 2022)

    What's Changed

    • Ignore index for union all test by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/875
    • Bump versioneer-vendored files to 0.27 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/881
    • Bump uvicorn minimum version to 0.13.4 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/873
    • Install twine in cibuildwheel environment by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/874

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.10.1rc0...2022.10.1rc1

    Source code(tar.gz)
    Source code(zip)
  • 2022.10.1rc0(Oct 19, 2022)

    What's Changed

    • Unpin dask/distributed for development by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/848
    • Switch docs/CI away from conda-installed Rust by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/817
    • Add /opt/cargo/bin to gpuCI PATH by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/856
    • Enable crate sorting with rustfmt by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/819
    • Update datafusion dependency during upstream testing by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/814
    • Bump mimalloc from 0.1.29 to 0.1.30 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/862
    • Update gpuCI RAPIDS_VER to 22.12 by @github-actions in https://github.com/dask-contrib/dask-sql/pull/863
    • Refactor which_upstream logic in upstream scheduled workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/864
    • Add testing for OSX by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/859
    • Wrap which_upstream logic in expression syntax by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/866
    • Check for np.timedelta64 in as_timelike by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/860
    • Update test-upstream.yml typo by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/869
    • Use latest DataFusion rev by @andygrove in https://github.com/dask-contrib/dask-sql/pull/865
    • Only use upstream Dask in scheduled cluster testing if which_upstream == 'Dask' by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/872
    • Bump async-trait from 0.1.57 to 0.1.58 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/870
    • Add pypi release workflow by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/858

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.10.0...2022.10.1c0

    Source code(tar.gz)
    Source code(zip)
  • 2022.10.0(Oct 10, 2022)

    What's Changed

    • Update README to link to DataFusion rather than Calcite by @andygrove in https://github.com/dask-contrib/dask-sql/pull/790
    • Unpin dask/distributed for development by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/794
    • Remove datafusion syncing workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/793
    • Resolve syntax errors in upstream testing workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/797
    • README update- remove 'experimental' from GPU support section by @randerzander in https://github.com/dask-contrib/dask-sql/pull/798
    • Fix new clippy warnings by @andygrove in https://github.com/dask-contrib/dask-sql/pull/801
    • Check split_out to decide on sorted groupby in aggregate.py by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/802
    • Resolve Docker build failures, update core dependency constraints by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/804
    • Fix docker build errors by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/805
    • Fix if condition for gpuCI updating workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/808
    • pip install awscli in cloud images by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/809
    • Resolve bare requirement failures in upstream workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/800
    • Refactor getValue<T> code to reduce duplication by @andygrove in https://github.com/dask-contrib/dask-sql/pull/803
    • Improve SqlTypeName to support more types and also improve error handling by @andygrove in https://github.com/dask-contrib/dask-sql/pull/824
    • Add dependabot config to update Rust deps by @andygrove in https://github.com/dask-contrib/dask-sql/pull/820
    • Bump uuid from 0.8.2 to 1.1.2 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/828
    • Bump rand from 0.7.3 to 0.8.5 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/827
    • Remove rust-toolchain.toml by @andygrove in https://github.com/dask-contrib/dask-sql/pull/826
    • Add quoting around partition keys for Hive table inputs by @randerzander in https://github.com/dask-contrib/dask-sql/pull/834
    • Configure dependabot to ignore arrow and datafusion by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/840
    • Bump pyo3 from 0.17.1 to 0.17.2 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/836
    • Add support for CREATE EXPERIMENT, expand support for WITH kwargs by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/796
    • Bump uuid from 1.1.2 to 1.2.1 in /dask_planner by @dependabot in https://github.com/dask-contrib/dask-sql/pull/845
    • Add Andy and Charles to the rust codeowners group by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/846
    • Update DataFusion and change order of optimization rules by @andygrove in https://github.com/dask-contrib/dask-sql/pull/825
    • Update doc pages after DataFusion merge by @randerzander in https://github.com/dask-contrib/dask-sql/pull/842
    • Resolve test_literals() by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/812
    • Faster limit computation on persisted dataframes by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/837
    • Pin dask/distributed for release by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/847

    New Contributors

    • @randerzander made their first contribution in https://github.com/dask-contrib/dask-sql/pull/798
    • @dependabot made their first contribution in https://github.com/dask-contrib/dask-sql/pull/828

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.9.0...2022.10.0

    Source code(tar.gz)
    Source code(zip)
  • 2022.9.0(Sep 21, 2022)

    What's Changed

    • Unpin dask/distibuted post-release by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/694
    • Don't check order for filtered groupby test by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/702
    • Relax test_groupby_split_every key check by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/710
    • Update gpuCI environment file, updating workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/731
    • Bump gpuCI test environment to use python 3.9 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/736
    • Refactor LIMIT computation to always use head when possible by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/696
    • Set pytest to fail on xpassing tests by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/756
    • Fix upstream failures in test_groupby_split_out by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/763
    • Add step argument to get_window_bounds for pandas>=1.5 by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/774
    • Remove PyPI release workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/777
    • Switch to Arrow DataFusion SQL parser by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/788
    • Pin dask/distributed for release by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/789

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.8.0...2022.9.0

    Source code(tar.gz)
    Source code(zip)
  • 2022.9.0.rc0(Sep 20, 2022)

    What's Changed

    • Datafusion aggregate by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/471
    • Bump DataFusion version by @andygrove in https://github.com/dask-contrib/dask-sql/pull/494
    • Basic DataFusion Select Functionality by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/489
    • Allow for Cast parsing and logicalplan by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/498
    • Minor code cleanup in row_type() by @andygrove in https://github.com/dask-contrib/dask-sql/pull/504
    • Bump rust version by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/508
    • Improve code for getting column name from expression by @andygrove in https://github.com/dask-contrib/dask-sql/pull/509
    • Update exceptions that are thrown by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/507
    • Add support for Expr::Sort in expr_to_field by @andygrove in https://github.com/dask-contrib/dask-sql/pull/515
    • Reduce crate dependencies by @andygrove in https://github.com/dask-contrib/dask-sql/pull/516
    • Datafusion dsql explain by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/511
    • Port sort logic to the datafusion planner by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/505
    • Add helper method to convert LogicalPlan to Python type by @andygrove in https://github.com/dask-contrib/dask-sql/pull/522
    • Support CASE WHEN and BETWEEN by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/502
    • Upgrade to DataFusion 8.0.0 by @andygrove in https://github.com/dask-contrib/dask-sql/pull/533
    • Enable passing tests by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/539
    • Datafusion crossjoin by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/521
    • Implement TryFrom for plans by @andygrove in https://github.com/dask-contrib/dask-sql/pull/543
    • Support for LIMIT clause with DataFusion by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/529
    • Support Joins using DataFusion planner/parser by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/512
    • Datafusion is not by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/557
    • [REVIEW] Add support for UNION by @galipremsagar in https://github.com/dask-contrib/dask-sql/pull/542
    • [REVIEW] Fix issue with duplicates in column renaming by @galipremsagar in https://github.com/dask-contrib/dask-sql/pull/559
    • [REVIEW] Enable LIMIT tests by @galipremsagar in https://github.com/dask-contrib/dask-sql/pull/560
    • Add CODEOWNERS file by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/562
    • Upgrade DataFusion version & support non-equijoin join conditions by @andygrove in https://github.com/dask-contrib/dask-sql/pull/566
    • [DF] Add @ayushdg and @galipremsagar to rust codeowners by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/572
    • Enable DataFusion CBO and introduce DaskSqlOptimizer by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/558
    • Only use the specific DataFusion crates that we need by @andygrove in https://github.com/dask-contrib/dask-sql/pull/568
    • Fix some clippy warnings by @andygrove in https://github.com/dask-contrib/dask-sql/pull/574
    • Datafusion invalid projection by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/571
    • Datafusion upstream merge by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/576
    • Datafusion filter by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/581
    • Table_scan column projection by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/578
    • Expose groupby agg configs to drop_duplicates (distinct) egg by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/575
    • Datafusion year & support for DaskSqlDialect by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/585
    • Optimization rule to optimize out nulls for inner joins by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/588
    • Push down null filters into TableScan by @andygrove in https://github.com/dask-contrib/dask-sql/pull/595
    • Datafusion IndexError - Return fields from the lhs and rhs of a join by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/599
    • Datafusion uncomment working filter tests by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/601
    • Search all schemas when attempting to locate index by field name by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/602
    • Fix join condition eval when joining on 3 or more columns by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/603
    • Add inList support by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/604
    • Enable Datafusion user defined functions UDFs by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/605
    • Datafusion empty relation by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/611
    • Datafusion NOT LIKE Clause support by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/615
    • Uncomment passing pytests by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/616
    • Fix bug when filtering on specific scalars. by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/609
    • Datafusion NULL & NOT NULL literals by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/618
    • Fix the results from a subquery alias operation with optimizations enabled by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/613
    • Initial version of contributing guide by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/600
    • Add helper function for converting expression lists to Python by @andygrove in https://github.com/dask-contrib/dask-sql/pull/631
    • Plugins support multiply types by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/636
    • Consolidate limit/offset logic in partition func by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/598
    • Datafusion version bump by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/628
    • Expand getOperands support to cover all currently available Expr type… by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/642
    • Introduce Inverse Rex Operation by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/643
    • Remove code segment that was causing double the amount of columns to … by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/644
    • Include Columns in Empty DataFrame by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/645
    • Bump setuptools-rust from 1.1.1 -> 1.4.1 by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/646
    • Merge main into datafusion-sql-planner by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/654
    • Port window logic to datafusion by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/545
    • COT function by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/657
    • Math functions by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/660
    • Use PyErrs for all Python-facing methods in dask_planner by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/662
    • Invalid crossjoin in plan by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/653
    • [DF] Add support for CREATE TABLE | VIEW AS statements by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/656
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/669
    • Datafusion expand scalarvalue catchall by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/638
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/670
    • [DF] Add support for DROP TABLE statements by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/658
    • Remove un-necessary sqlparser dependency and duplicate Dialect defini… by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/671
    • [DF] Resolve UDF test failures by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/672
    • Uncomment skipped rex pytests by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/661
    • Merge "Bump arrow version to 6.0.0 (#674)" by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/677
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/676
    • [DF] Fix most of the clippy warnings by @andygrove in https://github.com/dask-contrib/dask-sql/pull/679
    • [DF] use datafusion 9956f80f197550051db7debae15d5c706afc22a3 by @andygrove in https://github.com/dask-contrib/dask-sql/pull/667
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/685
    • Configure clippy to error on warnings by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/692
    • Unpin dask/distibuted post-release by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/694
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/691
    • [DF] Add optimizer rules to translate subqueries to joins by @andygrove in https://github.com/dask-contrib/dask-sql/pull/680
    • [DF] Upgrade DataFusion to rev c0b4ba by @andygrove in https://github.com/dask-contrib/dask-sql/pull/689
    • Add STDDEV, STDDEV_SAMP, and STDDEV_POP by @ChrisJar in https://github.com/dask-contrib/dask-sql/pull/629
    • Rust parsing support for CREATE MODEL statements by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/693
    • Support for DROP MODEL parsing in Rust by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/695
    • Support for parsing [or replace] with create [or replace] model by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/700
    • Parsing logic for SHOW SCHEMAS by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/697
    • Support for parsing SHOW TABLES FROM grammar by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/699
    • Don't check order for filtered groupby test by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/702
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/708
    • Enable passing pytests by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/709
    • Relax test_groupby_split_every key check by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/710
    • Introduce 'schema' to the DaskTable instance and modify context.fqn t… by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/713
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/711
    • Use compiler function in nightly recipe, pin to Rust 1.62.1 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/687
    • Add test queries to gpuCI checks by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/650
    • Support for DISTRIBUTE BY by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/715
    • Datafusion create table with by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/714
    • [DF] Bump DataFusion to rev 076b42 by @andygrove in https://github.com/dask-contrib/dask-sql/pull/720
    • [DF] Add support for CREATE [OR REPLACE] TABLE [IF NOT EXISTS] WITH by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/718
    • Stop overwriting aggregations on same column by @ChrisJar in https://github.com/dask-contrib/dask-sql/pull/675
    • [DF] Add TypeCoercion optimizer rule by @andygrove in https://github.com/dask-contrib/dask-sql/pull/723
    • Support for SHOW COLUMNS syntax by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/721
    • Implment PREDICT parsing and python wiring by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/722
    • Support all boolean operations by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/719
    • Resolve issue that crept in during code merge and caused build issues by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/724
    • [DF] Add handling for overloaded UDFs by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/682
    • [DF] Minor quality of life updates to test_queries.py by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/730
    • [DF] Fix PyExpr.index bug where it returns Ok(0) instead of an Err if no match is found by @andygrove in https://github.com/dask-contrib/dask-sql/pull/732
    • [DF] Add Cargo.lock and bump DataFusion rev by @andygrove in https://github.com/dask-contrib/dask-sql/pull/734
    • Update gpuCI environment file, updating workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/731
    • Bump gpuCI test environment to use python 3.9 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/736
    • [DF] Implement ANALYZE TABLE by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/733
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/735
    • [DF] Switch out gpuCI Java dependencies for Rust by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/737
    • {CREATE | USE | DROP} Schema support by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/727
    • Test function test_aggregate_function by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/738
    • Uncomment more test_model pytests by @ChrisJar in https://github.com/dask-contrib/dask-sql/pull/728
    • Unskip passing postgres test by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/739
    • [DF] Publish nightlies under dev_datafusion label by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/729
    • [DF] DataFusion upgrade by @andygrove in https://github.com/dask-contrib/dask-sql/pull/742
    • [DF] Resolve test_aggregations and test_group_by_all by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/743
    • Refactor LIMIT computation to always use head when possible by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/696
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/745
    • Upgrade to latest DataFusion by @andygrove in https://github.com/dask-contrib/dask-sql/pull/744
    • Uncomment passing pytests by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/750
    • [DF] Update DataFusion to pick up SQL support for LIKE, ILIKE, SIMILAR TO with escape char by @andygrove in https://github.com/dask-contrib/dask-sql/pull/751
    • Set pytest to fail on xpassing tests by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/756
    • Upgrade to Datafusion 12.0.0 RC1 by @andygrove in https://github.com/dask-contrib/dask-sql/pull/755
    • [DF] Optimize away COUNT DISTINCT aggregate operations - eliminate_agg_distinct by @andygrove in https://github.com/dask-contrib/dask-sql/pull/748
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/757
    • [DF] Upgrade pyo and change some signatures to use &str instead of String by @andygrove in https://github.com/dask-contrib/dask-sql/pull/762
    • Fix upstream failures in test_groupby_split_out by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/763
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/764
    • [DF] Switch back to architectured builds by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/765
    • [DF] Remove python constraint from nightly recipe by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/766
    • [DF] Generalize CREATE | PREDICT MODEL to accept non-native SELECT statements by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/747
    • [DF] Use Datafusion 12.0.0 by @andygrove in https://github.com/dask-contrib/dask-sql/pull/767
    • [DF] Use correct schema in TableProvider by @andygrove in https://github.com/dask-contrib/dask-sql/pull/769
    • Update docs by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/768
    • [DF] Add support for switching schema in DaskSqlContext by @andygrove in https://github.com/dask-contrib/dask-sql/pull/770
    • Add step argument to get_window_bounds for pandas>=1.5 by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/774
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/775
    • c.ipython_magic fix for Jupyter Lab by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/772
    • [DF] Remove PyPI release workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/776
    • Remove PyPI release workflow by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/777
    • sync: main to datafusion-sql-planner by @github-actions in https://github.com/dask-contrib/dask-sql/pull/778

    New Contributors

    • @andygrove made their first contribution in https://github.com/dask-contrib/dask-sql/pull/494
    • @galipremsagar made their first contribution in https://github.com/dask-contrib/dask-sql/pull/542
    • @ChrisJar made their first contribution in https://github.com/dask-contrib/dask-sql/pull/629

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.8.0...2022.9.0.rc0

    Source code(tar.gz)
    Source code(zip)
  • 2022.8.0(Aug 16, 2022)

    What's Changed

    • Unpin dask/distributed for development by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/564
    • Update docs theme by @scharlottej13 in https://github.com/dask-contrib/dask-sql/pull/567
    • Make sure scheduler has Dask nightlies in upstream cluster testing by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/573
    • Update gpuCI RAPIDS_VER to 22.08 by @github-actions in https://github.com/dask-contrib/dask-sql/pull/565
    • Modify test environment pinnings to cover minimum versions by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/555
    • Don't move jar to local mvn repo by @ksonj in https://github.com/dask-contrib/dask-sql/pull/579
    • Add max version constraint for fugue by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/639
    • Add environment file & documentation for GPU tests by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/633
    • Validate UDF metadata by @brandon-b-miller in https://github.com/dask-contrib/dask-sql/pull/641
    • Set Dask-sql as the default Fugue Dask engine when installed by @goodwanghan in https://github.com/dask-contrib/dask-sql/pull/640
    • Generalize analyze/sample tests to resolve CI failures by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/668
    • Update CodeCov upload step in CI by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/666
    • Bump arrow version to 6.0.0 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/674
    • Update gpuCI RAPIDS_VER to 22.10 by @github-actions in https://github.com/dask-contrib/dask-sql/pull/665
    • Constrain dask pinnings for release by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/690

    New Contributors

    • @scharlottej13 made their first contribution in https://github.com/dask-contrib/dask-sql/pull/567
    • @ksonj made their first contribution in https://github.com/dask-contrib/dask-sql/pull/579

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.6.0...2022.8.0

    Source code(tar.gz)
    Source code(zip)
  • 2022.6.0(Jun 3, 2022)

    What's Changed

    • Unpin Dask/distributed versions by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/452
    • Add jsonschema to ci testing by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/454
    • Switch tests from pd.testing.assert_frame_equal to dd.assert_eq by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/365
    • Set max pin on antlr4-python-runtime by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/456
    • Move / minimize number of cudf / dask-cudf imports by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/480
    • Use map_partitions to compute LIMIT / OFFSET by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/517
    • Use dev images for independent cluster testing by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/518
    • Add documentation for FugueSQL integrations by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/523
    • Timestampdiff support by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/495
    • Relax jsonschema testing dependency by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/546
    • Update upstream testing workflows by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/536
    • Fix pyarrow / cloudpickle failures in cluster testing by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/553
    • Use bash -l as default entrypoint for all upstream testing jobs by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/552
    • Constrain dask/distributed for release by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/563

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.4.1...2022.6.0

    Source code(tar.gz)
    Source code(zip)
  • 2022.4.1(Apr 8, 2022)

    What's Changed

    • Add Java source code to source distribution by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/451
    • Bump httpclient dependency by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/453

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.4.0...2022.4.1

    Source code(tar.gz)
    Source code(zip)
  • 2022.4.0(Apr 7, 2022)

    What's Changed

    • Switch github-script action to v3 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/379
    • Unpin dask/distributed following release by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/381
    • Fix typo by @wence- in https://github.com/dask-contrib/dask-sql/pull/382
    • Remove defaults channel from conda envs by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/384
    • Don't persist dataframes before applying offset / limit by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/387
    • Update gpuCI RAPIDS_VER to 22.04 by @github-actions in https://github.com/dask-contrib/dask-sql/pull/374
    • Feature/jdbc by @PeterLappo in https://github.com/dask-contrib/dask-sql/pull/351
    • Bump gpuCI PYTHON_VER to 3.9 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/388
    • Stop using defaults channel in dev environments by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/393
    • Use versioneer to compute __version__ by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/396
    • [REVIEW] Modified show.ftl to conditionally expect FROM in parsing logic by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/371
    • Fix TIMESTAMP / DATE scalars, add support for DATE column casting by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/343
    • Enable ability for user to pass in a list of CBO rules that should be… by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/389
    • Drop support for python 3.7, add testing for python 3.10 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/383
    • Bump pre-release package versions to be greater than stable releases by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/405
    • Update pytest to generate a client fixture by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/398
    • Use build_ext/install_lib subclasses to build external java by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/406
    • Fix use of row UDFs at intermediate query stages by @brandon-b-miller in https://github.com/dask-contrib/dask-sql/pull/409
    • [Review] Refactor ConfigContainer to use dask config by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/392
    • Provide meta to result of complex _apply_offset by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/420
    • Fix logic for unary join operands like IS NOT NULL by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/428
    • Update docs theme, use sphinx-tabs for CPU/GPU examples by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/394
    • Resolve independent cluster test failures by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/437
    • Only use session-wide client fixture for independent cluster testing by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/439
    • Drop common column from result of cross join, remove from corresponding meta by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/408
    • Add basic predicate-pushdown optimization by @rjzamora in https://github.com/dask-contrib/dask-sql/pull/433
    • Add workflow to keep datafusion-sql-planner branch up to date by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/440
    • Update gpuCI RAPIDS_VER to 22.06 by @github-actions in https://github.com/dask-contrib/dask-sql/pull/434
    • Bump black style checks to 22.3.0 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/443
    • Check for ucx-py nightlies when updating gpuCI by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/441
    • Add handling for newer prompt_toolkit versions in cmd tests by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/447
    • Resolve gpuCI workflow failures by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/446
    • Update versions of Java dependencies by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/445
    • Update jackson databind version by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/449
    • Disable SQL server functionality by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/448
    • Update dask pinnings for release by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/450

    New Contributors

    • @wence- made their first contribution in https://github.com/dask-contrib/dask-sql/pull/382
    • @PeterLappo made their first contribution in https://github.com/dask-contrib/dask-sql/pull/351
    • @rjzamora made their first contribution in https://github.com/dask-contrib/dask-sql/pull/433

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2022.1.0...2022.4.0

    Source code(tar.gz)
    Source code(zip)
  • 2022.1.0(Jan 24, 2022)

    What's Changed

    • Disable CodeCov upload in tests on forks by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/349
    • Cost based optimization by @nils-braun in https://github.com/dask-contrib/dask-sql/pull/226
    • Add latest dask-ml to upstream testing by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/354
    • Bump gpuCI CUDA_VER to 11.5 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/348
    • Update Calcite to 1.29.0 and log4j to 2.17.0 to address CVE-2021-44228 by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/347
    • Removed uneeded log4j instance that was causing version conflicts and generating slf4j warning messages by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/358
    • Added getContext() method to DaskPlanner to ensure that CalciteConfigC… by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/362
    • Add os environment option to enable remote jvm debugging by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/363
    • Fix issue reporting in scheduled upstream testing by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/355
    • Remove Join Condition Push CBO Rule since it was causing infinite cos… by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/359
    • Parse ROWS as tuples in SQL kwargs by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/338
    • Add support for gpu kwarg in Context.sql and explain by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/368
    • Remove max version restriction for Dask/Distributed by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/369
    • Use upstream Dask for complex sorting operations by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/336
    • xfail failing model tests by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/373
    • Add substr tests by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/372
    • Fix pandas BaseIndexer import by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/377
    • Bump dask-ml dependency by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/378
    • [REVIEW] Fix unary conditional join operations by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/366
    • Pin dask/distributed versions for release by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/380

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2021.12.0...2022.1.0

    Source code(tar.gz)
    Source code(zip)
  • 2021.12.0(Dec 13, 2021)

    What's Changed

    • Update nightly recipe / setup for 2021.11.0 release by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/308
    • Add test build using latest Dask/Distributed by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/306
    • General GHA workflow clean up by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/313
    • Add testing for Python 3.9 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/314
    • Use Boa for nightly builds by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/318
    • Add handling for cuDF-backed tables in dask-sql-server by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/312
    • Row UDF scalar arguments by @brandon-b-miller in https://github.com/dask-contrib/dask-sql/pull/311
    • Update register_func() in context.py by @DaceT in https://github.com/dask-contrib/dask-sql/pull/282
    • Bump dask-ml dependency to 2021.11.16 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/322
    • Add groupby split_out config options to dask-sql by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/286
    • Remove null-splitting from _perform_aggregation by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/273
    • Revert "Remove null-splitting from _perform_aggregation" by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/325
    • Resolve failures in nightly package builds by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/328
    • Add workflow to automate gpuCI updates by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/327
    • Update gpuCI RAPIDS_VER to 22.02 by @github-actions in https://github.com/dask-contrib/dask-sql/pull/329
    • Installing Dask-SQL w/ RAPIDS by @DaceT in https://github.com/dask-contrib/dask-sql/pull/324
    • Remove null-splitting from _perform_aggregation by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/326
    • Generalize table check in _get_tables_from_stack by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/333
    • Add support for GPU table creation in dask / location plugins by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/251
    • Circumvent deep copy of context in PredictModelPlugin by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/334
    • Unrestrict conda-build version used for nightly builds by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/335
    • Update conditions for apply_sort fast codepath by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/337
    • [REVIEW]Add support and tests for cuML and XGBoost by @VibhuJawa in https://github.com/dask-contrib/dask-sql/pull/330
    • Ignore case for queries in the parser configuration by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/316
    • Ignore .swp files by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/341
    • Added Alter schema and Alter Table by @rajagurunath in https://github.com/dask-contrib/dask-sql/pull/285
    • Bump dask dependency to >=2021.11.1,<=2021.11.2 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/345

    New Contributors

    • @DaceT made their first contribution in https://github.com/dask-contrib/dask-sql/pull/282
    • @github-actions made their first contribution in https://github.com/dask-contrib/dask-sql/pull/329

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/2021.11.0...2021.12.0

    Source code(tar.gz)
    Source code(zip)
  • 2021.11.0(Nov 10, 2021)

    What's Changed

    • Use unique names for null/non-null groupby columns by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/289
    • Use string separator in nightly version string by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/295
    • [Review] Update readme and docstrings to indicate GPU support by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/292
    • Add DISTRIBUTE BY to dask-sql grammar by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/228
    • Use Dask's sort_values for first column sorting in apply_sort by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/255
    • xfail broken dask-ml tests by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/304
    • Bump dask pinning to 2021.10.0 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/303
    • Prevent JVM Segfault by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/294
    • Make meta consistent with results of cross join by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/300

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/0.4.0...2021.11.0

    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Nov 2, 2021)

    What's Changed

    • More efficient window implementation by @nils-braun in https://github.com/dask-contrib/dask-sql/pull/217
    • Support creating tables from cudf dataframes by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/220
    • Re-enable the hive tests by @nils-braun in https://github.com/dask-contrib/dask-sql/pull/221
    • Reading tables with a dask-cudf DataFrame by @sarahyurick in https://github.com/dask-contrib/dask-sql/pull/224
    • Introduces parallel tests to speed up the processing by @nils-braun in https://github.com/dask-contrib/dask-sql/pull/230
    • Explicitly install sasl in CI by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/244
    • Add gpuCI support by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/240
    • Add issue templates by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/247
    • Fix test_deprecation_warning in gpuCI by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/248
    • [Review] Add fast path for multi-column sorting by @quasiben in https://github.com/dask-contrib/dask-sql/pull/229
    • Add conda dev environments for Python 3.7/3.8, JDK 8/11 by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/238
    • Add support for CONCAT by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/253
    • [REVIEW] Fast path when possible for non numeric aggregation by @VibhuJawa in https://github.com/dask-contrib/dask-sql/pull/236
    • Restrict docker/deploy jobs to upstream repo, cancel concurrent test runs by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/254
    • Do not persist data to memory by default when creating tables by @jdye64 in https://github.com/dask-contrib/dask-sql/pull/245
    • Add flake8 pre-commit hook by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/235
    • Automatically label bugs / feature requests for triage by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/261
    • Support pandas style row udfs by @brandon-b-miller in https://github.com/dask-contrib/dask-sql/pull/246
    • Publish nightly builds to dask conda channel by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/263
    • Revert conda build tweaks by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/266
    • Try anaconda upload again for conda package upload by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/267
    • Feature/improve cli by @rajagurunath in https://github.com/dask-contrib/dask-sql/pull/231
    • Simplify DataContainer.assign operation by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/271
    • Added bug fix for window func by @rajagurunath in https://github.com/dask-contrib/dask-sql/pull/277
    • Pass return_type through to meta in apply by @brandon-b-miller in https://github.com/dask-contrib/dask-sql/pull/275
    • [Review] Add gpu tests for string functions by @ayushdg in https://github.com/dask-contrib/dask-sql/pull/256
    • Simplify single-partition sorting logic by @charlesbluca in https://github.com/dask-contrib/dask-sql/pull/262
    • Require UDF return type and update docs by @brandon-b-miller in https://github.com/dask-contrib/dask-sql/pull/283

    New Contributors

    • @ayushdg made their first contribution in https://github.com/dask-contrib/dask-sql/pull/220
    • @charlesbluca made their first contribution in https://github.com/dask-contrib/dask-sql/pull/244
    • @quasiben made their first contribution in https://github.com/dask-contrib/dask-sql/pull/229
    • @VibhuJawa made their first contribution in https://github.com/dask-contrib/dask-sql/pull/236
    • @jdye64 made their first contribution in https://github.com/dask-contrib/dask-sql/pull/245
    • @brandon-b-miller made their first contribution in https://github.com/dask-contrib/dask-sql/pull/246

    Full Changelog: https://github.com/dask-contrib/dask-sql/compare/0.3.9...0.4.0

    Source code(tar.gz)
    Source code(zip)
  • 0.3.9(Aug 18, 2021)

    Bugfixes

    • Do not depend on pkg not specified in setup.py (#214)
    • Use the mambaforge installer to speed up the build process (#216)
    • Update all links from nils-braun to dask-contrib. (Fixes #212)
    • Make JOINs also work for non-pandas dask dataframes (e.g. dask-cudf) (#211)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.8(Aug 17, 2021)

  • 0.3.7(Aug 10, 2021)

    Features

    • Allow for multiple schemas (#205)
    • AutoML capabilities (#199)
    • Implement the regr count SQL operator (#193)
    • ML model improvement : Added SHOW MODELS, EXPORT MODEL and DESCRIBE MODEL (#185, #191)
    • Implement the search and sargs operator (#184)

    Bugfixes

    • Fixes for pandas 1.3.0 (#202)
    • Fix test fixture order (#194)
    • Fix a failing build, as ciso8601 is currently not pip-installable (#192)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.6(May 16, 2021)

  • 0.3.5(May 15, 2021)

    Bugfixes

    • Speed up aggregations when there are no aggregates (#174)
    • Register the lower and upper-case version of a function (#177)
    • Reverting a bug in the casting logic to cast only if really needed (#176)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.4(May 13, 2021)

    Small feature addons

    • Added correct casting and mod operation (#172)
    • Implement OVER for arbitrary windows (#164)
    • Allow to start a SQL server from a jupyter notebook (#162)

    Bugfixes and Improvements

    • Sort optimizations (#167, #173)
    • Fix scikit learn version in docker file
    • Add test with independent dask cluster (#165)
    • Speed up builds with mamba (#171)
    • Remove version constraints for pandas and dask as the errors were fixed upstream (#170)
    • Fixed the replacement of functions/aggregations and added a test (#169)
    • Added missing version in pom
    Source code(tar.gz)
    Source code(zip)
  • 0.3.3(Apr 30, 2021)

    Small feature addons

    • Allow function reregistration (#161)
    • upgrade fugue dependency (#160)
    • Implement a wrapper for the prompt_toolkit session (#159)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.2(Apr 13, 2021)

    Small feature addons

    • First working (but slow) implementation of OVER (#157)
    • Add a visualize function (#153)
    • IPython/Jupyter Magic (#146)
    • Hive/Databricks from SQL (#145)

    Bugfixes and Improvements

    • Improve documentation
    • Better cross joins (#150)
    • Fix a bug which occurs when only filters are present in groupbys (#154)
    • Make testing a bit easier to type
    • Fix a warning on regexes
    • Split out the jupyter notebook integration (#152)
    • Add pre commit hook (#149)
    • Limit the dask version until the dask-ml problem is fixed (#147)
    • Turn off docker image building of PRs
    • Fix integration with dbfs using the newest fsspec version (#140)
    • Show a reasonable traceback on exceptions (#142)
    • Docker image improvements (#137)
    • Support for Float (pandas extension type) and filter with NaNs (#136)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.1(Feb 7, 2021)

    Small feature addons

    • Aggregate improvements and SQL compatibility (#134)
    • New call operations (#122)
    • Added notebook with a 'Tour de dask-sql' (#119)

    Bugfixes and Improvements

    • Docs improvements (#132)
    • Fix the fugue dependency (#133)
    • Pandas dependency fix (#129)
    • Added missing iris.csv data set
    • Pip installation docs improvement (#128)
    • Correctly sort NULLs (#126)
    • Importlib import (#125)
    • Do not touch already installed dask and pandas version as this may lead to incompatibilities (#123)
    • Average decimal type (#121)
    • Fixing a bug in column container copies (#120)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Jan 21, 2021)

    Features

    • Allow for an sqlalchemy and a hive cursor input (#90)
    • Allow to register the same function with multiple parameter combinations (#93)
    • Additional datetime functions (#91)
    • Server and CMD CLI script (#94)
    • Split the SQL documentation in subpages and add a lot more documentation (#107)
    • DROP TABLE and IF NOT EXISTS/REPLACE (#98)
    • SQL Machine Learning Syntax (#108)
    • ANALYZE TABLE (#105)
    • Random sample operators (#115)
    • Read from Intake Catalogs (#113)
    • Adding fugue integration and tests (#116) and fsql (#118)

    Bugfixes

    • Keep casing also with unquoted identifiers. (#88)
    • Scalar where clauses (#89)
    • Check for the correct java path on Windows (#86)
    • Remove # pragma once where it is not needed anymore (#92)
    • Refactor the hive input handling (#95)
    • Limit pandas version (#100)
    • Handle the case of the java version is undefined correctly (#101)
    • Add datetime[ns, UTC] as understood type (#103)
    • Make sure to treat integers as integers (#109)
    • On ORDER BY queries, show the column names of the SELECT query (#110)
    • Always refer to a function with the name given by the user (#111)
    • Do not fail on empty SQL commands (#114)
    • Fix the random sample test (#117)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.2(Nov 28, 2020)

  • 0.2.1(Nov 19, 2020)

    Bugfixes and Improvements

    • Increase speed and parallelism of the limit algorithm and implement descending sorting (#75)
    • Improved the ability to create (materialized) views of queries (#77)
    • Added missing __version__ variable (#79)
    • Improved Docker image (#78)
    • Allow arbitrary return types in SQL server (#76)
    • Bugfix: Added tzlocal dependencies
    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Nov 5, 2020)

    Additional Features

    • Unify dask-sql API with blazing SQL (#63) This also brings an experimental hive input binding.
    • Added binder repository and example notebooks (forked from @raybellwaves) (#72)
    • Better/correct presto server (#69), now working together with many BI tools and ready for multiple clients in parallel
    • Enable input from published datasets (#68)
    • Use pytest for all the tests instead of unittest (#67)
    • SHOW SCHEMA now includes FROM and LIKE - and the information_schema is added (#62)
    • Some remaining simple operations (#54)

    Bugfixes

    • Allow None in LIKE calls and add tests for regression (#71)
    • Bugfix: correct isinf check, which also works distributed
    • Use the default conformance level, which e.g. allows to reuse aliases in the query (#66)
    • Set the JAVA_HOME in conda environments and warn the user, if not set cirrectly. (#65)

    Additional Fixes and Documentation

    • Some ignore file fixes
    • Fixes to typos, docu and format
    • Docker images with latest tag (#73)
    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Oct 14, 2020)

  • 0.1.1(Oct 13, 2020)

Owner
Nils Braun
Data Engineer, tsfresh Developer, Python Enthusiast
Nils Braun
Making it easy to query APIs via SQL

Shillelagh Shillelagh (ʃɪˈleɪlɪ) is an implementation of the Python DB API 2.0 based on SQLite (using the APSW library): from shillelagh.backends.apsw

Beto Dealmeida 207 Dec 30, 2022
Databank is an easy-to-use Python library for making raw SQL queries in a multi-threaded environment.

Databank Databank is an easy-to-use Python library for making raw SQL queries in a multi-threaded environment. No ORM, no frills. Thread-safe. Only ra

snapADDY GmbH 4 Apr 04, 2022
ClickHouse Python Driver with native interface support

ClickHouse Python Driver ClickHouse Python Driver with native (TCP) interface support. Asynchronous wrapper is available here: https://github.com/myma

Marilyn System 957 Dec 30, 2022
Official Python low-level client for Elasticsearch

Python Elasticsearch Client Official low-level client for Elasticsearch. Its goal is to provide common ground for all Elasticsearch-related code in Py

elastic 3.8k Jan 01, 2023
db.py is an easier way to interact with your databases

db.py What is it Databases Supported Features Quickstart - Installation - Demo How To Contributing TODO What is it? db.py is an easier way to interact

yhat 1.2k Jan 03, 2023
Google Sheets Python API v4

pygsheets - Google Spreadsheets Python API v4 A simple, intuitive library for google sheets which gets your work done. Features: Open, create, delete

Nithin Murali 1.4k Dec 31, 2022
Pure Python MySQL Client

PyMySQL Table of Contents Requirements Installation Documentation Example Resources License This package contains a pure-Python MySQL client library,

PyMySQL 7.2k Jan 09, 2023
python-bigquery Apache-2python-bigquery (🥈34 · ⭐ 3.5K · 📈) - Google BigQuery API client library. Apache-2

Python Client for Google BigQuery Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google

Google APIs 550 Jan 01, 2023
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

PyPika - Python Query Builder Abstract What is PyPika? PyPika is a Python API for building SQL queries. The motivation behind PyPika is to provide a s

KAYAK 1.9k Jan 04, 2023
A pythonic interface to Amazon's DynamoDB

PynamoDB A Pythonic interface for Amazon's DynamoDB. DynamoDB is a great NoSQL service provided by Amazon, but the API is verbose. PynamoDB presents y

2.1k Dec 30, 2022
A simple password manager I typed with python using MongoDB .

Python with MongoDB A simple python code example using MongoDB. How do i run this code • First of all you need to have a python on your computer. If y

31 Dec 06, 2022
An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets

datasets_sql A 🤗 Datasets extension package that provides support for executing arbitrary SQL queries on HF datasets. It uses DuckDB as a SQL engine

Mario Šaško 19 Dec 15, 2022
A Python Object-Document-Mapper for working with MongoDB

MongoEngine Info: MongoEngine is an ORM-like layer on top of PyMongo. Repository: https://github.com/MongoEngine/mongoengine Author: Harry Marr (http:

MongoEngine 3.9k Jan 08, 2023
Generate database table diagram from SQL data definition.

sql2diagram Generate database table diagram from SQL data definition. e.g. "CREATE TABLE ..." See Example below How does it works? Analyze the SQL to

django-cas-ng 1 Feb 08, 2022
dask-sql is a distributed SQL query engine in python using Dask

dask-sql is a distributed SQL query engine in Python. It allows you to query and transform your data using a mixture of common SQL operations and Python code and also scale up the calculation easily

Nils Braun 271 Dec 30, 2022
A Relational Database Management System for a miniature version of Twitter written in MySQL with CLI in python.

Mini-Twitter-Database This was done as a database design course project at Amirkabir university of technology. This is a relational database managemen

Ali 12 Nov 23, 2022
Records is a very simple, but powerful, library for making raw SQL queries to most relational databases.

Records: SQL for Humans™ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Kenneth Reitz 6.9k Jan 03, 2023
Create a database, insert data and easily select it with Sqlite

sqliteBasics create a database, insert data and easily select it with Sqlite Watch on YouTube a step by step tutorial explaining this code: https://yo

Mariya 27 Dec 27, 2022
Python cluster client for the official redis cluster. Redis 3.0+.

redis-py-cluster This client provides a client for redis cluster that was added in redis 3.0. This project is a port of redis-rb-cluster by antirez, w

Grokzen 1.1k Jan 05, 2023
Find graph motifs using intuitive notation

d o t m o t i f Find graph motifs using intuitive notation DotMotif is a library that identifies subgraphs or motifs in a large graph. It looks like t

APL BRAIN 45 Jan 02, 2023