A pandas-like deferred expression system, with first-class SQL support

Last update: Jan 06, 2023

Overview

Ibis: Python data analysis framework for Hadoop and SQL engines

Service	Status
Documentation
Conda packages
PyPI
Azure
Coverage

Ibis is a toolbox to bridge the gap between local Python environments, remote storage, execution systems like Hadoop components (HDFS, Impala, Hive, Spark) and SQL databases. Its goal is to simplify analytical workflows and make you more productive.

Install Ibis from PyPI with:

pip install ibis-framework

or from conda-forge with

conda install ibis-framework -c conda-forge

Ibis currently provides tools for interacting with the following systems:

Learn more about using the library at http://ibis-project.org.

Comments

RLS: 1.3

we should do a release soon, as last was in mid-summer.

anyone interested in being release manager? also an opportunity to update docs on how-to-release.

opened by jreback 48
Move Impala backend to backends/ directory

Did a first iteration of the move from impala to backends/impala. Some imports arent resolving, but it seems like that would have been the case even before move, to be discussed. Still need to work on documentation update as well.
refactor backends - impala

opened by matthewmturner 41
refactor: use nix to manage development dependencies
This is a PR to change the way local development is done in Ibis (while preserving as much backwards compatibility with conda and non-conda/non-nix workflows as possible!), accomplishing the following things:

The main change here is the addition of a development environment that is self-contained and derived from a single source of truth (pyproject.toml), using nix.

See docs/web/contribute.md for more details.

Note that both conda and setuptools-based development workflows are still supported as first class citizens.

DONE:

~A move to poetry for dependency management.~ done in #2937 It is still possible and supported to use setup.py and setuptools if you like, this functionality is checked in CI.

~More extensive CI testing, with minimal additional CI time used. It looks like roughly on average 3 minutes of additional CI time occur, with more of the library being tested on Windows in particular~ done in #2937

~Automatic generation of a conda recipe for every release, uploaded to GitHub releases (this is the remaining 1% that isn't automated but this can be automated by pushing a feedstock pull request directly from CI)~ I plan to do this in conda-forge directly, using their integrated bot automation

~Automated Dependabot updates for github-actions and poetry dependencies~ this is done using Renovate in #3053

~Generating a setup.py file from the project's pyproject.toml file~ ~This is now just checked that it doesn't need to be regenerated, since we want to ensure this is tested in CI where relevant.~ done in #3054, necessary for Renovate dependency update PRs

~Automatic generation of a conda environment file, uploaded to GitHub releases, useful for developers using conda~ We have automatically generated lockfiles for linux, macos, and windows, for all three versions of Python currently supported (3.7, 3.8, 3.9). These live under conda-lock and can be used to set up a development environment very quickly since there's no solve step required. Done #3077, #3080, #3083, #3087

Follow ups:

workflow_dispatch (click-to-run) GitHub Actions workflow that cuts a release to PyPI and GitHub Releases.

Automatic license updates once a year or on-demand. This is optional, but it's another thing we can automate at some point.

~Automatic nix dependency updates via PR submission every six hours or on-demand. This has to be done in a follow up.~ Done in https://github.com/ibis-project/ibis/pull/3182

Happy to address any concerns here, let's automate everything!
ci dependencies developer-tools released
opened by cpcloud 39
WIP: Add Support for ODBC Connection
Closes #985

Some caveats:

I’m not sure ODBC connection belong to impala module, since all databases are supported, provided a supporting ODBC driver. I should probably create ODBCConnection class and derived sub for each of the client.

I choose turbodbc over pyodbc for two things,

I’m unable to create weakref out of pyodbc connection object

turbodbc support fetchnumpy object which I believe integrate nicely to pandas.

There is no ping in turbodbc cursor

I have to modify _column_batches_to_dataframe to make a space for turbodbc

If someone willing me to point a test connection would be great.

cc @mariusvniekerk

[x] nthreads option

[ ] unit tests

feature
opened by napjon 38
[MapD] Added Geospatial functions
This PR solves #1665 and solves #1707

Add Geo Spatial functions on the main structure and define these functions inside MapD backend.

References:

https://github.com/Quansight/mapd/issues/21

https://www.omnisci.com/docs/latest/5_geospatial_functions.html

Depends on #1666 ( PR 1666 was used as base for the current PR)

Geospatial functions

Geometry/Geography Constructors

[x] ST_GeomFromText(WKT) - using literals

[x] ST_GeogFromText(WKT) - using literals

Geometry Editors

~ST_Transform (Returns a new geometry with its coordinates transformed to a different spatial reference system.)~

~ST_SetSRID (Sets the SRID on a geometry to a particular integer value.)~

Geometry Accessors

[x] ST_X (Return the X coordinate of the point, or NULL if not available. Input must be a point.)

[x] ST_Y (Return the Y coordinate of the point, or NULL if not available. Input must be a point.)

[x] ST_XMin (Returns Y minima of a bounding box 2d or 3d or a geometry.)

[x] ST_XMax (Returns X maxima of a bounding box 2d or 3d or a geometry.)

[x] ST_YMin (Returns Y minima of a bounding box 2d or 3d or a geometry.)

[x] ST_YMax (Returns Y maxima of a bounding box 2d or 3d or a geometry.)

[x] ST_StartPoint (Returns the first point of a LINESTRING geometry as a POINT or NULL if the input parameter is not a LINESTRING.)

[x] ST_EndPoint (Returns the last point of a LINESTRING geometry as a POINT or NULL if the input parameter is not a LINESTRING.)

[x] ST_PointN (Return the Nth point in a single linestring in the geometry. Negative values are counted backwards from the end of the LineString, so that -1 is the last point. Returns NULL if there is no linestring in the geometry.)

[x] ST_NPoints (Return the number of points in a geometry. Works for all geometries.)

[x] ST_NRings (If the geometry is a polygon or multi-polygon returns the number of rings. It counts the outer rings as well.)

[x] ST_SRID (Returns the spatial reference identifier for the ST_Geometry)

Spatial Relationships and Measurements

[x] ST_Distance

[x] ST_Contains

[x] ST_Area

[x] ST_Perimeter

[x] ST_Length

[x] ST_MaxDistance

Extra

~CastToGeography~ TODO: will be added in a new PR.

expressions
opened by xmnlab 35
Implementation for insert method for SQLAlchemy backends

Closes #2613

The implementation is similar to what is currently implemented for the Impala database. Please have a look and merge it if you think this helps your requirement. I have also added commentary to what will work and what won't in the code so that enhancements can be made to it later on. Thanks.
feature backends - sqlite

opened by harsharaj96 33
FEAT: Remove execution_type param and add gpu_device and ipc to execute method for OmniSciDB backend
In this PR:

Removed execution_type parameter and add gpu_device: int and ipc: bool for OmniSciDB backend (Resolves #1919)

Fixed categorical data type issue when using cudf.DataFrame output (Resolves #1920 )

add mock test for execution_type using pytest-mock
opened by xmnlab 33
FEAT: PostGIS support

Fixes #1786

This is in an extremely early state, but I wanted to open this for comment now as I'm not likely to get the chance to work much on it for a few days.
feature backends - postgres

opened by ian-r-rose 32
Docstrings checking

Accordingly pydocstyle, currently there are 3445 docstings issues inside the project.

I have started to fix that #1996

I will open one PR for each backend dosctring fixing

opened by xmnlab 29
feat(api): add `read` top level function for interactive analysis with CSV and Parquet files
This PR adds a read top level API function for reading the following kinds of things:

Local CSV, TSV, TXT and Parquet files. CSV/TSV/TXT files can be gzip compressed

Remote versions of any of those

Globs of 1., mixing compressed/uncompressed is allowed.

feature ux backends - duckdb
opened by cpcloud 26
RLS: 2.0
Next release, 2.0.

Issues that need to be closed before the release:

#2379 Entrypoints

#2448 Backends as conda subpackages

#2356 Move omnisci backend to another repo

Will be setting the milestone of these issues and any other after #2525 is discussed.

xref: https://github.com/ibis-project/ibis/issues/2321#issuecomment-726724955
opened by datapythonista 26
fix(pandas): use the correct aggregation context for window functions

This PR fixes an issue with the pandas backend where the Transform aggregation context was being used instead of the Moving context. Transform should only be used when there's a group_by present, and no order_by.

A backend test and associated data was is also added to catch regressions.

Closes #4676.
bug tests backends - pandas backends

opened by cpcloud 1
refactor: remove the `JSONB` type

This type isn't well supported. Currently, its only purpose is to allow users to consume tables with a postgres-native JSONB type, but the expression types provide no novel APIs beyond binary operations (which is an implementation detail of JSONB). This commit removes the JSONB type and uses the existing JSON type for table consumption.

BREAKING CHANGE: The JSONB type is replaced by the JSON type.
refactor backends - postgres breaking change type system

opened by cpcloud 1
chore: enable `flake-bugbear` and `flake8-2020` and fix lints

This PR enables the flake8-bugbear plugin and fixes associated lints, most of which are assert False usage and calling functions in default arguments. flake8-2020 is also enabled, and there were no existing violations thereof.
developer-tools

opened by cpcloud 1
bug: fix execution return type and casting in many backends

This PR closes a long-standing issue seen in #2064.

While fixing that issue, there were a number of other casting issues related to pandas (and thus dask) and pyspark that are also fixed here.

Happily, a bunch of tests that were previous mark as notyet or notimpl started passing by fixing these issues.

There are still a couple failures that I need to address, but otherwise this PR is ready for review.

Closes #2064.
backends

opened by cpcloud 1

Releases(3.2.0)

3.2.0(Sep 15, 2022)
3.2.0 (2022-09-15)

Features

add api to get backend entry points (0152f5e)

api: add and_ and or_ helpers (94bd4df)

api: add argmax and argmin column methods (b52216a)

api: add distinct to Intersection and Difference operations (cd9a34c)

api: add ibis.memtable API for constructing in-memory table expressions (0cc6948)

api: add ibis.sql to easily get a formatted SQL string (d971cc3)

api: add Table.unpack() and StructValue.lift() APIs for projecting struct fields (ced5f53)

api: allow transmute-style select method (d5fc364)

api: implement all bitwise operators (7fc5073)

api: promote psql to a show_sql public API (877a05d)

clickhouse: add dataframe external table support for memtables (bc86aa7)

clickhouse: add enum, ipaddr, json, lowcardinality to type parser (8f0287f)

clickhouse: enable support for working window functions (310a5a8)

clickhouse: implement argmin and argmax (ee7c878)

clickhouse: implement bitwise operations (348cd08)

clickhouse: implement struct scalars (1f3efe9)

dask: implement StringReplace execution (1389f4b)

dask: implement ungrouped argmin and argmax (854aea7)

deps: support duckdb 0.5.0 (47165b2)

duckdb: handle query parameters in ibis.connect (fbde95d)

duckdb: implement argmin and argmax (abf03f1)

duckdb: implement bitwise xor (ca3abed)

duckdb: register tables from pandas/pyarrow objects (36e48cc)

duckdb: support unsigned integer types (2e67918)

impala: implement bitwise operations (c5302ab)

implement dropna for SQL backends (8a747fb)

log: make BaseSQLBackend._log print by default (12de5bb)

mysql: register BLOB types (1e4fb92)

pandas: implement argmin and argmax (bf9b948)

pandas: implement NotContains on grouped data (976dce7)

pandas: implement StringReplace execution (578795f)

pandas: implement Contains with a group by (c534848)

postgres: implement bitwise xor (9b1ebf5)

pyspark: add option to treat nan as null in aggregations (bf47250)

pyspark: implement ibis.connect for pyspark (a191744)

pyspark: implement Intersection and Difference (9845a3c)

pyspark: implement bitwise operators (33cadb1)

sqlalchemy: implement bitwise operator translation (bd9f64c)

sqlalchemy: make ibis.connect with sqlalchemy backends (b6cefb9)

sqlalchemy: properly implement Intersection and Difference (2bc0b69)

sql: implement StringReplace translation (29daa32)

sqlite: implement bitwise xor and bitwise not (58c42f9)

support table.sort_by(ibis.random()) (693005d)

type-system: infer pandas' string dtype (5f0eb5d)

ux: add duckdb as the default backend (8ccb81d)

ux: use rich to format Table.info() output (67234c3)

ux: use sqlglot for pretty printing SQL (a3c81c5)

variadic union, intersect, & difference functions (05aca5a)

Bug Fixes

api: make sure column names that are already inferred are not overwritten (6f1cb16)

api: support deferred objects in existing API functions (241ce6a)

backend: ensure that chained limits respect prior limits (02a04f5)

backends: ensure select after filter works (e58ca73)

backends: only recommend installing ibis-foo when foo is a known backend (ac6974a)

base-sql: fix String-generating backend string concat implementation (3cf78c1)

clickhouse: add IPv4/IPv6 literal inference (0a2f315)

clickhouse: cast repeat times argument to UInt64 (b643544)

clickhouse: fix listing tables from databases with no tables (08900c3)

compilers: make sure memtable rows have names in the SQL string compilers (18e7f95)

compiler: use repr for SQL string VALUES data (75af658)

dask: ensure predicates are computed before projections (5cd70e1)

dask: implement timestamp-date binary comparisons (48d5058)

dask: set dask upper bound due to large scale test breakage (796c645), closes #9221

decimal: add decimal type inference (3fe3fd8)

deps: update dependency duckdb-engine to >=0.1.8,<0.4.0 (113dc8f)

deps: update dependency duckdb-engine to >=0.1.8,<0.5.0 (ef97c9d)

deps: update dependency parsy to v2 (9a06131)

deps: update dependency shapely to >=1.6,<1.8.4 (0c787d2)

deps: update dependency shapely to >=1.6,<1.8.5 (d08c737)

deps: update dependency sqlglot to v5 (f210bb8)

deps: update dependency sqlglot to v6 (5ca4533)

duckdb: add missing types (59bad07)

duckdb: ensure that in-memory connections remain in their creating thread (39bc537)

duckdb: use fetch_arrow_table() to be able to handle big timestamps (85a76eb)

fix bug in pandas & dask difference implementation (88a78fa)

fix dask where implementation (49f8845)

impala: add date column dtype to impala to ibis type dict (c59e94e), closes #4449

pandas where supports scalar for left (48f6c1e)

pandas: fix anti-joins (10a659d)

pandas: implement timestamp-date binary comparisons (4fc666d)

pandas: properly handle empty groups when aggregating with GroupConcat (6545f4d)

pyspark: fix broken StringReplace implementation (22cb297)

pyspark: make sure ibis.connect works with pyspark (a7ab107)

pyspark: translate predicates before projections (b3d1c80)

sqlalchemy: fix float64 type mapping (8782773)

sqlalchemy: handle reductions with multiple arguments (5b2039b)

sqlalchemy: implement SQLQueryResult translation (786a50f)

sql: fix sql compilation after making InMemoryTable a subclass of PhysicalTable (aac9524)

squash several bugs in sort_by asc/desc handling (222b2ba)

support chained set operations in SQL backends (227aed3)

support filters on InMemoryTable exprs (abfaf1f)

typo: in BaseSQLBackend.compile docstring (0561b13)

Deprecations

right kwarg in union/intersect/difference (719a5a1)

duckdb: deprecate path argument in favor of database (fcacc20)

sqlite: deprecate path argument in favor of database (0f85919)

Performance

pandas: remove reexecution of alias children (64efa53)

pyspark: ensure that pyspark DDL doesn't use VALUES (422c98d)

sqlalchemy: register DataFrames cheaply where possible (ee9f1be)

Documentation

add to_sql (e2821a5)

add back constraints for transitive doc dependencies and fix docs (350fd43)

add coc reporting information (c2355ba)

add community guidelines documentation (fd0893f)

add HeavyAI to the readme (4c5ca80)

add how-to bfill and ffill (ff84027)

add how-to for ibis+duckdb register (73a726e)

add how-to section to docs (33c4b93)

duckdb: add installation note for duckdb >= 0.5.0 (608b1fb)

fix memtable docstrings (72bc0f5)

fix flake8 line length issues (fb7af75)

fix markdown (4ab6b95)

fix relative links in tutorial (2bd075f), closes #4064 #4201

make attribution style uniform across the blog (05561e0)

move the blog out to the top level sidebar for visibility (417ba64)

remove underspecified UDF doc page (0eb0ac0)

Source code(tar.gz)
Source code(zip)
ibis_framework-3.2.0-py3-none-any.whl(756.51 KB)
3.1.0(Jul 26, 2022)
3.1.0 (2022-07-26)

Features

add __getattr__ support to StructValue (75bded1)

allow selection subclasses to define new node args (2a7dc41)

api: accept Schema objects in public ibis.schema (0daac6c)

api: add .tables accessor to BaseBackend (7ad27f0)

api: add e function to public API (3a07e70)

api: add ops.StructColumn operation (020bfdb)

api: add cume_dist operation (6b6b185)

api: add toplevel ibis.connect() (e13946b)

api: handle literal timestamps with timezone embedded in string (1ae976b)

api: ibis.connect() default to duckdb for parquet/csv extensions (ff2f088)

api: make struct metadata more convenient to access (3fd9bd8)

api: support tab completion for backends (eb75fc5)

api: underscore convenience api (81716da)

api: unnest (98ecb09)

backends: allow column expressions from non-foreign tables on the right side of isin/notin (e1374a4)

base-sql: implement trig and math functions (addb2c1)

clickhouse: add ability to pass arbitrary kwargs to Clickhouse do_connect (583f599)

clickhouse: implement ops.StructColumn operation (0063007)

clickhouse: implement array collect (8b2577d)

clickhouse: implement ArrayColumn (1301f18)

clickhouse: implement bit aggs (f94a5d2)

clickhouse: implement clip (12dfe50)

clickhouse: implement covariance and correlation (a37c155)

clickhouse: implement degrees (7946c0f)

clickhouse: implement proper type serialization (80f4ab9)

clickhouse: implement radians (c7b7f08)

clickhouse: implement strftime (222f2b5)

clickhouse: implement struct field access (fff69f3)

clickhouse: implement trig and math functions (c56440a)

clickhouse: support subsecond timestamp literals (e8698a6)

compiler: restore intersect_class and difference_class overrides in base SQL backend (2c46a15)

dask: implement trig functions (e4086bb)

dask: implement zeroifnull (38487db)

datafusion: implement negate (69dd64d)

datafusion: implement trig functions (16803e1)

duckdb: add register method to duckdb backend to load parquet and csv files (4ccc6fc)

duckdb: enable find_in_set test (377023d)

duckdb: enable group_concat test (4b9ad6c)

duckdb: implement ops.StructColumn operation (211bfab)

duckdb: implement approx_count_distinct (03c89ad)

duckdb: implement approx_median (894ce90)

duckdb: implement arbitrary first and last aggregation (8a500bc)

duckdb: implement NthValue (1bf2842)

duckdb: implement strftime (aebc252)

duckdb: return the ir.Table instance from DuckDB's register API (0d05d41)

mysql: implement FindInSet (e55bbbf)

mysql: implement StringToTimestamp (169250f)

pandas: implement bitwise aggregations (37ff328)

pandas: implement degrees (25b4f69)

pandas: implement radians (6816b75)

pandas: implement trig functions (1fd52d2)

pandas: implement zeroifnull (48e8ed1)

postgres/duckdb: implement covariance and correlation (464d3ef)

postgres: implement ArrayColumn (7b0a506)

pyspark: implement approx_count_distinct (1fe1d75)

pyspark: implement approx_median (07571a9)

pyspark: implement covariance and correlation (ae818fb)

pyspark: implement degrees (f478c7c)

pyspark: implement nth_value (abb559d)

pyspark: implement nullifzero (640234b)

pyspark: implement radians (18843c0)

pyspark: implement trig functions (fd7621a)

pyspark: implement Where (32b9abb)

pyspark: implement xor (550b35b)

pyspark: implement zeroifnull (db13241)

pyspark: topk support (9344591)

sqlalchemy: add degrees and radians (8b7415f)

sqlalchemy: add xor translation rule (2921664)

sqlalchemy: allow non-primitive arrays (4e02918)

sqlalchemy: implement approx_count_distinct as count distinct (4e8bcab)

sqlalchemy: implement clip (8c02639)

sqlalchemy: implement trig functions (34c1514)

sqlalchemy: implement Where (7424704)

sqlalchemy: implement zeroifnull (4735e9a)

sqlite: implement BitAnd, BitOr and BitXor (e478479)

sqlite: implement cotangent (01e7ce7)

sqlite: implement degrees and radians (2cf9c5e)

Bug Fixes

api: bring back null datatype parsing (fc131a1)

api: compute the type from both branches of Where expressions (b8f4120)

api: ensure that Deferred objects work in aggregations (bbb376c)

api: ensure that nulls can be cast to any type to allow caller promotion (fab4393)

api: make ExistSubquery and NotExistsSubquery pure boolean operations (dd70024)

backends: make execution transactional where possible (d1ea269)

clickhouse: cast empty result dataframe (27ae68a)

clickhouse: handle empty IN and NOT IN expressions (2c892eb)

clickhouse: return null instead of empty string for group_concat when values are filtered out (b826b40)

compiler: fix bool bool comparisons (1ac9a9e)

dask/pandas: allow limit to be None (9f91d6b)

dask: aggregation with multi-key groupby fails on dask backend (4f8bc70)

datafusion: handle predicates in aggregates (4725571)

deps: update dependency datafusion to >=0.4,<0.7 (f5b244e)

deps: update dependency duckdb to >=0.3.2,<0.5.0 (57ee818)

deps: update dependency duckdb-engine to >=0.1.8,<0.3.0 (3e379a0)

deps: update dependency geoalchemy2 to >=0.6.3,<0.13 (c04a533)

deps: update dependency geopandas to >=0.6,<0.12 (b899c37)

deps: update dependency Shapely to >=1.6,<1.8.3 (87a49ad)

deps: update dependency toolz to >=0.11,<0.13 (258a641)

don't mask udf module in init.py (3e567ba)

duckdb: ensure that paths with non-extension . chars are parsed correctly (9448fd3)

duckdb: fix struct datatype parsing (5124763)

duckdb: force string_agg separator to be a constant (21cdf2f)

duckdb: handle multiple dotted extensions; quote names; consolidate implementations (1494246)

duckdb: remove timezone function invocation (33d38fc)

geospatial: ensure that later versions of numpy are compatible with geospatial code (33f0afb)

impala: a delimited table explicitly declare stored as textfile (04086a4), closes #4260

impala: remove broken nth_value implementation (dbc9cc2)

ir: don't attempt fusion when projections aren't exactly equivalent (3482ba2)

mysql: cast mysql timestamp literals to ensure correct return type (8116e04)

mysql: implement integer to timestamp using from_unixtime (1b43004)

pandas/dask: look at pre_execute for has_operation reporting (cb44efc)

pandas: execute negate on bool as not (330ab4f)

pandas: fix struct inference from dict in the pandas backend (5886a9a)

pandas: force backend options registration on trace.enable() calls (8818fe6)

pandas: handle empty boolean column casting in Series conversion (f697e3e)

pandas: handle struct columns with NA elements (9a7c510)

pandas: handle the case of selection from a join when remapping overlapping column names (031c4c6)

pandas: perform correct equality comparison (d62e7b9)

postgres/duckdb: cast after milliseconds computation instead of after extraction (bdd1d65)

pyspark: handle predicates in Aggregation (842c307)

pyspark: prevent spark from trying to convert timezone of naive timestamps (dfb4127)

pyspark: remove xpassing test for #2453 (c051e28)

pyspark: specialize implementation of has_operation (5082346)

pyspark: use empty check for collect_list in GroupConcat rule (df66acb)

repr: allow DestructValue selections to be formatted by fmt (4b45d87)

repr: when formatting DestructValue selections, use struct field names as column names (d01fe42)

sqlalchemy: fix parsing and construction of nested array types (e20bcc0)

sqlalchemy: remove unused second argument when creating temporary views (8766b40)

sqlite: register coversion to isoformat for pandas.Timestamp (fe95dca)

sqlite: test case with whitespace at the end of the line (7623ae9)

sql: use isoformat for timestamp literals (70d0ba6)

type-system: infer null datatype for empty sequence of expressions (f67d5f9)

use bounded precision for decimal aggregations (596acfb)

Performance Improvements

analysis: add _projection as cached_property to avoid reconstruction of projections (98510c8)

lineage: ensure that expressions are not traversed multiple times in most cases (ff9708c)

Reverts

ci: install sqlite3 on ubuntu (1f2705f)

Source code(tar.gz)
Source code(zip)
ibis_framework-3.1.0-py3-none-any.whl(734.97 KB)
3.0.2(Apr 28, 2022)
3.0.2 (2022-04-28)

Bug Fixes

docs: fix tempdir location for docs build (dcd1b22)

Source code(tar.gz)
Source code(zip)
ibis_framework-3.0.2-py3-none-any.whl(711.27 KB)
3.0.1(Apr 28, 2022)
3.0.1 (2022-04-28)

Bug Fixes

build: replace version before exec plugin runs (573139c)

Source code(tar.gz)
Source code(zip)
3.0.0(Apr 25, 2022)
3.0.0 (2022-04-25)

⚠ BREAKING CHANGES

ir: The following are breaking changes due to simplifying expression internals

ibis.expr.datatypes.DataType.scalar_type and DataType.column_type factory methods have been removed, DataType.scalar and DataType.column class fields can be used to directly construct a corresponding expression instance (though prefer to use operation.to_expr())

ibis.expr.types.ValueExpr._name and ValueExpr._dtype`` fields are not accassible anymore. While these were not supposed to used directly nowValueExpr.has_name(),ValueExpr.get_name()andValueExpr.type()` methods are the only way to retrieve the expression's name and datatype.

ibis.expr.operations.Node.output_type is a property now not a method, decorate those methods with @property

ibis.expr.operations.ValueOp subclasses must define output_shape and output_dtype properties from now on (note the datatype abbreviation dtype in the property name)

ibis.expr.rules.cast(), scalar_like() and array_like() rules have been removed

api: Replace t["a"].distinct() with t[["a"]].distinct().

deps: The sqlalchemy lower bound is now 1.4

ir: Schema.names and Schema.types attributes now have tuple type rather than list

expr: Columns that were added or used in an aggregation or mutation would be alphabetically sorted in compiled SQL outputs. This was a vestige from when Python dicts didn't preserve insertion order. Now columns will appear in the order in which they were passed to aggregate or mutate

api: dt.float is now dt.float64; use dt.float32 for the previous behavior.

ir: Relation-based execute_node dispatch rules must now accept tuples of expressions.

ir: removed ibis.expr.lineage.{roots,find_nodes} functions

config: Use ibis.options.graphviz_repr = True to enable

hdfs: Use fsspec instead of HDFS from ibis

udf: Vectorized UDF coercion functions are no longer a public API.

The minimum supported Python version is now Python 3.8

config: register_option is no longer supported, please submit option requests upstream

backends: Read tables with pandas.read_hdf and use the pandas backend

The CSV backend is removed. Use Datafusion for CSV execution.

backends: Use the datafusion backend to read parquet files

Expr() -> Expr.pipe()

coercion functions previously in expr/schema.py are now in udf/vectorized.py

api: materialize is removed. Joins with overlapping columns now have suffixes.

kudu: use impala instead: https://kudu.apache.org/docs/kudu_impala_integration.html

Any code that was relying implicitly on string-y behavior from UUID datatypes will need to add an explicit cast first.

Features

add repr_html for expressions to print as tables in ipython (cd6fa4e)

add duckdb backend (667f2d5)

allow construction of decimal literals (3d9e865)

api: add ibis.asc expression (efe177e), closes #1454

api: add has_operation API to the backend (4fab014)

api: implement type for SortExpr (ab19bd6)

clickhouse: implement string concat for clickhouse (1767205)

clickhouse: implement StrRight operation (67749a0)

clickhouse: implement table union (e0008d7)

clickhouse: implement trim, pad and string predicates (a5b7293)

datafusion: implement Count operation (4797a86)

datatypes: unbounded decimal type (f7e6f65)

date: add ibis.date(y,m,d) functionality (26892b6), closes #386

duckdb/postgres/mysql/pyspark: implement .sql on tables for mixing sql and expressions (00e8087)

duckdb: add functionality needed to pass integer to interval test (e2119e8)

duckdb: implement _get_schema_using_query (93cd730)

duckdb: implement now() function (6924f50)

duckdb: implement regexp replace and extract (18d16a7)

implement force argument in sqlalchemy backend base class (9df7f1b)

implement coalesce for the pyspark backend (8183efe)

implement semi/anti join for the pandas backend (cb36fc5)

implement semi/anti join for the pyspark backend (3e1ba9c)

implement the remaining clickhouse joins (b3aa1f0)

ir: rewrite and speed up expression repr (45ce9b2)

mysql: implement _get_schema_from_query (456cd44)

mysql: move string join impl up to alchemy for mysql (77a8eb9)

postgres: implement _get_schema_using_query (f2459eb)

pyspark: implement Distinct for pyspark (4306ad9)

pyspark: implement log base b for pyspark (527af3c)

pyspark: implement percent_rank and enable testing (c051617)

repr: add interval info to interval repr (df26231)

sqlalchemy: implement ilike (43996c0)

sqlite: implement date_truncate (3ce4f2a)

sqlite: implement ISO week of year (714ff7b)

sqlite: implement string join and concat (6f5f353)

support of arrays and tuples for clickhouse (db512a8)

ver: dynamic version identifiers (408f862)

Bug Fixes

added wheel to pyproject toml for venv users (b0b8e5c)

allow major version changes in CalVer dependencies (9c3fbe5)

annotable: allow optional arguments at any position (778995f), closes #3730

api: add ibis.map and .struct (327b342), closes #3118

api: map string multiplication with integer to repeat method (b205922)

api: thread suffixes parameter to individual join methods (31a9aff)

change TimestampType to Timestamp (e0750be)

clickhouse: disconnect from clickhouse when computing version (11cbf08)

clickhouse: use a context manager for execution (a471225)

combine windows during windowization (7fdd851)

conform epoch_seconds impls to expression return type (18a70f1)

context-adjustment: pass scope when calling adjust_context in pyspark backend (33aad7b), closes #3108

dask: fix asof joins for newer version of dask (50711cc)

dask: workaround dask bug (a0f3bd9)

deps: update dependency atpublic to v3 (3fe8f0d)

deps: update dependency datafusion to >=0.4,<0.6 (3fb2194)

deps: update dependency geoalchemy2 to >=0.6.3,<0.12 (dc3c361)

deps: update dependency graphviz to >=0.16,<0.21 (3014445)

duckdb: add casts to literals to fix binding errors (1977a55), closes #3629

duckdb: fix array column type discovery on leaf tables and add tests (15e5412)

duckdb: fix log with base b impl (4920097)

duckdb: support both 0.3.2 and 0.3.3 (a73ccce)

enforce the schema's column names in apply_to (b0f334d)

expose ops.IfNull for mysql backend (156c2bd)

expr: add more binary operators to char list and implement fallback (b88184c)

expr: fix formatting of table info using tabulate (b110636)

fix float vs real data type detection in sqlalchemy (24e6774)

fix list_schemas argument (69c1abf)

fix postgres udfs and reenable ci tests (7d480d2)

fix tablecolumn execution for filter following join (064595b)

format: remove some newlines from formatted expr repr (ed4fa78)

histogram: cross_join needs onclause=True (5d36a58), closes #622

ibis.expr.signature.Parameter is not pickleable (828fd54)

implement coalesce properly in the pandas backend (aca5312)

implement count on tables for pyspark (7fe5573), closes #2879

infer coalesce types when a non-null expression occurs after the first argument (c5f2906)

mutate: do not lift table column that results from mutate (ba4e5e5)

pandas: disable range windows with order by (e016664)

pandas: don't reassign the same column to silence SettingWithCopyWarning warning (75dc616)

pandas: implement percent_rank correctly (d8b83e7)

prevent unintentional cross joins in mutate + filter (83eef99)

pyspark: fix range windows (a6f2aa8)

regression in Selection.sort_by with resolved_keys (c7a69cd)

regression in sort_by with resolved_keys (63f1382), closes #3619

remove broken csv pre_execute (93b662a)

remove importorskip call for backend tests (2f0bcd8)

remove incorrect fix for pandas regression (339f544)

remove passing schema into register_parquet (bdcbb08)

repr: add ops.TimeAdd to repr binop lookup table (fd94275)

repr: allow ops.TableNode in fmt_value (6f57003)

reverse the predicate pushdown subsitution (f3cd358)

sort_index to satisfy pandas 1.4.x (6bac0fc)

sqlalchemy: ensure correlated subqueries FROM clauses are rendered (3175321)

sqlalchemy: use corresponding_column to prevent spurious cross joins (fdada21)

sqlalchemy: use replace selectables to prevent semi/anti join cross join (e8a1a71)

sql: retain column names for named ColumnExprs (f1b4b6e), closes #3754

sql: walk right join trees and substitute joins with right-side joins with views (0231592)

store schema on the pandas backend to allow correct inference (35070be)

Performance Improvements

datatypes: speed up str and hash (262d3d7)

fast path for simple column selection (d178498)

ir: global equality cache (13c2bb2)

ir: introduce CachedEqMixin to speed up equality checks (b633925)

repr: remove full tree repr from rule validator error message (65885ab)

speed up attribute access (89d1c05)

use assign instead of concat in projections when possible (985c242)

Miscellaneous Chores

deps: increase sqlalchemy lower bound to 1.4 (560854a)

drop support for Python 3.7 (0afd138)

Code Refactoring

api: make primitive types more cohesive (71da8f7)

api: remove distinct ColumnExpr API (3f48cb8)

api: remove materialize (24285c1)

backends: remove the hdf5 backend (ff34f3e)

backends: remove the parquet backend (b510473)

config: disable graphviz-repr-in-notebook by default (214ad4e)

config: remove old config code and port to pydantic (4bb96d1)

dt.UUID inherits from DataType, not String (2ba540d)

expr: preserve column ordering in aggregations/mutations (668be0f)

hdfs: replace HDFS with fsspec (cc6eddb)

ir: make Annotable immutable (1f2b3fa)

ir: make schema annotable (b980903)

ir: remove unused lineage roots and find_nodes functions (d630a77)

ir: simplify expressions by not storing dtype and name (e929f85)

kudu: remove support for use of kudu through kudu-python (36bd97f)

move coercion functions from schema.py to udf (58eea56), closes #3033

remove blanket call for Expr (3a71116), closes #2258

remove the csv backend (0e3e02e)

udf: make coerce functions in ibis.udf.vectorized private (9ba4392)

Source code(tar.gz)
Source code(zip)
2.1.1(Jan 12, 2022)
2.1.1 (2022-01-12)

Bug Fixes

setup.py: set the correct version number for 2.1.0 (f3d267b)

Source code(tar.gz)
Source code(zip)
ibis_framework-2.1.1-py3-none-any.whl(682.64 KB)
2.1.0(Jan 12, 2022)
2.1.0 (2022-01-12)

Bug Fixes

consider all packages' entry points (b495cf6)

datatypes: infer bytes literal as binary #2915 (#3124) (887efbd)

deps: bump minimum dask version to 2021.10.0 (e6b5c09)

deps: constrain numpy to ensure wheels are used on windows (70c308b)

deps: update dependency clickhouse-driver to ^0.1 || ^0.2.0 (#3061) (a839d54)

deps: update dependency geoalchemy2 to >=0.6,<0.11 (4cede9d)

deps: update dependency pyarrow to v6 (#3092) (61e52b5)

don't force backends to override do_connect until 3.0.0 (4b46973)

execute materialized joins in the pandas and dask backends (#3086) (9ed937a)

literal: allow creating ibis literal with uuid (#3131) (b0f4f44)

restore the ability to have more than two option levels (#3151) (fb4a944)

sqlalchemy: fix correlated subquery compilation (43b9010)

sqlite: defer db connection until needed (#3127) (5467afa), closes #64

Features

allow column_of to take a column expression (dbc34bb)

ci: More readable workflow job titles (#3111) (d8fd7d9)

datafusion: initial implementation for Arrow Datafusion backend (3a67840), closes #2627

datafusion: initial implementation for Arrow Datafusion backend (75876d9), closes #2627

make dayofweek impls conform to pandas semantics (#3161) (9297828)

Reverts

"ci: install gdal for fiona" (8503361)

Source code(tar.gz)
Source code(zip)
ibis_framework-2.1.0-py3-none-any.whl(682.63 KB)

A pandas-like deferred expression system, with first-class SQL support

Related tags

Overview

Ibis: Python data analysis framework for Hadoop and SQL engines

Comments

Geospatial functions

Releases(3.2.0)

3.2.0(Sep 15, 2022)

3.2.0 (2022-09-15)

Features

Bug Fixes

Deprecations

Performance

Documentation

3.1.0(Jul 26, 2022)

3.1.0 (2022-07-26)

Features

Bug Fixes

Performance Improvements

Reverts

3.0.2(Apr 28, 2022)

3.0.2 (2022-04-28)

Bug Fixes

3.0.1(Apr 28, 2022)

3.0.1 (2022-04-28)

Bug Fixes

3.0.0(Apr 25, 2022)

3.0.0 (2022-04-25)

⚠ BREAKING CHANGES

Features

Bug Fixes

Performance Improvements

Miscellaneous Chores

Code Refactoring

2.1.1(Jan 12, 2022)

2.1.1 (2022-01-12)

Bug Fixes

2.1.0(Jan 12, 2022)

2.1.0 (2022-01-12)

Bug Fixes

Features

Reverts

Owner

Ibis Project

Async ORM based on PyPika

Sample scripts to show extracting details directly from the AIQUM database

Py2neo is a client library and toolkit for working with Neo4j from within Python

Pystackql - Python wrapper for StackQL

A fast MySQL driver written in pure C/C++ for Python. Compatible with gevent through monkey patching.

A simple Python tool to transfer data from MySQL to SQLite 3.

Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases

Asynchronous interface for peewee ORM powered by asyncio

MinIO Client SDK for Python

Google Cloud Client Library for Python

Micro ODM for MongoDB

An asyncio compatible Redis driver, written purely in Python. This is really just a pet-project for me.

SAP HANA Connector in pure Python

Official Python low-level client for Elasticsearch

A simple wrapper to make a flat file drop in raplacement for mongodb out of TinyDB

Amazon S3 Transfer Manager for Python

DBMS Mini-project: Recruitment Management System

Python ODBC bridge

Redis Python Client

SpyQL - SQL with Python in the middle