A pandas-like deferred expression system, with first-class SQL support

Overview

Ibis: Python data analysis framework for Hadoop and SQL engines

Service Status
Documentation Documentation Status
Conda packages Anaconda-Server Badge
PyPI PyPI
Azure Azure Status
Coverage Codecov branch

Ibis is a toolbox to bridge the gap between local Python environments, remote storage, execution systems like Hadoop components (HDFS, Impala, Hive, Spark) and SQL databases. Its goal is to simplify analytical workflows and make you more productive.

Install Ibis from PyPI with:

pip install ibis-framework

or from conda-forge with

conda install ibis-framework -c conda-forge

Ibis currently provides tools for interacting with the following systems:

Learn more about using the library at http://ibis-project.org.

Comments
  • RLS: 1.3

    RLS: 1.3

    we should do a release soon, as last was in mid-summer.

    anyone interested in being release manager? also an opportunity to update docs on how-to-release.

    opened by jreback 48
  • Move Impala backend to backends/ directory

    Move Impala backend to backends/ directory

    Did a first iteration of the move from impala to backends/impala. Some imports arent resolving, but it seems like that would have been the case even before move, to be discussed. Still need to work on documentation update as well.

    refactor backends - impala 
    opened by matthewmturner 41
  • refactor: use nix to manage development dependencies

    refactor: use nix to manage development dependencies

    This is a PR to change the way local development is done in Ibis (while preserving as much backwards compatibility with conda and non-conda/non-nix workflows as possible!), accomplishing the following things:

    The main change here is the addition of a development environment that is self-contained and derived from a single source of truth (pyproject.toml), using nix.

    See docs/web/contribute.md for more details.

    Note that both conda and setuptools-based development workflows are still supported as first class citizens.

    DONE:

    1. ~A move to poetry for dependency management.~ done in #2937 It is still possible and supported to use setup.py and setuptools if you like, this functionality is checked in CI.
    2. ~More extensive CI testing, with minimal additional CI time used. It looks like roughly on average 3 minutes of additional CI time occur, with more of the library being tested on Windows in particular~ done in #2937
    3. ~Automatic generation of a conda recipe for every release, uploaded to GitHub releases (this is the remaining 1% that isn't automated but this can be automated by pushing a feedstock pull request directly from CI)~ I plan to do this in conda-forge directly, using their integrated bot automation
    4. ~Automated Dependabot updates for github-actions and poetry dependencies~ this is done using Renovate in #3053
    5. ~Generating a setup.py file from the project's pyproject.toml file~ ~This is now just checked that it doesn't need to be regenerated, since we want to ensure this is tested in CI where relevant.~ done in #3054, necessary for Renovate dependency update PRs
    6. ~Automatic generation of a conda environment file, uploaded to GitHub releases, useful for developers using conda~ We have automatically generated lockfiles for linux, macos, and windows, for all three versions of Python currently supported (3.7, 3.8, 3.9). These live under conda-lock and can be used to set up a development environment very quickly since there's no solve step required. Done #3077, #3080, #3083, #3087

    Follow ups:

    1. workflow_dispatch (click-to-run) GitHub Actions workflow that cuts a release to PyPI and GitHub Releases.
    2. Automatic license updates once a year or on-demand. This is optional, but it's another thing we can automate at some point.
    3. ~Automatic nix dependency updates via PR submission every six hours or on-demand. This has to be done in a follow up.~ Done in https://github.com/ibis-project/ibis/pull/3182

    Happy to address any concerns here, let's automate everything!

    ci dependencies developer-tools released 
    opened by cpcloud 39
  • WIP: Add Support for ODBC Connection

    WIP: Add Support for ODBC Connection

    Closes #985

    Some caveats:

    • I’m not sure ODBC connection belong to impala module, since all databases are supported, provided a supporting ODBC driver. I should probably create ODBCConnection class and derived sub for each of the client.
    • I choose turbodbc over pyodbc for two things,
      • I’m unable to create weakref out of pyodbc connection object
      • turbodbc support fetchnumpy object which I believe integrate nicely to pandas.
    • There is no ping in turbodbc cursor
    • I have to modify _column_batches_to_dataframe to make a space for turbodbc
    • If someone willing me to point a test connection would be great.

    cc @mariusvniekerk

    • [x] nthreads option

    • [ ] unit tests

    feature 
    opened by napjon 38
  • [MapD] Added Geospatial functions

    [MapD] Added Geospatial functions

    This PR solves #1665 and solves #1707

    Add Geo Spatial functions on the main structure and define these functions inside MapD backend.

    References:

    • https://github.com/Quansight/mapd/issues/21
    • https://www.omnisci.com/docs/latest/5_geospatial_functions.html

    Depends on #1666 ( PR 1666 was used as base for the current PR)

    Geospatial functions

    • Geometry/Geography Constructors
      • [x] ST_GeomFromText(WKT) - using literals
      • [x] ST_GeogFromText(WKT) - using literals
    • Geometry Editors
      • ~ST_Transform (Returns a new geometry with its coordinates transformed to a different spatial reference system.)~
      • ~ST_SetSRID (Sets the SRID on a geometry to a particular integer value.)~
    • Geometry Accessors
      • [x] ST_X (Return the X coordinate of the point, or NULL if not available. Input must be a point.)
      • [x] ST_Y (Return the Y coordinate of the point, or NULL if not available. Input must be a point.)
      • [x] ST_XMin (Returns Y minima of a bounding box 2d or 3d or a geometry.)
      • [x] ST_XMax (Returns X maxima of a bounding box 2d or 3d or a geometry.)
      • [x] ST_YMin (Returns Y minima of a bounding box 2d or 3d or a geometry.)
      • [x] ST_YMax (Returns Y maxima of a bounding box 2d or 3d or a geometry.)
      • [x] ST_StartPoint (Returns the first point of a LINESTRING geometry as a POINT or NULL if the input parameter is not a LINESTRING.)
      • [x] ST_EndPoint (Returns the last point of a LINESTRING geometry as a POINT or NULL if the input parameter is not a LINESTRING.)
      • [x] ST_PointN (Return the Nth point in a single linestring in the geometry. Negative values are counted backwards from the end of the LineString, so that -1 is the last point. Returns NULL if there is no linestring in the geometry.)
      • [x] ST_NPoints (Return the number of points in a geometry. Works for all geometries.)
      • [x] ST_NRings (If the geometry is a polygon or multi-polygon returns the number of rings. It counts the outer rings as well.)
      • [x] ST_SRID (Returns the spatial reference identifier for the ST_Geometry)
    • Spatial Relationships and Measurements
      • [x] ST_Distance
      • [x] ST_Contains
      • [x] ST_Area
      • [x] ST_Perimeter
      • [x] ST_Length
      • [x] ST_MaxDistance
    • Extra
      • ~CastToGeography~ TODO: will be added in a new PR.
    expressions 
    opened by xmnlab 35
  • Implementation for insert method for SQLAlchemy backends

    Implementation for insert method for SQLAlchemy backends

    Closes #2613

    The implementation is similar to what is currently implemented for the Impala database. Please have a look and merge it if you think this helps your requirement. I have also added commentary to what will work and what won't in the code so that enhancements can be made to it later on. Thanks.

    feature backends - sqlite 
    opened by harsharaj96 33
  • FEAT: Remove execution_type param and add gpu_device and ipc to execute method for OmniSciDB backend

    FEAT: Remove execution_type param and add gpu_device and ipc to execute method for OmniSciDB backend

    In this PR:

    • Removed execution_type parameter and add gpu_device: int and ipc: bool for OmniSciDB backend (Resolves #1919)
    • Fixed categorical data type issue when using cudf.DataFrame output (Resolves #1920 )
    • add mock test for execution_type using pytest-mock
    opened by xmnlab 33
  • FEAT: PostGIS support

    FEAT: PostGIS support

    Fixes #1786

    This is in an extremely early state, but I wanted to open this for comment now as I'm not likely to get the chance to work much on it for a few days.

    feature backends - postgres 
    opened by ian-r-rose 32
  • Docstrings checking

    Docstrings checking

    Accordingly pydocstyle, currently there are 3445 docstings issues inside the project.

    I have started to fix that #1996

    I will open one PR for each backend dosctring fixing

    opened by xmnlab 29
  • feat(api): add `read` top level function for interactive analysis with CSV and Parquet files

    feat(api): add `read` top level function for interactive analysis with CSV and Parquet files

    This PR adds a read top level API function for reading the following kinds of things:

    1. Local CSV, TSV, TXT and Parquet files. CSV/TSV/TXT files can be gzip compressed
    2. Remote versions of any of those
    3. Globs of 1., mixing compressed/uncompressed is allowed.
    feature ux backends - duckdb 
    opened by cpcloud 26
  • RLS: 2.0

    RLS: 2.0

    Next release, 2.0.

    Issues that need to be closed before the release:

    • #2379 Entrypoints
    • #2448 Backends as conda subpackages
    • #2356 Move omnisci backend to another repo

    Will be setting the milestone of these issues and any other after #2525 is discussed.

    xref: https://github.com/ibis-project/ibis/issues/2321#issuecomment-726724955

    opened by datapythonista 26
  • fix(pandas): use the correct aggregation context for window functions

    fix(pandas): use the correct aggregation context for window functions

    This PR fixes an issue with the pandas backend where the Transform aggregation context was being used instead of the Moving context. Transform should only be used when there's a group_by present, and no order_by.

    A backend test and associated data was is also added to catch regressions.

    Closes #4676.

    bug tests backends - pandas backends 
    opened by cpcloud 1
  • refactor: remove the `JSONB` type

    refactor: remove the `JSONB` type

    This type isn't well supported. Currently, its only purpose is to allow users to consume tables with a postgres-native JSONB type, but the expression types provide no novel APIs beyond binary operations (which is an implementation detail of JSONB). This commit removes the JSONB type and uses the existing JSON type for table consumption.

    BREAKING CHANGE: The JSONB type is replaced by the JSON type.

    refactor backends - postgres breaking change type system 
    opened by cpcloud 1
  • chore: enable `flake-bugbear` and `flake8-2020` and fix lints

    chore: enable `flake-bugbear` and `flake8-2020` and fix lints

    This PR enables the flake8-bugbear plugin and fixes associated lints, most of which are assert False usage and calling functions in default arguments. flake8-2020 is also enabled, and there were no existing violations thereof.

    developer-tools 
    opened by cpcloud 1
  • bug: fix execution return type and casting in many backends

    bug: fix execution return type and casting in many backends

    This PR closes a long-standing issue seen in #2064.

    While fixing that issue, there were a number of other casting issues related to pandas (and thus dask) and pyspark that are also fixed here.

    Happily, a bunch of tests that were previous mark as notyet or notimpl started passing by fixing these issues.

    There are still a couple failures that I need to address, but otherwise this PR is ready for review.

    Closes #2064.

    backends 
    opened by cpcloud 1
Releases(3.2.0)
  • 3.2.0(Sep 15, 2022)

    3.2.0 (2022-09-15)

    Features

    • add api to get backend entry points (0152f5e)
    • api: add and_ and or_ helpers (94bd4df)
    • api: add argmax and argmin column methods (b52216a)
    • api: add distinct to Intersection and Difference operations (cd9a34c)
    • api: add ibis.memtable API for constructing in-memory table expressions (0cc6948)
    • api: add ibis.sql to easily get a formatted SQL string (d971cc3)
    • api: add Table.unpack() and StructValue.lift() APIs for projecting struct fields (ced5f53)
    • api: allow transmute-style select method (d5fc364)
    • api: implement all bitwise operators (7fc5073)
    • api: promote psql to a show_sql public API (877a05d)
    • clickhouse: add dataframe external table support for memtables (bc86aa7)
    • clickhouse: add enum, ipaddr, json, lowcardinality to type parser (8f0287f)
    • clickhouse: enable support for working window functions (310a5a8)
    • clickhouse: implement argmin and argmax (ee7c878)
    • clickhouse: implement bitwise operations (348cd08)
    • clickhouse: implement struct scalars (1f3efe9)
    • dask: implement StringReplace execution (1389f4b)
    • dask: implement ungrouped argmin and argmax (854aea7)
    • deps: support duckdb 0.5.0 (47165b2)
    • duckdb: handle query parameters in ibis.connect (fbde95d)
    • duckdb: implement argmin and argmax (abf03f1)
    • duckdb: implement bitwise xor (ca3abed)
    • duckdb: register tables from pandas/pyarrow objects (36e48cc)
    • duckdb: support unsigned integer types (2e67918)
    • impala: implement bitwise operations (c5302ab)
    • implement dropna for SQL backends (8a747fb)
    • log: make BaseSQLBackend._log print by default (12de5bb)
    • mysql: register BLOB types (1e4fb92)
    • pandas: implement argmin and argmax (bf9b948)
    • pandas: implement NotContains on grouped data (976dce7)
    • pandas: implement StringReplace execution (578795f)
    • pandas: implement Contains with a group by (c534848)
    • postgres: implement bitwise xor (9b1ebf5)
    • pyspark: add option to treat nan as null in aggregations (bf47250)
    • pyspark: implement ibis.connect for pyspark (a191744)
    • pyspark: implement Intersection and Difference (9845a3c)
    • pyspark: implement bitwise operators (33cadb1)
    • sqlalchemy: implement bitwise operator translation (bd9f64c)
    • sqlalchemy: make ibis.connect with sqlalchemy backends (b6cefb9)
    • sqlalchemy: properly implement Intersection and Difference (2bc0b69)
    • sql: implement StringReplace translation (29daa32)
    • sqlite: implement bitwise xor and bitwise not (58c42f9)
    • support table.sort_by(ibis.random()) (693005d)
    • type-system: infer pandas' string dtype (5f0eb5d)
    • ux: add duckdb as the default backend (8ccb81d)
    • ux: use rich to format Table.info() output (67234c3)
    • ux: use sqlglot for pretty printing SQL (a3c81c5)
    • variadic union, intersect, & difference functions (05aca5a)

    Bug Fixes

    • api: make sure column names that are already inferred are not overwritten (6f1cb16)
    • api: support deferred objects in existing API functions (241ce6a)
    • backend: ensure that chained limits respect prior limits (02a04f5)
    • backends: ensure select after filter works (e58ca73)
    • backends: only recommend installing ibis-foo when foo is a known backend (ac6974a)
    • base-sql: fix String-generating backend string concat implementation (3cf78c1)
    • clickhouse: add IPv4/IPv6 literal inference (0a2f315)
    • clickhouse: cast repeat times argument to UInt64 (b643544)
    • clickhouse: fix listing tables from databases with no tables (08900c3)
    • compilers: make sure memtable rows have names in the SQL string compilers (18e7f95)
    • compiler: use repr for SQL string VALUES data (75af658)
    • dask: ensure predicates are computed before projections (5cd70e1)
    • dask: implement timestamp-date binary comparisons (48d5058)
    • dask: set dask upper bound due to large scale test breakage (796c645), closes #9221
    • decimal: add decimal type inference (3fe3fd8)
    • deps: update dependency duckdb-engine to >=0.1.8,<0.4.0 (113dc8f)
    • deps: update dependency duckdb-engine to >=0.1.8,<0.5.0 (ef97c9d)
    • deps: update dependency parsy to v2 (9a06131)
    • deps: update dependency shapely to >=1.6,<1.8.4 (0c787d2)
    • deps: update dependency shapely to >=1.6,<1.8.5 (d08c737)
    • deps: update dependency sqlglot to v5 (f210bb8)
    • deps: update dependency sqlglot to v6 (5ca4533)
    • duckdb: add missing types (59bad07)
    • duckdb: ensure that in-memory connections remain in their creating thread (39bc537)
    • duckdb: use fetch_arrow_table() to be able to handle big timestamps (85a76eb)
    • fix bug in pandas & dask difference implementation (88a78fa)
    • fix dask where implementation (49f8845)
    • impala: add date column dtype to impala to ibis type dict (c59e94e), closes #4449
    • pandas where supports scalar for left (48f6c1e)
    • pandas: fix anti-joins (10a659d)
    • pandas: implement timestamp-date binary comparisons (4fc666d)
    • pandas: properly handle empty groups when aggregating with GroupConcat (6545f4d)
    • pyspark: fix broken StringReplace implementation (22cb297)
    • pyspark: make sure ibis.connect works with pyspark (a7ab107)
    • pyspark: translate predicates before projections (b3d1c80)
    • sqlalchemy: fix float64 type mapping (8782773)
    • sqlalchemy: handle reductions with multiple arguments (5b2039b)
    • sqlalchemy: implement SQLQueryResult translation (786a50f)
    • sql: fix sql compilation after making InMemoryTable a subclass of PhysicalTable (aac9524)
    • squash several bugs in sort_by asc/desc handling (222b2ba)
    • support chained set operations in SQL backends (227aed3)
    • support filters on InMemoryTable exprs (abfaf1f)
    • typo: in BaseSQLBackend.compile docstring (0561b13)

    Deprecations

    • right kwarg in union/intersect/difference (719a5a1)
    • duckdb: deprecate path argument in favor of database (fcacc20)
    • sqlite: deprecate path argument in favor of database (0f85919)

    Performance

    • pandas: remove reexecution of alias children (64efa53)
    • pyspark: ensure that pyspark DDL doesn't use VALUES (422c98d)
    • sqlalchemy: register DataFrames cheaply where possible (ee9f1be)

    Documentation

    • add to_sql (e2821a5)
    • add back constraints for transitive doc dependencies and fix docs (350fd43)
    • add coc reporting information (c2355ba)
    • add community guidelines documentation (fd0893f)
    • add HeavyAI to the readme (4c5ca80)
    • add how-to bfill and ffill (ff84027)
    • add how-to for ibis+duckdb register (73a726e)
    • add how-to section to docs (33c4b93)
    • duckdb: add installation note for duckdb >= 0.5.0 (608b1fb)
    • fix memtable docstrings (72bc0f5)
    • fix flake8 line length issues (fb7af75)
    • fix markdown (4ab6b95)
    • fix relative links in tutorial (2bd075f), closes #4064 #4201
    • make attribution style uniform across the blog (05561e0)
    • move the blog out to the top level sidebar for visibility (417ba64)
    • remove underspecified UDF doc page (0eb0ac0)
    Source code(tar.gz)
    Source code(zip)
    ibis_framework-3.2.0-py3-none-any.whl(756.51 KB)
  • 3.1.0(Jul 26, 2022)

    3.1.0 (2022-07-26)

    Features

    • add __getattr__ support to StructValue (75bded1)
    • allow selection subclasses to define new node args (2a7dc41)
    • api: accept Schema objects in public ibis.schema (0daac6c)
    • api: add .tables accessor to BaseBackend (7ad27f0)
    • api: add e function to public API (3a07e70)
    • api: add ops.StructColumn operation (020bfdb)
    • api: add cume_dist operation (6b6b185)
    • api: add toplevel ibis.connect() (e13946b)
    • api: handle literal timestamps with timezone embedded in string (1ae976b)
    • api: ibis.connect() default to duckdb for parquet/csv extensions (ff2f088)
    • api: make struct metadata more convenient to access (3fd9bd8)
    • api: support tab completion for backends (eb75fc5)
    • api: underscore convenience api (81716da)
    • api: unnest (98ecb09)
    • backends: allow column expressions from non-foreign tables on the right side of isin/notin (e1374a4)
    • base-sql: implement trig and math functions (addb2c1)
    • clickhouse: add ability to pass arbitrary kwargs to Clickhouse do_connect (583f599)
    • clickhouse: implement ops.StructColumn operation (0063007)
    • clickhouse: implement array collect (8b2577d)
    • clickhouse: implement ArrayColumn (1301f18)
    • clickhouse: implement bit aggs (f94a5d2)
    • clickhouse: implement clip (12dfe50)
    • clickhouse: implement covariance and correlation (a37c155)
    • clickhouse: implement degrees (7946c0f)
    • clickhouse: implement proper type serialization (80f4ab9)
    • clickhouse: implement radians (c7b7f08)
    • clickhouse: implement strftime (222f2b5)
    • clickhouse: implement struct field access (fff69f3)
    • clickhouse: implement trig and math functions (c56440a)
    • clickhouse: support subsecond timestamp literals (e8698a6)
    • compiler: restore intersect_class and difference_class overrides in base SQL backend (2c46a15)
    • dask: implement trig functions (e4086bb)
    • dask: implement zeroifnull (38487db)
    • datafusion: implement negate (69dd64d)
    • datafusion: implement trig functions (16803e1)
    • duckdb: add register method to duckdb backend to load parquet and csv files (4ccc6fc)
    • duckdb: enable find_in_set test (377023d)
    • duckdb: enable group_concat test (4b9ad6c)
    • duckdb: implement ops.StructColumn operation (211bfab)
    • duckdb: implement approx_count_distinct (03c89ad)
    • duckdb: implement approx_median (894ce90)
    • duckdb: implement arbitrary first and last aggregation (8a500bc)
    • duckdb: implement NthValue (1bf2842)
    • duckdb: implement strftime (aebc252)
    • duckdb: return the ir.Table instance from DuckDB's register API (0d05d41)
    • mysql: implement FindInSet (e55bbbf)
    • mysql: implement StringToTimestamp (169250f)
    • pandas: implement bitwise aggregations (37ff328)
    • pandas: implement degrees (25b4f69)
    • pandas: implement radians (6816b75)
    • pandas: implement trig functions (1fd52d2)
    • pandas: implement zeroifnull (48e8ed1)
    • postgres/duckdb: implement covariance and correlation (464d3ef)
    • postgres: implement ArrayColumn (7b0a506)
    • pyspark: implement approx_count_distinct (1fe1d75)
    • pyspark: implement approx_median (07571a9)
    • pyspark: implement covariance and correlation (ae818fb)
    • pyspark: implement degrees (f478c7c)
    • pyspark: implement nth_value (abb559d)
    • pyspark: implement nullifzero (640234b)
    • pyspark: implement radians (18843c0)
    • pyspark: implement trig functions (fd7621a)
    • pyspark: implement Where (32b9abb)
    • pyspark: implement xor (550b35b)
    • pyspark: implement zeroifnull (db13241)
    • pyspark: topk support (9344591)
    • sqlalchemy: add degrees and radians (8b7415f)
    • sqlalchemy: add xor translation rule (2921664)
    • sqlalchemy: allow non-primitive arrays (4e02918)
    • sqlalchemy: implement approx_count_distinct as count distinct (4e8bcab)
    • sqlalchemy: implement clip (8c02639)
    • sqlalchemy: implement trig functions (34c1514)
    • sqlalchemy: implement Where (7424704)
    • sqlalchemy: implement zeroifnull (4735e9a)
    • sqlite: implement BitAnd, BitOr and BitXor (e478479)
    • sqlite: implement cotangent (01e7ce7)
    • sqlite: implement degrees and radians (2cf9c5e)

    Bug Fixes

    • api: bring back null datatype parsing (fc131a1)
    • api: compute the type from both branches of Where expressions (b8f4120)
    • api: ensure that Deferred objects work in aggregations (bbb376c)
    • api: ensure that nulls can be cast to any type to allow caller promotion (fab4393)
    • api: make ExistSubquery and NotExistsSubquery pure boolean operations (dd70024)
    • backends: make execution transactional where possible (d1ea269)
    • clickhouse: cast empty result dataframe (27ae68a)
    • clickhouse: handle empty IN and NOT IN expressions (2c892eb)
    • clickhouse: return null instead of empty string for group_concat when values are filtered out (b826b40)
    • compiler: fix bool bool comparisons (1ac9a9e)
    • dask/pandas: allow limit to be None (9f91d6b)
    • dask: aggregation with multi-key groupby fails on dask backend (4f8bc70)
    • datafusion: handle predicates in aggregates (4725571)
    • deps: update dependency datafusion to >=0.4,<0.7 (f5b244e)
    • deps: update dependency duckdb to >=0.3.2,<0.5.0 (57ee818)
    • deps: update dependency duckdb-engine to >=0.1.8,<0.3.0 (3e379a0)
    • deps: update dependency geoalchemy2 to >=0.6.3,<0.13 (c04a533)
    • deps: update dependency geopandas to >=0.6,<0.12 (b899c37)
    • deps: update dependency Shapely to >=1.6,<1.8.3 (87a49ad)
    • deps: update dependency toolz to >=0.11,<0.13 (258a641)
    • don't mask udf module in init.py (3e567ba)
    • duckdb: ensure that paths with non-extension . chars are parsed correctly (9448fd3)
    • duckdb: fix struct datatype parsing (5124763)
    • duckdb: force string_agg separator to be a constant (21cdf2f)
    • duckdb: handle multiple dotted extensions; quote names; consolidate implementations (1494246)
    • duckdb: remove timezone function invocation (33d38fc)
    • geospatial: ensure that later versions of numpy are compatible with geospatial code (33f0afb)
    • impala: a delimited table explicitly declare stored as textfile (04086a4), closes #4260
    • impala: remove broken nth_value implementation (dbc9cc2)
    • ir: don't attempt fusion when projections aren't exactly equivalent (3482ba2)
    • mysql: cast mysql timestamp literals to ensure correct return type (8116e04)
    • mysql: implement integer to timestamp using from_unixtime (1b43004)
    • pandas/dask: look at pre_execute for has_operation reporting (cb44efc)
    • pandas: execute negate on bool as not (330ab4f)
    • pandas: fix struct inference from dict in the pandas backend (5886a9a)
    • pandas: force backend options registration on trace.enable() calls (8818fe6)
    • pandas: handle empty boolean column casting in Series conversion (f697e3e)
    • pandas: handle struct columns with NA elements (9a7c510)
    • pandas: handle the case of selection from a join when remapping overlapping column names (031c4c6)
    • pandas: perform correct equality comparison (d62e7b9)
    • postgres/duckdb: cast after milliseconds computation instead of after extraction (bdd1d65)
    • pyspark: handle predicates in Aggregation (842c307)
    • pyspark: prevent spark from trying to convert timezone of naive timestamps (dfb4127)
    • pyspark: remove xpassing test for #2453 (c051e28)
    • pyspark: specialize implementation of has_operation (5082346)
    • pyspark: use empty check for collect_list in GroupConcat rule (df66acb)
    • repr: allow DestructValue selections to be formatted by fmt (4b45d87)
    • repr: when formatting DestructValue selections, use struct field names as column names (d01fe42)
    • sqlalchemy: fix parsing and construction of nested array types (e20bcc0)
    • sqlalchemy: remove unused second argument when creating temporary views (8766b40)
    • sqlite: register coversion to isoformat for pandas.Timestamp (fe95dca)
    • sqlite: test case with whitespace at the end of the line (7623ae9)
    • sql: use isoformat for timestamp literals (70d0ba6)
    • type-system: infer null datatype for empty sequence of expressions (f67d5f9)
    • use bounded precision for decimal aggregations (596acfb)

    Performance Improvements

    • analysis: add _projection as cached_property to avoid reconstruction of projections (98510c8)
    • lineage: ensure that expressions are not traversed multiple times in most cases (ff9708c)

    Reverts

    • ci: install sqlite3 on ubuntu (1f2705f)
    Source code(tar.gz)
    Source code(zip)
    ibis_framework-3.1.0-py3-none-any.whl(734.97 KB)
  • 3.0.2(Apr 28, 2022)

  • 3.0.1(Apr 28, 2022)

  • 3.0.0(Apr 25, 2022)

    3.0.0 (2022-04-25)

    ⚠ BREAKING CHANGES

    • ir: The following are breaking changes due to simplifying expression internals
      • ibis.expr.datatypes.DataType.scalar_type and DataType.column_type factory methods have been removed, DataType.scalar and DataType.column class fields can be used to directly construct a corresponding expression instance (though prefer to use operation.to_expr())
      • ibis.expr.types.ValueExpr._name and ValueExpr._dtype`` fields are not accassible anymore. While these were not supposed to used directly nowValueExpr.has_name(),ValueExpr.get_name()andValueExpr.type()` methods are the only way to retrieve the expression's name and datatype.
      • ibis.expr.operations.Node.output_type is a property now not a method, decorate those methods with @property
      • ibis.expr.operations.ValueOp subclasses must define output_shape and output_dtype properties from now on (note the datatype abbreviation dtype in the property name)
      • ibis.expr.rules.cast(), scalar_like() and array_like() rules have been removed
    • api: Replace t["a"].distinct() with t[["a"]].distinct().
    • deps: The sqlalchemy lower bound is now 1.4
    • ir: Schema.names and Schema.types attributes now have tuple type rather than list
    • expr: Columns that were added or used in an aggregation or mutation would be alphabetically sorted in compiled SQL outputs. This was a vestige from when Python dicts didn't preserve insertion order. Now columns will appear in the order in which they were passed to aggregate or mutate
    • api: dt.float is now dt.float64; use dt.float32 for the previous behavior.
    • ir: Relation-based execute_node dispatch rules must now accept tuples of expressions.
    • ir: removed ibis.expr.lineage.{roots,find_nodes} functions
    • config: Use ibis.options.graphviz_repr = True to enable
    • hdfs: Use fsspec instead of HDFS from ibis
    • udf: Vectorized UDF coercion functions are no longer a public API.
    • The minimum supported Python version is now Python 3.8
    • config: register_option is no longer supported, please submit option requests upstream
    • backends: Read tables with pandas.read_hdf and use the pandas backend
    • The CSV backend is removed. Use Datafusion for CSV execution.
    • backends: Use the datafusion backend to read parquet files
    • Expr() -> Expr.pipe()
    • coercion functions previously in expr/schema.py are now in udf/vectorized.py
    • api: materialize is removed. Joins with overlapping columns now have suffixes.
    • kudu: use impala instead: https://kudu.apache.org/docs/kudu_impala_integration.html
    • Any code that was relying implicitly on string-y behavior from UUID datatypes will need to add an explicit cast first.

    Features

    • add repr_html for expressions to print as tables in ipython (cd6fa4e)
    • add duckdb backend (667f2d5)
    • allow construction of decimal literals (3d9e865)
    • api: add ibis.asc expression (efe177e), closes #1454
    • api: add has_operation API to the backend (4fab014)
    • api: implement type for SortExpr (ab19bd6)
    • clickhouse: implement string concat for clickhouse (1767205)
    • clickhouse: implement StrRight operation (67749a0)
    • clickhouse: implement table union (e0008d7)
    • clickhouse: implement trim, pad and string predicates (a5b7293)
    • datafusion: implement Count operation (4797a86)
    • datatypes: unbounded decimal type (f7e6f65)
    • date: add ibis.date(y,m,d) functionality (26892b6), closes #386
    • duckdb/postgres/mysql/pyspark: implement .sql on tables for mixing sql and expressions (00e8087)
    • duckdb: add functionality needed to pass integer to interval test (e2119e8)
    • duckdb: implement _get_schema_using_query (93cd730)
    • duckdb: implement now() function (6924f50)
    • duckdb: implement regexp replace and extract (18d16a7)
    • implement force argument in sqlalchemy backend base class (9df7f1b)
    • implement coalesce for the pyspark backend (8183efe)
    • implement semi/anti join for the pandas backend (cb36fc5)
    • implement semi/anti join for the pyspark backend (3e1ba9c)
    • implement the remaining clickhouse joins (b3aa1f0)
    • ir: rewrite and speed up expression repr (45ce9b2)
    • mysql: implement _get_schema_from_query (456cd44)
    • mysql: move string join impl up to alchemy for mysql (77a8eb9)
    • postgres: implement _get_schema_using_query (f2459eb)
    • pyspark: implement Distinct for pyspark (4306ad9)
    • pyspark: implement log base b for pyspark (527af3c)
    • pyspark: implement percent_rank and enable testing (c051617)
    • repr: add interval info to interval repr (df26231)
    • sqlalchemy: implement ilike (43996c0)
    • sqlite: implement date_truncate (3ce4f2a)
    • sqlite: implement ISO week of year (714ff7b)
    • sqlite: implement string join and concat (6f5f353)
    • support of arrays and tuples for clickhouse (db512a8)
    • ver: dynamic version identifiers (408f862)

    Bug Fixes

    • added wheel to pyproject toml for venv users (b0b8e5c)
    • allow major version changes in CalVer dependencies (9c3fbe5)
    • annotable: allow optional arguments at any position (778995f), closes #3730
    • api: add ibis.map and .struct (327b342), closes #3118
    • api: map string multiplication with integer to repeat method (b205922)
    • api: thread suffixes parameter to individual join methods (31a9aff)
    • change TimestampType to Timestamp (e0750be)
    • clickhouse: disconnect from clickhouse when computing version (11cbf08)
    • clickhouse: use a context manager for execution (a471225)
    • combine windows during windowization (7fdd851)
    • conform epoch_seconds impls to expression return type (18a70f1)
    • context-adjustment: pass scope when calling adjust_context in pyspark backend (33aad7b), closes #3108
    • dask: fix asof joins for newer version of dask (50711cc)
    • dask: workaround dask bug (a0f3bd9)
    • deps: update dependency atpublic to v3 (3fe8f0d)
    • deps: update dependency datafusion to >=0.4,<0.6 (3fb2194)
    • deps: update dependency geoalchemy2 to >=0.6.3,<0.12 (dc3c361)
    • deps: update dependency graphviz to >=0.16,<0.21 (3014445)
    • duckdb: add casts to literals to fix binding errors (1977a55), closes #3629
    • duckdb: fix array column type discovery on leaf tables and add tests (15e5412)
    • duckdb: fix log with base b impl (4920097)
    • duckdb: support both 0.3.2 and 0.3.3 (a73ccce)
    • enforce the schema's column names in apply_to (b0f334d)
    • expose ops.IfNull for mysql backend (156c2bd)
    • expr: add more binary operators to char list and implement fallback (b88184c)
    • expr: fix formatting of table info using tabulate (b110636)
    • fix float vs real data type detection in sqlalchemy (24e6774)
    • fix list_schemas argument (69c1abf)
    • fix postgres udfs and reenable ci tests (7d480d2)
    • fix tablecolumn execution for filter following join (064595b)
    • format: remove some newlines from formatted expr repr (ed4fa78)
    • histogram: cross_join needs onclause=True (5d36a58), closes #622
    • ibis.expr.signature.Parameter is not pickleable (828fd54)
    • implement coalesce properly in the pandas backend (aca5312)
    • implement count on tables for pyspark (7fe5573), closes #2879
    • infer coalesce types when a non-null expression occurs after the first argument (c5f2906)
    • mutate: do not lift table column that results from mutate (ba4e5e5)
    • pandas: disable range windows with order by (e016664)
    • pandas: don't reassign the same column to silence SettingWithCopyWarning warning (75dc616)
    • pandas: implement percent_rank correctly (d8b83e7)
    • prevent unintentional cross joins in mutate + filter (83eef99)
    • pyspark: fix range windows (a6f2aa8)
    • regression in Selection.sort_by with resolved_keys (c7a69cd)
    • regression in sort_by with resolved_keys (63f1382), closes #3619
    • remove broken csv pre_execute (93b662a)
    • remove importorskip call for backend tests (2f0bcd8)
    • remove incorrect fix for pandas regression (339f544)
    • remove passing schema into register_parquet (bdcbb08)
    • repr: add ops.TimeAdd to repr binop lookup table (fd94275)
    • repr: allow ops.TableNode in fmt_value (6f57003)
    • reverse the predicate pushdown subsitution (f3cd358)
    • sort_index to satisfy pandas 1.4.x (6bac0fc)
    • sqlalchemy: ensure correlated subqueries FROM clauses are rendered (3175321)
    • sqlalchemy: use corresponding_column to prevent spurious cross joins (fdada21)
    • sqlalchemy: use replace selectables to prevent semi/anti join cross join (e8a1a71)
    • sql: retain column names for named ColumnExprs (f1b4b6e), closes #3754
    • sql: walk right join trees and substitute joins with right-side joins with views (0231592)
    • store schema on the pandas backend to allow correct inference (35070be)

    Performance Improvements

    • datatypes: speed up str and hash (262d3d7)
    • fast path for simple column selection (d178498)
    • ir: global equality cache (13c2bb2)
    • ir: introduce CachedEqMixin to speed up equality checks (b633925)
    • repr: remove full tree repr from rule validator error message (65885ab)
    • speed up attribute access (89d1c05)
    • use assign instead of concat in projections when possible (985c242)

    Miscellaneous Chores

    • deps: increase sqlalchemy lower bound to 1.4 (560854a)
    • drop support for Python 3.7 (0afd138)

    Code Refactoring

    • api: make primitive types more cohesive (71da8f7)
    • api: remove distinct ColumnExpr API (3f48cb8)
    • api: remove materialize (24285c1)
    • backends: remove the hdf5 backend (ff34f3e)
    • backends: remove the parquet backend (b510473)
    • config: disable graphviz-repr-in-notebook by default (214ad4e)
    • config: remove old config code and port to pydantic (4bb96d1)
    • dt.UUID inherits from DataType, not String (2ba540d)
    • expr: preserve column ordering in aggregations/mutations (668be0f)
    • hdfs: replace HDFS with fsspec (cc6eddb)
    • ir: make Annotable immutable (1f2b3fa)
    • ir: make schema annotable (b980903)
    • ir: remove unused lineage roots and find_nodes functions (d630a77)
    • ir: simplify expressions by not storing dtype and name (e929f85)
    • kudu: remove support for use of kudu through kudu-python (36bd97f)
    • move coercion functions from schema.py to udf (58eea56), closes #3033
    • remove blanket call for Expr (3a71116), closes #2258
    • remove the csv backend (0e3e02e)
    • udf: make coerce functions in ibis.udf.vectorized private (9ba4392)
    Source code(tar.gz)
    Source code(zip)
  • 2.1.1(Jan 12, 2022)

  • 2.1.0(Jan 12, 2022)

    2.1.0 (2022-01-12)

    Bug Fixes

    • consider all packages' entry points (b495cf6)
    • datatypes: infer bytes literal as binary #2915 (#3124) (887efbd)
    • deps: bump minimum dask version to 2021.10.0 (e6b5c09)
    • deps: constrain numpy to ensure wheels are used on windows (70c308b)
    • deps: update dependency clickhouse-driver to ^0.1 || ^0.2.0 (#3061) (a839d54)
    • deps: update dependency geoalchemy2 to >=0.6,<0.11 (4cede9d)
    • deps: update dependency pyarrow to v6 (#3092) (61e52b5)
    • don't force backends to override do_connect until 3.0.0 (4b46973)
    • execute materialized joins in the pandas and dask backends (#3086) (9ed937a)
    • literal: allow creating ibis literal with uuid (#3131) (b0f4f44)
    • restore the ability to have more than two option levels (#3151) (fb4a944)
    • sqlalchemy: fix correlated subquery compilation (43b9010)
    • sqlite: defer db connection until needed (#3127) (5467afa), closes #64

    Features

    • allow column_of to take a column expression (dbc34bb)
    • ci: More readable workflow job titles (#3111) (d8fd7d9)
    • datafusion: initial implementation for Arrow Datafusion backend (3a67840), closes #2627
    • datafusion: initial implementation for Arrow Datafusion backend (75876d9), closes #2627
    • make dayofweek impls conform to pandas semantics (#3161) (9297828)

    Reverts

    • "ci: install gdal for fiona" (8503361)
    Source code(tar.gz)
    Source code(zip)
    ibis_framework-2.1.0-py3-none-any.whl(682.63 KB)
Owner
Ibis Project
Python Expression Language Framework
Ibis Project
A Python library for Cloudant and CouchDB

Cloudant Python Client This is the official Cloudant library for Python. Installation and Usage Getting Started API Reference Related Documentation De

Cloudant 162 Dec 19, 2022
A fast unobtrusive MongoDB ODM for Python.

MongoFrames MongoFrames is a fast unobtrusive MongoDB ODM for Python designed to fit into a workflow not dictate one. Documentation is available at Mo

getme 45 Jun 01, 2022
The Database Toolkit for Python

SQLAlchemy The Python SQL Toolkit and Object Relational Mapper Introduction SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that giv

SQLAlchemy 6.5k Jan 01, 2023
Toolkit for storing files and attachments in web applications

DEPOT - File Storage Made Easy DEPOT is a framework for easily storing and serving files in web applications on Python2.6+ and Python3.2+. DEPOT suppo

Alessandro Molina 139 Dec 25, 2022
A CRUD and REST api with mongodb atlas.

Movies_api A CRUD and REST api with mongodb atlas. Setup First import all the python dependencies in your virtual environment or globally by the follo

Pratyush Kongalla 0 Nov 09, 2022
The JavaScript Database, for Node.js, nw.js, electron and the browser

The JavaScript Database Embedded persistent or in memory database for Node.js, nw.js, Electron and browsers, 100% JavaScript, no binary dependency. AP

Louis Chatriot 13.2k Jan 02, 2023
Python script to clone SQL dashboard from one workspace to another

Databricks dashboard clone Unofficial project to allow Databricks SQL dashboard copy from one workspace to another. Resource clone Setup: Create a fil

Quentin Ambard 12 Jan 01, 2023
db.py is an easier way to interact with your databases

db.py What is it Databases Supported Features Quickstart - Installation - Demo How To Contributing TODO What is it? db.py is an easier way to interact

yhat 1.2k Jan 03, 2023
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

Amazon Web Services - Labs 3.3k Dec 31, 2022
aiomysql is a library for accessing a MySQL database from the asyncio

aiomysql aiomysql is a "driver" for accessing a MySQL database from the asyncio (PEP-3156/tulip) framework. It depends on and reuses most parts of PyM

aio-libs 1.5k Jan 03, 2023
Logica is a logic programming language that compiles to StandardSQL and runs on Google BigQuery.

Logica: language of Big Data Logica is an open source declarative logic programming language for data manipulation. Logica is a successor to Yedalog,

Evgeny Skvortsov 1.5k Dec 30, 2022
A Redis client library for Twisted Python

txRedis Asynchronous Redis client for Twisted Python. Install Install via pip. Usage examples can be found in the examples/ directory of this reposito

Dorian Raymer 127 Oct 23, 2022
Use SQL query in a jupyter notebook!

SQL-query Use SQL query in a jupyter notebook! The table I used can be found on UN Data. Or you can just click the link and download the file undata_s

Chuqin 2 Oct 05, 2022
A fast PostgreSQL Database Client Library for Python/asyncio.

asyncpg -- A fast PostgreSQL Database Client Library for Python/asyncio asyncpg is a database interface library designed specifically for PostgreSQL a

magicstack 5.8k Dec 31, 2022
Google Cloud Client Library for Python

Google Cloud Python Client Python idiomatic clients for Google Cloud Platform services. Stability levels The development status classifier on PyPI ind

Google APIs 4.1k Jan 01, 2023
Tool for synchronizing clickhouse clusters

clicksync Tool for synchronizing clickhouse clusters works only with partitioned MergeTree tables can sync clusters with different node number uses in

Alexander Rumyantsev 1 Nov 30, 2021
Async ORM based on PyPika

PyPika-ORM - ORM for PyPika SQL Query Builder The package gives you ORM for PyPika with asycio support for a range of databases (SQLite, PostgreSQL, M

Kirill Klenov 7 Jun 04, 2022
edaSQL is a library to link SQL to Exploratory Data Analysis and further more in the Data Engineering.

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can giv

Tamil Selvan 8 Dec 12, 2022
A Relational Database Management System for a miniature version of Twitter written in MySQL with CLI in python.

Mini-Twitter-Database This was done as a database design course project at Amirkabir university of technology. This is a relational database managemen

Ali 12 Nov 23, 2022
PostgreSQL database access simplified

Queries: PostgreSQL Simplified Queries is a BSD licensed opinionated wrapper of the psycopg2 library for interacting with PostgreSQL. The popular psyc

Gavin M. Roy 251 Oct 25, 2022