Python utility to extract differences between two pandas dataframes.

Overview

#Pandas Diff

Installation

Install pandas_diff with pip

  pip install pandas_diff

Usage/Examples

import pandas_diff as pd_diff

import pandas as pd

# Create two example dataframes
df_infinity = pd.DataFrame([
                {"hero" : "hulk" , "power" : "strength"},
                {"hero" : "black_widow" , "power" : "spy"},
                {"hero" : "thor" , "hammers" : 0 },
                {"hero" : "thor" , "hammers" : 1 } ] )
df_endgame = pd.DataFrame([
                {"hero" : "hulk" , "power" : "smart"},
                {"hero" : "captain marvel" , "power" : "strength"},
                {"hero" : "thor" , "hammers" : 2 } ] )

# Get differences, using the key "hero"
df = pd_diff.get_diffs(df_infinity ,df_endgame ,"hero")

df

  operation object_keys  object_values                     object_json                     attribute_changed old_value new_value
0   create     [hero]    captain marvel  {'hero': 'captain marvel', 'power': 'strength'...           NaN           NaN      NaN
1   delete     [hero]       black_widow  {'hero': 'black_widow', 'power': 'spy', 'hamme...           NaN           NaN      NaN
2   modify     [hero]              thor     {'hero': 'thor', 'power': nan, 'hammers': 2.0}       hammers             1        2
3   modify     [hero]              hulk  {'hero': 'hulk', 'power': 'smart', 'hammers': ...         power      strength    smart

Features

  • Support for stand alone app
Comments
  • Unused import click

    Unused import click

    opened by jaimevalero 1
  • Bump pip from 19.2.3 to 21.1

    Bump pip from 19.2.3 to 21.1

    Bumps pip from 19.2.3 to 21.1.

    Changelog

    Sourced from pip's changelog.

    21.1 (2021-04-24)

    Process

    • Start installation scheme migration from distutils to sysconfig. A warning is implemented to detect differences between the two implementations to encourage user reports, so we can avoid breakages before they happen.

    Features

    • Add the ability for the new resolver to process URL constraints. ([#8253](https://github.com/pypa/pip/issues/8253) <https://github.com/pypa/pip/issues/8253>_)
    • Add a feature --use-feature=in-tree-build to build local projects in-place when installing. This is expected to become the default behavior in pip 21.3; see Installing from local packages <https://pip.pypa.io/en/stable/user_guide/#installing-from-local-packages>_ for more information. ([#9091](https://github.com/pypa/pip/issues/9091) <https://github.com/pypa/pip/issues/9091>_)
    • Bring back the "(from versions: ...)" message, that was shown on resolution failures. ([#9139](https://github.com/pypa/pip/issues/9139) <https://github.com/pypa/pip/issues/9139>_)
    • Add support for editable installs for project with only setup.cfg files. ([#9547](https://github.com/pypa/pip/issues/9547) <https://github.com/pypa/pip/issues/9547>_)
    • Improve performance when picking the best file from indexes during pip install. ([#9748](https://github.com/pypa/pip/issues/9748) <https://github.com/pypa/pip/issues/9748>_)
    • Warn instead of erroring out when doing a PEP 517 build in presence of --build-option. Warn when doing a PEP 517 build in presence of --global-option. ([#9774](https://github.com/pypa/pip/issues/9774) <https://github.com/pypa/pip/issues/9774>_)

    Bug Fixes

    • Fixed --target to work with --editable installs. ([#4390](https://github.com/pypa/pip/issues/4390) <https://github.com/pypa/pip/issues/4390>_)
    • Add a warning, discouraging the usage of pip as root, outside a virtual environment. ([#6409](https://github.com/pypa/pip/issues/6409) <https://github.com/pypa/pip/issues/6409>_)
    • Ignore .dist-info directories if the stem is not a valid Python distribution name, so they don't show up in e.g. pip freeze. ([#7269](https://github.com/pypa/pip/issues/7269) <https://github.com/pypa/pip/issues/7269>_)
    • Only query the keyring for URLs that actually trigger error 401. This prevents an unnecessary keyring unlock prompt on every pip install invocation (even with default index URL which is not password protected). ([#8090](https://github.com/pypa/pip/issues/8090) <https://github.com/pypa/pip/issues/8090>_)
    • Prevent packages already-installed alongside with pip to be injected into an isolated build environment during build-time dependency population. ([#8214](https://github.com/pypa/pip/issues/8214) <https://github.com/pypa/pip/issues/8214>_)
    • Fix pip freeze permission denied error in order to display an understandable error message and offer solutions. ([#8418](https://github.com/pypa/pip/issues/8418) <https://github.com/pypa/pip/issues/8418>_)
    • Correctly uninstall script files (from setuptools' scripts argument), when installed with --user. ([#8733](https://github.com/pypa/pip/issues/8733) <https://github.com/pypa/pip/issues/8733>_)
    • New resolver: When a requirement is requested both via a direct URL (req @ URL) and via version specifier with extras (req[extra]), the resolver will now be able to use the URL to correctly resolve the requirement with extras. ([#8785](https://github.com/pypa/pip/issues/8785) <https://github.com/pypa/pip/issues/8785>_)
    • New resolver: Show relevant entries from user-supplied constraint files in the error message to improve debuggability. ([#9300](https://github.com/pypa/pip/issues/9300) <https://github.com/pypa/pip/issues/9300>_)
    • Avoid parsing version to make the version check more robust against lousily debundled downstream distributions. ([#9348](https://github.com/pypa/pip/issues/9348) <https://github.com/pypa/pip/issues/9348>_)
    • --user is no longer suggested incorrectly when pip fails with a permission error in a virtual environment. ([#9409](https://github.com/pypa/pip/issues/9409) <https://github.com/pypa/pip/issues/9409>_)
    • Fix incorrect reporting on Requires-Python conflicts. ([#9541](https://github.com/pypa/pip/issues/9541) <https://github.com/pypa/pip/issues/9541>_)

    ... (truncated)

    Commits
    • 2b2a268 Bump for release
    • ea761a6 Update AUTHORS.txt
    • 2edd3fd Postpone a deprecation to 21.2
    • 3cccfbf Rename mislabeled news fragment
    • 21cd124 Fix NEWS.rst placeholder position
    • e46bdda Merge pull request #9827 from pradyunsg/fix-git-improper-tag-handling
    • 0e4938d :newspaper:
    • ca832b2 Don't split git references on unicode separators
    • 1320bac Merge pull request #9814 from pradyunsg/revamp-ci-apr-2021-v2
    • e9cc23f Skip checks on PRs only
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Config file for pyup.io

    Config file for pyup.io

    Hi there and thanks for using pyup.io!

    Since you are using a non-default config I've created one for you.

    There are a lot of things you can configure on top of that, so make sure to check out the docs to see what I can do for you.

    opened by pyup-bot 1
  • Bump wheel from 0.33.6 to 0.38.1

    Bump wheel from 0.33.6 to 0.38.1

    Bumps wheel from 0.33.6 to 0.38.1.

    Changelog

    Sourced from wheel's changelog.

    Release Notes

    UNRELEASED

    • Updated vendored packaging to 22.0

    0.38.4 (2022-11-09)

    • Fixed PKG-INFO conversion in bdist_wheel mangling UTF-8 header values in METADATA (PR by Anderson Bravalheri)

    0.38.3 (2022-11-08)

    • Fixed install failure when used with --no-binary, reported on Ubuntu 20.04, by removing setup_requires from setup.cfg

    0.38.2 (2022-11-05)

    • Fixed regression introduced in v0.38.1 which broke parsing of wheel file names with multiple platform tags

    0.38.1 (2022-11-04)

    • Removed install dependency on setuptools
    • The future-proof fix in 0.36.0 for converting PyPy's SOABI into a abi tag was faulty. Fixed so that future changes in the SOABI will not change the tag.

    0.38.0 (2022-10-21)

    • Dropped support for Python < 3.7
    • Updated vendored packaging to 21.3
    • Replaced all uses of distutils with setuptools
    • The handling of license_files (including glob patterns and default values) is now delegated to setuptools>=57.0.0 (#466). The package dependencies were updated to reflect this change.
    • Fixed potential DoS attack via the WHEEL_INFO_RE regular expression
    • Fixed ValueError: ZIP does not support timestamps before 1980 when using SOURCE_DATE_EPOCH=0 or when on-disk timestamps are earlier than 1980-01-01. Such timestamps are now changed to the minimum value before packaging.

    0.37.1 (2021-12-22)

    • Fixed wheel pack duplicating the WHEEL contents when the build number has changed (#415)
    • Fixed parsing of file names containing commas in RECORD (PR by Hood Chatham)

    0.37.0 (2021-08-09)

    • Added official Python 3.10 support
    • Updated vendored packaging library to v20.9

    ... (truncated)

    Commits
    • 6f1608d Created a new release
    • cf8f5ef Moved news item from PR #484 to its proper place
    • 9ec2016 Removed install dependency on setuptools (#483)
    • 747e1f6 Fixed PyPy SOABI parsing (#484)
    • 7627548 [pre-commit.ci] pre-commit autoupdate (#480)
    • 7b9e8e1 Test on Python 3.11 final
    • a04dfef Updated the pypi-publish action
    • 94bb62c Fixed docs not building due to code style changes
    • d635664 Updated the codecov action to the latest version
    • fcb94cd Updated version to match the release
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Unused variable 'a'

    Unused variable 'a'

    opened by jaimevalero 0
  • Initial Update

    Initial Update

    The bot created this issue to inform you that pyup.io has been set up on this repo. Once you have closed it, the bot will open pull requests for updates as soon as they are available.

    opened by pyup-bot 0
Releases(v1.4.0)
Owner
Jaime Valero
Devops, Machine learning learner and sysadmin. I also cook omelettes.
Jaime Valero
Pandas and Dask test helper methods with beautiful error messages.

beavis Pandas and Dask test helper methods with beautiful error messages. test helpers These test helper methods are meant to be used in test suites.

Matthew Powers 18 Nov 28, 2022
Flenser is a simple, minimal, automated exploratory data analysis tool.

Flenser Have you ever been handed a dataset you've never seen before? Flenser is a simple, minimal, automated exploratory data analysis tool. It runs

John McCambridge 79 Sep 20, 2022
A data structure that extends pyspark.sql.DataFrame with metadata information.

MetaFrame A data structure that extends pyspark.sql.DataFrame with metadata info

Invent Analytics 8 Feb 15, 2022
VevestaX is an open source Python package for ML Engineers and Data Scientists.

VevestaX Track failed and successful experiments as well as features. VevestaX is an open source Python package for ML Engineers and Data Scientists.

Vevesta 24 Dec 14, 2022
Ejercicios Panda usando Pandas

Readme Below we add configuration details to locally test your application To co

1 Jan 22, 2022
Data and code accompanying the paper Politics and Virality in the Time of Twitter

Politics and Virality in the Time of Twitter Data and code accompanying the paper Politics and Virality in the Time of Twitter. In specific: the code

Cardiff NLP 3 Jul 02, 2022
A probabilistic programming library for Bayesian deep learning, generative models, based on Tensorflow

ZhuSuan is a Python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and

Tsinghua Machine Learning Group 2.2k Dec 28, 2022
A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

Columns AI 131 Dec 26, 2022
talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

David Cournapeau 76 Nov 30, 2022
OpenARB is an open source program aiming to emulate a free market while encouraging players to participate in arbitrage in order to increase working capital.

Overview OpenARB is an open source program aiming to emulate a free market while encouraging players to participate in arbitrage in order to increase

Tom 3 Feb 12, 2022
Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Table of contents Introduction Dataset Model & Metrics How to Run Quickstart Install Training Evaluation Detection DATA COMPETITION The COVID-19 pande

Thanh Dat Vu 1 Feb 27, 2022
This module is used to create Convolutional AutoEncoders for Variational Data Assimilation

VarDACAE This module is used to create Convolutional AutoEncoders for Variational Data Assimilation. A user can define, create and train an AE for Dat

Julian Mack 23 Dec 16, 2022
track your GitHub statistics

GitHub-Stalker track your github statistics 👀 features find new followers or unfollowers find who got a star on your project or remove stars find who

Bahadır Araz 34 Nov 18, 2022
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 09, 2023
Python Package for DataHerb: create, search, and load datasets.

The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.

DataHerb 4 Feb 11, 2022
Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.

Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown

915 Dec 26, 2022
Hg002-qc-snakemake - HG002 QC Snakemake

HG002 QC Snakemake To Run Resources and data specified within snakefile (hg002QC

Juniper A. Lake 2 Feb 16, 2022
Data imputations library to preprocess datasets with missing data

Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.

Elton Law 329 Dec 05, 2022
The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

Bell Eapen 14 Jan 02, 2023
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 917 Jan 03, 2023