Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Overview

Rubicon

Conda Version PyPi Version Test Package Publish Package Publish Docs

Purpose

Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a repeatable and searchable way. Rubicon's git integration associates these inputs and outputs directly with the model code that produced them to ensure full auditability and reproducibility for both developers and stakeholders alike. While experimenting, the Rubicon dashboard makes it easy to explore, filter, visualize, and share recorded work.


Components

Rubicon is composed of three parts:

  • A Python library for storing and retrieving model inputs, outputs, and analyses to filesystems that’s powered by fsspec
  • A dashboard for exploring, comparing, and visualizing logged data built with dash
  • And a process for sharing a selected subset of logged data with collaborators or reviewers that leverages intake

Workflow

Use the Rubicon library to capture model inputs and outputs over time. It can be easily integrated into existing Python models or pipelines and supports both concurrent logging (so multiple experiments can be logged in parallel) and asynchronous communication with S3 (so network reads and writes won’t block).

Meanwhile, periodically review the logged data within the Rubicon dashboard to steer the model tweaking process in the right direction. The dashboard lets you quickly spot trends by exploring and filtering your logged results and visualizes how the model inputs impacted the model outputs.

When the model is ready for review, Rubicon makes it easy to share specific subsets of the data with model reviewers and stakeholders, giving them the context necessary for a complete model review and approval.

Use

Here's a simple example:

from rubicon import Rubicon

rubicon = Rubicon(
    persistence="filesystem", root_dir="/rubicon-root", auto_git_enabled=True
)

project = rubicon.create_project(
    "Hello World", description="Using rubicon to track model results over time."
)

experiment = project.log_experiment(
    training_metadata=[SklearnTrainingMetadata("sklearn.datasets", "my-data-set")],
    model_name="My Model Name",
    tags=["my_model_name"],
)

experiment.log_parameter("n_estimators", n_estimators)
experiment.log_parameter("n_features", n_features)
experiment.log_parameter("random_state", random_state)

accuracy = rfc.score(X_test, y_test)
experiment.log_metric("accuracy", accuracy)

Then explore the project by running the dashboard:

rubicon ui --root-dir /rubicon-root

Documentation

For a full overview, visit the docs. If you have suggestions or find a bug, please open an issue.

Install

rubicon is available on Conda Forge via conda and PyPi via pip.

conda config --add channels conda-forge
conda install rubicon-ml

or

pip install rubicon-ml

Develop

rubicon uses conda to manage environments. First, install conda. Then use conda to setup a development environment:

conda env create -f ci/environment.yml
conda activate rubicon-dev

Testing

The tests are separated into unit and integration tests. They can be run directly in the activated dev environment via pytest tests/unit or pytest tests/integration. Or by simply running pytest to execute all of them.

Note: some integration tests are intentionally marked to control when they are run (i.e. not during cicd). These tests include:

  • Integration tests that connect to physical filesystems (local, S3). You'll want to configure the root_dir appropriately for these tests (tests/integration/test_async_rubicon.py, tests/integration/test_rubicon.py). And they can be run with:

    pytest -m "physical_filesystem_test"
    
  • Integration tests for the dashboard. To run these integration tests locally, you'll need to install one of the WebDrivers. To do so, follow the Install instructions in the Dash Testing Docs or install via brew with brew cask install chromedriver. You may have to update your permissions in Security & Privacy to install with brew.

    pytest -m "dashboard_test"
    

    Note: The --headless flag can be added to run the dashboard tests in headless mode.

Code Formatting

Install and configure pre-commit to automatically run black, flake8, and isort during commits:

Now pre-commit will run automatically on git commit and will ensure consistent code format throughout the project. You can format without committing via pre-commit run or skip these checks with git commit --no-verify.

Contributors


Mike McCarty


Sri Ranganathan


Joe Wolfe


Ryan Soley


Diane Lee

Comments
  • Edgetest action

    Edgetest action

    What

    • This PR adds in the edgetest action to ensure the basic requirements are up to date given the tests pass.
    • I had to refactor the setup a bit to be PEP517 compliant.

    How to Test

    • The install locally works for me but maybe a second set of eyes on this would be really helpful @ryanSoley @shania-m
    opened by fdosani 15
  • auto versioning using git tags

    auto versioning using git tags

    What

    • in version.py and rubicon/_version.py, use git to get the version from the latest tag
      • this needs to be run from the repo with tags cloned too, which means it'll only work for devs who've installed from source (or the CI with the fetch_depth maxed out for now, in our case)
    • to deal with that, setup.py calls version.py's _write_version_file function when the package is bundled
      • this replaces rubicon/_version.py with a function that returns a hardcoded string (fetched from the git tags in the build environment)
      • this change won't ever need to be committed to the repo, because the current git solution will always work for anyone installed from source

    How to Test

    • from the repo's root, run pip install -e ., launch a python interpreter, and import rubicon
      • rubicon.__version__ should return the latest version
      • navigate out of the repo and try the same thing - get_version will fail since theres no git repo
    • build rubicon - python setup.py sdist bdist_wheel - and install the wheel file
      • now rubicon.__version__ should return the latest version from a python interpreter started anywhere
    enhancement 
    opened by ryanSoley 9
  • "Rubicon" name collides with existing project in the Python ecosystem

    Describe the bug

    I was just made aware of this project via a PyCon US announcement email.

    The problem: the name you've chosen for this project collides with an existing project in the Python ecosystem.

    I've been using the name Rubicon in the Python ecosystem since 2014. I'm the owner of the Rubicon record in PyPI, as well as some related projects:

    • https://pypi.org/project/rubicon/
    • https://pypi.org/project/rubicon-java/
    • https://pypi.org/project/rubicon-objc/

    These projects are in active use in the Python community, and the Java subproject received funding (indirectly) from the PSF through their support of the BeeWare Android port.

    I can only assume this is something you were at least partially aware of, because you've chosen the name rubicon-ml for your PyPI package, and changed the name of the package in setup.py.

    Although the projects are in a different domain (language bridging vs numerical processing), I'd argue there is potential for confusion since they're both active projects in the same language ecosystem, and there is some usage of BeeWare tooling in the numerical processing community.

    I humbly request you choose a different name for your project that doesn't collide with my pre-existing usage.

    bug needs triage 
    opened by freakboy3742 7
  • Use conda incubator action for environment setup

    Use conda incubator action for environment setup

    Unpin Python in environment file to make sure Python version is not always 3.8

    closes: #42


    What

    • Uses the conda-incubator action for more flexible miniconda setup
    • Unset strict python version in environment file so the version matrix checks all the versions
    • Add percy to conda instead of using pip

    How to Test

    • I think if the CI passes it works?
    opened by gforsyth 5
  • python-3.10.6-h582c2e5_0_cpython.tar.bz2: 3 vulnerabilities (highest severity is: 9.8)

    python-3.10.6-h582c2e5_0_cpython.tar.bz2: 3 vulnerabilities (highest severity is: 9.8)

    Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

    General purpose programming language

    Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Path to dependency file: /environment.yml

    Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

    Vulnerabilities

    | CVE | Severity | CVSS | Dependency | Type | Fixed in | Remediation Available | | ------------- | ------------- | ----- | ----- | ----- | --- | --- | | CVE-2015-20107 | High | 9.8 | python-3.10.6-h582c2e5_0_cpython.tar.bz2 | Direct | N/A | ❌ | | CVE-2020-10735 | High | 7.5 | python-3.10.6-h582c2e5_0_cpython.tar.bz2 | Direct | N/A | ❌ | | CVE-2021-28861 | High | 7.4 | python-3.10.6-h582c2e5_0_cpython.tar.bz2 | Direct | v3.11 | ❌ |

    Details

    CVE-2015-20107

    Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

    General purpose programming language

    Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Path to dependency file: /environment.yml

    Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Dependency Hierarchy:

    • :x: python-3.10.6-h582c2e5_0_cpython.tar.bz2 (Vulnerable Library)

    Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

    Found in base branch: main

    Vulnerability Details

    In Python (aka CPython) through 3.10.4, the mailcap module does not add escape characters into commands discovered in the system mailcap file. This may allow attackers to inject shell commands into applications that call mailcap.findmatch with untrusted input (if they lack validation of user-provided filenames or arguments).

    Publish Date: 2022-04-13

    URL: CVE-2015-20107

    CVSS 3 Score Details (9.8)

    Base Score Metrics:

    • Exploitability Metrics:
      • Attack Vector: Network
      • Attack Complexity: Low
      • Privileges Required: None
      • User Interaction: None
      • Scope: Unchanged
    • Impact Metrics:
      • Confidentiality Impact: High
      • Integrity Impact: High
      • Availability Impact: High

    For more information on CVSS3 Scores, click here.

    CVE-2020-10735

    Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

    General purpose programming language

    Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Path to dependency file: /environment.yml

    Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Dependency Hierarchy:

    • :x: python-3.10.6-h582c2e5_0_cpython.tar.bz2 (Vulnerable Library)

    Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

    Found in base branch: main

    Vulnerability Details

    A flaw was found in python. In algorithms with quadratic time complexity using non-binary bases, when using int("text"), a system could take 50ms to parse an int string with 100,000 digits and 5s for 1,000,000 digits (float, decimal, int.from_bytes(), and int() for binary bases 2, 4, 8, 16, and 32 are not affected). The highest threat from this vulnerability is to system availability.

    Publish Date: 2022-09-09

    URL: CVE-2020-10735

    CVSS 3 Score Details (7.5)

    Base Score Metrics:

    • Exploitability Metrics:
      • Attack Vector: Network
      • Attack Complexity: Low
      • Privileges Required: None
      • User Interaction: None
      • Scope: Unchanged
    • Impact Metrics:
      • Confidentiality Impact: None
      • Integrity Impact: None
      • Availability Impact: High

    For more information on CVSS3 Scores, click here.

    CVE-2021-28861

    Vulnerable Library - python-3.10.6-h582c2e5_0_cpython.tar.bz2

    General purpose programming language

    Library home page: https://api.anaconda.org/download/conda-forge/python/3.10.6/linux-64/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Path to dependency file: /environment.yml

    Path to vulnerable library: /onda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2,/home/wss-scanner/anaconda3/pkgs/python-3.10.6-h582c2e5_0_cpython.tar.bz2

    Dependency Hierarchy:

    • :x: python-3.10.6-h582c2e5_0_cpython.tar.bz2 (Vulnerable Library)

    Found in HEAD commit: 2d403d6e2be8f1e80c4330791e6ff42d48bd4930

    Found in base branch: main

    Vulnerability Details

    ** DISPUTED ** Python 3.x through 3.10 has an open redirection vulnerability in lib/http/server.py due to no protection against multiple (/) at the beginning of URI path which may leads to information disclosure. NOTE: this is disputed by a third party because the http.server.html documentation page states "Warning: http.server is not recommended for production. It only implements basic security checks."

    Publish Date: 2022-08-23

    URL: CVE-2021-28861

    CVSS 3 Score Details (7.4)

    Base Score Metrics:

    • Exploitability Metrics:
      • Attack Vector: Network
      • Attack Complexity: Low
      • Privileges Required: None
      • User Interaction: Required
      • Scope: Changed
    • Impact Metrics:
      • Confidentiality Impact: High
      • Integrity Impact: None
      • Availability Impact: None

    For more information on CVSS3 Scores, click here.

    Suggested Fix

    Type: Upgrade version

    Origin: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-28861

    Release Date: 2022-08-23

    Fix Resolution: v3.11

    security vulnerability 
    opened by mend-for-github-com[bot] 4
  • Issue with the edgetest action and Dask

    Issue with the edgetest action and Dask

    Describe the bug @ryanSoley @ak-gupta I was digging into the edgetest action a bit more and I was able to recreate the bug we were seeing.

    Steps/Code to reproduce bug Running the following:

    conda create -n test python=3.9 pip conda
    conda activate test
    pip install dask==2022.2.0 prefect
    

    If I do a pip list I end up with:

    dask                    2022.2.0
    prefect                 1.1.0
    

    Then if I upgrade the following:

    pip install dask prefect --upgrade
    
    >
    Requirement already satisfied: dask in ~/miniconda3/envs/test/lib/python3.9/site-packages (2022.2.0)
    Collecting dask
      Using cached dask-2022.3.0-py3-none-any.whl (1.1 MB)
    Requirement already satisfied: prefect in ~/miniconda3/envs/test/lib/python3.9/site-packages (1.1.0)
    Requirement already satisfied: partd>=0.3.10 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (1.2.0)
    Requirement already satisfied: fsspec>=0.6.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (2022.2.0)
    Requirement already satisfied: cloudpickle>=1.1.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (2.0.0)
    Requirement already satisfied: packaging>=20.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (21.3)
    Requirement already satisfied: toolz>=0.8.2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (0.11.2)
    Requirement already satisfied: pyyaml>=5.3.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from dask) (6.0)
    Requirement already satisfied: requests>=2.25 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2.27.1)
    Requirement already satisfied: python-box>=5.1.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (6.0.1)
    Requirement already satisfied: pendulum>=2.0.4 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2.1.2)
    Requirement already satisfied: marshmallow>=3.0.0b19 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (3.15.0)
    Requirement already satisfied: toml>=0.9.4 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (0.10.2)
    Requirement already satisfied: docker>=3.4.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (5.0.3)
    Requirement already satisfied: distributed>=2.17.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2022.2.0)
    Requirement already satisfied: python-slugify>=1.2.6 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (6.1.1)
    Requirement already satisfied: importlib-resources>=3.0.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (5.4.0)
    Requirement already satisfied: croniter>=0.3.24 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (1.3.4)
    Requirement already satisfied: urllib3>=1.26.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (1.26.9)
    Requirement already satisfied: mypy-extensions>=0.4.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (0.4.3)
    Requirement already satisfied: pytz>=2018.7 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2022.1)
    Requirement already satisfied: msgpack>=0.6.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (1.0.3)
    Requirement already satisfied: tabulate>=0.8.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (0.8.9)
    Requirement already satisfied: python-dateutil>=2.7.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (2.8.2)
    Requirement already satisfied: click>=7.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (8.0.4)
    Requirement already satisfied: marshmallow-oneofschema>=2.0.0b2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from prefect) (3.0.1)
    Requirement already satisfied: psutil>=5.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (5.9.0)
    Requirement already satisfied: tblib>=1.6.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (1.7.0)
    Requirement already satisfied: jinja2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (3.0.3)
    Requirement already satisfied: setuptools in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (60.10.0)
    Requirement already satisfied: zict>=0.1.3 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (2.1.0)
    Requirement already satisfied: tornado>=6.0.3 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (6.1)
    Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from distributed>=2.17.0->prefect) (2.4.0)
    Requirement already satisfied: websocket-client>=0.32.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from docker>=3.4.1->prefect) (1.3.1)
    Requirement already satisfied: zipp>=3.1.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from importlib-resources>=3.0.0->prefect) (3.7.0)
    Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from packaging>=20.0->dask) (3.0.7)
    Requirement already satisfied: locket in ~/miniconda3/envs/test/lib/python3.9/site-packages (from partd>=0.3.10->dask) (0.2.1)
    Requirement already satisfied: pytzdata>=2020.1 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from pendulum>=2.0.4->prefect) (2020.1)
    Requirement already satisfied: six>=1.5 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from python-dateutil>=2.7.0->prefect) (1.16.0)
    Requirement already satisfied: text-unidecode>=1.3 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from python-slugify>=1.2.6->prefect) (1.3)
    Requirement already satisfied: certifi>=2017.4.17 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from requests>=2.25->prefect) (2021.10.8)
    Requirement already satisfied: charset-normalizer~=2.0.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from requests>=2.25->prefect) (2.0.12)
    Requirement already satisfied: idna<4,>=2.5 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from requests>=2.25->prefect) (3.3)
    Requirement already satisfied: heapdict in ~/miniconda3/envs/test/lib/python3.9/site-packages (from zict>=0.1.3->distributed>=2.17.0->prefect) (1.0.1)
    Requirement already satisfied: MarkupSafe>=2.0 in ~/miniconda3/envs/test/lib/python3.9/site-packages (from jinja2->distributed>=2.17.0->prefect) (2.1.1)
    

    Expected behavior dask should be upgraded to 2022.3.0, but due to some interaction with prefect it doesn't seem to. I've tested with other packages and it seems like dask is the only one which causes this.

    Additional context Will need to dig in a bit more but I'm thinking this isn't a edgetest issue but something to do with prefect? Appreciate any thoughts or insights either of you might have.

    bug needs triage 
    opened by fdosani 4
  • added buttons to select all or remove all columns in UI table

    added buttons to select all or remove all columns in UI table

    closes: #61


    What

    • Introduced buttons to select all columns or hide all columns in Rubicon UI experiment tables

    How to Test

    • When select all columns button is clicked, all columns appear in the table
    • When clear all columns button is clicked, no columns appear in the table
    opened by shania-m 4
  • Logging MultiIndex Dataframes Fails

    Logging MultiIndex Dataframes Fails

    Describe the bug It appears that internally, rubicon's .log_dataframe() converts pandas dataframes to dask dataframes regardless of the situation. This can cause issues in scenarios where dask might not support certain dataframe features such as multiindex dataframes.

    Steps/Code to reproduce bug

    import pandas as pd
    from rubicon.client import Rubicon
    # Create sample data
    df = pd.DataFrame([[0,1,'a'],[1,1,'b'],[2,2,'c'],[3,2,'d']], columns=['a', 'b', 'c'])
    df = df.set_index(['b', 'a']) # Set multiindex
    df
         c
    b a   
    1 0  a
      1  b
    2 2  c
      3  d
    
    # Log dataframe to rubicon
    rubicon = Rubicon(persistence="memory")
    project = rubicon.get_or_create_project("test")
    exp = project.log_experiment('test_exp')
    exp.log_dataframe(df)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/rubicon/client/mixin.py", line 251, in log_dataframe
        self.repository.create_dataframe(dataframe, df, project_name, experiment_id=experiment_id)
      File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/rubicon/repository/base.py", line 426, in create_dataframe
        data = self._convert_to_dask_dataframe(data)
      File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/rubicon/repository/base.py", line 396, in _convert_to_dask_dataframe
        return dd.from_pandas(df, npartitions=1)
      File "/Users/ouz343/miniconda3/envs/lustr/lib/python3.8/site-packages/dask/dataframe/io/io.py", line 202, in from_pandas
        raise NotImplementedError("Dask does not support MultiIndex Dataframes.")
    NotImplementedError: Dask does not support MultiIndex Dataframes.
    

    Additional context Not familiar as to why pandas dataframes need to be converted to dask dataframes every time during logging but the solution would revolve around avoiding conversion to dask since dask in this case does not support multiindex.

    bug 
    opened by Lazea 4
  • Automatic sklearn pipeline logging

    Automatic sklearn pipeline logging

    Is your feature request related to a problem? Please describe

    One way to log training data to Rubicon would be to extend the scikit-learn.pipeline so information could be logged before and/or after each step. We could extend the class and override the fit and predict methods to add optional hooks before and after.

    Describe the solution you'd like

    Something like...

    from sklearn.pipeline import Pipeline
    
    class RubiconPipeline(Pipeline):
    
    def before_fit(X, y=None, **fit_params):
        # logs info from self.steps
        ...
    
    def after_fit(X, y=None, **fit_params):
        # logs info from self.steps after fitting
        ...
    
    def fit(self, X, y=None, **fit_params):
        self.before_fit(X, y)
        retval = super().fit(X, y=y, **fit_params)
        self.after_fit(X, y)
        return retval
    

    Additional context

    Three cases to consider:

    1. Inferred logging from inspecting X's, y's and estimator object
    2. Logging through an extended common Rubicon/SKLearn API (optionally call .rubicon_log methods on estimators)
    3. Logging through user defined functions (UDFs) optionally provided to RubiconPipeline.__init__
    development feature 
    opened by joe-wolfe21 4
  • reorganize the existing docs

    reorganize the existing docs

    Is your documentation request related to a problem? Please describe

    we would like to update the rubicon-ml docs to follow the Diataxis framework

    Describe the solution you'd like

    once #207 is completed and we have a plan for reorganizing the docs, this issue will track the actual reorganization work

    documentation 
    opened by ryanSoley 3
  • `jupyter-dash` proof-of-concept

    `jupyter-dash` proof-of-concept

    I've been thinking about how we could get a live example of the dashboard hosted for users for a while now. I saw how the dask examples use JupyterLab through binder to show off their task graphs and stuff, so I thought it'd be great if we had a way to run the UI in JupyterLab. Of course we can launch it from lab, but that runs it on a localhost port which may not always be accessible.

    Then I came across this blog and thought if we could just use this it'd solve it.

    https://medium.com/plotly/introducing-jupyterdash-811f1f57c02e

    I was gonna raise this as an issue but it ended up being super easy to implement, so here it is.

    What

    • uses jupyter-dash to instantiate the dashboard app
      • the default option, "external", runs it exactly the same as dash.Dash would
      • now there are options for "jupyterlab" and "inline" which display the dashboard in a new JLab window or inline in a notebook respectively

    How to Test

    • run through the added notebook (should we even keep it?)
    opened by ryanSoley 3
  • return rubicon objects with proper parents

    return rubicon objects with proper parents

    Is your enhancement request related to a problem? Please describe

    the rubicon objects that can optionally be returned by RubiconJSON are currently utilizing a NoOpParent instead of the proper experiment/project parent

    https://github.com/capitalone/rubicon-ml/blob/jsonpath-poc/rubicon_ml/client/rubicon_json.py#L24

    this means that any operations that need to actually reach out to the filesystem will not be possible, essentially making the objects read-only

    Describe the solution you'd like

    we should try to associate the proper parents with each returned object so that they will be fully functional rubicon objects

    this will likely require the inspection of the match.value objects returned from each query. the JSON in match.value should have an experiment_id (feature, metric, parameter), or a parent_id (experiment, artifact, dataframe)

    projects are the exception - they require a config object rather than a parent. ‼️ this is actually incorrect in the current implementation and I didn't catch it before merging to the integration branch, so we'll have to address it too ‼️

    there are a few steps that'll be required in the process:

    • for each match in the results object:
      • extract the experiment_id or parent_id from the result
      • if the queried object is not a project:
        • if RubiconJSON is instantiated with experiments as an input and the parent of the queried object is an experiment, the parent should be in that list
        • if RubiconJSON is instantiated with projects as an input and the parent of the queried object is a project, the parent should be in that list
        • if RubiconJSON is instantiated with projects as an input and the parent of the queried object is an experiment, the parent should retrievable from one of those projects using project.experiment(id=...
        • if RubiconJSON is instantiated with top level rubicon objects as an input, we'll need to leverage get_project(id=... and experiment(id=... to retrieve the proper parents whether they be a project or experiment
      • if the queried object is a project:
        • we must be retrieving it from a top level rubicon input to RubiconJSON - so we can just take the config object off that top level rubicon object and dump it into the new project

    Describe alternatives you've considered

    if this ends up being infeasible, its fine to just leave the NoOpParents

    Additional context Add any other context, code examples, or references to existing implementations about the feature request here.

    enhancement development 
    opened by ryanSoley 0
  • add example & docs for `RubiconJSON`

    add example & docs for `RubiconJSON`

    Is your documentation request related to a problem? Please describe

    since this is a completely new feature, we'll need a new section in the docs for it

    Describe the solution you'd like

    design a notebook showing how the new RubiconJSON class works (maybe just adapt the poc one we're basing this all off) for the documentation. add said notebook to the docs. add any new, public python methods to the API reference

    documentation example 
    opened by ryanSoley 0
  • add python 3.11 to test matrix

    add python 3.11 to test matrix

    What

    • adds python 3.11 to test matrix in the testing workflow

    How to Test

    • make sure the CI on this branch runs tests for four python versions, 3.8-3.11
    enhancement 
    opened by ryanSoley 1
  • add example showing rubicon w/ DataProfiler

    add example showing rubicon w/ DataProfiler

    Is your feature request related to a problem? Please describe

    after seeing the DataProfiler demo at PyCon, we decided we could show an example of rubicon tracking data profiles over a project/experiments' lifetime

    Describe the solution you'd like

    • create an example that shows experiments profiling each incoming dataset and logging the profiles to rubicon
    • use rubicon to illustrate a change in data profiles over experiments
    • reference new data profiler integration example in "logging training metadata" example, as it is basically an extension of logging training metadata
    • title the new example "Integrate with DataProfiler" and add it to the "How to..." section of the docs
    documentation example 
    opened by ryanSoley 0
  • Validate fspec backends work with Rubicon

    Validate fspec backends work with Rubicon

    Rubicon-ml leverages fsspec for persistence. This issue includes:

    1. Determine which fsspec backends apply to rubicon
    2. Validate each backend works with rubicon-ml
    3. add working backends to docs
    documentation discussion 
    opened by shania-m 1
  • add test for the value error in `project`, `metric`, `experiment` and the other getters

    add test for the value error in `project`, `metric`, `experiment` and the other getters

    Describe the solution you'd like Create a test function so that the value error is correctly thrown in each getter for making sure name or id is sent as a parameter.

    Describe alternatives you've considered Alternatively, different functions to test this value error in each getter function could be created but it would most likely be more efficient to write one to be used for all.

    enhancement development 
    opened by andreafehrman 1
Releases(0.4.3)
  • 0.4.3(Dec 14, 2022)

    changelog

    • s3fs dependency now optional and installed via the s3 extra (#326)
    • renamed ui extra to viz to reflect module name change (#326)
    • dependency updates via edgetest
    Source code(tar.gz)
    Source code(zip)
  • 0.4.2(Dec 12, 2022)

    changelog

    • json encode/decode numpy objects (#321)
    • dependency updates via edgetest (#320)

    bugfixes

    • fix tag display in experiments table (#318)
    Source code(tar.gz)
    Source code(zip)
  • 0.4.1(Dec 2, 2022)

    changelog

    • added type checking for tags (#304)
    • update existing intake catalogs (#308)
    • log python objects as artifacts directly (#310)
    • dependency updates via edgetest (#305)

    bugfixes

    • update setup.cfg (#316)
    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Nov 16, 2022)

    changelog

    • address existing deprecations (#286)
    • deprecate async submodule (#287)
    • add new examples & example cleanup (#292, #293, #295)
    • add failure modes (#301)
    • dependency updates via edgetest (#283, #289, #291, #296, #297, #300)

    bugfixes

    • fix Binder examples (#284)
    • fix tag removal bug (#298)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.10(Sep 29, 2022)

    changelog

    • added tagging to features (#278)
    • added tagging to parameters (#280)
    • dependency updates via edgetest (#279, #281)

    bugfixes

    • fixes a bug where add_tags and remove_tags did not work properly on entities with names with underscores in them (#280)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.9(Sep 23, 2022)

    changelog

    • added tagging to metrics (#273, #276)
    • dependency updates via edgetest (#267, #274)

    bugfixes

    • artifacts can now be retrieved by tags (#275)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.8(Sep 16, 2022)

    changelog

    • tags can now be applied to artifacts (#268)
    • dependency updates via edgetest (#257, #259, #260, #264, #265)

    bugfixes

    • fixes duplicate source registration error from newest intake release (#262)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.7(Jul 6, 2022)

    changelog

    • get edgetest working (#225, #230)
    • dependency updates via edgetest (#231, #238, #239, #241, #242, #245, #248, #250)
    • documentation and example updates (#218, #222, #234, #235, #246, #249)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.6(Apr 12, 2022)

  • 0.3.5(Mar 29, 2022)

  • 0.3.4(Mar 17, 2022)

    changelog

    • Added edgetest action for up-to-date requirements (#195)

    bug fixes

    • Update intake dependency to include msgpack when using pip (#199)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.3(Mar 17, 2022)

    changelog

    • Added make_pipeline to Rubicon_ml.sklearn.pipeline to create pipelines (#185)
    • RubiconPipeline constructor takes memory and verbose arguments as well without ***kwargs (#185)
    • Added multiple scores and fits to pipelines in Rubicon_ml.sklearn.pipeline (#186)
    • Support score_samples on pipelines in Rubicon_ml.sklearn.pipeline (#192)
    • Add pipeline slicing on pipelines in Rubicon_ml.sklearn.pipeline (#194)

    bugfixes

    • Support NoneType values in correlation plot (#189)
    Source code(tar.gz)
    Source code(zip)
  • 0.3.2(Feb 16, 2022)

  • 0.3.1(Jan 24, 2022)

  • 0.3.0(Jan 21, 2022)

    changelog

    • adds new viz module to visualize logged data (#149)
      • more info in our docs
      • deprecates ui module in favor of viz
    • removes old rubicon module that was deprecated in favor of rubicon_ml in #93 (#169)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.11(Nov 29, 2021)

    changelog

    • add ability to instantiate dashboard with Rubicon object (#119)
    • support Dash 2.0.0 (#121)
    • preserve logging order on fetches (#129)
    • add ability to get all rubicon-ml entities by name and ID (#128, #131, #133, #135, #141, #152, #153)
    • add storage_options passthru to prefect task (#155)

    bugfixes

    • fix local dataframe logging (#156)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.10(Aug 24, 2021)

    changelog

    • add passthrough for dash.Dash keyword arguments to the Dashboard (#117)

    bugfixes

    • get dashboard working behind Jupyter proxies (#116)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.9(Aug 19, 2021)

  • 0.2.8(Jul 22, 2021)

  • 0.2.7(Jul 8, 2021)

    changelog

    • log estimator parameters passed to fit in the Scikit-learn pipeline (#111)

    bugfixes

    • properly serialize date types when logging (#108)
    • properly serialize datetime types with null fields (#111)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.6(Jun 9, 2021)

    changelog

    • add runnable binder example (#99)

    bugfixes

    • check for root_dir before initializing in-memory filesystem (#104)
    • address whitesource vulnerability (#100)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.5(May 21, 2021)

  • 0.2.4(May 19, 2021)

  • 0.2.3(May 17, 2021)

  • 0.2.2(Apr 29, 2021)

    changelog

    • adds test suite for example notebooks (#90)

    bugfixes

    • ignore non-rubicon files within data directories (#84)
    • ignore non-rubicon files in async repo (#91)
    • fix pytest warnings (#92)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Apr 19, 2021)

    bugfixes

    • revisit examples (#79)
      • ensures all examples in the notebooks directory are working with the latest version of rubicon
      • the asynchronous S3 client can now read data back
      • the dashboard now works with an in-memory filesystem with a default root_dir
    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Apr 12, 2021)

  • 0.1.8(Apr 7, 2021)

  • 0.1.7(Apr 5, 2021)

  • 0.1.6(Apr 1, 2021)

    changelog

    • support hiding cols within experiment table and comparison plot (#60)

    bugfixes

    • fixes a bug related to dataframe plotting using hvplot (#62)
    Source code(tar.gz)
    Source code(zip)
Owner
Capital One
We’re an open source-first organization — actively using, contributing to and managing open source software projects.
Capital One
PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages

PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages Abstract NLP applications for code-mixed (CM) or mix-li

Mohsin Ali, Mohammed 1 Nov 12, 2021
A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

Corp-Rel is a PoC of Corpartion Relationship Knowledge Graph System. It's built on top of the Open Source Graph Database: Nebula Graph with a dataset

Wey Gu 20 Dec 11, 2022
Convert Table data to approximate values with GUI

Table_Editor Convert Table data to approximate values with GUIs... usage - Import methods for extension Tables. Imported method supposed to have only

CLJ 1 Jan 10, 2022
This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

OpenAI 3k Dec 26, 2022
A Real-ESRGAN equipped Colab notebook for CLIP Guided Diffusion

#360Diffusion automatically upscales your CLIP Guided Diffusion outputs using Real-ESRGAN. Latest Update: Alpha 1.61 [Main Branch] - 01/11/22 Layout a

78 Nov 02, 2022
JupyterNotebook - C/C++, Javascript, HTML, LaTex, Shell scripts in Jupyter Notebook Also run them on remote computer

JupyterNotebook Read, write and execute C, C++, Javascript, Shell scripts, HTML, LaTex in jupyter notebook, And also execute them on remote computer R

1 Jan 09, 2022
classify fashion-mnist dataset with pytorch

Fashion-Mnist Classifier with PyTorch Inference 1- clone this repository: git clone https://github.com/Jhamed7/Fashion-Mnist-Classifier.git 2- Instal

1 Jan 14, 2022
Evaluating deep transfer learning for whole-brain cognitive decoding

Evaluating deep transfer learning for whole-brain cognitive decoding This README file contains the following sections: Project description Repository

Armin Thomas 5 Oct 31, 2022
Infrastructure as Code (IaC) for a self-hosted version of Gnosis Safe on AWS

Welcome to Yearn Gnosis Safe! Setting up your local environment Infrastructure Deploying Gnosis Safe Prerequisites 1. Create infrastructure for secret

Numan 16 Jul 18, 2022
Tooling for converting STAC metadata to ODC data model

手语识别 0、使用到的模型 (1). openpose,作者:CMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). 图像分类classification,作者:Bubbl

Open Data Cube 65 Dec 20, 2022
SEC'21: Sparse Bitmap Compression for Memory-Efficient Training onthe Edge

Training Deep Learning Models on The Edge Training on the Edge enables continuous learning from new data for deployed neural networks on memory-constr

Brown University Scale Lab 4 Nov 18, 2022
List of awesome things around semantic segmentation 🎉

Awesome Semantic Segmentation List of awesome things around semantic segmentation 🎉 Semantic segmentation is a computer vision task in which we label

Dam Minh Tien 18 Nov 26, 2022
Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021)

Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021) Overview of paths used in DIG and IG. w is the word being attributed. The

INK Lab @ USC 17 Oct 27, 2022
Extremely simple and fast extreme multi-class and multi-label classifiers.

napkinXC napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, that focus of implementing various m

Marek Wydmuch 43 Nov 14, 2022
Node Dependent Local Smoothing for Scalable Graph Learning

Node Dependent Local Smoothing for Scalable Graph Learning Requirements Environments: Xeon Gold 5120 (CPU), 384GB(RAM), TITAN RTX (GPU), Ubuntu 16.04

Wentao Zhang 15 Nov 28, 2022
Computational inteligence project on faces in the wild dataset

Table of Contents The general idea How these scripts work? Loading data Needed modules and global variables Parsing the arrays in dataset Extracting a

tooraj taraz 4 Oct 21, 2022
4K videos with annotated masks in our ICCV2021 paper 'Internal Video Inpainting by Implicit Long-range Propagation'.

Annotated 4K Videos paper | project website | code | demo video 4K videos with annotated object masks in our ICCV2021 paper: Internal Video Inpainting

Tengfei Wang 21 Nov 05, 2022
Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

QData 440 Jan 02, 2023
A ssl analyzer which could analyzer target domain's certificate.

ssl_analyzer A ssl analyzer which could analyzer target domain's certificate. Analyze the domain name ssl certificate information according to the inp

vincent 17 Dec 12, 2022
Mall-Customers-Segmentation - Customer Segmentation Using K-Means Clustering

Overview Customer Segmentation is one the most important applications of unsupervised learning. Using clustering techniques, companies can identify th

NelakurthiSudheer 2 Jan 03, 2022