Embeddinghub is a database built for machine learning embeddings.

Overview

featureform

Embedding Store workflow PyPi Downloads Featureform Slack
Python supported PyPi Version featureform Website Twitter

What is Embeddinghub?

Embeddinghub is a database built for machine learning embeddings. It is built with four goals in mind.

  • Store embeddings durably and with high availability
  • Allow for approximate nearest neighbor operations
  • Enable other operations like partitioning, sub-indices, and averaging
  • Manage versioning, access control, and rollbacks painlessly


drawing



Features

  • Supported Operations: Run approximate nearest neighbor lookups, average multiple embeddings, partition tables (spaces), cache locally while training, and more.
  • Storage: Store and index billions vectors embeddings from our storage layer.
  • Versioning: Create, manage, and rollback different versions of your embeddings.
  • Access Control: Encode different business logic and user management directly into Embeddinghub.
  • Monitoring: Keep track of how embeddings are being used, latency, throughput, and feature drift over time.

What is an Embedding?

Embeddings are dense numerical representations of real-world objects and relationships, expressed as a vector. The vector space quantifies the semantic similarity between categories. Embedding vectors that are close to each other are considered similar. Sometimes, they are used directly for “Similar items to this” section in an e-commerce store. Other times, embeddings are passed to other models. In those cases, the model can share learnings across similar items rather than treating them as two completely unique categories, as is the case with one-hot encodings. For this reason, embeddings can be used to accurately represent sparse data like clickstreams, text, and e-commerce purchases as features to downstream models.

Further Reading



Getting Started

Step 1: Install Embeddinghub client

Install the Python SDK via pip

pip install embeddinghub

Step 2: Deploy Docker container ( optional )

The Embeddinghub client can be used without a server. This is useful when using embeddings in a research environment where a database server is not necessary. If that’s the case for you, skip ahead to the next step.

Otherwise, we can use this docker command to run Embeddinghub locally and to map the container's main port to our host's port.

docker run featureformcom/embeddinghub -p 7462:7462

Step 3: Initialize Python Client

If you deployed a docker container, you can initialize the python client.

import embeddinghub as eh

hub = eh.connect(eh.Config())

Otherwise, you can use a LocalConfig to store and index embeddings locally.

hub = eh.connect(eh.LocalConfig("data/"))

Step 4: Create a Space

Embeddings are written and retrieved from Spaces. When creating a Space we must also specify a version, otherwise a default version is used.

space = hub.create_space("quickstart", dims=3)

Step 5: Upload Embeddings

We will create a dictionary of three embeddings and upload them to our new quickstart space.

embeddings = {
    "apple": [1, 0, 0],
    "orange": [1, 1, 0],
    "potato": [0, 1, 0],
    "chicken": [-1, -1, 0],
}
space.multiset(embeddings)

Step 6: Get nearest neighbors

Now we can compare apples to oranges and get the nearest neighbors.

neighbors = space.nearest_neighbors(key="apple", num=2)
print(neighbors)

Contributing


Report Issues

Please help us by reporting any issues you may have while using Embeddinghub.


License

Comments
  • [Bug]: Inability to do time-based join, and potentially to name time column different than 'ts'

    [Bug]: Inability to do time-based join, and potentially to name time column different than 'ts'

    Expected Behavior

    Showing some numerical values from features and labels in the console.

    (Documented in detail in this repo, which contains a full bug reproduction setup: https://github.com/samuell/bugs/tree/main/20220831-ff-join-bug#readme)

    Actual Behavior

    Got this stack trace:

    $ python train.py 
    Traceback (most recent call last):
      File "/home/sal/proj/sav/2022/bug-reproductions/20220831-ff-join-bug/train.py", line 5, in <module>
        train_data = client.training_set("traindata", "default")
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/featureform/serving.py", line 49, in training_set
        return self._local_training_set(name, version)
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/featureform/serving.py", line 112, in _local_training_set
        trainingset_df = pd.merge_asof(trainingset_df, df.sort_values(['ts']), direction='backward',
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
        return func(*args, **kwargs)
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 6259, in sort_values
        k = self._get_label_or_level_values(by, axis=axis)
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/pandas/core/generic.py", line 1779, in _get_label_or_level_values
        raise KeyError(key)
    KeyError: 'ts'
    

    After trying to rename all the occurances of "ts" to "time" in data.csv and defs.py, and running

    rm -rf .featureform && sleep 1 && featureform apply --local defs.py
    python train.py
    

    .. I instead got this:

    $ python train.py 
    Traceback (most recent call last):
      File "/home/sal/proj/sav/2022/bug-reproductions/20220831-ff-join-bug/train.py", line 5, in <module>
        train_data = client.training_set("traindata", "default")
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/featureform/serving.py", line 49, in training_set
        return self._local_training_set(name, version)
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/featureform/serving.py", line 112, in _local_training_set
        trainingset_df = pd.merge_asof(trainingset_df, df.sort_values(['ts']), direction='backward',
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 580, in merge_asof
        op = _AsOfMerge(
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 1740, in __init__
        _OrderedMerge.__init__(
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 1623, in __init__
        _MergeOperation.__init__(
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 681, in __init__
        self._validate_specification()
      File "/home/sal/.cache/pypoetry/virtualenvs/20220831-ff-join-bug-JAwT4QYU-py3.9/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 1809, in _validate_specification
        raise MergeError(
    pandas.errors.MergeError: Incompatible merge dtype, dtype('O') and dtype('O'), both sides must have numeric dtype
    

    (Documented in detail in this repo, which contains a full bug reproduction setup: https://github.com/samuell/bugs/tree/main/20220831-ff-join-bug#readme)

    Steps To Reproduce

    1. Make sure to have poetry installed (or install featureform 1.2.0 in another way).
    2. Run the following commands:
      git clone https://github.com/samuell/bugs.git
      cd 20220831-ff-join-bug
      poetry install
      poetry shell
      featureform apply --local defs.py
      python train.py
      

    (Documented in detail in this repo, which contains a full bug reproduction setup: https://github.com/samuell/bugs/tree/main/20220831-ff-join-bug#readme)

    What mode are you running Featureform in?

    Local

    What version of Python are you running?

    3.9

    Featureform Python Package Version

    1.2.0

    Featureform Helm Chart Version

    No response

    Kubernetes Version

    No response

    Relevant log output

    No response

    bug 
    opened by samuell 8
  • Add logging and remove unused functions

    Add logging and remove unused functions

    Description

    This PR adds logging to the spark.go function (and logs essential operations of the spark offline store) It also removed unused functions.

    This is needed to debug and show the process of the store as it runs, also to clean up code and add clarity.

    Type of change

    Select type(s) of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] Documentation update

    Checklist:

    • [x] I have performed a self-review of my code
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [x] My changes generate no new warnings
    • [x] I have added tests that prove my fix is effective or that my feature works
    • [x] New and existing unit tests pass locally with my changes
    • [x] I have fixed any merge conflicts
    opened by Sami1309 3
  • Seperate offline and online feature registration

    Seperate offline and online feature registration

    If a feature is registered without a provider, it defaults to registering it in the offline store. With an online provider (only online is valid and makes sense) it will register offline if not registered so far, and then registers it online.

    Also for some reason the pb file was commited, then i deleted it, so it interpets it as a big delete when it's generated with testing anyway

    opened by Sami1309 3
  • Adding end to end tests for SQL & DF KCF transformations

    Adding end to end tests for SQL & DF KCF transformations

    Description

    Adding end to end tests for Kubernetes Computed Features. They test both SQL and DF transformations.

    Type of change

    Does this correspond to an open issue?

    Select type(s) of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] Documentation update

    Checklist:

    • [x] I have performed a self-review of my code
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [x] I have made corresponding changes to the documentation
    • [x] My changes generate no new warnings
    • [x] I have added tests that prove my fix is effective or that my feature works
    • [x] New and existing unit tests pass locally with my changes
    • [x] I have fixed any merge conflicts
    opened by ahmadnazeri 2
  • Adding Pandas SQL Transformation for Local

    Adding Pandas SQL Transformation for Local

    Description

    This is adding Pandas SQL Transformation for local. The dataframe code is there but it is for spark. I will modify it to match what is needed for pandas.

    Type of change

    Does this correspond to an open issue?

    Select type(s) of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [X] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] Documentation update

    Checklist:

    • [X] I have performed a self-review of my code
    • [X] I have commented my code, particularly in hard-to-understand areas
    • [X] I have made corresponding changes to the documentation
    • [X] My changes generate no new warnings
    • [X] I have added tests that prove my fix is effective or that my feature works
    • [X] New and existing unit tests pass locally with my changes
    • [X] I have fixed any merge conflicts
    opened by ahmadnazeri 2
  • [Feature] Docker Container Quickstart

    [Feature] Docker Container Quickstart

    Description

    • Enables Featureform to run in a single container.
    • Contains a Quickstart guide using the single container Featureform

    Type of change

    Does this correspond to an open issue?

    Select type(s) of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [x] Documentation update

    Checklist:

    • [x] I have performed a self-review of my code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [x] I have made corresponding changes to the documentation
    • [x] My changes generate no new warnings
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [x] New and existing unit tests pass locally with my changes
    • [ ] I have fixed any merge conflicts
    opened by sdreyer 2
  • Featureform in a single container

    Featureform in a single container

    Description

    This pull request adds featureform into a single container using nginx and supervisord. The goal is to decrease the time to run featureform locally

    Type of change

    Does this correspond to an open issue?

    No

    Select type(s) of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] Documentation update

    Checklist:

    • [x] I have performed a self-review of my code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] My changes generate no new warnings
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] I have fixed any merge conflicts
    opened by sdreyer 2
  • Unimpliment coalesce from spark script and make necessary changes to spark runner

    Unimpliment coalesce from spark script and make necessary changes to spark runner

    Description

    Removes the .coalesce option called when saving resources in the spark script. This saves the resources in multiple parts instead of conjoining them into a single part. Implementing this requires changing how iterators parse through files and the functional definition of a resource in the offline store (so now a resource is defined as the parent directory containing all the parts, not the single parquet file)

    Type of change

    Does this correspond to an open issue?

    Select type(s) of change

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] Documentation update

    Checklist:

    • [x] I have performed a self-review of my code
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [x] My changes generate no new warnings
    • [x] I have added tests that prove my fix is effective or that my feature works
    • [x] New and existing unit tests pass locally with my changes
    • [x] I have fixed any merge conflicts
    opened by Sami1309 2
  • [Bug]: 500 Server Error on some pages in the dashboard

    [Bug]: 500 Server Error on some pages in the dashboard

    Expected Behavior

    Expect to see a normal page saying for example "No entities registered" or similar.

    Actual Behavior

    There are also "500 Server Error" thrown at some pages in some situations, such as:

    • http://127.0.0.1:3000/entities
    • http://127.0.0.1:3000/users

    Example screenshot from the first of the URLs above:

    image

    Steps To Reproduce

    Add the following code to definitions.py:

    import featureform as ff
    
    ff.register_user("myself").make_default_owner()
    
    local = ff.register_local()
    
    dummydata = local.register_file(
        name="dummydata",
        variant="default",
        description="This is a dummy dataset in CSV format",
        path="data.csv",
    )
    
    person = ff.register_entity("person")
    
    dummydata.register_resources(
        entity="person",
        entity_column="person_id",
        inference_store=local,
        features=[
            {
                "name": "foo",
                "variant": "default",
                "column": "foo",
                "type": "float32",
                "description": "The Foo feature",
            },
            {
                "name": "bar",
                "variant": "default",
                "column": "bar",
                "type": "float32",
                "description": "The Bar feature",
            },
        ],
        timestamp_column="time",
    )
    

    Run:

    featureform apply --local definitions.py
    featureform dash
    

    Open in a browser: http://127.0.0.1:3000/entities

    Open in a browser: http://127.0.0.1:3000/users

    What mode are you running Featureform in?

    Local

    What version of Python are you running?

    3.9

    Featureform Python Package Version

    1.1.12

    Featureform Helm Chart Version

    No response

    Kubernetes Version

    No response

    Relevant log output

    When accessing http://127.0.0.1:3000/entities 
    
    
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /entities HTTP/1.1" 200 -
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /_next/static/css/85a2addfd2efc882.css HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /_next/static/chunks/webpack-b5a50f2710bf3333.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /_next/static/chunks/framework-3412d1150754b2fb.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /_next/static/chunks/main-2715d0c23f47c019.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /_next/static/chunks/pages/_app-b217558d3c27b2cc.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /_next/static/chunks/pages/%5Btype%5D-3c144c056366a40b.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /_next/static/jgzCj9eheZJ2ZWZELvsjO/_buildManifest.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /_next/static/jgzCj9eheZJ2ZWZELvsjO/_ssgManifest.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /_next/static/media/Matter-Regular.f1ae4ce5.ttf HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:06:39] "GET /static/FeatureForm_Logo_Full_Black.svg HTTP/1.1" 304 -
    [2022-09-11 00:06:40,120] ERROR in app: Exception on /data/entities [GET]
    Traceback (most recent call last):
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/flask/app.py", line 2525, in wsgi_app
        response = self.full_dispatch_request()
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/flask/app.py", line 1822, in full_dispatch_request
        rv = self.handle_user_exception(e)
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
        rv = self.dispatch_request()
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
        return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/flask_cors/decorator.py", line 128, in wrapped_function
        resp = make_response(f(*args, **kwargs))
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/featureform/dashboard_metadata.py", line 331, in GetMetadataList
        allData.append(entities(row))
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/featureform/dashboard_metadata.py", line 243, in entities
        label_list = sqlObject.query_resource( "label_variant", "entity", rowData['name'])
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/featureform/sqlite_metadata.py", line 197, in query_resource
        raise ValueError(f"{type} with {column}: {resource} not found")
    ValueError: label_variant with entity: person not found
    127.0.0.1 - - [11/Sep/2022 00:06:40] "GET /data/entities HTTP/1.1" 500 -
    127.0.0.1 - - [11/Sep/2022 00:06:40] "GET /_next/static/chunks/pages/index-c2ee5b1681e97e4b.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:06:40] "GET /static/favicon.ico HTTP/1.1" 304 -
    

    When accessing http://127.0.0.1:3000/users

    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /users HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /_next/static/css/85a2addfd2efc882.css HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /_next/static/chunks/webpack-b5a50f2710bf3333.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /_next/static/chunks/pages/_app-b217558d3c27b2cc.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /_next/static/chunks/framework-3412d1150754b2fb.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /_next/static/chunks/main-2715d0c23f47c019.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /_next/static/chunks/pages/%5Btype%5D-3c144c056366a40b.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /_next/static/jgzCj9eheZJ2ZWZELvsjO/_buildManifest.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /_next/static/jgzCj9eheZJ2ZWZELvsjO/_ssgManifest.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /static/FeatureForm_Logo_Full_Black.svg HTTP/1.1" 304 -
    [2022-09-11 00:07:27,451] ERROR in app: Exception on /data/users [GET]
    Traceback (most recent call last):
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/flask/app.py", line 2525, in wsgi_app
        response = self.full_dispatch_request()
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/flask/app.py", line 1822, in full_dispatch_request
        rv = self.handle_user_exception(e)
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
        rv = self.dispatch_request()
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
        return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/flask_cors/decorator.py", line 128, in wrapped_function
        resp = make_response(f(*args, **kwargs))
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/featureform/dashboard_metadata.py", line 335, in GetMetadataList
        allData.append(users(row))
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/featureform/dashboard_metadata.py", line 275, in users
        variant_organiser(label_variant(sqlObject.query_resource( "label_variant", "owner", rowData['name']))[2]),
      File "/home/sal/.cache/pypoetry/virtualenvs/03-ff-feature-descriptions-6H_O3yPY-py3.9/lib/python3.9/site-packages/featureform/sqlite_metadata.py", line 197, in query_resource
        raise ValueError(f"{type} with {column}: {resource} not found")
    ValueError: label_variant with owner: default_user not found
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /data/users HTTP/1.1" 500 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /_next/static/chunks/pages/index-c2ee5b1681e97e4b.js HTTP/1.1" 304 -
    127.0.0.1 - - [11/Sep/2022 00:07:27] "GET /static/favicon.ico HTTP/1.1" 304 -
    
    bug 
    opened by samuell 2
  • [Bug]: Can not show information about transformation in the UI

    [Bug]: Can not show information about transformation in the UI

    Expected Behavior

    I expect to see some metadata information about the transformation when clicking on it.

    Actual Behavior

    I only see the animated three dots, as if data is loaded:

    image

    Steps To Reproduce

    Add this code to a file named definitions.py:

    import featureform as ff
    
    ff.register_user("myself").make_default_owner()
    
    local = ff.register_local()
    
    dummydata = local.register_file(
        name="dummydata",
        variant="default",
        description="",
        path="data.csv",
    )
    
    person = ff.register_entity("person")
    
    dummydata.register_resources(
        entity="person",
        entity_column="person_id",
        inference_store=local,
        features=[
            {
                "name": "foo",
                "variant": "default",
                "column": "foo",
                "type": "float32",
            },
            {
                "name": "bar",
                "variant": "default",
                "column": "bar",
                "type": "float32",
            },
        ],
        timestamp_column="time",
    )
    
    
    @local.df_transformation(variant="default", inputs=[("dummydata", "default")])
    def compute_fooplusbar(df):
        df["fooplusbar"] = df["foo"] + df["bar"]
        return df
    
    
    compute_fooplusbar.register_resources(
        entity=person,
        entity_column="person_id",
        inference_store=local,
        features=[
            {
                "name": "fooplusbar",
                "variant": "default",
                "column": "fooplusbar",
                "type": "float32",
            },
        ],
        labels=[
            {
                "name": "fooplusbar",
                "variant": "default",
                "column": "fooplusbar",
                "type": "float32",
            },
        ],
        timestamp_column="time",
    )
    

    Run:

    featureform apply --local definitions.py
    featureform dash
    

    Enter the URL http://127.0.0.1:3000/sources/compute_fooplusbar

    ... or alternatively, open http://127.0.0.1:3000 and click "features" and then "compute_foobar" on the "source" row in the table with metadata.

    What mode are you running Featureform in?

    Local

    What version of Python are you running?

    3.9

    Featureform Python Package Version

    1.1.12

    bug 
    opened by samuell 2
  • [Bug]: Querying for features based on transformations with entity name throws errors

    [Bug]: Querying for features based on transformations with entity name throws errors

    Summary

    I get errors when I try to query features that are registered upon a df_transformation . It seems the entity name does not work there. If I use the entity column name instead, I don't get errors anymore, and actually get some output, although it is not correct. See below, and also this folder in a separate repo for all the code needed to reproduce this.

    Expected Behavior

    Should get this output:

    -------Foo-------
    0.002
    -------Bar-------
    0.004
    -------FooPlusBar-------
    0.006
    

    Actual Behavior

    I get this output:

    -------Foo-------
    0.002
    -------Bar-------
    0.004
    -------FooPlusBar-------
    Traceback (most recent call last):
      File "/home/sal/proj/sav/2022/bug-reproductions/02-ff-entity-match/client.py", line 15, in <module>
        fooplusbar = client.features([("fooplusbar", "default")], {"person": "samuel"})
      File "/home/sal/.cache/pypoetry/virtualenvs/20220903-ff-entity-match-9koWNEX7-py3.9/lib/python3.9/site-packages/featureform/serving.py", line 104, in features
        return self.impl.features(features, entities)
      File "/home/sal/.cache/pypoetry/virtualenvs/20220903-ff-entity-match-9koWNEX7-py3.9/lib/python3.9/site-packages/featureform/serving.py", line 304, in features
        all_features_list = self.add_feature_dfs_to_list(feature_variant_list, entity_id)
      File "/home/sal/.cache/pypoetry/virtualenvs/20220903-ff-entity-match-9koWNEX7-py3.9/lib/python3.9/site-packages/featureform/serving.py", line 319, in add_feature_dfs_to_list
        raise ValueError(
    ValueError: Could not set entity column. No column name person exists in compute_fooplusbar-default
    

    Steps To Reproduce

    The code to reproduce this is available in this repo, but adding the reproduction info here as well:

    Save this data to file named data.csv:

    time,foo,bar,person_id
    2022-08-25 00:00:01,0.000,0.001,samuel
    2022-08-25 00:00:02,0.001,0.002,samuel
    2022-08-25 00:00:03,0.002,0.004,samuel
    

    Put this in defs.py:

    import featureform as ff
    
    ff.register_user("myself").make_default_owner()
    
    local = ff.register_local()
    
    dummydata = local.register_file(
        name="dummydata",
        variant="default",
        description="",
        path="data.csv",
    )
    
    person = ff.register_entity("person")
    
    dummydata.register_resources(
        entity="person",
        entity_column="person_id",
        inference_store=local,
        features=[
            {
                "name": "foo",
                "variant": "default",
                "column": "foo",
                "type": "float32",
            },
            {
                "name": "bar",
                "variant": "default",
                "column": "bar",
                "type": "float32",
            },
        ],
        timestamp_column="time",
    )
    
    @local.df_transformation(variant="default", inputs=[("dummydata", "default")])
    def compute_fooplusbar(df):
        df["fooplusbar"] = df["foo"] + df["bar"]
        return df
    
    compute_fooplusbar.register_resources(
        entity=person,
        entity_column="person_id",
        inference_store=local,
        features=[
            {
                "name": "fooplusbar",
                "variant": "default",
                "column": "fooplusbar",
                "type": "float32",
            },
        ],
        labels=[
            {
                "name": "fooplusbar",
                "variant": "default",
                "column": "fooplusbar",
                "type": "float32",
            },
        ],
        timestamp_column="time",
    )
    

    Put this in client.py:

    import featureform as ff
    
    client = ff.ServingClient(local=True)
    
    print("-"*7 + "Foo" + "-"*7)
    foo = client.features([("foo", "default")], {"person": "samuel"})
    print(f"{foo[0]:.3f}")
    
    print("-"*7 + "Bar" + "-"*7)
    bar = client.features([("bar", "default")], {"person": "samuel"})
    print(f"{bar[0]:.3f}")
    
    print("-"*7 + "FooPlusBar" + "-"*7)
    fooplusbar = client.features([("fooplusbar", "default")], {"person": "samuel"})
    print(f"{fooplusbar[0]:.3f}")
    

    Run:

    featureform apply --local defs.py
    python client.py
    

    What mode are you running Featureform in?

    Local

    What version of Python are you running?

    3.9

    Featureform Python Package Version

    1.1.12

    Other info

    If using the entity column name (person_id) instead of the entity name (person) for the feature based on a transformation ... that is, putting this code into client_works_but_gives_wrong_value.py

    import featureform as ff
    
    client = ff.ServingClient(local=True)
    
    print("-"*7 + "Foo" + "-"*7)
    foo = client.features([("foo", "default")], {"person": "samuel"})
    print(f"{foo[0]:.3f}")
    
    print("-"*7 + "Bar" + "-"*7)
    bar = client.features([("bar", "default")], {"person": "samuel"})
    print(f"{bar[0]:.3f}")
    
    print("-"*7 + "FooPlusBar" + "-"*7)
    # *** NOTE below that we use "person_id" (the entity COLUMN NAME) instead of "person" (the entity NAME): ***
    fooplusbar = client.features([("fooplusbar", "default")], {"person_id": "samuel"})
    print(f"{fooplusbar[0]:.3f}")
    

    ... and executes it with python ... then, we get output, but not correct values:

    Expected output:

    -------Foo-------
    0.002
    -------Bar-------
    0.004
    -------FooPlusBar-------
    0.006
    

    Actual output

    -------Foo-------
    0.002
    -------Bar-------
    0.004
    -------FooPlusBar-------
    0.001
    
    bug 
    opened by samuell 2
  • adding backend changes for EMR and fixing some linting issues

    adding backend changes for EMR and fixing some linting issues

    Description

    Type of change

    Does this correspond to an open issue?

    Select type(s) of change

    • [X] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] Documentation update

    Checklist:

    • [ ] I have performed a self-review of my code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] My changes generate no new warnings
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] I have fixed any merge conflicts
    opened by ahmadnazeri 1
  • Additional tests for KCF custom image

    Additional tests for KCF custom image

    Description

    Type of change

    Does this correspond to an open issue?

    Select type(s) of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] Documentation update

    Checklist:

    • [ ] I have performed a self-review of my code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] My changes generate no new warnings
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] I have fixed any merge conflicts
    opened by sdreyer 1
  • Moved filestore to its own package

    Moved filestore to its own package

    Description

    Split k8s.go into multiple files. The end goal is to have a Filestore package

    Type of change

    Does this correspond to an open issue?

    Select type(s) of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] Documentation update

    Checklist:

    • [ ] I have performed a self-review of my code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] My changes generate no new warnings
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] I have fixed any merge conflicts
    opened by sdreyer 1
  • Added latest tag to docker deployment

    Added latest tag to docker deployment

    Description

    Adds latest tag to docker image deployments so specific images don't need to be found

    Type of change

    Does this correspond to an open issue?

    Select type(s) of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] Documentation update

    Checklist:

    • [ ] I have performed a self-review of my code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] My changes generate no new warnings
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] I have fixed any merge conflicts
    opened by sdreyer 1
  • Bump decode-uri-component from 0.2.0 to 0.2.2 in /dashboard

    Bump decode-uri-component from 0.2.0 to 0.2.2 in /dashboard

    Bumps decode-uri-component from 0.2.0 to 0.2.2.

    Release notes

    Sourced from decode-uri-component's releases.

    v0.2.2

    • Prevent overwriting previously decoded tokens 980e0bf

    https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.1...v0.2.2

    v0.2.1

    • Switch to GitHub workflows 76abc93
    • Fix issue where decode throws - fixes #6 746ca5d
    • Update license (#1) 486d7e2
    • Tidelift tasks a650457
    • Meta tweaks 66e1c28

    https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.1

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 1
  • Implement get_status() RFC

    Implement get_status() RFC

    Description

    Type of change

    Does this correspond to an open issue?

    Select type(s) of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] Documentation update

    Checklist:

    • [ ] I have performed a self-review of my code
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [ ] My changes generate no new warnings
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] New and existing unit tests pass locally with my changes
    • [ ] I have fixed any merge conflicts
    opened by Sami1309 1
Releases(v0.4.3)
  • v0.4.3(Dec 22, 2022)

    What's Changed

    Bugfixes

    • Bugfix for blank Feature and Training Set Pages

    Full Changelog: https://github.com/featureform/featureform/compare/v0.4.2...v0.4.3

    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Dec 22, 2022)

    What's Changed

    Features

    • KCF Custom Image Per Transformation

    Bugfixes

    • Fixed Naming and Typos In k8s.go
    • Removed Excessive Logging In Materialize Jobs
    • Fixed K8s Logging Errors

    Full Changelog: https://github.com/featureform/featureform/compare/v0.4.1...v0.4.2

    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Dec 7, 2022)

    • Custom KCF Docker Image Support (https://github.com/featureform/featureform/pull/563)

    Full Changelog: https://github.com/featureform/featureform/compare/v0.4.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4.1-rc0(Dec 6, 2022)

  • v0.4.0(Nov 23, 2022)

    What's Changed

    • Databricks support by @Sami1309 in https://github.com/featureform/featureform/pull/526
    • MongoDB support by @Sami1309 in https://github.com/featureform/featureform/pull/526
    • Lag Features for KCF by @ahmadnazeri in https://github.com/featureform/featureform/pull/520
    • Lag Features for Localmode by @ahmadnazeri in https://github.com/featureform/featureform/pull/535
    • Logging Included In Helm Chart by @sdreyer in https://github.com/featureform/featureform/pull/529
    • Status functions for resources by @sdreyer in https://github.com/featureform/featureform/pull/551

    Bugfixes

    • Empty Parquet Files Panicking by @sdreyer in https://github.com/featureform/featureform/pull/519
    • Coordinator restarts when ETCD token expires by @sdreyer in https://github.com/featureform/featureform/pull/534
    • Documentation Updates by @sdreyer in https://github.com/featureform/featureform/pull/542
    • Fix for Invalid Redis Reads by @sdreyer in https://github.com/featureform/featureform/pull/549
    • Fix for Invalid SQL Transformation Description by @sdreyer in https://github.com/featureform/featureform/pull/550
    • Fix outdated pydoc by @simba-git in https://github.com/featureform/featureform/pull/548

    Full Changelog: https://github.com/featureform/featureform/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Oct 31, 2022)

    What's Changed

    • Fix error where dashboard doesn't show error messages by @Sami1309 in https://github.com/featureform/featureform/pull/431
    • Ability to use a URL in featureform apply by @sdreyer in https://github.com/featureform/featureform/pull/456
    • Unimplement coalesce from spark script by @Sami1309 in https://github.com/featureform/featureform/pull/454
    • Docker Container Quickstart Docs by @sdreyer in https://github.com/featureform/featureform/pull/460
    • Add wait condition for training set's dependent features and label by @Sami1309 in https://github.com/featureform/featureform/pull/461
    • Featureform in a single container by @sdreyer in https://github.com/featureform/featureform/pull/457
    • Docker Container Quickstart by @sdreyer in https://github.com/featureform/featureform/pull/464
    • Make offline tests parallel by @Sami1309 in https://github.com/featureform/featureform/pull/429
    • Change newserving -> serving by @sdreyer in https://github.com/featureform/featureform/pull/466
    • Demo Notebook Cluster Reset by @sdreyer in https://github.com/featureform/featureform/pull/462
    • Changed Job Search Error To Info by @sdreyer in https://github.com/featureform/featureform/pull/467
    • Remove error for fetch with no return by @Sami1309 in https://github.com/featureform/featureform/pull/469
    • Adding end to end tests for Spark SQL and DF. by @ahmadnazeri in https://github.com/featureform/featureform/pull/470
    • Add implementation for training sets and tests by @Sami1309 in https://github.com/featureform/featureform/pull/468
    • Feature and label description registration by @Sami1309 in https://github.com/featureform/featureform/pull/458
    • Adding pre-commit functionality by @ahmadnazeri in https://github.com/featureform/featureform/pull/292
    • Feature/include label timestamp for training set by @imanthorpe in https://github.com/featureform/featureform/pull/481
    • Fixed entity used in serving in localmode quickstart by @sdreyer in https://github.com/featureform/featureform/pull/500
    • Optional Nginx Ingress Install by @sdreyer in https://github.com/featureform/featureform/pull/501
    • Added honor to robot exclusions for featureform.com by @sdreyer in https://github.com/featureform/featureform/pull/514
    • Feature/k8s computed features by @Sami1309 in https://github.com/featureform/featureform/pull/511

    New Contributors

    • @imanthorpe made their first contribution in https://github.com/featureform/featureform/pull/481

    Full Changelog: https://github.com/featureform/featureform/compare/v0.2.0...v0.3.0

    Source code(tar.gz)
    Source code(zip)
Owner
Featureform
We turn features into first-class component of the ML process!
Featureform
PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE).

GRACE The official PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE). For a thorough resource collection of self-superv

Big Data and Multi-modal Computing Group, CRIPAC 186 Dec 27, 2022
FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation [Project] [Paper] [arXiv] [Home] Official implementation of FastFCN:

Wu Huikai 815 Dec 29, 2022
People movement type classifier with YOLOv4 detection and SORT tracking.

Movement classification The goal of this project would be movement classification of people, in other words, walking (normal and fast) and running. Yo

4 Sep 21, 2021
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

Jingtao Zhan 99 Dec 27, 2022
Implementation of TimeSformer, a pure attention-based solution for video classification

TimeSformer - Pytorch Implementation of TimeSformer, a pure and simple attention-based solution for reaching SOTA on video classification.

Phil Wang 602 Jan 03, 2023
ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.

Gengshan Yang 59 Nov 25, 2022
Deep Learning Package based on TensorFlow

White-Box-Layer is a Python module for deep learning built on top of TensorFlow and is distributed under the MIT license. The project was started in M

YeongHyeon Park 7 Dec 27, 2021
Official implementation for TTT++: When Does Self-supervised Test-time Training Fail or Thrive

TTT++ This is an official implementation for TTT++: When Does Self-supervised Test-time Training Fail or Thrive? TL;DR: Online Feature Alignment + Str

VITA lab at EPFL 39 Dec 25, 2022
Code repository for the paper "Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation" with instructions to reproduce the results.

Doubly Trained Neural Machine Translation System for Adversarial Attack and Data Augmentation Languages Experimented: Data Overview: Source Target Tra

Steven Tan 1 Aug 18, 2022
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
Face-Recognition-Attendence-System - This face recognition Attendence system using Python

Face-Recognition-Attendence-System I have developed this face recognition Attend

Riya Gupta 4 May 10, 2022
A curated list of awesome Deep Learning tutorials, projects and communities.

Awesome Deep Learning Table of Contents Books Courses Videos and Lectures Papers Tutorials Researchers Websites Datasets Conferences Frameworks Tools

Christos 20k Jan 05, 2023
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 8 Oct 03, 2022
Data cleaning, missing value handle, EDA use in this project

Lending Club Case Study Project Brief Solving this assignment will give you an idea about how real business problems are solved using EDA. In this cas

Dhruvil Sheth 1 Jan 05, 2022
Unadversarial Examples: Designing Objects for Robust Vision

Unadversarial Examples: Designing Objects for Robust Vision This repository contains the code necessary to replicate the major results of our paper: U

Microsoft 93 Nov 28, 2022
[ICML 2021] “ Self-Damaging Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Self-Damaging Contrastive Learning Introduction The recent breakthrough achieved by contrastive learning accelerates the pace for deploying unsupervis

VITA 51 Dec 29, 2022
Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

CoMIR: Contrastive Multimodal Image Representation for Registration Framework 🖼 Registration of images in different modalities with Deep Learning 🤖

Methods for Image Data Analysis - MIDA 55 Dec 09, 2022
Everything you want about DP-Based Federated Learning, including Papers and Code. (Mechanism: Laplace or Gaussian, Dataset: femnist, shakespeare, mnist, cifar-10 and fashion-mnist. )

Differential Privacy (DP) Based Federated Learning (FL) Everything about DP-based FL you need is here. (所有你需要的DP-based FL的信息都在这里) Code Tip: the code o

wenzhu 83 Dec 24, 2022
Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

transformer-from-scratch Code for my Medium blog post: Transformers from Scratch in PyTorch Note: This Transformer code does not include masked attent

Frank Odom 27 Dec 21, 2022
Minimal PyTorch implementation of YOLOv3

A minimal PyTorch implementation of YOLOv3, with support for training, inference and evaluation.

Erik Linder-Norén 6.9k Dec 29, 2022