SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

Overview

SageMaker

SageMaker Python SDK

Latest Version Supported Python Versions Code style: black Documentation Status

SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

With the SDK, you can train and deploy models using popular deep learning frameworks Apache MXNet and TensorFlow. You can also train and deploy models with Amazon algorithms, which are scalable implementations of core machine learning algorithms that are optimized for SageMaker and GPU training. If you have your own algorithms built into SageMaker compatible Docker containers, you can train and host models using these as well.

For detailed documentation, including the API reference, see Read the Docs.

Table of Contents

  1. Installing SageMaker Python SDK
  2. Using the SageMaker Python SDK
  3. Using MXNet
  4. Using TensorFlow
  5. Using Chainer
  6. Using PyTorch
  7. Using Scikit-learn
  8. Using XGBoost
  9. SageMaker Reinforcement Learning Estimators
  10. SageMaker SparkML Serving
  11. Amazon SageMaker Built-in Algorithm Estimators
  12. Using SageMaker AlgorithmEstimators
  13. Consuming SageMaker Model Packages
  14. BYO Docker Containers with SageMaker Estimators
  15. SageMaker Automatic Model Tuning
  16. SageMaker Batch Transform
  17. Secure Training and Inference with VPC
  18. BYO Model
  19. Inference Pipelines
  20. Amazon SageMaker Operators in Apache Airflow
  21. SageMaker Autopilot
  22. Model Monitoring
  23. SageMaker Debugger
  24. SageMaker Processing

Installing the SageMaker Python SDK

The SageMaker Python SDK is built to PyPI and can be installed with pip as follows:

pip install sagemaker

You can install from source by cloning this repository and running a pip install command in the root directory of the repository:

git clone https://github.com/aws/sagemaker-python-sdk.git
cd sagemaker-python-sdk
pip install .

Supported Operating Systems

SageMaker Python SDK supports Unix/Linux and Mac.

Supported Python Versions

SageMaker Python SDK is tested on:

  • Python 3.6
  • Python 3.7
  • Python 3.8

AWS Permissions

As a managed service, Amazon SageMaker performs operations on your behalf on the AWS hardware that is managed by Amazon SageMaker. Amazon SageMaker can perform only operations that the user permits. You can read more about which permissions are necessary in the AWS Documentation.

The SageMaker Python SDK should not require any additional permissions aside from what is required for using SageMaker. However, if you are using an IAM role with a path in it, you should grant permission for iam:GetRole.

Licensing

SageMaker Python SDK is licensed under the Apache 2.0 License. It is copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. The license is available at: http://aws.amazon.com/apache2.0/

Running tests

SageMaker Python SDK has unit tests and integration tests.

You can install the libraries needed to run the tests by running pip install --upgrade .[test] or, for Zsh users: pip install --upgrade .\[test\]

Unit tests

We run unit tests with tox, which is a program that lets you run unit tests for multiple Python versions, and also make sure the code fits our style guidelines. We run tox with all of our supported Python versions, so to run unit tests with the same configuration we do, you need to have interpreters for those Python versions installed.

To run the unit tests with tox, run:

tox tests/unit

Integrations tests

To run the integration tests, the following prerequisites must be met

  1. AWS account credentials are available in the environment for the boto3 client to use.
  2. The AWS account has an IAM role named SageMakerRole. It should have the AmazonSageMakerFullAccess policy attached as well as a policy with the necessary permissions to use Elastic Inference.

We recommend selectively running just those integration tests you'd like to run. You can filter by individual test function names with:

tox -- -k 'test_i_care_about'

You can also run all of the integration tests by running the following command, which runs them in sequence, which may take a while:

tox -- tests/integ

You can also run them in parallel:

tox -- -n auto tests/integ

Git Hooks

to enable all git hooks in the .githooks directory, run these commands in the repository directory:

find .git/hooks -type l -exec rm {} \;
find .githooks -type f -exec ln -sf ../../{} .git/hooks/ \;

To enable an individual git hook, simply move it from the .githooks/ directory to the .git/hooks/ directory.

Building Sphinx docs

Setup a Python environment, and install the dependencies listed in doc/requirements.txt:

# conda
conda create -n sagemaker python=3.7
conda activate sagemaker
conda install sphinx=3.1.1 sphinx_rtd_theme=0.5.0

# pip
pip install -r doc/requirements.txt

Clone/fork the repo, and install your local version:

pip install --upgrade .

Then cd into the sagemaker-python-sdk/doc directory and run:

make html

You can edit the templates for any of the pages in the docs by editing the .rst files in the doc directory and then running make html again.

Preview the site with a Python web server:

cd _build/html
python -m http.server 8000

View the website by visiting http://localhost:8000

SageMaker SparkML Serving

With SageMaker SparkML Serving, you can now perform predictions against a SparkML Model in SageMaker. In order to host a SparkML model in SageMaker, it should be serialized with MLeap library.

For more information on MLeap, see https://github.com/combust/mleap .

Supported major version of Spark: 2.4 (MLeap version - 0.9.6)

Here is an example on how to create an instance of SparkMLModel class and use deploy() method to create an endpoint which can be used to perform prediction against your trained SparkML Model.

sparkml_model = SparkMLModel(model_data='s3://path/to/model.tar.gz', env={'SAGEMAKER_SPARKML_SCHEMA': schema})
model_name = 'sparkml-model'
endpoint_name = 'sparkml-endpoint'
predictor = sparkml_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)

Once the model is deployed, we can invoke the endpoint with a CSV payload like this:

payload = 'field_1,field_2,field_3,field_4,field_5'
predictor.predict(payload)

For more information about the different content-type and Accept formats as well as the structure of the schema that SageMaker SparkML Serving recognizes, please see SageMaker SparkML Serving Container.

Comments
  • feature: Adding serial inference pipeline support to RegisterModel Step

    feature: Adding serial inference pipeline support to RegisterModel Step

    Issue #, if available: #2291, #2014, #2485

    Description of changes: This change enables multiple models to be packed together and registered as a single model package version, so that the models could be run as an inference pipeline.

    • Updated the inputs for RegisterModelStep to also take a model object or a PipelineModel for Serial Inference pipeline
    • Performs repack in a loop for containers in a serial inference pipeline, if required.
    • Ability to invoke RegisterModelStep without an Estimator.
    • The environment variables are auto-updated by the framework classes into the model package

    Testing done: test_register_model_sip - Able to add more than 1 model during registration of model package.

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [X] I have read the CONTRIBUTING doc
    • [X] I used the commit message format described in CONTRIBUTING
    • [ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [X] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [X] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by sreedes 198
  • feature: processors that support multiple Python files, requirements.txt, and dependencies.

    feature: processors that support multiple Python files, requirements.txt, and dependencies.

    Issue #, if available: #1248, #2117

    Description of changes: Propose processing classes that are feature-parity with estimator. These classes allow SDK users to runn a Python job that consists of multiple Python scripts, requirements.txt and additional dependencies.

    Documentation provided as docstrings.

    Testing done: on my own AWS account, ran processing jobs using the proposed classes (FrameworkProcessor and its subclasses) -- the testing scripts are located here, and usage is as outlined as in #1248 (this comment).

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [x] I have read the CONTRIBUTING doc
    • [x] I used the commit message format described in CONTRIBUTING
    • [ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [x] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [x] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [x] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by verdimrc 175
  • feature: Inferentia Neuron support for HuggingFace

    feature: Inferentia Neuron support for HuggingFace

    Issue #, if available:

    Description of changes:

    Added the necessary changes to incorporate the inf-neuron/hf image into the huggigface framework

    Testing done:

    • Locally tested the feature to extract image string with the following code, where the output matched with the released DLC image
    import sagemaker, boto3
    sagemaker.image_uris.retrieve(framework="huggingface",
        region=boto3.Session().region_name,
        version="4.12.3",py_version="py37",
        base_framework_version="pytorch1.9.1",
        inference_tool="neuron",
        image_scope="inference")
    '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuron:1.9.1-transformers4.12.3-neuron-py37-sdk1.17.1-ubuntu18.04'
    
    
    from sagemaker.huggingface import HuggingFaceModel
    import sagemaker
    import boto3
    sess = sagemaker.Session()
    
    role = sagemaker.get_execution_role()
    
    huggingface_model = HuggingFaceModel(
        model_data='s3://sagemaker-us-west-2-267274314323/hf-sagemaker-inf/model.tar.gz',
    	transformers_version='4.12',
    	pytorch_version='1.9',
    	py_version='py37',
    	role=role, 
        sagemaker_session=sess
    )
    
    huggingface_model._is_compiled_model = True
    huggingface_model.prepare_container_def("ml.inf.xlarge")
    
    {'Image': '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuron:1.9.1-transformers4.12.3-neuron-py37-sdk1.17.1-ubuntu18.04', 'Environment': {'SAGEMAKER_PROGRAM': '', 'SAGEMAKER_SUBMIT_DIRECTORY': '', 'SAGEMAKER_CONTAINER_LOG_LEVEL': '20', 'SAGEMAKER_REGION': 'us-west-2'}, 'ModelDataUrl': 's3://hf-sagemaker-inference/inferentia/model.tar.gz'}
    
    
    import sagemaker, boto3
    from sagemaker.huggingface import HuggingFaceModel
    sess = sagemaker.Session()
    role = sagemaker.get_execution_role()
    huggingface_model = HuggingFaceModel(
        model_data='s3://sagemaker-us-west-2-267274314323/hf-sagemaker-inf/model.tar.gz',                
        transformers_version='4.12.3',
        pytorch_version='1.9.1',
        py_version='py37',
        role=role, 
        sagemaker_session=sess)
    huggingface_model._is_compiled_model = True
    predictor = huggingface_model.deploy(initial_instance_count=1, instance_type="ml.inf1.xlarge")
    huggingface_model.image_uri
    
    
    '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuron:1.9.1-transformers4.12.3-neuron-py37-sdk1.17.1-ubuntu18.04'
    
    
    • Tested the changes in this PR locally, where it passed all the tests (attached the local output below)
    JT:sagemaker-python-sdk jeniyat$ ./.githooks/pre-push
    GLOB sdist-make: /Users/jeniyat/Desktop/HuggingFace/source_repo/sagemaker-python-sdk/setup.py
    ✔ OK black-check in 10.19 seconds
    ✔ OK twine in 12.616 seconds
    ✔ OK pylint in 22.618 seconds
    ✔ OK docstyle in 24.769 seconds
    ✔ OK flake8 in 41.238 seconds
    __________________________________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________________________________
      flake8: commands succeeded
      pylint: commands succeeded
      docstyle: commands succeeded
      black-check: commands succeeded
      twine: commands succeeded
      congratulations :)
    =================== flake8,pylint,docstyle,black-check,twine execution time ===================
    44 seconds
    
    GLOB sdist-make: /Users/jeniyat/Desktop/HuggingFace/source_repo/sagemaker-python-sdk/setup.py
    ✔ OK doc8 in 7.752 seconds
    ✔ OK sphinx in 4 minutes, 57.083 seconds
    __________________________________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________________________________
      sphinx: commands succeeded
      doc8: commands succeeded
      congratulations :)
    =================== sphinx,doc8 execution time ===================
    4 minutes and 59 seconds
    

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [x] I have read the CONTRIBUTING doc
    • [x] I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
    • [x] I used the commit message format described in CONTRIBUTING
    • [x] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [x] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [x] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [x] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
    • [x] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [x] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    HuggingFace 
    opened by jeniyat 136
  • feature: Add ModelStep for SageMaker Model Building Pipeline

    feature: Add ModelStep for SageMaker Model Building Pipeline

    Description of changes: feature: Add ModelStep for SageMaker Model Building Pipeline

    Testing done: unit tests and integ tests

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [X] I have read the CONTRIBUTING doc
    • [X] I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
    • [X] I used the commit message format described in CONTRIBUTING
    • [X] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [X] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [X] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
    • [X] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [X] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by qidewenwhen 113
  • change: add type annotations for Lineage

    change: add type annotations for Lineage

    Issue #, if available: https://sim.amazon.com/issues/AML-96242

    Description of changes: Added Type Annotations for .py files under lineage directory

    Testing done:

    tox -e py39 -- tests/unit/sagemaker/lineage

    collected 45 items

    tests/unit/sagemaker/lineage/test_action.py ........... [ 24%] tests/unit/sagemaker/lineage/test_artifact.py ............ [ 51%] tests/unit/sagemaker/lineage/test_association.py ....... [ 66%] tests/unit/sagemaker/lineage/test_context.py ........... [ 91%] tests/unit/sagemaker/lineage/test_dataset_artifact.py . [ 93%] tests/unit/sagemaker/lineage/test_endpoint_context.py . [ 95%] tests/unit/sagemaker/lineage/test_model_artifact.py . [ 97%] tests/unit/sagemaker/lineage/test_visualizer.py . [100%]

    -> tox -e py39 -- tests/integ/sagemaker/lineage

    collected 28 items

    tests/integ/sagemaker/lineage/test_action.py ....... [ 25%] tests/integ/sagemaker/lineage/test_artifact.py ........ [ 53%] tests/integ/sagemaker/lineage/test_association.py sss [ 64%] tests/integ/sagemaker/lineage/test_context.py ....... [ 89%] tests/integ/sagemaker/lineage/test_dataset_artifact.py . [ 92%] tests/integ/sagemaker/lineage/test_endpoint_context.py . [ 96%] tests/integ/sagemaker/lineage/test_model_artifact.py . [100%]

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [ ] I have read the CONTRIBUTING doc
    • [ ] I used the commit message format described in CONTRIBUTING
    • [ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [ ] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [ ] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by stisac 106
  • change: add data wrangler image uri

    change: add data wrangler image uri

    Issue #, if available: N/A

    Description of changes:

    Adding SageMaker Data Wrangler image URL configs.

    Testing done: tox

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [x] I have read the CONTRIBUTING doc
    • [x] I used the commit message format described in CONTRIBUTING
    • [] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [x] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [ ] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by chenliu0831 103
  • feature: add HuggingFace model and predictor

    feature: add HuggingFace model and predictor

    Issue #, if available:

    Description of changes: created model.py for HF. updated image uris and integ test. Testing done: unit tests

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [x] I have read the CONTRIBUTING doc
    • [x] I used the commit message format described in CONTRIBUTING
    • [ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [x] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [x] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by ahsan-z-khan 102
  • fix: jumpstart amt tracking

    fix: jumpstart amt tracking

    Issue #, if available:

    Description of changes: Hyperparameter Tuning jobs launched with JumpStart artifacts (scripts, model) will be tagged with the artifact uris and the base name will also be modified to "sagemaker-jumpstart".

    Testing done:

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [ ] I have read the CONTRIBUTING doc
    • [ ] I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
    • [ ] I used the commit message format described in CONTRIBUTING
    • [ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [ ] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [ ] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
    • [ ] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by evakravi 100
  • feature: Support for remote docker host

    feature: Support for remote docker host

    *Issue #, if available:

    *Description of changes: This PR adds support for remote docker host when using SageMaker Python SDK in local mode by changing hardcoded value "localhost" with sagemaker.utils.local.get_docker_host(). This allows using remote docker server.

    Testing done:

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [x] I have read the CONTRIBUTING doc
    • [x] I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
    • [x] I used the commit message format described in CONTRIBUTING
    • [x] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [x] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [x] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [x] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
    • [x] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [x] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by awssamdwar 100
  • Lzao/debugger rule removal

    Lzao/debugger rule removal

    Issue #, if available: This is a draft.

    Description of changes:

    This changes adds a new field DisableProfiler in ProfilerConfig. It also disables the default profiler rules. The changes can be merged after the service package changes are deployed.

    Testing done: All the relevant unit/integration tests are updated accordingly to reflect the changes.

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [x] I have read the CONTRIBUTING doc
    • [ ] I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
    • [x] I used the commit message format described in CONTRIBUTING
    • [ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [x] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [ ] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
    • [ ] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    do-not-merge 
    opened by zaoliu-aws 99
  • fix: update `sagemaker.serverless` integration test

    fix: update `sagemaker.serverless` integration test

    Issue #, if available:

    Description of changes: "CODEBUILD_BUILD_ID" not in os.environ was evaluating True during builds, so the test was getting skipped.

    Made two fixes:

    • Stored cat image in S3 (it was previously hosted at some third-party website)
    • Changed the condition for skipping the test

    Note that this branch is based on the branch bveeramani:rename-delete-endpoint. See #2529.

    Testing done: Integration test.

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [x] I have read the CONTRIBUTING doc
    • [x] I used the commit message format described in CONTRIBUTING
    • [ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [x] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [ ] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [x] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    bug serverless 
    opened by bveeramani 92
  • ParameterString should be able to be an empty string

    ParameterString should be able to be an empty string

    Describe the feature you'd like sagemaker.workflow.parameters.ParameterString should be able to be an empty string as a default value, and also be able to receive empty string as an input. In sagemaker local mode we are able to put empty string as parameter to the script processor, so it is unclear why in local mode it is working but in the cloud we will get an error.

    Describe alternatives you've considered We are putting the string "(empty)" to represent empty values, Seems redundant.

    opened by idanmoradarthas 0
  • Lambda boto3 wrappers doesn't have all the option that we need to create a lambda

    Lambda boto3 wrappers doesn't have all the option that we need to create a lambda

    Describe the feature you'd like We need that sagemaker.lambda_helper.Lambda will have more options like the aws lambda create-function cli command. The missing options that we need are:

    • environment
    • vpc-config
    • architectures

    How would this feature be used? Please describe. The class sagemaker.lambda_helper.Lambda will have the missing attributes.

    Describe alternatives you've considered What we do today is to create the lambda as part of the automation of our pipeline object upsert in codepipline, and we have a step in the pipeline for the same particular lambda in order it to be execute as part of the pipeline.

    Additional context https://docs.aws.amazon.com/cli/latest/reference/lambda/create-function.html https://sagemaker.readthedocs.io/en/stable/api/utility/lambda_helper.html

    opened by idanmoradarthas 0
  • I have lost all my jobs / runs / trials since last week

    I have lost all my jobs / runs / trials since last week

    Hello,

    I am not sure if this related to a recent update but as of last week there was an update in the UI and now I can't find the tab with "Experiments / trial names / jobs". There's only an "Experiment" tab and it says "no Jobs" even though there was a lot of jobs before.

    Also when i try to read the doc there is nothing that looks like what I was using before. Here are images of the script I was using to launch my jobs.

    Can anyone explain what is happening ?

    1

    2

    3

    4

    bug 
    opened by arminvburren 0
  • ScriptProcessor does not check local_code config before uploading code to S3

    ScriptProcessor does not check local_code config before uploading code to S3

    Describe the bug When a LocalSession or LocalPipelineSession is configured to use local code, as follows

    session.config = {'local': {'local_code': True}}
    

    the code passed to a pipeline ProcessingStep or directly to the run method of a processor (ScriptProcessor, FrameworkProcessor, ...) should not be uploaded to S3.

    However, ScriptProcessor does not honor this. Its _include_code_in_inputs method (which is called unconditionally by the _normalize_args of the base class Processor, which in turn is called both when running directly and through a pipeline) unconditionally tries to upload the code to S3. https://github.com/aws/sagemaker-python-sdk/blob/554952eac259979dc714a1a9002653ced342b876/src/sagemaker/processing.py#L625

    Compare this to the Model class, used for example in the TrainingStep. Its _upload_code method checks the session configuration and does not upload to S3 when local code is enabled. https://github.com/aws/sagemaker-python-sdk/blob/554952eac259979dc714a1a9002653ced342b876/src/sagemaker/model.py#L532

    To reproduce In the absence of any AWS credentials (which should not be needed when running completely locally), the following code will fail to upload the processing.py script to S3 (botocore.exceptions.NoCredentialsError). Note that, in addition to the following code, a processing.py file must exist in the working directory (but its contents don't matter).

    Code
    import boto3
    import sagemaker
    from sagemaker.workflow.pipeline import Pipeline
    from sagemaker.workflow.pipeline_context import LocalPipelineSession
    from sagemaker.processing import ProcessingInput, ProcessingOutput, ScriptProcessor
    from sagemaker.workflow.steps import ProcessingStep
    
    role = 'arn:aws:iam::123456789012:role/MyRole'
    
    local_pipeline_session = LocalPipelineSession(boto_session = boto3.Session(region_name = 'eu-west-1'))
    local_pipeline_session.config = {'local': {'local_code': True}}
    
    script_processor = ScriptProcessor(
        image_uri = 'docker.io/library/python:3.8',
        command = ['python'],
        instance_type = 'local',
        instance_count = 1,
        sagemaker_session = local_pipeline_session,
        role = role,
    )
    
    processing_step = ProcessingStep(
        name = 'Processing Step',
        processor = script_processor,
        code = 'processing.py',
        inputs = [
            ProcessingInput(
                source = './input-data',
                destination = '/opt/ml/processing/input',
            )
        ],
        outputs = [
            ProcessingOutput(
                source = '/opt/ml/processing/output',
                destination = './output-data',
            )
        ],
    )
    
    pipeline = Pipeline(
        name = 'MyPipeline',
        steps = [processing_step],
        sagemaker_session = local_pipeline_session
    )
    
    pipeline.upsert(role_arn = role)
    
    pipeline_run = pipeline.start()
    

    System information A description of your system. Please provide:

    • SageMaker Python SDK version: 2.126.0
    bug 
    opened by lodo1995 0
  • Fixed hashing problem for frameworkprocessors with identical source d…

    Fixed hashing problem for frameworkprocessors with identical source d…

    Describe the bug When a pipeline has more than one FrameworkProcessor which have an identical source directory and identical dependencies but different entry points, then some of these FrameworkProcessors will use the incorrect entry point. They will all use the same entry point, the one of the processing job (which uses a FrameworkProcessor) that was created last during pipeline creation.

    To reproduce Create a pipeline which has multiple steps that have a FrameworkProcessor. Make sure these processors use the same source directory and have the same dependencies (at least those dependencies which are used in generating the hash which is used for the s3 uri, they are specified in src/workflow/utilities.py in the get_code_hash method) but use a different entry point. What you will see is that all of the processing jobs, using a FrameworkProcessors with these nearly identical arguments, will generate the same s3 uri for uploading of the source directory (sourcedir.tar.gz) and run script (runproc). Hence, all but the last of these processing jobs will be using the wrong entry point (the source directory will also be overwritten every time but since they are all equal this doesn't lead to a problem) as they are overwritten every time.

    Expected behavior That each of the processing jobs starts executing the entry point specified in the processor.

    System information A description of your system. Please provide:

    • SageMaker Python SDK version: 2.125.0
    • Framework name (eg. PyTorch) or algorithm (eg. KMeans): TensorFlow
    • Framework version: 2.9
    • Python version: 3.9
    • CPU or GPU: CPU
    • Custom Docker image (Y/N): N

    Additional context I have written a fix for my problem which involves including the code in the hash that generates the s3 uri to ensure that the uris are different. After testing my pipeline with the fix, the problem no longer occurred.

    Description of changes: I have included an if statement which checks whether code is defined, if it is it will try to use it to generate the hash, alongside the already existing inputs.

    Testing done: Tested using a pipeline that could not execute entirely as a result of the described issue, after the fix it was able to execute.

    Merge Checklist

    Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

    General

    • [ ] I have read the CONTRIBUTING doc
    • [ ] I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
    • [ ] I used the commit message format described in CONTRIBUTING
    • [ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
    • [ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

    Tests

    • [ ] I have added tests that prove my fix is effective or that my feature works (if appropriate)
    • [ ] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
    • [ ] I have checked that my tests are not configured for a specific region or account (if appropriate)
    • [ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by SeppeHannen 0
  • TransformStep transforms files it should not

    TransformStep transforms files it should not

    Describe the bug When using the output of a processing step as input to a transformation step (step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri), the transformation step transforms files from directories at the same level as the input directory (test_small_sample_with_info). It should not.

    To reproduce

    [...]
    step_process_args = pyspark_processor.run(
        submit_app="source/preprocess.py",
        submit_py_files=["source/preprocess_utils.py",
                         "source/spark_utils.py"],
        outputs=[
            ProcessingOutput(
                output_name="train",
                source="/opt/ml/processing/output/train",
                destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "train"]),
            ),
            ProcessingOutput(
                output_name="validation",
                source="/opt/ml/processing/output/validation",
                destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "validation"]),
            ),
            ProcessingOutput(
                output_name="test",
                source="/opt/ml/processing/output/test",
                destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "test"]),
            ),
            ProcessingOutput(
                output_name="test_small_sample",
                source="/opt/ml/processing/output/test_small_sample",
                destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "test_small_sample"]),
            ),
            ProcessingOutput(
                output_name="train_with_info",
                source="/opt/ml/processing/output/train_with_info",
                destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "train_with_info"]),
            ),
            ProcessingOutput(
                output_name="validation_with_info",
                source="/opt/ml/processing/output/validation_with_info",
                destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "validation_with_info"]),
            ),
            ProcessingOutput(
                output_name="test_with_info",
                source="/opt/ml/processing/output/test_with_info",
                destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "test_with_info"]),
            ),
            ProcessingOutput(
                output_name="test_small_sample_with_info",
                source="/opt/ml/processing/output/test_small_sample_with_info",
                destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "test_small_sample_with_info"]),
            ),
        ],
        arguments=[
            "--aws_account",
            aws_account,
            "--aws_env",
            aws_env,
            "--project_name",
            project_name,
            "--mode",
            "training",
        ],
    )
    
    step_process = ProcessingStep(
        name="PySparkPreprocessing", step_args=step_process_args,
    
    [...]
    
    transformer = Transformer(
        model_name=model_step.properties.ModelName,
        instance_count=transformer_instance_count,
        instance_type=transformer_instance_type,
        strategy="MultiRecord",
        assemble_with="Line",
        output_path=s3_training_pipeline_transform_output_path,
        accept="text/csv",
        max_concurrent_transforms=max_concurrent_transforms,
        max_payload=max_payload,
        sagemaker_session=pipeline_session,
        base_transform_job_name=[MASKED],
    )
    
    step_transform = TransformStep(
        name=[MASKED],
        transformer=transformer,
        inputs=TransformInput(
            data=step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
            content_type="text/csv",
            split_type="Line",
            input_filter="$[1:]",
        ),
        depends_on=[model_step],
        cache_config=cache_config,
    )
    

    Expected behavior Transform only the csv file in the test dir.

    Screenshots or logs image

    test_small_sample_with_info should not be transformed.

    System information A description of your system. Please provide:

    • SageMaker Python SDK version: 2.125.0
    • Framework name (eg. PyTorch) or algorithm (eg. KMeans):
    • Framework version:
    • Python version: 3.9.12
    • CPU or GPU: CPU
    • Custom Docker image (Y/N): Y

    Additional context A workaround consists in specifying the URI to the csv file like this : inputs=TransformInput( data=Join( on="/", values=[step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri, "data.csv"], ), [...] But it is not the way proposed in this example : sagemaker-pipeline-model-monitor-clarify-steps

    bug 
    opened by HarryPommier 0
Releases(v2.126.0)
  • v2.126.0(Dec 22, 2022)

    Features

    • AutoGluon 0.6.1 image_uris

    Bug Fixes and Other Changes

    • Fix broken link in doc
    • Do not specify S3 path for disabled profiler

    Documentation Changes

    • fix the incorrect property reference
    Source code(tar.gz)
    Source code(zip)
  • v2.125.0(Dec 19, 2022)

    Features

    • add RandomSeed to support reproducible HPO

    Bug Fixes and Other Changes

    • Correct SageMaker Clarify API docstrings by changing JSONPath to JMESPath
    Source code(tar.gz)
    Source code(zip)
  • v2.124.0(Dec 16, 2022)

    Features

    • Doc update for TableFormatEnum
    • Add p4de to smddp supported instance types
    • Add disable_profiler field in config and propagate changes
    • Added doc update for dataset builder

    Bug Fixes and Other Changes

    • Use Async Inference Config when available for endpoint update

    Documentation Changes

    • smdistributed libraries release notes
    Source code(tar.gz)
    Source code(zip)
  • v2.123.0(Dec 15, 2022)

  • v2.122.0(Dec 14, 2022)

    Features

    • Feature Store dataset builder, delete_record, get_record, list_feature_group
    • Add OSU region to frameworks for DLC

    Bug Fixes and Other Changes

    • the Hyperband support fix for the HPO
    • unpin packaging version
    • Remove content type image/jpg from analysis configuration schema
    Source code(tar.gz)
    Source code(zip)
  • v2.121.2(Dec 12, 2022)

    Bug Fixes and Other Changes

    • Update for Tensorflow Serving 2.11 inference DLCs
    • Revert "fix: type hint of PySparkProcessor init"
    • Skip Bad Transform Test
    Source code(tar.gz)
    Source code(zip)
  • v2.121.1(Dec 9, 2022)

  • v2.121.0(Dec 8, 2022)

    Features

    • Algorithms Region Expansion OSU/DXB

    Bug Fixes and Other Changes

    • FrameworkProcessor S3 uploads
    • Add constraints file for apache-airflow
    Source code(tar.gz)
    Source code(zip)
  • v2.120.0(Dec 7, 2022)

    Features

    • Add Neo image uri config for Pytorch 1.12
    • Adding support for SageMaker Training Compiler in PyTorch estimator starting 1.12
    • Update registries with new region account number mappings.
    • Add DXB region to frameworks by DLC

    Bug Fixes and Other Changes

    • support idempotency for framework and spark processors
    Source code(tar.gz)
    Source code(zip)
  • v2.119.0(Dec 3, 2022)

    Features

    • Add Code Owners file
    • Added transform with monitoring pipeline step in transformer
    • Update TF 2.9 and TF 2.10 inference DLCs
    • make estimator accept json file as modelparallel config
    • SageMaker Training Compiler does not support p4de instances
    • Add support for SparkML v3.3

    Bug Fixes and Other Changes

    • Fix bug forcing uploaded tar to be named sourcedir
    • Update local_requirements.txt PyYAML version
    • refactoring : using with statement
    • Allow Py 3.7 for MMS Test Docker env
    • fix PySparkProcessor init params type
    • type hint of PySparkProcessor init
    • Return ARM XGB/SKLearn tags if image_scope is inference_graviton
    • Update scipy to 1.7.3 to support M1 development envs
    • Fixing type hints for Spark processor that has instance type/count params in reverse order
    • Add DeepAR ap-northeast-3 repository.
    • Fix AsyncInferenceConfig documentation typo
    • fix ml_inf to ml_inf1 in Neo multi-version support
    • Fix type annotations
    • add neo mvp region accounts
    Source code(tar.gz)
    Source code(zip)
  • v2.118.0(Dec 1, 2022)

    Features

    • Update boto3 version to 1.26.20
    • support table format option for create feature group.
    • Support Amazon SageMaker Model Cards
    • support monitoring alerts api
    • Support Amazon SageMaker AutoMLStep

    Bug Fixes and Other Changes

    • integration test in anticipate of ProfilerConfig API changes
    • Add more integ test logic for AutoMLStep
    • update get_execution_role_arn to use role from DefaultSpaceSettings
    • bug on AutoMLInput to allow PipelineVariable
    • FinalMetricDataList is missing from the training job search resu…
    • add integration tests for Model Card
    • update AutoMLStep with cache improvement

    Documentation Changes

    • automlstep doc update
    Source code(tar.gz)
    Source code(zip)
  • v2.117.0(Nov 15, 2022)

  • v2.116.0(Oct 28, 2022)

    Features

    • support customized timeout for model data download and inference container startup health check for Hosting Endpoints
    • Trainium Neuron support for PyTorch
    • Pipelines cache keys update
    • Caching Improvements for SM Pipeline Workflows
    Source code(tar.gz)
    Source code(zip)
  • v2.115.0(Oct 27, 2022)

    Features

    • Add support for TF 2.10 training
    • Disable profiler for Trainium instance type
    • support the Hyperband strategy with the StrategyConfig
    • support the GridSearch strategy for hyperparameter optimization

    Bug Fixes and Other Changes

    • Update Graviton supported instance families
    Source code(tar.gz)
    Source code(zip)
  • v2.114.0(Oct 26, 2022)

    Features

    • Graviton support for XGB and SKLearn frameworks
    • Graviton support for PyTorch and Tensorflow frameworks
    • do not expand estimator role when it is pipeline parameter
    • added support for batch transform with model monitoring

    Bug Fixes and Other Changes

    • regex in tuning integs
    • remove debugger environment var set up
    • adjacent slash in s3 key
    • Fix Repack step auto install behavior
    • Add retry for airflow ParsingError

    Documentation Changes

    • doc fix
    Source code(tar.gz)
    Source code(zip)
  • v2.113.0(Oct 21, 2022)

    Features

    • support torch_distributed distribution for Trainium instances

    Bug Fixes and Other Changes

    • bump apache-airflow from 2.4.0 to 2.4.1 in /requirements/extras

    Documentation Changes

    • fix kwargs and descriptions of the smdmp checkpoint function
    • add the doc for the MonitorBatchTransformStep
    Source code(tar.gz)
    Source code(zip)
  • v2.112.2(Oct 11, 2022)

  • v2.112.1(Oct 10, 2022)

    Bug Fixes and Other Changes

    • fix(local-mode): loosen docker requirement to allow 6.0.0
    • CreateModelPackage API error for Scikit-learn and XGBoost frameworkss
    Source code(tar.gz)
    Source code(zip)
  • v2.112.0(Oct 9, 2022)

    Features

    • added monitor batch transform step (pipeline)

    Bug Fixes and Other Changes

    • Add PipelineVariable annotation to framework estimators
    Source code(tar.gz)
    Source code(zip)
  • v2.111.0(Oct 5, 2022)

    Features

    • Edit test file for supporting TF 2.10 training

    Bug Fixes and Other Changes

    • support kms key in processor pack local code
    • security issue by bumping apache-airflow from 2.3.4 to 2.4.0
    • instance count retrieval logic
    • Add regex for short-form sagemaker-xgboost tags
    • Upgrade attrs>=20.3.0,<23
    • Add PipelineVariable annotation to Amazon estimators

    Documentation Changes

    • add context for pytorch
    Source code(tar.gz)
    Source code(zip)
  • v2.110.0(Sep 27, 2022)

    Features

    • Support KeepAlivePeriodInSeconds for Training APIs
    • added ANALYSIS_CONFIG_SCHEMA_V1_0 in clarify
    • add model monitor image accounts for ap-southeast-3

    Bug Fixes and Other Changes

    • huggingface release test
    • Fixing the logic to return instanceCount for heterogeneousClusters
    • Disable type hints in doc signature and add PipelineVariable annotations in docstring
    • estimator hyperparameters in script mode

    Documentation Changes

    • Added link to example notebook for Pipelines local mode
    Source code(tar.gz)
    Source code(zip)
  • v2.109.0(Sep 9, 2022)

    Features

    • add search filters

    Bug Fixes and Other Changes

    • local pipeline step argument parsing bug
    • support fail_on_violation flag for check steps
    • fix links per app security scan
    • Add PipelineVariable annotation for all processor subclasses

    Documentation Changes

    • the SageMaker model parallel library 1.11.0 release
    Source code(tar.gz)
    Source code(zip)
  • v2.108.0(Sep 2, 2022)

    Features

    • Adding support in HuggingFace estimator for Training Compiler enhanced PyTorch 1.11

    Bug Fixes and Other Changes

    • add sagemaker clarify image account for cgk region
    • set PYTHONHASHSEED env variable to fixed value to fix intermittent failures in release pipeline
    • trcomp fixtures to override default fixtures for integ tests

    Documentation Changes

    • add more info about volume_size
    Source code(tar.gz)
    Source code(zip)
  • v2.107.0(Aug 29, 2022)

    Features

    • support python 3.10, update airflow dependency

    Bug Fixes and Other Changes

    • Add retry in session.py to check if training is finished

    Documentation Changes

    • remove Other tab in Built-in algorithms section and mi…
    Source code(tar.gz)
    Source code(zip)
  • v2.106.0(Aug 24, 2022)

    Features

    • Implement Kendra Search in RTD website

    Bug Fixes and Other Changes

    • Add primitive_or_expr() back to conditions
    • remove specifying env-vars when creating model from model package
    • Add CGK in config for Spark Image
    Source code(tar.gz)
    Source code(zip)
  • v2.105.0(Aug 19, 2022)

    Features

    • Added endpoint_name to clarify.ModelConfig
    • adding workgroup functionality to athena query

    Bug Fixes and Other Changes

    • disable debugger/profiler in cgk region
    • using unique name for lineage test to unblock PR checks

    Documentation Changes

    • update first-party algorithms and structural updates
    Source code(tar.gz)
    Source code(zip)
  • v2.104.0(Aug 17, 2022)

    Features

    • local mode executor implementation
    • Pipelines local mode setup
    • Add PT 1.12 support
    • added _AnalysisConfigGenerator for clarify

    Bug Fixes and Other Changes

    • yaml safe_load sagemaker config
    • pipelines local mode minor bug fixes
    • add local mode integ tests
    • implement local JsonGet function
    • Add Pipeline annotation in model base class and tensorflow estimator
    • Allow users to customize trial component display names for pipeline launched jobs
    • Update localmode code to decode urllib response as UTF8

    Documentation Changes

    • New content for Pipelines local mode
    • Correct documentation error
    Source code(tar.gz)
    Source code(zip)
  • v2.103.0(Aug 5, 2022)

    Features

    • AutoGluon 0.4.3 and 0.5.2 image_uris

    Bug Fixes and Other Changes

    • Revert "change: add a check to prevent launching a modelparallel job on CPU only instances"
    • Add gpu capability to local
    • Link PyTorch 1.11 to 1.11.0
    Source code(tar.gz)
    Source code(zip)
  • v2.102.0(Aug 4, 2022)

    Features

    • add warnings for xgboost specific rules in debugger rules
    • Add PyTorch DDP distribution support
    • Add test for profiler enablement with debugger_hook false

    Bug Fixes and Other Changes

    • Two letter language code must be supported
    • add a check to prevent launching a modelparallel job on CPU only instances
    • Allow StepCollection added in ConditionStep to be depended on
    • Add PipelineVariable annotation in framework models
    • skip managed spot training mxnet nb

    Documentation Changes

    • smdistributed libraries currency updates
    Source code(tar.gz)
    Source code(zip)
  • v2.101.1(Jul 28, 2022)

    Bug Fixes and Other Changes

    • added more ml frameworks supported by SageMaker Workflows
    • test: Vspecinteg2
    • Add PipelineVariable annotation in amazon models
    Source code(tar.gz)
    Source code(zip)
High performance Python GLMs with all the features!

High performance Python GLMs with all the features!

QuantCo 200 Dec 14, 2022
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021
TensorFlow implementation of an arbitrary order Factorization Machine

This is a TensorFlow implementation of an arbitrary order (=2) Factorization Machine based on paper Factorization Machines with libFM. It supports: d

Mikhail Trofimov 785 Dec 21, 2022
ETNA – time series forecasting framework

ETNA Time Series Library Predict your time series the easiest way Homepage | Documentation | Tutorials | Contribution Guide | Release Notes ETNA is an

Tinkoff.AI 675 Jan 08, 2023
Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

pandas-method-chaining pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code. It is a fork from pandas-v

Francis 5 May 14, 2022
Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.

python-is-cool A gentle guide to the Python features that I didn't know existed or was too afraid to use. This will be updated as I learn more and bec

Chip Huyen 3.3k Jan 05, 2023
K-Means clusternig example with Python and Scikit-learn

Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

Emin 1 Dec 13, 2021
A Python implementation of the Robotics Toolbox for MATLAB

Robotics Toolbox for Python A Python implementation of the Robotics Toolbox for MATLAB® GitHub repository Documentation Wiki (examples and details) Sy

Peter Corke 1.2k Jan 07, 2023
This is a curated list of medical data for machine learning

Medical Data for Machine Learning This is a curated list of medical data for machine learning. This list is provided for informational purposes only,

Andrew L. Beam 5.4k Dec 26, 2022
Coursera Machine Learning - Python code

Coursera Machine Learning This repository contains python implementations of certain exercises from the course by Andrew Ng. For a number of assignmen

Jordi Warmenhoven 859 Dec 10, 2022
A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement.

Organic Alkalinity Sausage Machine A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement. Getting started To mak

Charles Turner 1 Feb 01, 2022
Stacked Generalization (Ensemble Learning)

Stacking (stacked generalization) Overview ikki407/stacking - Simple and useful stacking library, written in Python. User can use models of scikit-lea

Ikki Tanaka 192 Dec 23, 2022
ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

ArviZ 1.3k Jan 05, 2023
Data science, Data manipulation and Machine learning package.

duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l

David Kundih 3 Oct 19, 2022
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 02, 2023
Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Tangram Website | Discord Tangram makes it easy for programmers to train, deploy, and monitor machine learning models. Run tangram train to train a mo

Tangram 1.4k Jan 05, 2023
Machine learning algorithms implementation

Machine learning algorithms implementation This repository consisits of implementation of various machine learning algorithms. The algorithms implemen

Karun Dawadi 1 Jan 03, 2022
SynapseML - an open source library to simplify the creation of scalable machine learning pipelines

Synapse Machine Learning SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. Sy

Microsoft 3.9k Dec 30, 2022
LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms Based on the work by Smith et al. (2021) Query

5 Aug 06, 2022
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

Facebook 15.4k Jan 07, 2023