Python client for using Prefect Cloud with Saturn Cloud

Overview

prefect-saturn

GitHub Actions PyPI Version

prefect-saturn is a Python package that makes it easy to run Prefect Cloud flows on a Dask cluster with Saturn Cloud. For a detailed tutorial, see "Fault-Tolerant Data Pipelines with Prefect Cloud ".

Installation

prefect-saturn is available on PyPi.

pip install prefect-saturn

prefect-saturn can be installed directly from GitHub

pip install git+https://github.com/saturncloud/[email protected]

Getting Started

prefect-saturn is intended for use inside a Saturn Cloud environment, such as a Jupyter notebook.

import prefect
from prefect import Flow, task
from prefect_saturn import PrefectCloudIntegration


@task
def hello_task():
    logger = prefect.context.get("logger")
    logger.info("hello prefect-saturn")


flow = Flow("sample-flow", tasks=[hello_task])

project_name = "sample-project"
integration = PrefectCloudIntegration(
    prefect_cloud_project_name=project_name
)
flow = integration.register_flow_with_saturn(flow)

flow.register(
    project_name=project_name,
    labels=["saturn-cloud"]
)

Customize Dask

You can customize the size and behavior of the Dask cluster used to run prefect flows. prefect_saturn.PrefectCloudIntegration.register_flow_with_saturn() accepts to arguments to accomplish this:

For example, the code below tells Saturn that this flow should run on a Dask cluster with 3 xlarge workers, and that prefect should shut down the cluster once the flow run has finished.

flow = integration.register_flow_with_saturn(
    flow=flow,
    dask_cluster_kwargs={
        "n_workers": 3,
        "worker_size": "xlarge",
        "autoclose": True
    }
)

flow.register(
    project_name=project_name,
    labels=["saturn-cloud"]
)

Contributing

See CONTRIBUTING.md for documentation on how to test and contribute to prefect-saturn.

Comments
  • [CU-feu7x7] saturn labels

    [CU-feu7x7] saturn labels

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Automatically add saturn-specific labels to the flow environment.

    How does this PR improve prefect-saturn?

    Along with the related change in saturn, this allows flows to be assigned to the correct cluster agent even if there are agents in multiple clusters that are all using the same prefect-cloud tenant.

    opened by bhperry 7
  • Bump development version

    Bump development version

    • [X] passes make lint
    • [X] adds tests to tests/ (if appropriate)

    What does this PR change?

    • Bumps version to 0.5.1.9000 for development

    How does this PR improve prefect-saturn?

    With this change, users can rely on the behavior that installations from source control will have a newer version than the newest release available on PyPI, but guaranteed to be older than the next release on PyPI.

    opened by dotNomad 6
  • [CU-frwyvw] Set flow instance size

    [CU-frwyvw] Set flow instance size

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Adds instance size argument for flow node

    How does this PR improve prefect-saturn?

    Enables users to select a larger node size for their flow to run on. This means we are not forcing users to farm all of the work out to dask-clusters, they can instead run flows with a local executor and not be constrained by a medium tier node.

    opened by bhperry 4
  • pickle is a bad choice for hashing.

    pickle is a bad choice for hashing.

    Without this PR, modifying the python version could lead to different hashes for flow metadata. This PR hashes identifying information for flows using json, rather than pickle. This PR does change the hashing function. As a result re-registering a flow will create a new flow in Saturn Cloud.

    opened by hhuuggoo 3
  • add expectation that BASE_URL does not end in a slash

    add expectation that BASE_URL does not end in a slash

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Changes PrefectCloudIntegration to expect the environment BASE_URL to NOT end in a trailing slash. In all Saturn production environments, BASE_URL does not end in a trailing slash.

    This PR also bumps the version of prefect-saturn to 0.2.0, since this change in behavior is a breaking change.

    How does this PR improve prefect-saturn?

    Without this fix, prefect-saturn is currently broken in Saturn production environments.

    Why can't we just keep the code that adds or removes slashes as needed?

    This project uses prefect's WebhookStorage (https://docs.prefect.io/orchestration/execution/storage_options.html#webhook). That type of storage uses template strings to reference environment variables, like this:

    flow = Flow(
        "some-flow",
        storage=Webhook(
            build_request_kwargs={
                "url": "${BASE_URL}/api/whatever",
            },
            build_request_http_method="POST",
           ....
        )
    )
    

    There is not place in there where we can introduce Saturn-written code that checks the environment variable BASE_URL and deals with missing or extra trailing slashes. So an assumption about whether or not the environment variable BASE_URL ends in a slash has to be hard-coded into the codebase here.

    opened by jameslamb 2
  • Fix tenant_id attr for Prefect 15

    Fix tenant_id attr for Prefect 15

    • [x] passes make lint
    • [ ] adds tests to tests/ (if appropriate)

    I don't believe any tests need to be added.

    What does this PR change?

    This PR changes the Client()._active_tenant_id > Client().tenant_id (when using Prefect >= 0.15) since _active_tenant_id is no longer a property of Prefect's Client in 0.15.

    How does this PR improve prefect-saturn?

    This allows prefect-saturn to work with prefect in version 0.15 which removed the private attribute.

    opened by dotNomad 1
  • [DEV-1227] Replace pyyaml

    [DEV-1227] Replace pyyaml

    Unlike PyYAML, ruamel.yaml supports:

    • YAML <= 1.2. PyYAML only supports YAML <= 1.1 This is vital, as YAML 1.2 intentionally breaks backward compatibility with YAML 1.1 in several edge cases. This would usually be a bad thing. In this case, this renders YAML 1.2 a strict superset of JSON. Since YAML 1.1 is not a strict superset of JSON, this is a good thing.
    • Roundtrip preservation When calling yaml.dump() to dump a dictionary loaded by a prior call to yaml.load():

    See more details at https://yaml.readthedocs.io/en/latest/pyyaml.html

    opened by wreis 1
  • Switch from Docker storage to Webhook storage

    Switch from Docker storage to Webhook storage

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    This PR replaces Docker storage with Webhook storage.

    This PR also adds a small working example to the README, to show users how it works. A longer-form tutorial will be up on https://www.saturncloud.io/docs/ some time in the next week.

    How does this PR improve prefect-saturn?

    The use of Webhook storage will make the integration between Prefect Cloud and Saturn Cloud much faster and less error-prone. It eliminates several hacks and special cases, should be much quicker, and removes an awkward race condition. With Docker storage, after calling .register_flow_with_saturn() you had to wait for a k8s job that built and pushed the image to complete. That could take up to 10 minutes, and nothing in prefect-saturn or the Saturn UI allowed you to see logs or other progress of that job.

    That job was also very brittle...it required docker-in-docker trickery, patching user-chosen images with several build-time-only dependencies, and running a sequence of multiple gnarly shell scripts.

    With Webhook storage, all of that complexity is eliminated. When you call .register_flow_with_saturn(), the flow is serialized with cloudpickle, sent to Saturn, and stored as bytes in an object store. At run time, when flow.storage.get_flow() is called, Saturn retrieves the binary content of the flow from an object store and sends it back over the wire. That's it! Everything is synchronous, so no weird race conditions, and it's just passing bytes around, so the storage process goes from 10+ minutes to under a second.

    opened by jameslamb 1
  • Add Saturn project details and remove unnecessary stuff

    Add Saturn project details and remove unnecessary stuff

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    • removes fields from saturn_details that are unnecessary as of #8
    • renames saturn_details to storage_details since that now only contains information for building storage
    • bump version floor to Python 3.7 (all the Saturn images are Python 3.7)
    • set ignore_healthchecks=True on the storage object.
      • This avoids a weird error where the image being built with flow.storage.build() doesn't have stuff like the Saturn start script in it, which could cause errors.
      • We don't need to care about that because every job in the flow's lifecycle will have the start script added to it's command / args

    How does this PR improve prefect-saturn?

    This PR removes unnecessary fields that could have become dependencies in users' code, reducing the surface for breaking changes.

    The ignore_healthchecks thing makes it possible for users to rely on the start script to install libraries.

    opened by jameslamb 1
  • change strategy for identifying flows

    change strategy for identifying flows

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    This PR changes the strategy for uniquely identifying flows. Now the flow_hash sent to Saturn Cloud is the sha256 hash of:

    • project name
    • flow name
    • Prefect Cloud tenant id

    This means that flow_hash is equivalent to the Prefect Cloud concept flow_group_id. This PR proposes creating this hash ourselves because Prefect Cloud's flow_group_id can't be known until you've registered a flow with Prefect Cloud, and prefect-saturn needs to register with Saturn first.

    How does this PR improve prefect-saturn?

    This PR provides a reliable way to uniquely identify all versions of the same flow. It improves on the previous model, where tenant id was not considered. The previous model could have caused conflicts in the case where two flows with identical names and code, in Prefect Cloud projects with the same name but in different tenants, would get the same hash and conflict.

    Because this PR no longer considers the task graph in the hash, it also means that the hash will not change as a flow's task graph changes. That means pushing Docker storage to a container registry should be a lot faster, since it'll be more likely to hit the registry's cache.

    Notes for Reviewers

    I had to introduce prefect.client.Client in this PR, and then mock it with patch(). A lot of the diff in the test files is just whitespace, the result of adding in a with patch(....). I recommend reviewing with whitespace changes hidden.

    opened by jameslamb 1
  • Add more metadata to flows

    Add more metadata to flows

    This PR includes a few updates that improve execution and testing for the end-to-end experience with Prefect Cloud.

    What does this PR change?

    This PR adds more details from Saturn to the flow, so the agent running it knows how to configure the first job that loads the flow.

    How does this PR improve prefect-saturn?

    This PR allows flows executed from Prefect Cloud to take advantage of Saturn-y customization features like env secrets, filesystem secrets, and a custom start script.

    opened by jameslamb 1
Releases(v0.6.0)
  • v0.6.0(Nov 4, 2021)

    What's Changed

    • No default dask cluster by @jsignell in https://github.com/saturncloud/prefect-saturn/pull/50
    • Add encoding kwarg to open by @jsignell in https://github.com/saturncloud/prefect-saturn/pull/52

    New Contributors

    • @jsignell made their first contribution in https://github.com/saturncloud/prefect-saturn/pull/50

    Full Changelog: https://github.com/saturncloud/prefect-saturn/compare/v0.5.1...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Jul 15, 2021)

  • v0.5.0(Apr 23, 2021)

    Breaking

    None

    Features

    • replace pyyaml with ruamel-yaml (#28)
    • add support for KubernetesRun "RunConfig", if using this library with prefect >= 0.13.10 (#34, #36, #37)
      • this does not break compatibility with prefect >0.13.0,<=0.13.9

    Bug Fixes

    • fix deprecation warnings from prefect 0.14.x (#32)
      • some modules were reorganized from prefect 0.13.x to 0.14.x, and using the 0.13.x-style paths raises deprecation warnings

    Docs

    • support Python 3.8 (#33)
      • this library was already compatible with Python 3.8, but that is now tested on every build and documented in the package classifiers
    • fix keywords in package metadata (#39)
      • this improves the discoverability of this project on PyPi
    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(Dec 9, 2020)

    Breaking

    None

    Features

    None

    Bug Fixes

    • set an explicit default of autoclose = False for dask_cluster_kwargs (#25)
      • this ensures that, by default, flows registered with prefect-saturn leave their Dask cluster up at the end of execution
      • this avoids the risk of one flow run closing down a Dask cluster that is in use by another flow run
      • this was already prefect-saturn's behavior, but only indirectly because autoclose defaults to False in dask-saturn. Not that is directly the default in prefect-saturn.
    • add tests on describe_sizes() (#24)

    Docs

    • added more docs in the README on how to customize the Dask cluster used by DaskExecutor (#25)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Dec 9, 2020)

    Breaking

    None

    Features

    • You can now set the instance size for the node that runs flow.run() (#23)
      • PrefectCloudIntegration.register_flow_with_saturn() gets a new keyword argument `instance_size
      • use new function describe_sizes() to list the valid options

    Bug Fixes

    None

    Docs

    • added documentation on changing the size of the instance that a flow runs on
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Nov 19, 2020)

    Breaking

    None

    Features

    None

    Bug Fixes

    • fix broken installations from source distribution (prefect-saturn-*.tar.gz) (#22)

    Docs

    • package LICENSE file with package artifacts (#22)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Nov 6, 2020)

  • v0.4.0(Oct 21, 2020)

  • v0.3.0(Oct 16, 2020)

  • v0.2.0(Sep 18, 2020)

    Breaking

    • prefect-saturn now expects that the environment variable BASE_URL does not end in a slash (#17)

    Features

    None

    Bug Fixes

    None

    Docs

    None

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Aug 31, 2020)

  • v0.1.0(Aug 13, 2020)

    Breaking

    • Moved .add_storage() and .add_environment() internal, and made .register_flow_with_saturn() do more. (#14) Now the interface is just like this:

      integration = PrefectCloudIntegration("some-project")
      flow.register_flow_with_saturn()
      flow.register(project_name="some-project", labels=["saturn-cloud"])
      
    • Replaced Docker storage with Webhook storage (#14)

    • Bumped prefect version floor to 0.13.0 (the first release that had Webhook) (#14)

    Features

    None

    Bug Fixes

    None

    Docs

    • Added a minimal working example in README (#14)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Jul 23, 2020)

    • moved some details of building KubernetesJobEnvironment into Saturn's back-end and out of this library
    • removed unnecessary elements in saturn_details
    • renamed saturn_details to storage_details since it now only contains things needed for building storage
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Jul 21, 2020)

Owner
Saturn Cloud
End-to-End Data Science in Python Featuring Dask and Jupyter
Saturn Cloud
Discord bot for playing Werewolf game on League of Legends.

LoLWolf LoL人狼をプレイするときのDiscord用botです。 (Discord bot for playing Werewolf game on League of Legends.) 以下のボタンを押してbotをあなたのDiscordに招待することで誰でも簡単に使用することができます。

Hatsuka 4 Oct 18, 2021
Data from popular CS:GO website hltv.org

Welcome to hltv-data 👋 🎮 Data from popular CS:GO website hltv.org Install pip install hltv-data Usage The public methods can be reached using HLTVCl

Dariusz Choruży 28 Dec 23, 2022
AminoLab Library For AminoApps using aminoapps.com/api

AminoLab AminoLab Api For AminoApps using aminoapps.com/api Installing pip install AminoLab Example #Login import AminoLab client = AminoLab.Client()

10 Sep 26, 2022
Source code from thenewboston Discord Bot with Python tutorial series.

Project Setup Follow the steps below to set up the project on your environment. Local Development Create a virtual environment with Python 3.7 or high

Bucky Roberts 24 Aug 19, 2022
A discord bot written in python

arch-bot A discord bot written in python prefix: . help: .help Installation Requirements A discord bot token Your user id Python installed. For window

3 Jan 10, 2022
🎀 First and most powerfull open source clicktune botter

CTB 🖤 Follow me here: Discord | YouTube | Twitter | Github 🐺 Features: /* *- The first *- Fast *- Proxy support: http/s, socks4/5, premieum (w

Iтѕ_Ѵιcнч#1337 22 Aug 29, 2022
An unofficial library for discord components (under-development)

discord-components An unofficial library for discord components (under-development) Welcome! Discord components are cool, but discord.py will support

11 Jun 14, 2021
Simple Telegram Bot To Get Feedback from users & Some Other Features

FeedbackBot Simple Telegram Bot To Get Feedback from users & Some Other Features. Features Get Feedback from users Reply to user's feedback Customisab

Arun 18 Dec 29, 2022
Using Streamlit to build a simple UI on top of the OpenSea API

OpenSea API Explorer Using Streamlit to build a simple UI on top of the OpenSea API. 🤝 Contributing Contributions, issues and feature requests are we

Gavin Capriola 1 Jan 04, 2022
Generate Heroku-like random names to use in your python applications

HaikunatorPY Generate Heroku-like random names to use in your python applications. Installation pip install haikunator Usage Haikunator is pretty sim

Atrox 116 Nov 15, 2022
A python discord client interaction emulator for the DC29 badge code channel

dc29-discord-signalbot A python discord client interaction emulator for the DC29 badge code channel Prep Open Developer mode Open the developer mode f

8 Aug 23, 2021
Image captioning service for healthcare domains in Vietnamese using VLP

Image captioning service for healthcare domains in Vietnamese using VLP This service is a web service that provides image captioning services for heal

CS-UIT AI Club 2 Nov 04, 2021
Unofficial GoPro API Library for Python - connect to GoPro via WiFi.

GoPro API for Python Unofficial GoPro API Library for Python - connect to GoPro cameras via WiFi. Compatibility: HERO3 HERO3+ HERO4 (including HERO Se

Konrad Iturbe 1.3k Jan 01, 2023
D-Ticket is a discord bot for ticket system

D-Ticket Discord Bot D-Ticket is a discord bot for ticket management system. This is not final product is currently being in development stay connecte

DeViL 1 Jan 06, 2022
A Python interface to AFL, allowing for easy injection of testcases and other functionality.

Fuzzer This module provides a Python wrapper for interacting with AFL (American Fuzzy Lop: http://lcamtuf.coredump.cx/afl/). It supports starting an A

Shellphish 614 Dec 26, 2022
Discord-selfbot - Very basic discord self bot

discord-selfbot Very basic discord self bot still being actively developed requi

nana 4 Apr 07, 2022
Python API Client for Close

Close API A convenient Python wrapper for the Close API. API docs: http://developer.close.com Support: Close 56 Nov 30, 2022

Stream Telegram files to web

Telegram File Stream Bot A Telegram bot to stream files to web Demo Bot » Report a Bug | Request Feature Table of Contents About this Bot Original Rep

Wrench 572 Jan 09, 2023
A visualization of people a user follows on Twitter

Twitter-Map This software allows the user to create maps of Twitter accounts. Installation git clone Oliver Greenwood 12 Jul 20, 2022

Asynchronous Python Wrapper for the Ufile API

Ufile.io Asynchronous Python Wrapper for the Ufile API (Unofficial).

Gautam Kumar 16 Aug 31, 2022