Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Overview

Hello from magnus

Magnus provides four capabilities for data teams:

  • Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

  • Run log store: A place to store run logs for reporting or re-running older runs. Along with capturing the status of execution, the run logs also capture code identifiers (commits, docker image digests etc), data hashes and configuration settings for reproducibility and audit.

  • Data Catalogs: A way to pass data between nodes of the graph during execution and also serves the purpose of versioning the data used by a particular run.

  • Secrets: A framework to provide secrets/credentials at run time to the nodes of the graph.

Design decisions:

  • Easy to extend: All the four capabilities are just definitions and can be implemented in many flavors.

    • Compute execution plan: You can choose to run the DAG on your local computer, in containers of local computer or off load the work to cloud providers or translate the DAG to AWS step functions or Argo workflows.

    • Run log Store: The actual implementation of storing the run logs could be in-memory, file system, S3, database etc.

    • Data Catalogs: The data files generated as part of a run could be stored on file-systems, S3 or could be extended to fit your needs.

    • Secrets: The secrets needed for your code to work could be in dotenv, AWS or extended to fit your needs.

  • Pipeline as contract: Once a DAG is defined and proven to work in local or some environment, there is absolutely no code change needed to deploy it to other environments. This enables the data teams to prove the correctness of the dag in dev environments while infrastructure teams to find the suitable way to deploy it.

  • Reproducibility: Run log store and data catalogs hold the version, code commits, data files used for a run making it easy to re-run an older run or debug a failed run. Debug environment need not be the same as original environment.

  • Easy switch: Your infrastructure landscape changes over time. With magnus, you can switch infrastructure by just changing a config and not code.

Magnus does not aim to replace existing and well constructed orchestrators like AWS Step functions or argo but complements them in a unified, simple and intuitive way.

Documentation

More details about the project and how to use it available here.

Installation

pip

magnus is a python package and should be installed as any other.

pip install magnus

Example Run

To give you a flavour of how magnus works, lets create a simple pipeline.

Copy the contents of this yaml into getting-started.yaml.


!!! Note

The below execution would create a folder called 'data' in the current working directory. The command as given should work in linux/macOS but for windows, please change accordingly.


> data/data.txt # For Linux/macOS next: success catalog: put: - "*" success: type: success fail: type: fail">
dag:
  description: Getting started
  start_at: step parameters
  steps:
    step parameters:
      type: task
      command_type: python-lambda
      command: "lambda x: {'x': int(x) + 1}"
      next: step shell
    step shell:
      type: task
      command_type: shell
      command: mkdir data ; env >> data/data.txt # For Linux/macOS
      next: success
      catalog:
        put:
          - "*"
    success:
      type: success
    fail:
      type: fail

And let's run the pipeline using:

 magnus execute --file getting-started.yaml --x 3

You should see a list of warnings but your terminal output should look something similar to this:

", "code_identifier_message": " " } ], "attempts": [ { "attempt_number": 0, "start_time": "2022-01-18 11:46:08.530138", "end_time": "2022-01-18 11:46:08.530561", "duration": "0:00:00.000423", "status": "SUCCESS", "message": "" } ], "user_defined_metrics": {}, "branches": {}, "data_catalog": [] }, "step shell": { "name": "step shell", "internal_name": "step shell", "status": "SUCCESS", "step_type": "task", "message": "", "mock": false, "code_identities": [ { "code_identifier": "c5d2f4aa8dd354740d1b2f94b6ee5c904da5e63c", "code_identifier_type": "git", "code_identifier_dependable": false, "code_identifier_url": " ", "code_identifier_message": " " } ], "attempts": [ { "attempt_number": 0, "start_time": "2022-01-18 11:46:08.576522", "end_time": "2022-01-18 11:46:08.588158", "duration": "0:00:00.011636", "status": "SUCCESS", "message": "" } ], "user_defined_metrics": {}, "branches": {}, "data_catalog": [ { "name": "data.txt", "data_hash": "8f25ba24e56f182c5125b9ede73cab6c16bf193e3ad36b75ba5145ff1b5db583", "catalog_relative_path": "20220118114608/data.txt", "catalog_handler_location": ".catalog", "stage": "put" } ] }, "success": { "name": "success", "internal_name": "success", "status": "SUCCESS", "step_type": "success", "message": "", "mock": false, "code_identities": [ { "code_identifier": "c5d2f4aa8dd354740d1b2f94b6ee5c904da5e63c", "code_identifier_type": "git", "code_identifier_dependable": false, "code_identifier_url": " ", "code_identifier_message": " " } ], "attempts": [ { "attempt_number": 0, "start_time": "2022-01-18 11:46:08.639563", "end_time": "2022-01-18 11:46:08.639680", "duration": "0:00:00.000117", "status": "SUCCESS", "message": "" } ], "user_defined_metrics": {}, "branches": {}, "data_catalog": [] } }, "parameters": { "x": 4 }, "run_config": { "executor": { "type": "local", "config": {} }, "run_log_store": { "type": "buffered", "config": {} }, "catalog": { "type": "file-system", "config": {} }, "secrets": { "type": "do-nothing", "config": {} } } }">
{
    "run_id": "20220118114608",
    "dag_hash": "ce0676d63e99c34848484f2df1744bab8d45e33a",
    "use_cached": false,
    "tag": null,
    "original_run_id": "",
    "status": "SUCCESS",
    "steps": {
        "step parameters": {
            "name": "step parameters",
            "internal_name": "step parameters",
            "status": "SUCCESS",
            "step_type": "task",
            "message": "",
            "mock": false,
            "code_identities": [
                {
                    "code_identifier": "c5d2f4aa8dd354740d1b2f94b6ee5c904da5e63c",
                    "code_identifier_type": "git",
                    "code_identifier_dependable": false,
                    "code_identifier_url": "
        
         "
        ,
                    "code_identifier_message": "
        
         "
        
                }
            ],
            "attempts": [
                {
                    "attempt_number": 0,
                    "start_time": "2022-01-18 11:46:08.530138",
                    "end_time": "2022-01-18 11:46:08.530561",
                    "duration": "0:00:00.000423",
                    "status": "SUCCESS",
                    "message": ""
                }
            ],
            "user_defined_metrics": {},
            "branches": {},
            "data_catalog": []
        },
        "step shell": {
            "name": "step shell",
            "internal_name": "step shell",
            "status": "SUCCESS",
            "step_type": "task",
            "message": "",
            "mock": false,
            "code_identities": [
                {
                    "code_identifier": "c5d2f4aa8dd354740d1b2f94b6ee5c904da5e63c",
                    "code_identifier_type": "git",
                    "code_identifier_dependable": false,
                    "code_identifier_url": "
        
         "
        ,
                    "code_identifier_message": "
        
         "
        
                }
            ],
            "attempts": [
                {
                    "attempt_number": 0,
                    "start_time": "2022-01-18 11:46:08.576522",
                    "end_time": "2022-01-18 11:46:08.588158",
                    "duration": "0:00:00.011636",
                    "status": "SUCCESS",
                    "message": ""
                }
            ],
            "user_defined_metrics": {},
            "branches": {},
            "data_catalog": [
                {
                    "name": "data.txt",
                    "data_hash": "8f25ba24e56f182c5125b9ede73cab6c16bf193e3ad36b75ba5145ff1b5db583",
                    "catalog_relative_path": "20220118114608/data.txt",
                    "catalog_handler_location": ".catalog",
                    "stage": "put"
                }
            ]
        },
        "success": {
            "name": "success",
            "internal_name": "success",
            "status": "SUCCESS",
            "step_type": "success",
            "message": "",
            "mock": false,
            "code_identities": [
                {
                    "code_identifier": "c5d2f4aa8dd354740d1b2f94b6ee5c904da5e63c",
                    "code_identifier_type": "git",
                    "code_identifier_dependable": false,
                    "code_identifier_url": "
        
         "
        ,
                    "code_identifier_message": "
        
         "
        
                }
            ],
            "attempts": [
                {
                    "attempt_number": 0,
                    "start_time": "2022-01-18 11:46:08.639563",
                    "end_time": "2022-01-18 11:46:08.639680",
                    "duration": "0:00:00.000117",
                    "status": "SUCCESS",
                    "message": ""
                }
            ],
            "user_defined_metrics": {},
            "branches": {},
            "data_catalog": []
        }
    },
    "parameters": {
        "x": 4
    },
    "run_config": {
        "executor": {
            "type": "local",
            "config": {}
        },
        "run_log_store": {
            "type": "buffered",
            "config": {}
        },
        "catalog": {
            "type": "file-system",
            "config": {}
        },
        "secrets": {
            "type": "do-nothing",
            "config": {}
        }
    }
}

You should see that data folder being created with a file called data.txt in it. This is according to the command in step shell.

You should also see a folder .catalog being created with a single folder corresponding to the run_id of this run.

To understand more about the input and output, please head over to the documentation.

CT Based COVID 19 Diagnose by Image Processing and Deep Learning

This project proposed the deep learning and image processing method to undertake the diagnosis on 2D CT image and 3D CT volume.

1 Feb 08, 2022
PyBrain - Another Python Machine Learning Library.

PyBrain -- the Python Machine Learning Library =============================================== INSTALLATION ------------ Quick answer: make sure you

2.8k Dec 31, 2022
Localized representation learning from Vision and Text (LoVT)

Localized Vision-Text Pre-Training Contrastive learning has proven effective for pre- training image models on unlabeled data and achieved great resul

Philip Müller 10 Dec 07, 2022
Stacked Generative Adversarial Networks

Stacked Generative Adversarial Networks This repository contains code for the paper "Stacked Generative Adversarial Networks", CVPR 2017. Part of the

Xun Huang 241 May 07, 2022
This repository contains the code for designing risk bounded motion plans for car-like robot using Carla Simulator.

Nonlinear Risk Bounded Robot Motion Planning This code simulates the bicycle dynamics of car by steering it on the road by avoiding another static car

8 Sep 03, 2022
Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise NeurIPS 2021: This repository is the official implementation of ODNL. Require

Hongxin Wei 12 Dec 07, 2022
Sentiment analysis translations of the Bhagavad Gita

Sentiment and Semantic Analysis of Bhagavad Gita Translations It is well known that translations of songs and poems not only breaks rhythm and rhyming

Machine learning and Bayesian inference @ UNSW Sydney 3 Aug 01, 2022
FMA: A Dataset For Music Analysis

FMA: A Dataset For Music Analysis Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson. International Society for Music Information

Michaël Defferrard 1.8k Dec 29, 2022
An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

Dual Correlation Reduction Network An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022. Any

yueliu1999 109 Dec 23, 2022
OoD Minimum Anomaly Score GAN - Code for the Paper 'OMASGAN: Out-of-Distribution Minimum Anomaly Score GAN for Sample Generation on the Boundary'

OMASGAN: Out-of-Distribution Minimum Anomaly Score GAN for Sample Generation on the Boundary Out-of-Distribution Minimum Anomaly Score GAN (OMASGAN) C

- 8 Sep 27, 2022
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 687 Jan 07, 2023
[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

Semantics Disentangling for Generalized Zero-shot Learning This is the official implementation for paper Zhi Chen, Yadan Luo, Ruihong Qiu, Zi Huang, J

25 Dec 06, 2022
Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification Code release for The Devil is in the Channels: Mutual-Channel

PRIS-CV: Computer Vision Group 230 Dec 31, 2022
This is an implementation of PIFuhd based on Pytorch

Open-PIFuhd This is a unofficial implementation of PIFuhd PIFuHD: Multi-Level Pixel-Aligned Implicit Function forHigh-Resolution 3D Human Digitization

Lingteng Qiu 235 Dec 19, 2022
Subnet Replacement Attack: Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Subnet Replacement Attack: Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks Official implementation of paper Towards Practic

Xiangyu Qi 8 Dec 30, 2022
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 06, 2022
High-quality implementations of standard and SOTA methods on a variety of tasks.

Uncertainty Baselines The goal of Uncertainty Baselines is to provide a template for researchers to build on. The baselines can be a starting point fo

Google 1.1k Dec 30, 2022
Code, final versions, and information on the Sparkfun Graphical Datasheets

Graphical Datasheets Code, final versions, and information on the SparkFun Graphical Datasheets. Generated Cells After Running Script Example Complete

SparkFun Electronics 102 Jan 05, 2023
Fast convergence of detr with spatially modulated co-attention

Fast convergence of detr with spatially modulated co-attention Usage There are no extra compiled components in SMCA DETR and package dependencies are

peng gao 135 Dec 07, 2022
A Temporal Extension Library for PyTorch Geometric

Documentation | External Resources | Datasets PyTorch Geometric Temporal is a temporal (dynamic) extension library for PyTorch Geometric. The library

Benedek Rozemberczki 1.9k Jan 07, 2023