Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Overview

Hello from magnus

Magnus provides four capabilities for data teams:

  • Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

  • Run log store: A place to store run logs for reporting or re-running older runs. Along with capturing the status of execution, the run logs also capture code identifiers (commits, docker image digests etc), data hashes and configuration settings for reproducibility and audit.

  • Data Catalogs: A way to pass data between nodes of the graph during execution and also serves the purpose of versioning the data used by a particular run.

  • Secrets: A framework to provide secrets/credentials at run time to the nodes of the graph.

Design decisions:

  • Easy to extend: All the four capabilities are just definitions and can be implemented in many flavors.

    • Compute execution plan: You can choose to run the DAG on your local computer, in containers of local computer or off load the work to cloud providers or translate the DAG to AWS step functions or Argo workflows.

    • Run log Store: The actual implementation of storing the run logs could be in-memory, file system, S3, database etc.

    • Data Catalogs: The data files generated as part of a run could be stored on file-systems, S3 or could be extended to fit your needs.

    • Secrets: The secrets needed for your code to work could be in dotenv, AWS or extended to fit your needs.

  • Pipeline as contract: Once a DAG is defined and proven to work in local or some environment, there is absolutely no code change needed to deploy it to other environments. This enables the data teams to prove the correctness of the dag in dev environments while infrastructure teams to find the suitable way to deploy it.

  • Reproducibility: Run log store and data catalogs hold the version, code commits, data files used for a run making it easy to re-run an older run or debug a failed run. Debug environment need not be the same as original environment.

  • Easy switch: Your infrastructure landscape changes over time. With magnus, you can switch infrastructure by just changing a config and not code.

Magnus does not aim to replace existing and well constructed orchestrators like AWS Step functions or argo but complements them in a unified, simple and intuitive way.

Documentation

More details about the project and how to use it available here.

Installation

pip

magnus is a python package and should be installed as any other.

pip install magnus

Example Run

To give you a flavour of how magnus works, lets create a simple pipeline.

Copy the contents of this yaml into getting-started.yaml.


!!! Note

The below execution would create a folder called 'data' in the current working directory. The command as given should work in linux/macOS but for windows, please change accordingly.


> data/data.txt # For Linux/macOS next: success catalog: put: - "*" success: type: success fail: type: fail">
dag:
  description: Getting started
  start_at: step parameters
  steps:
    step parameters:
      type: task
      command_type: python-lambda
      command: "lambda x: {'x': int(x) + 1}"
      next: step shell
    step shell:
      type: task
      command_type: shell
      command: mkdir data ; env >> data/data.txt # For Linux/macOS
      next: success
      catalog:
        put:
          - "*"
    success:
      type: success
    fail:
      type: fail

And let's run the pipeline using:

 magnus execute --file getting-started.yaml --x 3

You should see a list of warnings but your terminal output should look something similar to this:

", "code_identifier_message": " " } ], "attempts": [ { "attempt_number": 0, "start_time": "2022-01-18 11:46:08.530138", "end_time": "2022-01-18 11:46:08.530561", "duration": "0:00:00.000423", "status": "SUCCESS", "message": "" } ], "user_defined_metrics": {}, "branches": {}, "data_catalog": [] }, "step shell": { "name": "step shell", "internal_name": "step shell", "status": "SUCCESS", "step_type": "task", "message": "", "mock": false, "code_identities": [ { "code_identifier": "c5d2f4aa8dd354740d1b2f94b6ee5c904da5e63c", "code_identifier_type": "git", "code_identifier_dependable": false, "code_identifier_url": " ", "code_identifier_message": " " } ], "attempts": [ { "attempt_number": 0, "start_time": "2022-01-18 11:46:08.576522", "end_time": "2022-01-18 11:46:08.588158", "duration": "0:00:00.011636", "status": "SUCCESS", "message": "" } ], "user_defined_metrics": {}, "branches": {}, "data_catalog": [ { "name": "data.txt", "data_hash": "8f25ba24e56f182c5125b9ede73cab6c16bf193e3ad36b75ba5145ff1b5db583", "catalog_relative_path": "20220118114608/data.txt", "catalog_handler_location": ".catalog", "stage": "put" } ] }, "success": { "name": "success", "internal_name": "success", "status": "SUCCESS", "step_type": "success", "message": "", "mock": false, "code_identities": [ { "code_identifier": "c5d2f4aa8dd354740d1b2f94b6ee5c904da5e63c", "code_identifier_type": "git", "code_identifier_dependable": false, "code_identifier_url": " ", "code_identifier_message": " " } ], "attempts": [ { "attempt_number": 0, "start_time": "2022-01-18 11:46:08.639563", "end_time": "2022-01-18 11:46:08.639680", "duration": "0:00:00.000117", "status": "SUCCESS", "message": "" } ], "user_defined_metrics": {}, "branches": {}, "data_catalog": [] } }, "parameters": { "x": 4 }, "run_config": { "executor": { "type": "local", "config": {} }, "run_log_store": { "type": "buffered", "config": {} }, "catalog": { "type": "file-system", "config": {} }, "secrets": { "type": "do-nothing", "config": {} } } }">
{
    "run_id": "20220118114608",
    "dag_hash": "ce0676d63e99c34848484f2df1744bab8d45e33a",
    "use_cached": false,
    "tag": null,
    "original_run_id": "",
    "status": "SUCCESS",
    "steps": {
        "step parameters": {
            "name": "step parameters",
            "internal_name": "step parameters",
            "status": "SUCCESS",
            "step_type": "task",
            "message": "",
            "mock": false,
            "code_identities": [
                {
                    "code_identifier": "c5d2f4aa8dd354740d1b2f94b6ee5c904da5e63c",
                    "code_identifier_type": "git",
                    "code_identifier_dependable": false,
                    "code_identifier_url": "
        
         "
        ,
                    "code_identifier_message": "
        
         "
        
                }
            ],
            "attempts": [
                {
                    "attempt_number": 0,
                    "start_time": "2022-01-18 11:46:08.530138",
                    "end_time": "2022-01-18 11:46:08.530561",
                    "duration": "0:00:00.000423",
                    "status": "SUCCESS",
                    "message": ""
                }
            ],
            "user_defined_metrics": {},
            "branches": {},
            "data_catalog": []
        },
        "step shell": {
            "name": "step shell",
            "internal_name": "step shell",
            "status": "SUCCESS",
            "step_type": "task",
            "message": "",
            "mock": false,
            "code_identities": [
                {
                    "code_identifier": "c5d2f4aa8dd354740d1b2f94b6ee5c904da5e63c",
                    "code_identifier_type": "git",
                    "code_identifier_dependable": false,
                    "code_identifier_url": "
        
         "
        ,
                    "code_identifier_message": "
        
         "
        
                }
            ],
            "attempts": [
                {
                    "attempt_number": 0,
                    "start_time": "2022-01-18 11:46:08.576522",
                    "end_time": "2022-01-18 11:46:08.588158",
                    "duration": "0:00:00.011636",
                    "status": "SUCCESS",
                    "message": ""
                }
            ],
            "user_defined_metrics": {},
            "branches": {},
            "data_catalog": [
                {
                    "name": "data.txt",
                    "data_hash": "8f25ba24e56f182c5125b9ede73cab6c16bf193e3ad36b75ba5145ff1b5db583",
                    "catalog_relative_path": "20220118114608/data.txt",
                    "catalog_handler_location": ".catalog",
                    "stage": "put"
                }
            ]
        },
        "success": {
            "name": "success",
            "internal_name": "success",
            "status": "SUCCESS",
            "step_type": "success",
            "message": "",
            "mock": false,
            "code_identities": [
                {
                    "code_identifier": "c5d2f4aa8dd354740d1b2f94b6ee5c904da5e63c",
                    "code_identifier_type": "git",
                    "code_identifier_dependable": false,
                    "code_identifier_url": "
        
         "
        ,
                    "code_identifier_message": "
        
         "
        
                }
            ],
            "attempts": [
                {
                    "attempt_number": 0,
                    "start_time": "2022-01-18 11:46:08.639563",
                    "end_time": "2022-01-18 11:46:08.639680",
                    "duration": "0:00:00.000117",
                    "status": "SUCCESS",
                    "message": ""
                }
            ],
            "user_defined_metrics": {},
            "branches": {},
            "data_catalog": []
        }
    },
    "parameters": {
        "x": 4
    },
    "run_config": {
        "executor": {
            "type": "local",
            "config": {}
        },
        "run_log_store": {
            "type": "buffered",
            "config": {}
        },
        "catalog": {
            "type": "file-system",
            "config": {}
        },
        "secrets": {
            "type": "do-nothing",
            "config": {}
        }
    }
}

You should see that data folder being created with a file called data.txt in it. This is according to the command in step shell.

You should also see a folder .catalog being created with a single folder corresponding to the run_id of this run.

To understand more about the input and output, please head over to the documentation.

Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

Pre-trained image classification models for Jax/Haiku Jax/Haiku Applications are deep learning models that are made available alongside pre-trained we

Alper Baris CELIK 14 Dec 20, 2022
The repository contains source code and models to use PixelNet architecture used for various pixel-level tasks. More details can be accessed at .

PixelNet: Representation of the pixels, by the pixels, and for the pixels. We explore design principles for general pixel-level prediction problems, f

Aayush Bansal 196 Aug 10, 2022
Code for the paper "Asymptotics of ℓ2 Regularized Network Embeddings"

README Code for the paper Asymptotics of L2 Regularized Network Embeddings. Requirements Requires Stellargraph 1.2.1, Tensorflow 2.6.0, scikit-learm 0

Andrew Davison 0 Jan 06, 2022
Automatic self-diagnosis program (python required)Automatic self-diagnosis program (python required)

auto-self-checker 자동으로 자가진단 해주는 프로그램(python 필요) 중요 이 프로그램이 실행될때에는 절대로 마우스포인터를 움직이거나 키보드를 건드리면 안된다(화면인식, 마우스포인터로 직접 클릭) 사용법 프로그램을 구동할 폴더 내의 cmd창에서 pip

1 Dec 30, 2021
Code for KHGT model, AAAI2021

KHGT Code for KHGT accepted by AAAI2021 Please unzip the data files in Datasets/ first. To run KHGT on Yelp data, use python labcode_yelp.py For Movi

32 Nov 29, 2022
Implementation of ICCV21 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers

Implementation of ICCV 2021 paper: PnP-DETR: Towards Efficient Visual Analysis with Transformers arxiv This repository is based on detr Recently, DETR

twang 113 Dec 27, 2022
Implementation of DropLoss for Long-Tail Instance Segmentation in Pytorch

[AAAI 2021]DropLoss for Long-Tail Instance Segmentation [AAAI 2021] DropLoss for Long-Tail Instance Segmentation Ting-I Hsieh*, Esther Robb*, Hwann-Tz

Tim 37 Dec 02, 2022
Aspect-Sentiment-Multiple-Opinion Triplet Extraction (NLPCC 2021)

The code and data for the paper "Aspect-Sentiment-Multiple-Opinion Triplet Extraction" Requirements Python 3.6.8 torch==1.2.0 pytorch-transformers==1.

慢半拍 5 Jul 02, 2022
Reference code for the paper "Cross-Camera Convolutional Color Constancy" (ICCV 2021)

Cross-Camera Convolutional Color Constancy, ICCV 2021 (Oral) Mahmoud Afifi1,2, Jonathan T. Barron2, Chloe LeGendre2, Yun-Ta Tsai2, and Francois Bleibe

Mahmoud Afifi 76 Jan 07, 2023
PyTorch Implementation of SSTNs for hyperspectral image classifications from the IEEE T-GRS paper "Spectral-Spatial Transformer Network for Hyperspectral Image Classification: A FAS Framework."

PyTorch Implementation of SSTN for Hyperspectral Image Classification Paper links: SSTN published on IEEE T-GRS. Also, you can directly find the imple

Zilong Zhong 54 Dec 19, 2022
Contains modeling practice materials and homework for the Computational Neuroscience course at Okinawa Institute of Science and Technology

A310 Computational Neuroscience - Okinawa Institute of Science and Technology, 2022 This repository contains modeling practice materials and homework

Sungho Hong 1 Jan 24, 2022
YOLOv5 in PyTorch > ONNX > CoreML > TFLite

This repository represents Ultralytics open-source research into future object detection methods, and incorporates lessons learned and best practices evolved over thousands of hours of training and e

Ultralytics 34.1k Dec 31, 2022
Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal"

Patch-wise Adversarial Removal Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

4 Oct 12, 2022
An off-line judger supporting distributed problem repositories

Thaw 中文 | English Thaw is an off-line judger supporting distributed problem repositories. Everyone can use Thaw release problems with license on GitHu

countercurrent_time 2 Jan 09, 2022
Learning What and Where to Draw

###Learning What and Where to Draw Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee This is the code for our NIPS 201

Scott Ellison Reed 337 Nov 18, 2022
Benchmark library for high-dimensional HPO of black-box models based on Weighted Lasso regression

LassoBench LassoBench is a library for high-dimensional hyperparameter optimization benchmarks based on Weighted Lasso regression. Note: LassoBench is

Kenan Šehić 5 Mar 15, 2022
YOLOv7 - Framework Beyond Detection

🔥🔥🔥🔥 YOLO with Transformers and Instance Segmentation, with TensorRT acceleration! 🔥🔥🔥

JinTian 3k Jan 01, 2023
ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

PENet: Precise and Efficient Depth Completion This repo is the PyTorch implementation of our paper to appear in ICRA2021 on "Towards Precise and Effic

232 Dec 25, 2022
2D Time independent Schrodinger equation solver for arbitrary shape of well

Schrodinger Well Python Python solver for timeless Schrodinger equation for well with arbitrary shape https://imgur.com/a/jlhK7OZ Pictures of circular

WeightAn 24 Nov 18, 2022