Convert monolithic Jupyter notebooks into Ploomber pipelines.

Last update: Dec 16, 2022

Overview

Soorgeon

Convert monolithic Jupyter notebooks into Ploomber pipelines.

soorgeon.mp4

3-minute video tutorial.

Try the interactive demo:

Note: Soorgeon is in alpha, help us make it better.

Install

pip install soorgeon

Usage

# refactor notebook
soorgeon refactor nb.ipynb

# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet

# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory

# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py

To learn more, check out our guide.

Examples

git clone https://github.com/ploomber/soorgeon

Exploratory daya analysis notebook:

cd examples/exploratory
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

Machine learning notebook:

cd examples/machine-learning
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

To learn more, check out our guide.

Convert monolithic Jupyter notebooks into Ploomber pipelines.

Related tags

Overview

Soorgeon

Install

Usage

Examples

Community

Owner

Ploomber

This creates a ohlc timeseries from downloaded CSV files from NSE India website and makes a SQLite database for your research.

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

PyTorch implementation for NCL (Neighborhood-enrighed Contrastive Learning)

Import, connect and transform data into Excel

Churn prediction with PySpark

Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

PipeChain is a utility library for creating functional pipelines.

Manage large and heterogeneous data spaces on the file system.

track your GitHub statistics

Minimal working example of data acquisition with nidaqmx python API

Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family.

LynxKite: a complete graph data science platform for very large graphs and other datasets.

Containerized Demo of Apache Spark MLlib on a Data Lakehouse (2022)

Gaussian processes in TensorFlow

Code for the DH project "Dhimmis & Muslims – Analysing Multireligious Spaces in the Medieval Muslim World"

Create HTML profiling reports from pandas DataFrame objects

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

small package with utility functions for analyzing (fly) calcium imaging data

Stock Analysis dashboard Using Streamlit and Python