Python Research Framework

Related tags

Machine Learningpyfra
Overview

pyfra

The Python Research Framework.

Design Philosophy

Research code has some of the fastest shifting requirements of any type of code. It's nearly impossible to plan ahead of time the proper abstractions, because it is exceedingly likely that in the course of the project what you originally thought was your main focus suddenly no longer is. Further, research code (especially in ML) often involves big and complicated pipelines, typically involving many different machines, which are either run by hand or using shell scripts that are far more complicated than any shell script ever should be.

Therefore, the objective of pyfra is to make it as fast and low-friction as possible to write research code involving complex pipelines over many machines. This entails making it as easy as possible to implement a research idea in reality, at the cost of fine-grained control and the long-term maintainability of the system. In other words, pyfra expects that code will either be rapidly obsoleted by newer code, or rewritten using some other framework once it is no longer a research project and requirements have settled down.

Pyfra is in its very early stages of development. The interface may change rapidly and without warning.

Features:

  • Spin up an internal webserver complete with a permissions system using only a few lines of code
  • Extremely elegant shell integration—run commands on any server seamlessly. All the best parts of bash and python combined
  • Automated remote environment setup, so you never have to worry about provisioning machines by hand again
  • (WIP) Tools for painless functional programming in python
  • (Coming soon) High level API for experiment management/scheduling and resource provisioning
  • (Coming soon) Idempotent resumable data pipelines with no cognitive overhead

Example code

from pyfra import *

loc = Remote()
rem = Remote("[email protected]")
nas = Remote("[email protected]")

@page("Run experiment", dropdowns={'server': ['local', 'remote']})
def run_experiment(server: str, config_file: str, some_numerical_value: int, some_checkbox: bool):
    r = loc if server == 'local' else rem

    r.sh("git clone https://github.com/EleutherAI/gpt-neox")
    
    # rsync as a function can do local-local, local-remote, and remote-remote
    rsync(config_file, r.file("gpt-neox/configs/my-config.yml"))
    rsync(nas.file('some_data_file'), r.file('gpt-neox/data/whatever'))
    
    return r.sh('cd gpt-neox; python3 main.py')

@page("Write example file and copy")
def example():
    rem.fwrite("testing.txt", "hello world")
    
    # tlocal files can be specified as just a string
    rsync(rem.file('testing123.txt'), 'test1.txt')
    rsync(rem.file('testing123.txt'), loc.file('test2.txt'))

    loc.sh('cat test1.txt')
    
    assert fread('test1.txt') == fread('test2.txt')
    
    # fread, fwrite, etc can take a `rem.file` instead of a string filename.
    # you can also use all *read and *write functions directly on the remote too.
    assert fread('test1.txt') == fread(rem.file('testing123.txt'))
    assert fread('test1.txt') == rem.fread('testing123.txt')

    # ls as a function returns a list of files (with absolute paths) on the selected remote.
    # the returned value is displayed on the webpage.
    return '\n'.join(rem.ls('/'))

@page("List files in some directory")
def list_files(directory):
    return sh(f"ls -la {directory | quote}")


# start internal webserver
webserver()

Installation

pip3 install git+https://github.com/EleutherAI/pyfra/

The version of PyPI is not up to date, do not use it.

Webserver screenshots

image image

Comments
  • Try to install sudo in _install

    Try to install sudo in _install

    Sudo is installed in setup.apt(), which is not run when python_version=None is set for an env. This PR tries to install the sudo package on _install which solves this issue.

    opened by kurumuz 1
  • Styling updates 2

    Styling updates 2

    This should fix some issues that were noticed recently.

    • increases the width of the content in the middle
    • all button icons are now the same (until we figure out better solution)
    • content that is overflowing should now be scrollable
    opened by jprester 0
  • Update styling

    Update styling

    I made some updates to styling for the admin dashboard pages.

    Stuff I did:

    • changed the styling to look like design mockup
    • moved ids to classes in css. Ids should be used for javascript selector
    • added some svg icons
    • made the UI somewhat responsive
    opened by jprester 0
  • docs: docs are empty

    docs: docs are empty

    Screenshot from the RTD page:

    image

    I recommend checking the raw output of the build on the RTD dashboard.

    Probably some library installation issue when running setup.

    opened by TomFrederik 0
  • Type annotations

    Type annotations

    Type annotations are a must-have for public facing library exports, as they allow users to infer a lot of information about calls/return values independent of documentation, as well as help with code completions.

    opened by hugbubby 0
Releases(v0.3.0)
  • v0.3.0(Dec 9, 2021)

    What's new

    • Envs now resume where they left off (and Remotes have an option for turning this behaviour on)
    • @stage caching added

    Breaking Changes

    • delegation promoted to full submodule and experiment removed
    • pyfra.functional removed
    • pyfra.web deprecated and moved to contrib
    • contrib revamp

    Full Changelog: https://github.com/EleutherAI/pyfra/compare/8e775df36ca8f2ae39b0b7add9c30eab446207b1...9616e835578f8ad04a6d9c3b405777fc4b7e0853

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0rc6(Sep 1, 2021)

Owner
EleutherAI
EleutherAI
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

ClearML - Auto-Magical Suite of tools to streamline your ML workflow Experiment Manager, MLOps and Data-Management ClearML Formerly known as Allegro T

ClearML 4k Jan 09, 2023
This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022
(3D): LeGO-LOAM, LIO-SAM, and LVI-SAM installation and application

SLAM-application: installation and test (3D): LeGO-LOAM, LIO-SAM, and LVI-SAM Tested on Quadruped robot in Gazebo ● Results: video, video2 Requirement

EungChang-Mason-Lee 203 Dec 26, 2022
Python implementation of Weng-Lin Bayesian ranking, a better, license-free alternative to TrueSkill

Python implementation of Weng-Lin Bayesian ranking, a better, license-free alternative to TrueSkill This is a port of the amazing openskill.js package

Open Debates Project 156 Dec 14, 2022
Python Research Framework

Python Research Framework

EleutherAI 106 Dec 13, 2022
JMP is a Mixed Precision library for JAX.

Mixed precision training [0] is a technique that mixes the use of full and half precision floating point numbers during training to reduce the memory bandwidth requirements and improve the computatio

DeepMind 108 Dec 31, 2022
Official code for HH-VAEM

HH-VAEM This repository contains the official Pytorch implementation of the Hierarchical Hamiltonian VAE for Mixed-type Data (HH-VAEM) model and the s

Ignacio Peis 8 Nov 30, 2022
The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it inside a loop of Design, Model Development and Operations.

MLOps The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it insid

Maykon Schots 25 Nov 27, 2022
Uses WiFi signals :signal_strength: and machine learning to predict where you are

Uses WiFi signals and machine learning (sklearn's RandomForest) to predict where you are. Even works for small distances like 2-10 meters.

Pascal van Kooten 5k Jan 09, 2023
Python module for performing linear regression for data with measurement errors and intrinsic scatter

Linear regression for data with measurement errors and intrinsic scatter (BCES) Python module for performing robust linear regression on (X,Y) data po

Rodrigo Nemmen 56 Sep 27, 2022
Meerkat provides fast and flexible data structures for working with complex machine learning datasets.

Meerkat makes it easier for ML practitioners to interact with high-dimensional, multi-modal data. It provides simple abstractions for data inspection, model evaluation and model training supported by

Robustness Gym 115 Dec 12, 2022
Learning --> Numpy January 2022 - winter'22

Numerical-Python Numpy NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along

Shahzaneer Ahmed 0 Mar 12, 2022
UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base l

Booking.com 254 Dec 31, 2022
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

RAPIDS 3.1k Dec 28, 2022
scikit-learn is a python module for machine learning built on top of numpy / scipy

About scikit-learn is a python module for machine learning built on top of numpy / scipy. The purpose of the scikit-learn-tutorial subproject is to le

Gael Varoquaux 122 Dec 12, 2022
AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

Derek Snow 465 Jan 02, 2023
Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

1 Aug 06, 2022
A Python toolkit for rule-based/unsupervised anomaly detection in time series

Anomaly Detection Toolkit (ADTK) Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection. As

Arundo Analytics 888 Dec 30, 2022
Dual Adaptive Sampling for Machine Learning Interatomic potential.

DAS Dual Adaptive Sampling for Machine Learning Interatomic potential. How to cite If you use this code in your research, please cite this using: Hong

6 Jul 06, 2022
Tools for diffing and merging of Jupyter notebooks.

nbdime provides tools for diffing and merging of Jupyter Notebooks.

Project Jupyter 2.3k Jan 03, 2023