Python Research Framework

Related tags

Machine Learningpyfra
Overview

pyfra

The Python Research Framework.

Design Philosophy

Research code has some of the fastest shifting requirements of any type of code. It's nearly impossible to plan ahead of time the proper abstractions, because it is exceedingly likely that in the course of the project what you originally thought was your main focus suddenly no longer is. Further, research code (especially in ML) often involves big and complicated pipelines, typically involving many different machines, which are either run by hand or using shell scripts that are far more complicated than any shell script ever should be.

Therefore, the objective of pyfra is to make it as fast and low-friction as possible to write research code involving complex pipelines over many machines. This entails making it as easy as possible to implement a research idea in reality, at the cost of fine-grained control and the long-term maintainability of the system. In other words, pyfra expects that code will either be rapidly obsoleted by newer code, or rewritten using some other framework once it is no longer a research project and requirements have settled down.

Pyfra is in its very early stages of development. The interface may change rapidly and without warning.

Features:

  • Spin up an internal webserver complete with a permissions system using only a few lines of code
  • Extremely elegant shell integration—run commands on any server seamlessly. All the best parts of bash and python combined
  • Automated remote environment setup, so you never have to worry about provisioning machines by hand again
  • (WIP) Tools for painless functional programming in python
  • (Coming soon) High level API for experiment management/scheduling and resource provisioning
  • (Coming soon) Idempotent resumable data pipelines with no cognitive overhead

Example code

from pyfra import *

loc = Remote()
rem = Remote("[email protected]")
nas = Remote("[email protected]")

@page("Run experiment", dropdowns={'server': ['local', 'remote']})
def run_experiment(server: str, config_file: str, some_numerical_value: int, some_checkbox: bool):
    r = loc if server == 'local' else rem

    r.sh("git clone https://github.com/EleutherAI/gpt-neox")
    
    # rsync as a function can do local-local, local-remote, and remote-remote
    rsync(config_file, r.file("gpt-neox/configs/my-config.yml"))
    rsync(nas.file('some_data_file'), r.file('gpt-neox/data/whatever'))
    
    return r.sh('cd gpt-neox; python3 main.py')

@page("Write example file and copy")
def example():
    rem.fwrite("testing.txt", "hello world")
    
    # tlocal files can be specified as just a string
    rsync(rem.file('testing123.txt'), 'test1.txt')
    rsync(rem.file('testing123.txt'), loc.file('test2.txt'))

    loc.sh('cat test1.txt')
    
    assert fread('test1.txt') == fread('test2.txt')
    
    # fread, fwrite, etc can take a `rem.file` instead of a string filename.
    # you can also use all *read and *write functions directly on the remote too.
    assert fread('test1.txt') == fread(rem.file('testing123.txt'))
    assert fread('test1.txt') == rem.fread('testing123.txt')

    # ls as a function returns a list of files (with absolute paths) on the selected remote.
    # the returned value is displayed on the webpage.
    return '\n'.join(rem.ls('/'))

@page("List files in some directory")
def list_files(directory):
    return sh(f"ls -la {directory | quote}")


# start internal webserver
webserver()

Installation

pip3 install git+https://github.com/EleutherAI/pyfra/

The version of PyPI is not up to date, do not use it.

Webserver screenshots

image image

Comments
  • Try to install sudo in _install

    Try to install sudo in _install

    Sudo is installed in setup.apt(), which is not run when python_version=None is set for an env. This PR tries to install the sudo package on _install which solves this issue.

    opened by kurumuz 1
  • Styling updates 2

    Styling updates 2

    This should fix some issues that were noticed recently.

    • increases the width of the content in the middle
    • all button icons are now the same (until we figure out better solution)
    • content that is overflowing should now be scrollable
    opened by jprester 0
  • Update styling

    Update styling

    I made some updates to styling for the admin dashboard pages.

    Stuff I did:

    • changed the styling to look like design mockup
    • moved ids to classes in css. Ids should be used for javascript selector
    • added some svg icons
    • made the UI somewhat responsive
    opened by jprester 0
  • docs: docs are empty

    docs: docs are empty

    Screenshot from the RTD page:

    image

    I recommend checking the raw output of the build on the RTD dashboard.

    Probably some library installation issue when running setup.

    opened by TomFrederik 0
  • Type annotations

    Type annotations

    Type annotations are a must-have for public facing library exports, as they allow users to infer a lot of information about calls/return values independent of documentation, as well as help with code completions.

    opened by hugbubby 0
Releases(v0.3.0)
  • v0.3.0(Dec 9, 2021)

    What's new

    • Envs now resume where they left off (and Remotes have an option for turning this behaviour on)
    • @stage caching added

    Breaking Changes

    • delegation promoted to full submodule and experiment removed
    • pyfra.functional removed
    • pyfra.web deprecated and moved to contrib
    • contrib revamp

    Full Changelog: https://github.com/EleutherAI/pyfra/compare/8e775df36ca8f2ae39b0b7add9c30eab446207b1...9616e835578f8ad04a6d9c3b405777fc4b7e0853

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0rc6(Sep 1, 2021)

Owner
EleutherAI
EleutherAI
Exemplary lightweight and ready-to-deploy machine learning project

Exemplary lightweight and ready-to-deploy machine learning project

snapADDY GmbH 6 Dec 20, 2022
A library of sklearn compatible categorical variable encoders

Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques

2.1k Jan 07, 2023
TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

A powerful and flexible machine learning platform for drug discovery

MilaGraph 1.1k Jan 08, 2023
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. 10x Larger Models 10x Faster Trainin

Microsoft 8.4k Dec 30, 2022
🌊 River is a Python library for online machine learning.

River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on strea

OnlineML 4k Jan 03, 2023
This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

B DEVA DEEKSHITH 1 Nov 03, 2021
Simple Machine Learning Tool Kit

Getting started smltk (Simple Machine Learning Tool Kit) package is implemented for helping your work during data preparation testing your model The g

Alessandra Bilardi 1 Dec 30, 2021
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 02, 2023
Sequence learning toolkit for Python

seqlearn seqlearn is a sequence classification toolkit for Python. It is designed to extend scikit-learn and offer as similar as possible an API. Comp

Lars 653 Dec 27, 2022
Greykite: A flexible, intuitive and fast forecasting library

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

LinkedIn 1.4k Jan 15, 2022
We have a dataset of user performances. The project is to develop a machine learning model that will predict the salaries of baseball players.

Salary-Prediction-with-Machine-Learning 1. Business Problem Can a machine learning project be implemented to estimate the salaries of baseball players

Ayşe Nur Türkaslan 9 Oct 14, 2022
Repositório para o #alurachallengedatascience1

1° Challenge de Dados - Alura A Alura Voz é uma empresa de telecomunicação que nos contratou para atuar como cientistas de dados na equipe de vendas.

Sthe Monica 16 Nov 10, 2022
database for artificial intelligence/machine learning data

AIDB v0.0.1 database for artificial intelligence/machine learning data Overview aidb is a database designed for large dataset for machine learning pro

Aarush Gupta 1 Oct 24, 2021
An easier way to build neural search on the cloud

Jina is geared towards building search systems for any kind of data, including text, images, audio, video and many more. With the modular design & multi-layer abstraction, you can leverage the effici

Jina AI 17k Jan 01, 2023
Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters

Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters. It also works with any regressor compatible with the scikit-learn API (pipelines, CatBoost, LightGBM

Joaquín Amat Rodrigo 297 Jan 09, 2023
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
PySurvival is an open source python package for Survival Analysis modeling

PySurvival What is Pysurvival ? PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or p

Square 265 Dec 27, 2022
PyHarmonize: Adding harmony lines to recorded melodies in Python

PyHarmonize: Adding harmony lines to recorded melodies in Python About To use this module, the user provides a wav file containing a melody, the key i

Julian Kappler 2 May 20, 2022
Optimal Randomized Canonical Correlation Analysis

ORCCA Optimal Randomized Canonical Correlation Analysis This project is for the python version of ORCCA algorithm. It depends on Numpy for matrix calc

Yinsong Wang 1 Nov 21, 2021