MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

Last update: Dec 27, 2022

Overview

The collaboration platform for Machine Learning

MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

MLReef

MLReef is a ML/DL development platform containing four main sections:

Data-Management - Fully versioned data hosting and processing infrastructure
Publishing code repositories - Containerized and versioned script repositories for immutable use in data pipelines
Experiment Manager - Experiment tracking, environments and results
ML-Ops - Pipelines & Orchestration solution for ML/DL jobs (K8s / Cloud / bare-metal)

Sign up & start experimenting in minutes.

To find out more about how MLReef can streamline your Machine Learning Development Lifecycle visit our homepage

Data Management

Host your data using git / git LFS repositories.
- Work concurrently on data
- Fully versioned or LFS version control
- Full view on data processing and visualization history
Connect your external storage to MLReef and use your data directly in pipelines
Data set management (access, history, pipelines)

Publishing Code

Adding only parameter annotations to your code...

# example of parameter annotation for a image crop function
 @data_processor(
        name="Resnet50",
        author="MLReef",
        command="resnet50",
        type="ALGORITHM",
        description="CNN Model resnet50",
        visibility="PUBLIC",
        input_type="IMAGE",
        output_type="MODEL"
    )
    @parameter(name='input-path', type='str', required=True, defaultValue='train', description="input path")
    @parameter(name='output-path', type='str', required=True, defaultValue='output', description="output path")
    @parameter(name='height', type='int', required=True, defaultValue=224, description="height of cropped images in px")
    @parameter(name='width', type='int', required=True, defaultValue=224, description="width of cropped images in px")
    def init_params():
        pass

...and publishing your scripts gets you the following:

Containerization of your scripts
- Always working scripts including easy hyperparameter access in pipelines
- Execution environment (including specific packages & versions)
- Hyper-parameters
  - ArgParser for command line parameters with currently used values
  - Explicit parameters dictionary
  - Input validation and guides
Multiple containers based on version and code branches

Experiment Manager

Complete experiment setup log
- Full source control info including non-committed local changes
- Execution environment (including specific packages & versions)
- Hyper-parameters
Full experiment output automatic capture
- Artifacts storage and standard-output logs
- Performance metrics on individual experiments and comparative graphs for all experiments
- Detailed view on logs and outputs generated
Extensive platform support and integrations
- Supported all python based ML/DL frameworks, for example: PyTorch, Tensorflow, Keras or Scikit-Learn

ML-Ops

Concurrent computing pipelining
Governance and control
- Access and user management
- Single permission management
- Resource management
Model management

MLReef Architecture

The MLReef ML components within the ML life cycle:

Data Storage components based currently on Git and Git LFS.
Model development based on working modules (published by the community or your team), data management, data processing / data visualization / experiment pipeline on hosted or on-prem and model management.
ML-Ops orchestration, experiment and workflow reproducibility, and scalability.

Why MLReef?

MLReef is our solution to a problem we share with countless other researchers and developers in the machine learning/deep learning universe: Training production-grade deep learning models is a tangled process. MLReef tracks and controls the process by associating code version control, research projects, performance metrics, and model provenance.

We designed MLReef on best data science practices combined with the knowleged gained from DevOps and a deep focus on collaboration.

Use it on a daily basis to boost collaboration and visibility in your team
Create a job in the cloud from any code repository with a click of a button
Automate processes and create pipelines to collect your experimentation logs, outputs, and data
Make you ML life cycle transparent by cataloging it all on the MLReef platform

Getting Started as a Developer

Please read the Contribution Guidelines carefully
clone the mlreef git repository to your local machine
read the architecture document

To start developing, continue with the developer guide

Canonical source

The canonical source of MLReef where all development takes place is hosted on gitLab.com/mlreef/mlreef.

License

MIT License (see the License for more information)

Documentation, Community and Support

More information in the official documentation and on Youtube.

For examples and use cases, check these use cases or start the tutorial after registring:

If you have any questions: post on our Slack channel, or tag your questions on stackoverflow with 'mlreef' tag.

For feature requests or bug reports, please use GitLab issues.

Additionally, you can always reach out to us via [email protected]

Contributing

Merge Requests are always welcomed ❤️ See more details in the MLReef Contribution Guidelines.

MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

Related tags

Overview

MLReef

Data Management

Publishing Code

Experiment Manager

ML-Ops

MLReef Architecture

Why MLReef?

Getting Started as a Developer

Canonical source

License

Documentation, Community and Support

Contributing

Owner

MLReef

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Examples and code for the Practical Machine Learning workshop series

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

Crunchdao - Python API for the Crunchdao machine learning tournament

This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

TIANCHI Purchase Redemption Forecast Challenge

Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

Machine Learning approach for quantifying detector distortion fields

Predicting job salaries from ads - a Kaggle competition

MachineLearningStocks is designed to be an intuitive and highly extensible template project applying machine learning to making stock predictions.

MLBox is a powerful Automated Machine Learning python library.

Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

Exemplary lightweight and ready-to-deploy machine learning project

30 Days Of Machine Learning Using Pytorch