On Anytime Learning At Macroscale

Learning from sequential data dumps

(key) Requirements

Python 3.7
Pytorch 1.9.0
Hydra 1.1.0 (pip install hydra-core & pip install hydra-submitit-launcher)

Structure

├── crlapi           
  ├── benchmark.py    # Creates the data stream, feeds it to the model and evaluates it
  ├── core.py         # Abstract classes for 
  ├── logger.py   
  ├── sl
    ├── architectures
      ├── ...         # NN architectures used in this project
    ├── clmodels
      ├── ...         # Models (e.g. Single, gEns, ..., )
    ├── streams
      ├── ...         # CIFAR and MNIST stream implementatins

Running Experiments

To run experiments, you need to call the dataset specific run file, and you need to pass the configuration of the run. We have place the configurations in the previous directory (../configs). The config structure is as follows

    ├── configs
        ├── mnist
           ├── run.py                 # run file
           ├── test_usage_gmoe.yaml   # This is the "gMoE" model
           ├── test_finetune_mlp.yaml # This is the "Single Model"
           ... 
        ├── cifar
           ├── run.py                 # run file
           ├── test_finetune_vgg.yaml # This is the "Single Model"
           ├── test_usage_gmoe.yaml   # This is the "gMoE" model
           ...

To run an e.g. mnist gMoE run, the command is (launched from the directory just above (so cd ..)

PYTHONPATH=./ python configs/mnist/run.py -cn test_usage_gmoe n_megabatches=2 replay=1 clmodel.max_epochs=200

Important arguments

n_megabatches : controls the number of megabatches. So n_megabatches=1 is your regular full dataset training
replay : whether to use replay or not
clmodel.init_from_scratch : whether to reinitialize the model at every MB. Should only be used when replay=1
device : use cuda or cpu depending on your hardware

License

alma is released under the MIT license. See LICENSE for additional details about it. See also our Terms of Use and Privacy Policy.

Anytime Learning At Macroscale

Related tags

Overview

On Anytime Learning At Macroscale

(key) Requirements

Structure

Running Experiments

Important arguments

License

Owner

Meta Research

Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

Machine Learning Course with Python:

An AutoML survey focusing on practical systems.

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Mesh TensorFlow: Model Parallelism Made Easier

Estudos e projetos feitos com PySpark.

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

JMP is a Mixed Precision library for JAX.

Automated Machine Learning with scikit-learn

Python module for performing linear regression for data with measurement errors and intrinsic scatter

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models.

ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

Data from "Datamodels: Predicting Predictions with Training Data"

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Markov bot - A Writing bot based on Markov Chain for Data Structure Lab

🤖 ⚡ scikit-learn tips

Fit interpretable models. Explain blackbox machine learning.