A Lightweight Hyperparameter Optimization Tool 🚀

Last update: Jan 08, 2023

Related tags

Overview

Lightweight Hyperparameter Optimization 🚀

The mle-hyperopt package provides a simple and intuitive API for hyperparameter optimization of your Machine Learning Experiment (MLE) pipeline. It supports real, integer & categorical search variables and single- or multi-objective optimization.

Core features include the following:

API Simplicity: strategy.ask(), strategy.tell() interface & space definition.
Strategy Diversity: Grid, random, coordinate search, SMBO & wrapping around FAIR's nevergrad.
Search Space Refinement based on the top performing configs via strategy.refine(top_k=10).
Export of configurations to execute via e.g. python train.py --config_fname config.yaml.
Storage & reload search logs via strategy.save(<log_fname>), strategy.load(<log_fname>).

For a quickstart check out the notebook blog 📖 .

The API 🎮

from mle_hyperopt import RandomSearch

# Instantiate random search class
strategy = RandomSearch(real={"lrate": {"begin": 0.1,
                                        "end": 0.5,
                                        "prior": "log-uniform"}},
                        integer={"batch_size": {"begin": 32,
                                                "end": 128,
                                                "prior": "uniform"}},
                        categorical={"arch": ["mlp", "cnn"]})

# Simple ask - eval - tell API
configs = strategy.ask(5)
values = [train_network(**c) for c in configs]
strategy.tell(configs, values)

Implemented Search Types 🔭

Search Type	Description	`search_config`
`GridSearch`	Search over list of discrete values	-
`RandomSearch`	Random search over variable ranges	`refine_after`, `refine_top_k`
`CoordinateSearch`	Coordinate-wise optimization with fixed defaults	`order`, `defaults`
`SMBOSearch`	Sequential model-based optimization	`base_estimator`, `acq_function`, `n_initial_points`
`NevergradSearch`	Multi-objective nevergrad wrapper	`optimizer`, `budget_size`, `num_workers`

Variable Types & Hyperparameter Spaces 🌍

Variable	Type	Space Specification
`real`	Real-valued	`Dict`: `begin`, `end`, `prior`/`bins` (grid)
`integer`	Integer-valued	`Dict`: `begin`, `end`, `prior`/`bins` (grid)
`categorical`	Categorical	`List`: Values to search over

Installation ⏳

A PyPI installation is available via:

pip install mle-hyperopt

Alternatively, you can clone this repository and afterwards 'manually' install it:

git clone https://github.com/mle-infrastructure/mle-hyperopt.git
cd mle-hyperopt
pip install -e .

Further Options 🚴

Saving & Reloading Logs 🏪

# Storing & reloading of results from .pkl
strategy.save("search_log.json")
strategy = RandomSearch(..., reload_path="search_log.json")

# Or manually add info after class instantiation
strategy = RandomSearch(...)
strategy.load("search_log.json")

Search Decorator 🧶

from mle_hyperopt import hyperopt

@hyperopt(strategy_type="grid",
          num_search_iters=25,
          real={"x": {"begin": 0., "end": 0.5, "bins": 5},
                "y": {"begin": 0, "end": 0.5, "bins": 5}})
def circle(config):
    distance = abs((config["x"] ** 2 + config["y"] ** 2))
    return distance

strategy = circle()

Storing Configuration Files 📑

# Store 2 proposed configurations - eval_0.yaml, eval_1.yaml
strategy.ask(2, store=True)
# Store with explicit configuration filenames - conf_0.yaml, conf_1.yaml
strategy.ask(2, store=True, config_fnames=["conf_0.yaml", "conf_1.yaml"])

Retrieving Top Performers & Visualizing Results 📉

# Get the top k best performing configurations
id, configs, values = strategy.get_best(top_k=4)

# Plot timeseries of best performing score over search iterations
strategy.plot_best()

# Print out ranking of best performers
strategy.print_ranking(top_k=3)

Refining the Search Space of Your Strategy 🪓

# Refine the search space after 5 & 10 iterations based on top 2 configurations
strategy = RandomSearch(real={"lrate": {"begin": 0.1,
                                        "end": 0.5,
                                        "prior": "log-uniform"}},
                        integer={"batch_size": {"begin": 1,
                                                "end": 5,
                                                "prior": "uniform"}},
                        categorical={"arch": ["mlp", "cnn"]},
                        search_config={"refine_after": [5, 10],
                                       "refine_top_k": 2})

# Or do so manually using `refine` method
strategy.tell(...)
strategy.refine(top_k=2)

Note that the search space refinement is only implemented for random, SMBO and nevergrad-based search strategies.

Development & Milestones for Next Release

You can run the test suite via python -m pytest -vv tests/. If you find a bug or are missing your favourite feature, feel free to contact me @RobertTLange or create an issue 🤗 .

Robust type checking with isinstance(self.log[0]["objective"], (float, int, np.integer, np.float))
Add improvement method indicating if score is better than best stored one
Fix logging message when log is stored
Add save option for best plot
Make json serializer more robust for numpy data types
Make sure search space refinement works for different batch sizes
Add args, kwargs into decorator
Check why SMBO can propose same config multiple times. Add Hutter reference.

Comments

[FEATURE] Hyperband

Hi! I was wondering if the Hyperband hyperparameter algorithm is something you want implemented.

I'm willing to spend some time working on it if there's interest.

opened by colligant 5
[FEATURE] Option to pickle the whole strategy
Right now strategy.save produces a JSON with the log. Any reason you didn't opt for (or have an option of) pickling the whole strategy? Two motivations for this:

Not having to re-init the strategy with all the args/kwargs

Not having to loop through tell! SMBO can take quite some time to do this.
opened by alexander-soare 4
Type checking strategy.log could be made more flexible?
Yay first issue! Congrats Robert, this is a great interface. Haven't used a hyperopt library in a while and this felt so easy to pick up.

For example https://github.com/RobertTLange/mle-hyperopt/blob/57eb806e95c854f48f8faac2b2dc182d2180d393/mle_hyperopt/search.py#L251

raises an error if my objective is numpy.float64. Also noticed https://github.com/RobertTLange/mle-hyperopt/blob/57eb806e95c854f48f8faac2b2dc182d2180d393/mle_hyperopt/search.py#L206

Could we just have

isinstance(strategy.log[0]['objective'], (float, int))

which would cover the numpy types?
opened by alexander-soare 4
Successive Halving, Hyperband, PBT
[x] Robust type checking with isinstance(self.log[0]["objective"], (float, int, np.integer, np.float))

[x] Add improvement method indicating if score is better than best stored one

[x] Fix logging message when log is stored

[x] Add save option for best plot

[x] Make json serializer more robust for numpy data types

[x] Add possibility to save as .pkl file by providing filename in .save method ending with .pkl (issue #2)

[x] Add args, kwargs into decorator

[x] Adds synchronous Successive Halving (SuccessiveHalvingSearch - issue #3)

[x] Adds synchronous HyperBand (HyperbandSearch - issue #3)

[x] Adds synchronous PBT (PBTSearch - issue #4 )
opened by RobertTLange 1
[Feature] Synchronous PBT

Move PBT ask/tell functionality from mle-toolbox experimental to mle-hyperopt. Is there any literature/empirical evidence for the importance of being asynchronous?
enhancement

opened by RobertTLange 1

Releases(v0.0.7)

v0.0.7(Feb 20, 2022)
Added

Log reloading helper for post-processing.

Fixed

Bug fix in mle-search with imports of dependencies. Needed to append path.

Bug fix with cleaning nested dictionaries. Have to make sure not to delete entire sub-dictionary.

Source code(tar.gz)
Source code(zip)
v0.0.6(Feb 20, 2022)
Added

Adds a command line interface for running a sequential search given a python script <script>.py containing a function main(config), a default configuration file <base>.yaml & a search configuration <search>.yaml. The main function should return a single scalar performance score. You can then start the search via:

mle-search <script>.py --base_config <base>.yaml --search_config <search>.yaml --num_iters <search_iters>

Or short via:

mle-search <script>.py -base <base>.yaml -search <search>.yaml -iters <search_iters>

Adds doc-strings to all functionalities.

Changed

Make it possible to optimize parameters in nested dictionaries. Added helpers flatten_config and unflatten_config. For shaping 'sub1/sub2/vname' <-> {sub1: {sub2: {vname: v}}}

Make start-up message also print fixed parameter settings.

Cleaned up decorator with the help of Strategies wrapper.

Source code(tar.gz)
Source code(zip)
v0.0.5(Jan 5, 2022)
Added

Adds possibility to store and reload entire strategies as pkl file (as asked for in issue #2).

Adds improvement method indicating if score is better than best stored one

Adds save option for best plot

Adds args, kwargs into decorator

Adds synchronous Successive Halving (SuccessiveHalvingSearch - issue #3)

Adds synchronous HyperBand (HyperbandSearch - issue #3)

Adds synchronous PBT (PBTSearch - issue #4)

Adds option to save log in tell method

Adds small torch mlp example for SH/Hyperband/PBT w. logging/scheduler

Adds print welcome/update message for strategy specific info

Changed

Major internal restructuring:

clean_data: Get rid of extra data provided in configuration file

tell_search: Update model of search strategy (e.g. SMBO/Nevergrad)

log_search: Add search specific log data to evaluation log

update_search: Refine search space/change active strategy etc.

Also allow to store checkpoint of trained models in tell method.

Fix logging message when log is stored

Make json serializer more robust for numpy data types

Robust type checking with isinstance(self.log[0]["objective"], (float, int, np.integer, np.float))

Update NB to include mle-scheduler example

Make PBT explore robust for integer/categorical valued hyperparams

Calculate total batches & their sizes for hyperband

Source code(tar.gz)
Source code(zip)
v0.0.4(Dec 10, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.3(Oct 24, 2021)
Fixes CoordinateSearch active grid search dimension updating. We have to account for the fact that previous coordinates are not evaluated again after switching the active variable.

Generalizes NevergradSearch to wrap around all search strategies.

Adds rich logging to all console print statements.

Updates documentation and adds text to getting_started.ipynb.

Source code(tar.gz)
Source code(zip)

v0.0.2(Oct 20, 2021)

Fixes import bug when using PyPi installation.
Enhances documentation and test coverage.
Adds search space refinement for nevergrad and smbo search strategies via refine_after and refine_top_k:

strategy = SMBOSearch(
        real={"lrate": {"begin": 0.1, "end": 0.5, "prior": "uniform"}},
        integer={"batch_size": {"begin": 1, "end": 5, "prior": "uniform"}},
        categorical={"arch": ["mlp", "cnn"]},
        search_config={
            "base_estimator": "GP",
            "acq_function": "gp_hedge",
            "n_initial_points": 5,
            "refine_after": 5,
            "refine_top_k": 2,
        },
        seed_id=42,
        verbose=True
    )

Adds additional strategy boolean option maximize_objective to maximize instead of performing default black-box minimization.

Source code(tar.gz)
Source code(zip)

v0.0.1(Oct 16, 2021)

Base API implementation:

from mle_hyperopt import RandomSearch

# Instantiate random search class
strategy = RandomSearch(real={"lrate": {"begin": 0.1,
                                        "end": 0.5,
                                        "prior": "log-uniform"}},
                        integer={"batch_size": {"begin": 32,
                                                "end": 128,
                                                "prior": "uniform"}},
                        categorical={"arch": ["mlp", "cnn"]})

# Simple ask - eval - tell API
configs = strategy.ask(5)
values = [train_network(**c) for c in configs]
strategy.tell(configs, values)

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

Code for generating a single image pretraining dataset

Single Image Pretraining of Visual Representations As shown in the paper A critical analysis of self-supervision, or what we can learn from a single i

12 Dec 19, 2022

Face recognition. Redefined.

FaceFinder Use a powerful CNN to identify faces in images! TABLE OF CONTENTS About The Project Built With Getting Started Prerequisites Installation U

20 Jun 16, 2021

CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search

CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search This repository is the official implementation of CAPITAL: Optimal Subgrou

0 Oct 19, 2021

Advbox is a toolbox to generate adversarial examples that fool neural networks in PaddlePaddle、PyTorch、Caffe2、MxNet、Keras、TensorFlow and Advbox can benchmark the robustness of machine learning models.

Advbox is a toolbox to generate adversarial examples that fool neural networks in PaddlePaddle、PyTorch、Caffe2、MxNet、Keras、TensorFlow and Advbox can benchmark the robustness of machine learning models

1.3k Dec 25, 2022

Statistical and Algorithmic Investing Strategies for Everyone

Eiten - Algorithmic Investing Strategies for Everyone Eiten is an open source toolkit by Tradytics that implements various statistical and algorithmic

2.5k Jan 02, 2023

A naive ROS interface for visualDet3D.

YOLO3D ROS Node This repo contains a Monocular 3D detection Ros node. Base on https://github.com/Owen-Liuyuxuan/visualDet3D All parameters are exposed

19 Oct 08, 2022

fklearn: Functional Machine Learning

fklearn: Functional Machine Learning fklearn uses functional programming principles to make it easier to solve real problems with Machine Learning. Th

1.4k Dec 07, 2022

Replication package for the manuscript "Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?" submitted to TOSEM

tosem2021-personality-rep-package Replication package for the manuscript "Using Personality Detection Tools for Software Engineering Research: How Far

1 Dec 13, 2021

Generic Foreground Segmentation in Images

Pixel Objectness The following repository contains pretrained model for pixel objectness. Please visit our project page for the paper and visual resul

157 Nov 21, 2022

[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?

Pri3D: Can 3D Priors Help 2D Representation Learning? [ICCV 2021] Pri3D leverages 3D priors for downstream 2D image understanding tasks: during pre-tr

124 Jan 06, 2023

2021 Artificial Intelligence Diabetes Datathon

A.I.D.D. 2021 2021 Artificial Intelligence Diabetes Datathon A.I.D.D. 2021은 ‘2021 인공지능 학습용 데이터 구축사업’을 통해 만들어진 학습용 데이터를 활용하여 당뇨병을 효과적으로 예측할 수 있는가에 대한 A

2 Dec 27, 2021

Code Release for Learning to Adapt to Evolving Domains

EAML Code release for "Learning to Adapt to Evolving Domains" (NeurIPS 2020) Prerequisites PyTorch = 0.4.0 (with suitable CUDA and CuDNN version) tor

23 Dec 07, 2022

A CV toolkit for my papers.

PyTorch-Encoding created by Hang Zhang Documentation Please visit the Docs for detail instructions of installation and usage. Please visit the link to

2k Jan 04, 2023

A basic implementation of Layer-wise Relevance Propagation (LRP) in PyTorch.

Layer-wise Relevance Propagation (LRP) in PyTorch Basic unsupervised implementation of Layer-wise Relevance Propagation (Bach et al., Montavon et al.)

28 Dec 26, 2022

CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper)

CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper) (Accepted for oral presentation at ACM

1 Nov 12, 2021

A new play-and-plug method of controlling an existing generative model with conditioning attributes and their compositions.

Viz-It Data Visualizer Web-Application If I ask you where most of the data wrangler looses their time ? It is Data Overview and EDA. Presenting "Viz-I

66 Jan 01, 2023

A Lightweight Hyperparameter Optimization Tool 🚀

Related tags

Overview

Lightweight Hyperparameter Optimization 🚀

The API 🎮

Implemented Search Types 🔭

Variable Types & Hyperparameter Spaces 🌍

Installation ⏳

Further Options 🚴

Saving & Reloading Logs 🏪

Search Decorator 🧶

Storing Configuration Files 📑

Retrieving Top Performers & Visualizing Results 📉

Refining the Search Space of Your Strategy 🪓

Development & Milestones for Next Release

Comments

[FEATURE] Hyperband

[FEATURE] Option to pickle the whole strategy

Type checking strategy.log could be made more flexible?

Successive Halving, Hyperband, PBT

[Feature] Synchronous PBT

Releases(v0.0.7)

v0.0.7(Feb 20, 2022)

Added

Fixed

v0.0.6(Feb 20, 2022)

Added

Changed

v0.0.5(Jan 5, 2022)

Added

Changed

v0.0.4(Dec 10, 2021)

v0.0.3(Oct 24, 2021)

v0.0.2(Oct 20, 2021)

v0.0.1(Oct 16, 2021)

Owner

Code for generating a single image pretraining dataset

Face recognition. Redefined.

CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search

Advbox is a toolbox to generate adversarial examples that fool neural networks in PaddlePaddle、PyTorch、Caffe2、MxNet、Keras、TensorFlow and Advbox can benchmark the robustness of machine learning models.

Statistical and Algorithmic Investing Strategies for Everyone

A naive ROS interface for visualDet3D.

fklearn: Functional Machine Learning

Replication package for the manuscript "Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?" submitted to TOSEM

Generic Foreground Segmentation in Images

[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?

2021 Artificial Intelligence Diabetes Datathon

Code Release for Learning to Adapt to Evolving Domains

A CV toolkit for my papers.

A basic implementation of Layer-wise Relevance Propagation (LRP) in PyTorch.

CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation (ACMMM'21 Oral Paper)

A new play-and-plug method of controlling an existing generative model with conditioning attributes and their compositions.

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

A simple version for graphfpn

This repository is the official implementation of the Hybrid Self-Attention NEAT algorithm.

LETR: Line Segment Detection Using Transformers without Edges