An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

Overview



img not loaded: try F5 :) img not loaded: try F5 :) img not loaded: try F5 :) img not loaded: try F5 :) img not loaded: try F5 :) img not loaded: try F5 :)

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.


Hyperactive:







What's new?


Overview

Hyperactive features a collection of optimization algorithms that can be used for a variety of optimization problems. The following table shows examples of its capabilities:


Optimization Techniques Tested and Supported Packages Optimization Applications
Local Search:
Global Search:
Population Methods:
Sequential Methods:
Machine Learning:
Deep Learning:
Parallel Computing:
Feature Engineering: Machine Learning: Deep Learning: Data Collection: Visualization: Miscellaneous:

The examples above are not necessarily done with realistic datasets or training procedures. The purpose is fast execution of the solution proposal and giving the user ideas for interesting usecases.


Hyperactive is very easy to use:

Regular training Hyperactive
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston


data = load_boston()
X, y = data.data, data.target


gbr = DecisionTreeRegressor(max_depth=10)
score = cross_val_score(gbr, X, y, cv=3).mean()







from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston
from hyperactive import Hyperactive

data = load_boston()
X, y = data.data, data.target

def model(opt):
    gbr = DecisionTreeRegressor(max_depth=opt["max_depth"])
    return cross_val_score(gbr, X, y, cv=3).mean()


search_space = {"max_depth": list(range(3, 25))}

hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50)
hyper.run()

Installation

The most recent version of Hyperactive is available on PyPi:

pyversions PyPI version PyPI version

pip install hyperactive

Example

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from hyperactive import Hyperactive

data = load_boston()
X, y = data.data, data.target

# define the model in a function
def model(opt):
    # pass the suggested parameter to the machine learning model
    gbr = GradientBoostingRegressor(
        n_estimators=opt["n_estimators"]
    )
    scores = cross_val_score(gbr, X, y, cv=3)

    # return a single numerical value, which gets maximized
    return scores.mean()


# search space determines the ranges of parameters you want the optimizer to search through
search_space = {"n_estimators": list(range(10, 200, 5))}

# start the optimization run
hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50)
hyper.run()

Hyperactive API reference


Basic Usage

Hyperactive(verbosity, distribution, n_processes)
  • verbosity = ["progress_bar", "print_results", "print_times"]

    • Possible parameter types: (list, False)
    • The verbosity list determines what part of the optimization information will be printed in the command line.
  • distribution = "multiprocessing"

    • Possible parameter types: (str, dict, callable)

    • Access the parallel processing in three ways:

      • Via a str "multiprocessing" or "joblib" to choose one of the two.
      • Via a dictionary with one key "multiprocessing" or "joblib" and a value that is the input argument of Pool and Parallel. The default argument is a good example of this.
      • Via your own parallel processing function that will be used instead of those for multiprocessing and joblib. The wrapper-function must work similar to the following two functions:

      Multiprocessing:

      def multiprocessing_wrapper(process_func, search_processes_paras, **kwargs):
        n_jobs = len(search_processes_paras)
      
        pool = Pool(n_jobs, **kwargs)
        results = pool.map(process_func, search_processes_paras)
      
        return results

      Joblib:

      def joblib_wrapper(process_func, search_processes_paras, **kwargs):
          n_jobs = len(search_processes_paras)
      
          jobs = [
              delayed(process_func)(**info_dict)
              for info_dict in search_processes_paras
          ]
          results = Parallel(n_jobs=n_jobs, **kwargs)(jobs)
      
          return results
  • n_processes = "auto",

    • Possible parameter types: (str, int)
    • The maximum number of processes that are allowed to run simultaneously. If n_processes is of int-type there will only run n_processes-number of jobs simultaneously instead of all at once. So if n_processes=10 and n_jobs_total=35, then the schedule would look like this 10 - 10 - 10 - 5. This saves computational resources if there is a large number of n_jobs. If "auto", then n_processes is the sum of all n_jobs (from .add_search(...)).
.add_search(objective_function, search_space, n_iter, optimizer, n_jobs, initialize, max_score, random_state, memory, memory_warm_start, progress_board)
  • objective_function

    • Possible parameter types: (callable)
    • The objective function defines the optimization problem. The optimization algorithm will try to maximize the numerical value that is returned by the objective function by trying out different parameters from the search space.
  • search_space

    • Possible parameter types: (dict)
    • Defines the space were the optimization algorithm can search for the best parameters for the given objective function.
  • n_iter

    • Possible parameter types: (int)
    • The number of iterations that will be performed during the optimization run. The entire iteration consists of the optimization-step, which decides the next parameter that will be evaluated and the evaluation-step, which will run the objective function with the chosen parameter and return the score.
  • optimizer = "default"

    • Possible parameter types: ("default", initialized optimizer object)

    • Instance of optimization class that can be imported from Hyperactive. "default" corresponds to the random search optimizer. The following classes can be imported and used:

      • HillClimbingOptimizer
      • StochasticHillClimbingOptimizer
      • RepulsingHillClimbingOptimizer
      • RandomSearchOptimizer
      • RandomRestartHillClimbingOptimizer
      • RandomAnnealingOptimizer
      • SimulatedAnnealingOptimizer
      • ParallelTemperingOptimizer
      • ParticleSwarmOptimizer
      • EvolutionStrategyOptimizer
      • BayesianOptimizer
      • TreeStructuredParzenEstimators
      • DecisionTreeOptimizer
      • EnsembleOptimizer
    • Example:

      ...
      
      opt_hco = HillClimbingOptimizer(epsilon=0.08)
      hyper = Hyperactive()
      hyper.add_search(..., optimizer=opt_hco)
      hyper.run()
      
      ...
  • n_jobs = 1

    • Possible parameter types: (int)
    • Number of jobs to run in parallel. Those jobs are optimization runs that work independent from another (no information sharing). If n_jobs == -1 the maximum available number of cpu cores is used.
  • initialize = {"grid": 4, "random": 2, "vertices": 4}

    • Possible parameter types: (dict)

    • The initialization dictionary automatically determines a number of parameters that will be evaluated in the first n iterations (n is the sum of the values in initialize). The initialize keywords are the following:

      • grid

        • Initializes positions in a grid like pattern. Positions that cannot be put into a grid are randomly positioned.
      • vertices

        • Initializes positions at the vertices of the search space. Positions that cannot be put into a new vertex are randomly positioned.
      • random

        • Number of random initialized positions
      • warm_start

        • List of parameter dictionaries that marks additional start points for the optimization run.

      Example:

      ... 
      search_space = {
          "x1": list(range(10, 150, 5)),
          "x2": list(range(2, 12)),
      }
      
      ws1 = {"x1": 10, "x2": 2}
      ws2 = {"x1": 15, "x2": 10}
      
      hyper = Hyperactive()
      hyper.add_search(
          model,
          search_space,
          n_iter=30,
          initialize={"grid": 4, "random": 10, "vertices": 4, "warm_start": [ws1, ws2]},
      )
      hyper.run()
  • max_score = None

    • Possible parameter types: (float, None)
    • Maximum score until the optimization stops. The score will be checked after each completed iteration.
  • early_stopping=None

    • (dict, None)

    • Stops the optimization run early if it did not achive any score-improvement within the last iterations. The early_stopping-parameter enables to set three parameters:

      • n_iter_no_change: Non-optional int-parameter. This marks the last n iterations to look for an improvement over the iterations that came before n. If the best score of the entire run is within those last n iterations the run will continue (until other stopping criteria are met), otherwise the run will stop.
      • tol_abs: Optional float-paramter. The score must have improved at least this absolute tolerance in the last n iterations over the best score in the iterations before n. This is an absolute value, so 0.1 means an imporvement of 0.8 -> 0.9 is acceptable but 0.81 -> 0.9 would stop the run.
      • tol_rel: Optional float-paramter. The score must have imporved at least this relative tolerance (in percentage) in the last n iterations over the best score in the iterations before n. This is a relative value, so 10 means an imporvement of 0.8 -> 0.88 is acceptable but 0.8 -> 0.87 would stop the run.
    • random_state = None

    • Possible parameter types: (int, None)

    • Random state for random processes in the random, numpy and scipy module.

  • memory = True

    • Possible parameter types: (bool, "share")
    • Whether or not to use the "memory"-feature. The memory is a dictionary, which gets filled with parameters and scores during the optimization run. If the optimizer encounters a parameter that is already in the dictionary it just extracts the score instead of reevaluating the objective function (which can take a long time). If memory is set to "share" and there are multiple jobs for the same objective function then the memory dictionary is automatically shared between the different processes.
  • memory_warm_start = None

    • Possible parameter types: (pandas dataframe, None)

    • Pandas dataframe that contains score and parameter information that will be automatically loaded into the memory-dictionary.

      example:

      score x1 x2 x...
      0.756 0.1 0.2 ...
      0.823 0.3 0.1 ...
      ... ... ... ...
      ... ... ... ...
  • progress_board = None

    • Possible parameter types: (initialized ProgressBoard object, None)
    • Initialize the ProgressBoard class and pass the object to the progress_board-parameter.
.run(max_time)
  • max_time = None
    • Possible parameter types: (float, None)
    • Maximum number of seconds until the optimization stops. The time will be checked after each completed iteration.

Special Parameters

Objective Function

Each iteration consists of two steps:

  • The optimization step: decides what position in the search space (parameter set) to evaluate next
  • The evaluation step: calls the objective function, which returns the score for the given position in the search space

The objective function has one argument that is often called "para", "params" or "opt". This argument is your access to the parameter set that the optimizer has selected in the corresponding iteration.

def objective_function(opt):
    # get x1 and x2 from the argument "opt"
    x1 = opt["x1"]
    x2 = opt["x1"]

    # calculate the score with the parameter set
    score = -(x1 * x1 + x2 * x2)

    # return the score
    return score

The objective function always needs a score, which shows how "good" or "bad" the current parameter set is. But you can also return some additional information with a dictionary:

def objective_function(opt):
    x1 = opt["x1"]
    x2 = opt["x1"]

    score = -(x1 * x1 + x2 * x2)

    other_info = {
      "x1 squared" : x1**2,
      "x2 squared" : x2**2,
    }

    return score, other_info

When you take a look at the results (a pandas dataframe with all iteration information) after the run has ended you will see the additional information in it. The reason we need a dictionary for this is because Hyperactive needs to know the names of the additonal parameters. The score does not need that, because it is always called "score" in the results. You can run this example script if you want to give it a try.

Search Space Dictionary

The search space defines what values the optimizer can select during the search. These selected values will be inside the objective function argument and can be accessed like in a dictionary. The values in each search space dimension should always be in a list. If you use np.arange you should put it in a list afterwards:

search_space = {
    "x1": list(np.arange(-100, 101, 1)),
    "x2": list(np.arange(-100, 101, 1)),
}

A special feature of Hyperactive is shown in the next example. You can put not just numeric values into the search space dimensions, but also strings and functions. This enables a very high flexibility in how you can create your studies.

def func1():
  # do stuff
  return stuff
  

def func2():
  # do stuff
  return stuff


search_space = {
    "x": list(np.arange(-100, 101, 1)),
    "str": ["a string", "another string"],
    "function" : [func1, func2],
}

If you want to put other types of variables (like numpy arrays, pandas dataframes, lists, ...) into the search space you can do that via functions:

def array1():
  return np.array([0, 1, 2])
  

def array2():
  return np.array([0, 1, 2])


search_space = {
    "x": list(np.arange(-100, 101, 1)),
    "str": ["a string", "another string"],
    "numpy_array" : [array1, array2],
}

The functions contain the numpy arrays and returns them. This way you can use them inside the objective function.

Optimizer Classes

Each of the following optimizer classes can be initialized and passed to the "add_search"-method via the "optimizer"-argument. During this initialization the optimizer class accepts additional paramters. You can read more about each optimization-strategy and its parameters in the Optimization Tutorial.

  • HillClimbingOptimizer
  • RepulsingHillClimbingOptimizer
  • SimulatedAnnealingOptimizer
  • RandomSearchOptimizer
  • RandomRestartHillClimbingOptimizer
  • RandomAnnealingOptimizer
  • ParallelTemperingOptimizer
  • ParticleSwarmOptimizer
  • EvolutionStrategyOptimizer
  • BayesianOptimizer
  • TreeStructuredParzenEstimators
  • DecisionTreeOptimizer
Progress Board

The progress board enables the visualization of search data during the optimization run. This will help you to understand what is happening during the optimization and give an overview of the explored parameter sets and scores.

  • filter_file
    • Possible parameter types: (None, True)
    • If the filter_file-parameter is True Hyperactive will create a file in the current directory, which allows the filtering of parameters or the score by setting an upper or lower bound.

The following script provides an example:

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston

from hyperactive import Hyperactive
# import the ProgressBoard
from hyperactive.dashboards import ProgressBoard

data = load_boston()
X, y = data.data, data.target


def model(opt):
    gbr = GradientBoostingRegressor(
        n_estimators=opt["n_estimators"],
        max_depth=opt["max_depth"],
        min_samples_split=opt["min_samples_split"],
    )
    scores = cross_val_score(gbr, X, y, cv=3)

    return scores.mean()


search_space = {
    "n_estimators": list(range(50, 150, 5)),
    "max_depth": list(range(2, 12)),
    "min_samples_split": list(range(2, 22)),
}

# create an instance of the ProgressBoard
progress_board = ProgressBoard()

hyper = Hyperactive()

# pass the instance of the ProgressBoard to .add_search(...)
hyper.add_search(
    model,
    search_space,
    n_iter=120,
    progress_board=progress_board,
)

# a terminal will open, which opens a dashboard in your browser
hyper.run()

Result Attributes

.best_para(objective_function)
  • objective_function

    • (callable)
  • returnes: dictionary

  • Parameter dictionary of the best score of the given objective_function found in the previous optimization run.

    example:

    {
      'x1': 0.2, 
      'x2': 0.3,
    }
.best_score(objective_function)
  • objective_function
    • (callable)
  • returns: int or float
  • Numerical value of the best score of the given objective_function found in the previous optimization run.
.results(objective_function)
  • objective_function

    • (callable)
  • returns: Pandas dataframe

  • The dataframe contains score, parameter information, iteration times and evaluation times of the given objective_function found in the previous optimization run.

    example:

    score x1 x2 x... eval_times iter_times
    0.756 0.1 0.2 ... 0.953 1.123
    0.823 0.3 0.1 ... 0.948 1.101
    ... ... ... ... ... ...
    ... ... ... ... ... ...

Roadmap

v2.0.0 ✔️
  • Change API
v2.1.0 ✔️
  • Save memory of evaluations for later runs (long term memory)
  • Warm start sequence based optimizers with long term memory
  • Gaussian process regressors from various packages (gpy, sklearn, GPflow, ...) via wrapper
v2.2.0 ✔️
  • Add basic dataset meta-features to long term memory
  • Add helper-functions for memory
    • connect two different model/dataset hashes
    • split two different model/dataset hashes
    • delete memory of model/dataset
    • return best known model for dataset
    • return search space for best model
    • return best parameter for best model
v2.3.0 ✔️
  • Tree-structured Parzen Estimator
  • Decision Tree Optimizer
  • add "max_sample_size" and "skip_retrain" parameter for sbom to decrease optimization time
v3.0.0 ✔️
  • New API
    • expand usage of objective-function
    • No passing of training data into Hyperactive
    • Removing "long term memory"-support (better to do in separate package)
    • More intuitive selection of optimization strategies and parameters
    • Separate optimization algorithms into other package
    • expand api so that optimizer parameter can be changed at runtime
    • add extensive testing procedure (similar to Gradient-Free-Optimizers)
v3.1.0 ✔️
  • Decouple number of runs from active processes (Thanks to PartiallyTyped)
v3.2.0 ✔️
  • Dashboard for visualization of search-data at runtime via streamlit (Progress-Board)
v3.3.0 ✔️
  • Early stopping
  • Shared memory dictionary between processes with the same objective function
Upcoming Features
  • "long term memory" for search-data storage and usage
  • Data collector tool to use inside the objective function
  • Dashboard for visualization of stored search-data
  • Data collector tool to store data (from inside the objective function) into csv- or sql-files

Experimental algorithms

The following algorithms are of my own design and, to my knowledge, do not yet exist in the technical literature. If any of these algorithms already exist I would like you to share it with me in an issue.

Random Annealing

A combination between simulated annealing and random search.


FAQ

Known Errors + Solutions

Read this before opening a bug-issue
  • Are you sure the bug is located in Hyperactive?

    The error might be located in the optimization-backend. Look at the error message from the command line. If one of the last messages look like this:

    • File "/.../gradient_free_optimizers/...", line ...

    Then you should post the bug report in:


    Otherwise you can post the bug report in Hyperactive

  • Do you have the correct Hyperactive version?

    Every major version update (e.g. v2.2 -> v3.0) the API of Hyperactive changes. Check which version of Hyperactive you have. If your major version is older you have two options:

    Recommended: You could just update your Hyperactive version with:

    pip install hyperactive --upgrade

    This way you can use all the new documentation and examples from the current repository.

    Or you could continue using the old version and use an old repository branch as documentation. You can do that by selecting the corresponding branch. (top right of the repository. The default is "master" or "main") So if your major version is older (e.g. v2.1.0) you can select the 2.x.x branch to get the old repository for that version.

MemoryError: Unable to allocate ... for an array with shape (...)

This is expected of the current implementation of smb-optimizers. For all Sequential model based algorithms you have to keep your eyes on the search space size:

search_space_size = 1
for value_ in search_space.values():
    search_space_size *= len(value_)
    
print("search_space_size", search_space_size)

Reduce the search space size to resolve this error.

TypeError: cannot pickle '_thread.RLock' object

Setting distribution to "joblib" may fix this problem:

hyper = Hyperactive(distribution="joblib")
Command line full of warnings

Very often warnings from sklearn or numpy. Those warnings do not correlate with bad performance from Hyperactive. Your code will most likely run fine. Those warnings are very difficult to silence.

Put this at the very top of your script:

def warn(*args, **kwargs):
    pass


import warnings

warnings.warn = warn

References

[dto] Scikit-Optimize


Citing Hyperactive

@Misc{hyperactive2021,
  author =   {{Simon Blanke}},
  title =    {{Hyperactive}: An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.},
  howpublished = {\url{https://github.com/SimonBlanke}},
  year = {since 2019}
}

License

LICENSE

Comments
  • Printing of Results from Runs - Preference to be able to provide additional parameters to objective function not through search space

    Printing of Results from Runs - Preference to be able to provide additional parameters to objective function not through search space

    Is your feature request related to a problem? Please describe. Yes, when using the "print results" parameter for verbosity within the Hyperactive initialization the parameter set printed includes all of the parameters used. In my case, one of the parameters within the search space is the dataframe that I am passing to the objective_function. I can't find any other way to pass the dataframe that the objective function is being performed on without including it in the search space.

    Describe the solution you'd like

    1. Either the ability to edit which parameters will be printed from the print_results within the parameter set or
    2. The ability to pass extra parameters to the objective function without including them in the search space (Preferred) This could perhaps be done through the "initialize" parameter if it was opened up to more arguments than grid, vertices, and random, perhaps **kwargs so that any user parameters could be added. As it is written now, if you were to add an additional parameter to add_search, it might complicate things rather than just having the optimizer only look within search_space, you would have to make that change everywhere and for each optimizer which is a ton of work, but if you were to add it for initialize, it may mean less changes.

    Describe alternatives you've considered I have considered not printing the results because the dataframe printed makes my results not as clean in the console.

    Additional context If I do not include the dataframe in the search parameters then I can't run my objective_function. But if I do, I can't include memory = True because the fact that my dataframe is now something stored means that it would consume a ton of memory very quickly.

    enhancement 
    opened by mlittmanabbvie 20
  • Type Error: unsopported operand type(s) for -: 'function and function'

    Type Error: unsopported operand type(s) for -: 'function and function'

    Look into the FAQ of the readme. Can the bug be resolved by one of those solutions? Not in the FAQs

    Describe the bug

    TypeError: unsupported operand type(s) for -: 'function' and 'function'

    Code to reproduce the behavior ''' from hyperactive import Hyperactive from hyperactive import RepulsingHillClimbingOptimizer, RandomAnnealingOptimizer

      class Parameters:
          def __init__(self):
              self.x = 5
              
      finp = Parameters()
      def ret_df():
          return df
      
      def func_minl(opts):
          return opts['slope'] + opts['exp']
        
      h = Hyperactive(["progress_bar", "print_results", "print_times"])
      search_space = {'exp':list(range(0, 5)),
                     'slope': list(np.arange(.001,10,step = .05)),
                     'freq_mult':list(np.arange(1,2.5,.005)),
                     'clust':[5],
                      'df': [ret_df],
                      'finp': [finp],
                      'asc': [False],
                      'use_pca':[False],
                      'last':[False],
                      'disc_type':['type']
                      }
      h.add_search(func_minl, search_space = search_space, n_iter = 10, optimizer = 
          RepulsingHillClimbingOptimizer(epsilon=0.05,
          distribution="normal",n_neighbours=3,rand_rest_p=0.03,repulsion_factor=3), n_jobs = 1, max_score = None,initialize = 
          {'warm_start' : [{'exp':2,'slope':5,'freq_mult':1.5,'clust':5,
                      'df': ret_df,
                      'finp': finp,
                      'asc': False,
                      'use_pca':False,
                      'last':False,
                      'disc_type':'type'
                      }]}, early_stopping = {'tol_rel':1, 'n_iter_no_change':3},random_state = 0, memory= True, memory_warm_start = None)
    

    '''

    Error message from command line When adding a dataframe as a parameter in the search space, by using a function as mentioned in the documentation, I am receiving an error --> 973 h.add_search(func_minl, search_space = search_space, n_iter = maxiter, optimizer = RepulsingHillClimbingOptimizer(epsilon=0.05, 974 distribution="normal", 975 n_neighbours=3,

    ~\Anaconda3\lib\site-packages\hyperactive\hyperactive.py in add_search(self, objective_function, search_space, n_iter, search_id, optimizer, n_jobs, initialize, max_score, early_stopping, random_state, memory, memory_warm_start, progress_board) 148 self.check_list(search_space) 149 --> 150 optimizer.init(search_space, initialize, progress_collector) 151 152 self._add_search_processes(

    ~\Anaconda3\lib\site-packages\hyperactive\optimizers\gfo_wrapper.py in init(self, search_space, initialize, progress_collector) 76 self.trafo = HyperGradientTrafo(search_space) 77 ---> 78 initialize = self.trafo.trafo_initialize(initialize) 79 search_space_positions = self.trafo.search_space_positions 80

    ~\Anaconda3\lib\site-packages\hyperactive\optimizers\hyper_gradient_trafo.py in trafo_initialize(self, initialize) 113 for warm_start_ in warm_start: 114 value = self.para2value(warm_start_) --> 115 position = self.value2position(value) 116 pos_para = self.value2para(position) 117

    ~\Anaconda3\lib\site-packages\hyperactive\optimizers\hyper_gradient_trafo.py in value2position(self, value) 16 position = [] 17 for n, space_dim in enumerate(self.search_space_values): ---> 18 pos = np.abs(value[n] - np.array(space_dim)).argmin() 19 position.append(int(pos)) 20 TypeError: unsupported operand type(s) for -: 'function' and 'function'

    System information:

    • OS Platform and Distribution
    • Windows 10
    • Python version 3.8.8
    • Hyperactive version 3.3.2

    Additional context

    bug 
    opened by mlittmanabbvie 12
  • para2value - 'NoneType' object is not subscriptable

    para2value - 'NoneType' object is not subscriptable

    Describe the bug In some optimizations, I get this error. I can run the same optimization. Sometimes it works perfectly and other times it suddenly throws this. Couldn't retrace why it's None.

    Error message from command line

    ============================== EXCEPTION TRACEBACK:
      File "/usr/local/bin/jesse", line 33, in <module>
        sys.exit(load_entry_point('jesse', 'console_scripts', 'jesse')())
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/home/home/src/jesse/jesse/__init__.py", line 379, in optimize_hyperactive
        optimize_mode_hyperactive(start_date, finish_date, optimal_total, cpu, optimizer, iterations)
      File "/home/home/src/jesse/jesse/modes/optimize_hyperactive_mode/__init__.py", line 307, in optimize_mode_hyperactive
        optimizer.run()
      File "/home/home/src/jesse/jesse/modes/optimize_hyperactive_mode/__init__.py", line 268, in run
        hyper.run()
      File "/usr/local/lib/python3.9/site-packages/hyperactive/hyperactive.py", line 201, in run
        self.results_list = run_search(self.process_infos, self.distribution)
      File "/usr/local/lib/python3.9/site-packages/hyperactive/run_search.py", line 42, in run_search
        results_list = single_process(_process_, process_infos)
      File "/usr/local/lib/python3.9/site-packages/hyperactive/distribution.py", line 10, in single_process
        results = [process_func(**search_processes_infos[0])]
      File "/usr/local/lib/python3.9/site-packages/hyperactive/process.py", line 25, in _process_
        optimizer.search(
      File "/usr/local/lib/python3.9/site-packages/hyperactive/optimizers.py", line 167, in search
        self._convert_results2hyper()
      File "/usr/local/lib/python3.9/site-packages/hyperactive/optimizers.py", line 80, in _convert_results2hyper
        value = self.trafo.para2value(self.optimizer.best_para)
      File "/usr/local/lib/python3.9/site-packages/hyperactive/hyper_gradient_trafo.py", line 52, in para2value
        value.append(para[para_name])
    =========================================================================
    
     Uncaught Exception: TypeError: 'NoneType' object is not subscriptable
    

    System information:

    • OS Platform and Distribution - Ubuntu (Docker)
    • Python version 3.8
    • Hyperactive version 3.0.5.1
    bug 
    opened by cryptocoinserver 12
  • Population size

    Population size

    Look into the FAQ of the readme. Can the bug be resolved by one of those solutions?

    Describe the bug

    Code to reproduce the behavior

    Error message from command line

    System information:

    • OS Platform and Distribution
    • Python version
    • Hyperactive version

    Additional context Dear Simon, i have one question relating the population size of PSO, search_space, initialize={"grid": 4, "random": 2, "vertices": 4}, population=10, inertia=0.5, cognitive_weight=0.5, social_weight=0.5, temp_weight=0.2, rand_rest_p=0.03, ) i can not increase the population size which is automatically reduced to 10 when i modifies to 20 or greater ? i tried to midifiy on file search.py

    if random_state is None:
        random_state = np.random.randint(0, high=2 ** 32 - 2, dtype=np.int64)
    

    but it is not changed anything, so can you help me to debug this issue please ? I would investigate the population effect on time-consuming and cost function of my problem. Thank you for your help

    bug 
    opened by vanquanTRAN 8
  • Data type

    Data type

    Hello and tnx for this great library. in CNN networks, most of the time we used imageDatagenerator which save data and label in a data frame and we have not a separate x and y. this is same for train, validation and test sets. this library can support this? or we need separate data and labels like in examples. if this support dataframeiterator type (output of imageDatagenerator) will be amazing.

    question 
    opened by aminamani10 7
  • information displayed in the optimization process

    information displayed in the optimization process

    Thank you for your code sharing and your extraordinary development. I try to modify your code to show each iteration versus best score, that help us to show a graph n_iter vs best score. print('Iteration {}: Best Cost = {}'.format(best_iter, best_score=)). But i cant not succeed, Could you help me to handle this issue ? Thank you for your help.

    enhancement 
    opened by vanquanTRAN 7
  • Question of Particle Swarm Optimizer

    Question of Particle Swarm Optimizer

    Hello, I have a question about the Particle Swarm Optimization for hyperparameter optimization tasks. My understanding is that the number of particles multiplied by the number of iterations equals the total number of models run, but I can't change the number of particles. I saw that the default number of particles for PSO is 10, but when I set the number of iterations to 5, the model only runs 5 times, not 50. Also, I saw that I can add the 'population' parameter to change the number of particles, but it didn't work when I tried it. Here is my code, is there a problem?

     def cnn(params):
        nn = tf.keras.models.Sequential(
            [
                tf.keras.layers.Conv1D(filters=params["filters1"],kernel_size=params["kernel_size1"],activation=params["activation"],input_shape=cnn_train_data.shape[1:],padding="same"),
                tf.keras.layers.MaxPooling1D(pool_size=params["pool_size1"]),
                tf.keras.layers.Conv1D(filters=params["filters2"],kernel_size=params["kernel_size2"],activation=params["activation"],padding="same"),
                tf.keras.layers.MaxPooling1D(pool_size=params["pool_size2"]),
                tf.keras.layers.LSTM(units=params["units1"]),
                tf.keras.layers.Dense(units=params["units2"],activation=params["activation"]),
                tf.keras.layers.Dropout(rate=params["rate"]),
                tf.keras.layers.Dense(label.max() + 1, activation='softmax'),
            ]
        )
    
        nn.compile(optimizer=params["optimizer"], loss='categorical_crossentropy', metrics=["accuracy"])
        nn.fit(cnn_train_data, dnn_train_label, epochs=params["epochs"],batch_size=params["batch_size"])
        _, score = nn.evaluate(x=cnn_test_data, y=dnn_test_label)
    
        return score
    
    
    search_space = {
        "optimizer": ["adam","sgd"],
        "batch_size": list(range(100,1000)),
        "epochs" :list(range(10, 100)),
        "activation": ["sigmoid", "relu", "tanh"],
        "filters1": list(range(4, 256)),
        "pool_size1": list(range(1, 4)),
        "kernel_size1": list(range(1, 7)),
        "filters2": list(range(4, 256)),
        "pool_size2": list(range(1, 4)),
        "kernel_size2": list(range(1, 7)),
        "units1": list(range(1,256)),
        "units2": list(range(1,128)),
        "rate": [0.1,0.2,0.3,0.4],
    }
    
    optimizer = ParticleSwarmOptimizer(
        inertia=0.4,
        cognitive_weight=2.0,
        social_weight=2.0,
        temp_weight=0.3,
        rand_rest_p=0.05,
        population=5,
    )
    
    
    hyper = Hyperactive()
    hyper.add_search(cnn, search_space, optimizer=optimizer,n_iter=10)
    hyper.run()
    
    question 
    opened by a7258258 6
  • New feature: Save search-data during optimization run

    New feature: Save search-data during optimization run

    The user @DavidFricker asked for a way to save search-data during an ongoing optimization run in the issue SimonBlanke/Hyperactive#12. This is a feature i would like to implement as a helper-class in Hyperactive.

    enhancement help wanted 
    opened by SimonBlanke 6
  • Feature: Passing extra parameters to the optimization function

    Feature: Passing extra parameters to the optimization function

    Is your feature request related to a problem? Please describe. There are situations in which the optimization function is governed by different external parameters - e.g. if the optimization score is calculated as s = alpha * x + beta, where depending on the initial conditions, alpha and beta are different, it can become handy to be able to pass those as either values of the optimization function or of the opt input variable.

    Describe alternatives you've considered Right now this can be done by creating a lambda function depending on the initial condition that wraps the hyperactive optimization call - e.g. optim_func = lambda opt: true_optim_func(opt, alpha=1, beta=external_var), and changing the lambda dynamically. This, however, does now work if we want to use n_jobs!=1 as mp.Pool cannot serialize the lambda function.

    Describe the solution you'd like

    • Use multiprocess, dill, pathos or the like to allow serialization of the method and allowing to choose between a process vs thread model.
    • Allow passing additional arguments to the optimization function.

    Additional context This would even allow extra functionality like optimizing symbolic functions that are externally referenced.

    edit: assessing if this type of alternative works.

    enhancement 
    opened by 23pointsNorth 5
  • No score variance over 50 iterations despite multiple parameters switched

    No score variance over 50 iterations despite multiple parameters switched

    Attempted tuning a xgboost binary classifier on tf-idf data adjusting n_estimators, max_depth, and learning_rate and there was zero variation in the score for each of 50 iterations. When I manually tweak parameters and run a single training instance manually, I achieve score variations. Note: I have also tried this with the default optimizer for 20 iterations and different ranges for the parameter tuning, and it gave me the same results: the score is always 0.6590446358653093.

    SYSTEM DETAILS: Amazon SageMaker Hyperactive ver: 3.0.5.1 Python ver: 3.6.13

    Here is my code:

    freq_df, y_labels = jc.prep_train_data('raw_data.pkl', remove_stopwords=False)
    
    def model(opt):
        clf_xgb = xgb.XGBClassifier(objective='binary:logistic',
                                #eta=0.4,
                                #max_depth=8,
                                subsample=0.5,
                                base_score=np.mean(y_labels),
                                eval_metric = 'logloss',
                                missing=None,
                                use_label_encoder=False,
                                seed=42)
        
        scores = cross_val_score(clf_xgb, freq_df, y_labels, cv=5) # default is 5, hyperactive example is 3
    
        return scores.mean()
    
    # Configure the range of hyperparameters we want to test out
    search_space = {
        "n_estimators": list(range(500, 5000, 100)),
        "max_depth": list(range(6, 12)),
        "learning_rate": [0.1, 0.3, 0.4, 0.5, 0.7],
    }
    
    # Configure the optimizer
    optimizer = SimulatedAnnealingOptimizer(
        epsilon=0.1,
        distribution="laplace",
        n_neighbours=4,
        rand_rest_p=0.1,
        p_accept=0.15,
        norm_factor="adaptive",
        annealing_rate=0.999,
        start_temp=0.8)
    
    # Execute optimization
    hyper = Hyperactive()
    hyper.add_search(model, search_space, n_iter=50, optimizer=optimizer)
    hyper.run()
    
    # Print-out the results and save them to a dataframe
    results_filename = "xgboost_hyperactive_results.csv"
    
    search_data = hyper.results(model)
    search_data.to_csv(results_filename, index=0)
    
    question 
    opened by suciokhan 5
  • Decouple number of runs from active processes

    Decouple number of runs from active processes

    Hi Simon. This is what I had in mind, basically it allows joblib or multiprocessing to use to use n_jobs processes to execute n_run "jobs". Note that it is a breaking change.

    Solves:

    Using n_jobs to get a large number of runs results in spawning too many processes which end up filling the memory and cause thrashing. The patch limits the number of active processes to n_jobs while still allowing hyperactive to execute n_runs times the optimization process.

    opened by PartiallyTyped 5
  • Progress Bar visual error when running in parallel

    Progress Bar visual error when running in parallel

    The tqdm-based progress-bar shows visual errors when running in parallel (with multiprocessing pool). It appears as if new progress-bars are created but not finished. I am not sure if this needs to be fixed in Hyperactive or tqdm.

    This problem seems to be related to the following issues tqdm/tqdm#811, tqdm/tqdm#285

    I tried a few things to solve this but nothing worked:

    Passing a lock to multiprocessing Pool:

    Pool(initializer=tqdm.set_lock, initargs=(tqdm.get_lock(),))
    

    If I initialize the progress-bars before starting the multiprocessing by passing them to the parallel function I get the following error:

    TypeError: cannot pickle '_io.TextIOWrapper' object
    
    bug 
    opened by SimonBlanke 1
  • Change Optimization paramters at runtime

    Change Optimization paramters at runtime

    This adds a way to change the parameters of the optimization algorithms during runtime (e.g. epsilon from the hill-climbing optimizer). My idea is to enable this within the objective function. This way the user can change parameters based on conditions/data each time the objective function is called (evaluated). As seen in issue SimonBlanke/Hyperactive#49 there are already some parameters that can be changed via the objective-function argument, but this is not standardized, tested or documented, yet.

    enhancement 
    opened by SimonBlanke 0
  • add ray multiprocessing support

    add ray multiprocessing support

    The popular python package ray has a multiprocessing feature that could be used to run optimization-processes in parallel:

    from ray.util.multiprocessing import Pool
    
    def f(index):
        return index
    
    pool = Pool()
    for result in pool.map(f, range(100)):
        print(result)
    

    This Pool-API is very similar to the regular multiprocessing. I will look further into if it's possible to integrate Ray so that its features can be used in Hyperactive.

    enhancement 
    opened by SimonBlanke 0
  • New feature: Optimization Strategies

    New feature: Optimization Strategies

    I would like to introduce a new feature to Hyperactive to chain together multiple optimization algorithms. This will be called an Optimization Strategy in the future.

    The API for this feature could look like this:

    opt_strat = OptimizationStrategy()
    opt_strat.add_optimizer(RandomSearchOptimizer(), duration=0.5)
    opt_strat.add_optimizer(HillClimbingOptimizer(), duration=0.5)
    
    hyper = Hyperactive()
    hyper.add_search(model, search_space, n_iter=20, optimizer=opt_strat)
    hyper.run()
    

    The duration will be the fraction of n_iter passed to add_search(...). Each optimizer will automatically pass the memory to the next one.

    This feature-idea is in an early stage and might change in the future.

    enhancement 
    opened by SimonBlanke 1
  • New feature: save optimizer object to continue optimization run at a later time.

    New feature: save optimizer object to continue optimization run at a later time.

    Explanation

    It would be very useful if Hyperactive has the ability to save the optimization backend (via pickle, dill, cloudpickle, ...) to disk and load it later into Hyperactive to continue the optimization run.

    So the goal is, that the optimizer can be saved during one code execution and loaded at a later time during a second code execution. The optimization run should behave as if there was no break between the two optimization runs.

    The optimization backend of Hyperactive is Gradient-Free-Optimizers. So I first confirmed that GFO optimizer-objects can be saved and loaded in two different code executions. In the following script the optimizer-object is saved if it does not exist, yet. This code must then be executed a second time. The optimizer-object is loaded and continues the search.

    Save and load GFO-optimizer

    import os
    import numpy as np
    
    from gradient_free_optimizers import RandomSearchOptimizer
    
    import dill as pkl
    
    file_name = "./optimizer.pkl"
    
    def load(file_name):
        if os.path.isfile(file_name):
            with open(file_name, "rb") as pickle_file:
                return pkl.load(pickle_file)
        else:
            print("---> Warning: No file found in path:", file_name)
    
    def save(file_name, data):
        with open(file_name, "wb") as f:
            pkl.dump(data, f)
    
    
    def parabola_function(para):
        loss = para["x"] * para["x"]
        return -loss
    
    search_space = {"x": np.arange(-10, 10, 0.1)}
    
    opt_loaded = load(file_name)
    if opt_loaded:
        print("Optimizer loaded!")
        opt_loaded.search(parabola_function, n_iter=100)
    
    else:
        opt = RandomSearchOptimizer(search_space)
        opt.search(parabola_function, n_iter=10000)
    
        save(file_name, opt)
        print("Optimizer saved!")
    

    The code above works fine!

    So lets try to now access the optimizer objects from within Hyperactive, save it and load it during a second code execution:

    Save and load optimizer (GFO-wrapper) from within Hyperactive

    import os
    import numpy as np
    
    from hyperactive import Hyperactive
    
    import dill as pkl
    
    file_name = "./optimizer.pkl"
    
    def load(file_name):
        if os.path.isfile(file_name):
            with open(file_name, "rb") as pickle_file:
                return pkl.load(pickle_file)
        else:
            print("---> Warning: No file found in path:", file_name)
    
    def save(file_name, data):
        with open(file_name, "wb") as f:
            pkl.dump(data, f)
    
    
    def parabola_function(para):
        loss = para["x"] * para["x"]
        return -loss
    
    search_space = {"x": list(np.arange(-10, 10, 0.1))}
    
    opt_loaded = load(file_name)
    if opt_loaded:
        print("Optimizer loaded!")
        # do stuff
    
    else:
        hyper = Hyperactive()
        hyper.add_search(parabola_function, search_space, n_iter=100)
        hyper.run()
    
        # access the optimizer attribute from the list of results
        optimizer = hyper.opt_pros[0]._optimizer  # not official API
    
        save(file_name, optimizer)
        print("Optimizer saved!")
    

    If you executed the code above two times you will probably encounter the error message further down. The reason why this error occurs is a mystery to me. There is a FileNotFoundError even though the file is present. I do not have expert knowledge about pickling processes/functions, so I would be very grateful to get help with this problem.

    If you take a look at the type of hyper.opt_pros[0]._optimizer from Hyperactive you can see, that it is the same GFO optimizer-object as in the GFO stand-alone-code (the first example).

    My guess would be, that the optimizer-class in Hyperactive receives parameters that cannot be pickled by dill (or couldpickle) for some reason. The source code where GFO receives parameters within Hyperactive can be found here.

    Traceback (most recent call last):
      File "hyper_pkl_optimizer.py", line 33, in <module>
        opt_loaded = load(file_name)
      File "hyper_pkl_optimizer.py", line 15, in load
        return pkl.load(pickle_file)
      File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/dill/_dill.py", line 373, in load
        return Unpickler(file, ignore=ignore, **kwds).load()
      File "/home/simon/anaconda3/envs/dev/lib/python3.8/site-packages/dill/_dill.py", line 646, in load
        obj = StockUnpickler.load(self)
      File "/home/simon/anaconda3/envs/dev/lib/python3.8/multiprocessing/managers.py", line 959, in RebuildProxy
        return func(token, serializer, incref=incref, **kwds)
      File "/home/simon/anaconda3/envs/dev/lib/python3.8/multiprocessing/managers.py", line 809, in __init__
        self._incref()
      File "/home/simon/anaconda3/envs/dev/lib/python3.8/multiprocessing/managers.py", line 863, in _incref
        conn = self._Client(self._token.address, authkey=self._authkey)
      File "/home/simon/anaconda3/envs/dev/lib/python3.8/multiprocessing/connection.py", line 502, in Client
        c = SocketClient(address)
      File "/home/simon/anaconda3/envs/dev/lib/python3.8/multiprocessing/connection.py", line 630, in SocketClient
        s.connect(address)
    FileNotFoundError: [Errno 2] No such file or directory
    

    So the goal is now to fix the problem with the second code example and enable the correct saving and loading of the optimizer-object from Hyperactive.

    enhancement help wanted 
    opened by SimonBlanke 0
Releases(4.3)
Owner
Simon Blanke
Physicist, software developer for driving assistance systems and machine learning enthusiast.
Simon Blanke
MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks Introduction This repo contains the pytorch impl

Meta Research 38 Oct 10, 2022
Apache Flink

Apache Flink Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. Learn more about Flin

The Apache Software Foundation 20.4k Dec 30, 2022
Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

🐯 SynthTIGER: Synthetic Text Image GEneratoR Official implementation of SynthTIGER | Paper | Datasets Moonbin Yim1, Yoonsik Kim1, Han-cheol Cho1, Sun

Clova AI Research 256 Jan 05, 2023
A Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Training Data》

RangeLoss Pytorch This is a Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Trai

Youzhi Gu 7 Nov 27, 2021
Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Bag of Tricks for Natural Policy Gradient Reinforcement Learning [ArXiv] Setup Python 3.8.0 pip install -r req.txt Mujoco 200 license Main Files main.

Brennan Gebotys 1 Oct 10, 2022
Read number plates with https://platerecognizer.com/

HASS-plate-recognizer Read vehicle license plates with https://platerecognizer.com/ which offers free processing of 2500 images per month. You will ne

Robin 69 Dec 30, 2022
Learning from Synthetic Shadows for Shadow Detection and Removal [Inoue+, IEEE TCSVT 2020].

Learning from Synthetic Shadows for Shadow Detection and Removal (IEEE TCSVT 2020) Overview This repo is for the paper "Learning from Synthetic Shadow

Naoto Inoue 67 Dec 28, 2022
[UNMAINTAINED] Automated machine learning for analytics & production

auto_ml Automated machine learning for production and analytics Installation pip install auto_ml Getting started from auto_ml import Predictor from au

Preston Parry 1.6k Jan 02, 2023
AITUS - An atomatic notr maker for CYTUS

AITUS an automatic note maker for CYTUS. 利用AI根据指定乐曲生成CYTUS游戏谱面。 效果展示:https://www

GradiusTwinbee 6 Feb 24, 2022
VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries

VACA Code repository for the paper "VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries (arXiv)". The impleme

Pablo Sánchez-Martín 16 Oct 10, 2022
A library of scripts that interact with the PythonTurtle module to create games, drawings, and more

TurtleLib TurtleLib is a library of scripts that interact with the PythonTurtle module to create games, drawings, and more! Using the Scripts Copy or

1 Jan 15, 2022
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 03, 2023
Evaluation suite for large-scale language models.

This repo contains code for running the evaluations and reproducing the results from the Jurassic-1 Technical Paper (see blog post), with current support for running the tasks through both the AI21 S

71 Dec 17, 2022
Multiple style transfer via variational autoencoder

ST-VAE Multiple style transfer via variational autoencoder By Zhi-Song Liu, Vicky Kalogeiton and Marie-Paule Cani This repo only provides simple testi

13 Oct 29, 2022
Learning Representations that Support Robust Transfer of Predictors

Transfer Risk Minimization (TRM) Code for Learning Representations that Support Robust Transfer of Predictors Prepare the Datasets Preprocess the Scen

Yilun Xu 15 Dec 07, 2022
The official PyTorch code implementation of "Personalized Trajectory Prediction via Distribution Discrimination" in ICCV 2021.

Personalized Trajectory Prediction via Distribution Discrimination (DisDis) The official PyTorch code implementation of "Personalized Trajectory Predi

25 Dec 20, 2022
Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)

Active Learning for Deep Object Detection via Probabilistic Modeling This repository is the official PyTorch implementation of Active Learning for Dee

NVIDIA Research Projects 130 Jan 06, 2023
PyTorch implementation of Pointnet2/Pointnet++

Pointnet2/Pointnet++ PyTorch Project Status: Unmaintained. Due to finite time, I have no plans to update this code and I will not be responding to iss

Erik Wijmans 1.2k Dec 29, 2022
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating

Jiefeng Chen 13 Nov 21, 2022
Simple transformer model for CIFAR10

CIFAR-Transformer Simple transformer model for CIFAR10. Reference: https://www.tensorflow.org/text/tutorials/transformer https://github.com/huggingfac

9 Nov 07, 2022