Use evolutionary algorithms instead of gridsearch in scikit-learn

Last update: Jan 03, 2023

Related tags

Overview

sklearn-deap

Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameters for your estimator. Instead of trying out every possible combination of parameters, evolve only the combinations that give the best results.

Here is an ipython notebook comparing EvolutionaryAlgorithmSearchCV against GridSearchCV and RandomizedSearchCV.

It's implemented using deap library: https://github.com/deap/deap

Install

To install the library use pip:

pip install sklearn-deap

or clone the repo and just type the following on your shell:

python setup.py install

Usage examples

Example of usage:

import sklearn.datasets
import numpy as np
import random

data = sklearn.datasets.load_digits()
X = data["data"]
y = data["target"]

from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold

paramgrid = {"kernel": ["rbf"],
             "C"     : np.logspace(-9, 9, num=25, base=10),
             "gamma" : np.logspace(-9, 9, num=25, base=10)}

random.seed(1)

from evolutionary_search import EvolutionaryAlgorithmSearchCV
cv = EvolutionaryAlgorithmSearchCV(estimator=SVC(),
                                   params=paramgrid,
                                   scoring="accuracy",
                                   cv=StratifiedKFold(n_splits=4),
                                   verbose=1,
                                   population_size=50,
                                   gene_mutation_prob=0.10,
                                   gene_crossover_prob=0.5,
                                   tournament_size=3,
                                   generations_number=5,
                                   n_jobs=4)
cv.fit(X, y)

Output:

    Types [1, 2, 2] and maxint [0, 24, 24] detected
    --- Evolve in 625 possible combinations ---
    gen	nevals	avg     	min    	max
    0  	50    	0.202404	0.10128	0.962716
    1  	26    	0.383083	0.10128	0.962716
    2  	31    	0.575214	0.155259	0.962716
    3  	29    	0.758308	0.105732	0.976071
    4  	22    	0.938086	0.158041	0.976071
    5  	26    	0.934201	0.155259	0.976071
    Best individual is: {'kernel': 'rbf', 'C': 31622.776601683792, 'gamma': 0.001}
    with fitness: 0.976071229827

Example for maximizing just some function:

from evolutionary_search import maximize

def func(x, y, m=1., z=False):
    return m * (np.exp(-(x**2 + y**2)) + float(z))

param_grid = {'x': [-1., 0., 1.], 'y': [-1., 0., 1.], 'z': [True, False]}
args = {'m': 1.}
best_params, best_score, score_results, _, _ = maximize(func, param_grid, args, verbose=False)

Output:

best_params = {'x': 0.0, 'y': 0.0, 'z': True}
best_score  = 2.0
score_results = (({'x': 1.0, 'y': -1.0, 'z': True}, 1.1353352832366128),
 ({'x': -1.0, 'y': 1.0, 'z': True}, 1.3678794411714423),
 ({'x': 0.0, 'y': 1.0, 'z': True}, 1.3678794411714423),
 ({'x': -1.0, 'y': 0.0, 'z': True}, 1.3678794411714423),
 ({'x': 1.0, 'y': 1.0, 'z': True}, 1.1353352832366128),
 ({'x': 0.0, 'y': 0.0, 'z': False}, 2.0),
 ({'x': -1.0, 'y': -1.0, 'z': False}, 0.36787944117144233),
 ({'x': 1.0, 'y': 0.0, 'z': True}, 1.3678794411714423),
 ({'x': -1.0, 'y': -1.0, 'z': True}, 1.3678794411714423),
 ({'x': 0.0, 'y': -1.0, 'z': False}, 1.3678794411714423),
 ({'x': 1.0, 'y': -1.0, 'z': False}, 1.1353352832366128),
 ({'x': 0.0, 'y': 0.0, 'z': True}, 2.0),
 ({'x': 0.0, 'y': -1.0, 'z': True}, 2.0))

Comments

Added cv_results. Fixed some documentation.

In init.py I added cv_results_ based on the logbook generated in _fit. This is a compatability feature with sklearn GridSearch and the like in interest of consistency.

Other than that, I added a test file I used outside of ipython notebook which could eventually use the true python test library, and fixed some errors in the notebook which look like simple version errors.

opened by ryanpeach 16
`.cv_results_` does not include info from first generation

I think there's a fenceposting/off-by-one error somewhere.

When I pass in generations_number = 1, it's actually 0-indexed, and gives me 2 generations. Similarly, if I pass in 2 generations, I actually get 3.

Then, when I examine the cv_results_ property, I noticed that I only get the results from all generations after the first generation (the 0-indexed generation).

This is most apparently if you set generations_number = 1.

I looked through the code quickly, but didn't see any obvious source of it. Hopefully someone who knows the library can find it more easily!

opened by ClimbsRocks 12
Better Parallelism

I wrote this because parallelism wasn't working on my Windows laptop. So I did some reading and found out, at least on windows, you need to declare your Pool from within a if __name__=="__main__" structure in order to prevent recurrent execution. Deap also identifies other kinds of multiprocessing maps you may want to pass to it, so now the user has every option to implement parallelism however they want by passing their "map" function to pmap.

Yes, it's divergent from sklearn, but sklearn has a fully implemented special parallelism library for their n_jobs parameters that would be both a challenge and potentially incompatible with deap, so what I have implemented is deap's way of doing things.

opened by ryanpeach 8
Error Message While Calling fit() Method

AttributeError: can't set attribute

It pointed out the error come from fit( ) method as

def fit(self, X, y=None): self.best_estimator_ = None --> self.best_score_ = -1 self.best_params_ = None for possible_params in self.possible_params: self.fit(X, y, possible_params) if self.refit: self.best_estimator = clone(self.estimator) self.best_estimator_.set_params(**self.best_params_) self.best_estimator_.fit(X, y)

opened by tasyacute 8
Can't get attribute 'Individual'

Trying to test example code on Indian Pima Diabetes dataset in Jupyter notebook 5.0.0, Python 3.6, I'm getting an error. Kernel is busy but no processes are running. Turning on debag mode shows: ... File "c:\users\szymon\anaconda3\envs\tensorflow\lib\multiprocessing\queues.py", line 345, in get return ForkingPickler.loads(res) AttributeError: Can't get attribute 'Individual' on <module 'deap.creator' from 'c:\\users\\szymon\\anaconda3\\envs\\tensorflow\\lib\\site-packages\\deap\\creator.py'> File "c:\users\szymon\anaconda3\envs\tensorflow\lib\multiprocessing\pool.py", line 108, in worker task = get()

opened by szymonk92 7
Doubts about encoding correctness
I have some doubts about current parameter encoding (to chromosome) correctness.

Let's assume that we have 2 categorical parameters f1 and f2:

Enc f1 f2 0000 a 1 0001 a 2 0010 a 3 0011 a 4 0100 a 5 0101 b 1 0110 b 2 0111 b 3 1000 b 4 1001 b 5 1010 c 1 1011 c 2 1100 c 3 1101 c 4 1110 c 5

If we use any crossover operator, for example let's do 2 points crossover between some points:

(a, 4) 0011 0111 (b, 3) x (c, 3) 1100 1000 (b, 4)

After crossover we've got b, but both parents don't have b as first parameter.
opened by olologin 7

Can't instantiate abstract class EvolutionaryAlgorithmSearchCV with abstract methods _run_search

I use Python 2.7.15 to run test.py and I found an error TypeError: Can't instantiate abstract class EvolutionaryAlgorithmSearchCV with abstract methods _run_search

Would you please help correct anything I missed, bellow is all packages I installed

Package                            Version
---------------------------------- -----------
appdirs                            1.4.3
appnope                            0.1.0
asn1crypto                         0.24.0
attrs                              18.2.0
Automat                            0.7.0
backports-abc                      0.5
backports.shutil-get-terminal-size 1.0.0
bleach                             2.1.4
certifi                            2018.8.24
cffi                               1.11.5
configparser                       3.5.0
constantly                         15.1.0
cryptography                       2.3.1
Cython                             0.28.5
deap                               1.2.2
decorator                          4.3.0
entrypoints                        0.2.3
enum34                             1.1.6
functools32                        3.2.3.post2
futures                            3.2.0
html5lib                           1.0.1
hyperlink                          18.0.0
idna                               2.7
incremental                        17.5.0
ipaddress                          1.0.22
ipykernel                          4.10.0
ipython                            5.8.0
ipython-genutils                   0.2.0
ipywidgets                         7.4.2
Jinja2                             2.10
jsonschema                         2.6.0
jupyter                            1.0.0
jupyter-client                     5.2.3
jupyter-console                    5.2.0
jupyter-core                       4.4.0
MarkupSafe                         1.0
mistune                            0.8.3
mkl-fft                            1.0.6
mkl-random                         1.0.1
nbconvert                          5.3.1
nbformat                           4.4.0
notebook                           5.6.0
numpy                              1.15.2
pandas                             0.23.4
pandocfilters                      1.4.2
pathlib2                           2.3.2
pexpect                            4.6.0
pickleshare                        0.7.4
pip                                10.0.1
prometheus-client                  0.3.1
prompt-toolkit                     1.0.15
ptyprocess                         0.6.0
pyasn1                             0.4.4
pyasn1-modules                     0.2.2
pycparser                          2.19
Pygments                           2.2.0
pyOpenSSL                          18.0.0
python-dateutil                    2.7.3
pytz                               2018.5
pyzmq                              17.1.2
qtconsole                          4.4.1
scandir                            1.9.0
scikit-learn                       0.20.0
scipy                              1.1.0
Send2Trash                         1.5.0
service-identity                   17.0.0
setuptools                         40.2.0
simplegeneric                      0.8.1
singledispatch                     3.4.0.3
six                                1.11.0
sklearn-deap                       0.2.2
terminado                          0.8.1
testpath                           0.3.1
tornado                            5.1.1
traitlets                          4.3.2
Twisted                            17.5.0
wcwidth                            0.1.7
webencodings                       0.5.1
wheel                              0.31.1
widgetsnbextension                 3.4.2
zope.interface                     4.5.0

opened by dongchirua 6

Sklearn Depreciation

cross_validation has been replaced with model_selection and will soon be depreciated. Already getting a warning. Tried to simply change this but they have moved a few other things around and also changed how some functions seem to fundamentally work.

opened by ryanpeach 6
What does it take to parallelize the search?

Great tool! Allows me to drastically expand the search space over using GridSearchCV. Really promising for deep learning, as well as standard scikit-learn interfaced ML models.

Because I'm searching over a large space, this obviously involves training a bunch of models, and doing a lot of computations. scikit-learn's model training parallelizes this to ease the pain somewhat.

I tried using the toolbox.register('map', pool.map) approach as described out by deap, but didn't see any parallelization.

Is there a different approach I should take instead? Or is that a feature that hasn't been built yet? If so, what are the steps needed to get parallelization working?

opened by ClimbsRocks 5
What's wrong with my datas ?

With the following code 👍 paramgrid = {"n_jobs": -1, "max_features":['auto','log2'], "n_estimators":[10,100,500,1000], "min_samples_split" : [2,5,10], "max_leaf_nodes" : [1,5,10,20,50] }

#min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None

cv = EvolutionaryAlgorithmSearchCV(estimator=RandomForestClassifier(), params=paramgrid, scoring="accuracy", cv=StratifiedKFold(y, n_folds=10), verbose=True, population_size=50, gene_mutation_prob=0.10, tournament_size=3, generations_number=10 )

cv.fit(X, y)

and having the followning error :

TypeErrorTraceback (most recent call last) in () 20 ) 21 ---> 22 cv.fit(X,y)

/root/anaconda2/lib/python2.7/site-packages/evolutionary_search/init.pyc in fit(self, X, y) 276 self.best_params_ = None 277 for possible_params in self.possible_params: --> 278 self.fit(X, y, possible_params) 279 if self.refit: 280 self.best_estimator = clone(self.estimator)

/root/anaconda2/lib/python2.7/site-packages/evolutionary_search/init.pyc in _fit(self, X, y, parameter_dict) 301 toolbox = base.Toolbox() 302 --> 303 name_values, gene_type, maxints = _get_param_types_maxint(parameter_dict) 304 if self.gene_type is None: 305 self.gene_type = gene_type

/root/anaconda2/lib/python2.7/site-packages/evolutionary_search/init.pyc in _get_param_types_maxint(params) 33 types = [] 34 for _, possible_values in name_values: ---> 35 if isinstance(possible_values[0], float): 36 types.append(param_types.Numerical) 37 else:

TypeError: 'int' object has no attribute 'getitem'

opened by M4k34B3tt3rW0r1D 4
Python3 compatibility is broken

There are two old-style print statements in __init__.py that break compatibility with Python 3.

I added brackets to turn them into function calls and that seemed to fix it, but I have not done extensive testing to see if there are any other compatibility issues.

opened by davekirby 4
ValueError when calling cv.fit() for optimising a neural network

Hi,

I am trying to optimise a neural network (Keras, TensorFlow), but I'm getting an error: ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

I have checked my input data for NaNs, infities and large or small values. There aren't any. I have forced the input data to be np.float32 before passing it to .fit().

I've used this algorithm before without any problems or special data prep, so I'm not sure where there error is creeping in.

the relavent bit of the code is: codetxt.txt

I should also say that when I manually try to just .fit() to my model, it works fine. The issue is something to do with how the cross valdation is working.

The full traceback is:

Traceback (most recent call last): File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/evolutionary_search/cv.py", line 104, in _evalFunction error_score=error_score)[0] File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 568, in _fit_and_score test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 610, in _score score = scorer(estimator, X_test, y_test) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/metrics/scorer.py", line 98, in call **self._kwargs) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/metrics/regression.py", line 239, in mean_squared_error y_true, y_pred, multioutput) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/metrics/regression.py", line 77, in _check_reg_targets y_pred = check_array(y_pred, ensure_2d=False) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 573, in check_array allow_nan=force_all_finite == 'allow-nan') File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite raise ValueError(msg_err.format(type_err, X.dtype)) ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "NN_GSCV-DL2.py", line 308, in grid_result = cv.fit(X_train, y_train) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/evolutionary_search/cv.py", line 363, in fit self._fit(X, y, possible_params) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/evolutionary_search/cv.py", line 453, in _fit halloffame=hof, verbose=self.verbose) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/site-packages/deap/algorithms.py", line 150, in eaSimple fitnesses = toolbox.map(toolbox.evaluate, invalid_ind) File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/users/hf832176/.conda/envs/tb_env6/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

opened by tbloch1 0
Does not work with pipelines

For tuning a single estimator this tool is awesome. But the standard gridsearch can actually accept a pipeline as an estimator, which allows you to evaluate different classifiers as parameters.

For some reason, this breaks with EvolutionaryAlgorithmSearchCV.

For example, set a pipeline like this: pipe = Pipeline([ ('imputer', SimpleImputer(strategy='median')), ('scaler' , StandardScaler()), ('classify', LogisticRegression()) ])

Then define a parameter grid to include different classifiers: param_grid_rf_big = [ {'classify': [RandomForestClassifier(),ExtraTreesClassifier()], 'classify__n_estimators': [500], 'classify__max_features': ['log2', 'sqrt', None], 'classify__min_samples_split': [2,3], 'classify__min_samples_leaf': [1,2,3], 'classify__criterion': ['gini',] } ]

When you pass this to EvolutionaryAlgorithmSearchCV you should be able to set the estimator to 'pipe' and and the params to 'param_grid_rf_big' and let it evaluate. This works with gridsearchcv, but not with EvolutionaryAlgorithmSearchCV.

opened by dth5 4
stuck after gen 1...

I have some datasets where the search get stuck for ever on gen 1 for instance.. does it happen to you too? how can I figure out what is the problem? python is still running and using a lot of CPU... but after hours nothing happens. any idea what could be the issue?

opened by fcoppey 1

Releases(0.3.0)

0.3.0(Jul 30, 2021)
fix bug with new version of sklearn (again)

Source code(tar.gz)
Source code(zip)
0.2.3(Nov 27, 2018)
fix bug with new version of sklearn

Source code(tar.gz)
Source code(zip)
0.2.2(Oct 21, 2017)
Fix parallelism in windows

Source code(tar.gz)
Source code(zip)
0.2.1(Sep 14, 2017)
Add again properties best_score_ and best_params_

Source code(tar.gz)
Source code(zip)
0.2.0(Apr 21, 2017)
Make sklearn-deap compatible with sklearn model_selection

Include some tests

New function maximize

Source code(tar.gz)
Source code(zip)
0.1.8(Jan 18, 2017)
Fix cross validation with score_cache

Source code(tar.gz)
Source code(zip)
0.1.7(Nov 23, 2016)
Fix score_cache

Source code(tar.gz)
Source code(zip)
0.1.6(Oct 18, 2016)
now scikit-learn==0.18 can be used

Source code(tar.gz)
Source code(zip)
0.1.5(Oct 12, 2016)
fix best_score_

Source code(tar.gz)
Source code(zip)
0.1.4(Aug 23, 2016)
Fix python3 compatibility

Source code(tar.gz)
Source code(zip)
0.1.3(Jan 27, 2016)

add caches
Source code(tar.gz)
Source code(zip)
0.1.2(Jan 6, 2016)

more fixes for pip
Source code(tar.gz)
Source code(zip)
0.1.1(Jan 6, 2016)

some fixes for pip
Source code(tar.gz)
Source code(zip)
0.1(Jan 6, 2016)

First version!
Source code(tar.gz)
Source code(zip)

Owner

rsteca

GitHub Repository

TensorFlow implementation of the algorithm in the paper "Decoupled Low-light Image Enhancement"

Decoupled Low-light Image Enhancement Shijie Hao1,2*, Xu Han1,2, Yanrong Guo1,2 & Meng Wang1,2 1Key Laboratory of Knowledge Engineering with Big Data

17 Apr 25, 2022

Rasterize with the least efforts for researchers.

utils3d Rasterize and do image-based 3D transforms with the least efforts for researchers. Based on numpy and OpenGL. It could be helpful when you wan

8 Dec 15, 2022

Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Fine-tuning StyleGAN2 for Cartoon Face Generation

520 Jan 04, 2023

Generalized Data Weighting via Class-level Gradient Manipulation

Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas

18 Nov 12, 2022

Agent-based model simulator for air quality and pandemic risk assessment in architectural spaces

Agent-based model simulation for air quality and pandemic risk assessment in architectural spaces. User Guide archABM is a fast and open source agent-

10 Dec 05, 2022

It is an open dataset for object detection in remote sensing images.

RSOD-Dataset It is an open dataset for object detection in remote sensing images. The dataset includes aircraft, oiltank, playground and overpass. The

136 Dec 08, 2022

Code for "Modeling Indirect Illumination for Inverse Rendering", CVPR 2022

Modeling Indirect Illumination for Inverse Rendering Project Page | Paper | Data Preparation Set up the python environment conda create -n invrender p

116 Jan 03, 2023

使用深度学习框架提取视频硬字幕；docker容器免安装深度学习库，使用本地api接口使得界面和后端识别分离；

extract-video-subtittle 使用深度学习框架提取视频硬字幕；本地识别无需联网； CPU识别速度可观；容器提供API接口；运行环境本项目运行环境非常好搭建，我做好了docker容器免安装各种深度学习包；提供windows界面操作；容器为CPU版本；视频演示 https

16 Aug 06, 2022

Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

Simple_Linear_2nd_ODE_Solver_GUI Description It is a 2nd constant coefficient li

4 Feb 05, 2022

PECOS - Prediction for Enormous and Correlated Spaces

PECOS - Predictions for Enormous and Correlated Output Spaces PECOS is a versatile and modular machine learning (ML) framework for fast learning and i

387 Jan 04, 2023

Scripts used to make and evaluate OpenAlex's concept tagging model

openalex-concept-tagging This repository contains all of the code for getting the concept tagger up and running. To learn more about where this model

18 Dec 09, 2022

pytorch, hand(object) detect ,yolo v5，手检测

YOLO V5 物体检测，包括手部检测。项目介绍手部检测手部检测示例如下：视频示例：项目配置作者开发环境： Python 3.7 PyTorch = 1.5.1 数据集手部检测数据集该项目数据集采用 TV-Hand 和 COCO-Hand (COCO-Hand-Big 部分) 进

11 Dec 20, 2022

In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy

PixMix Introduction In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard te

79 Dec 30, 2022

Cowsay - A rewrite of cowsay in python

Python Cowsay A rewrite of cowsay in python. Allows for parsing of existing .cow

3 Jun 27, 2022

Curved Projection Reformation

Description Assuming that we already know the image of the centerline, we want the lumen to be displayed on a plane, which requires curved projection

5 Sep 11, 2022

The project of phase's key role in complex and real NN

Phase-in-NN This is the code for our project at Princeton (co-authors: Yuqi Nie, Hui Yuan). The paper title is: "Neural Network is heterogeneous: Phas

1 Nov 04, 2021

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

OC-SORT Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes

325 Jan 05, 2023

Use evolutionary algorithms instead of gridsearch in scikit-learn

Related tags

Overview

sklearn-deap

Install

Usage examples

Comments

Releases(0.3.0)

0.3.0(Jul 30, 2021)

0.2.3(Nov 27, 2018)

0.2.2(Oct 21, 2017)

0.2.1(Sep 14, 2017)

0.2.0(Apr 21, 2017)

0.1.8(Jan 18, 2017)

0.1.7(Nov 23, 2016)

0.1.6(Oct 18, 2016)

0.1.5(Oct 12, 2016)

0.1.4(Aug 23, 2016)

0.1.3(Jan 27, 2016)

0.1.2(Jan 6, 2016)

0.1.1(Jan 6, 2016)

0.1(Jan 6, 2016)