MLBox is a powerful Automated Machine Learning python library.

Overview

docs/logos/logo.png

Documentation Status PyPI version Build Status GitHub Issues codecov License Downloads Python Versions


MLBox is a powerful Automated Machine Learning python library. It provides the following features:

  • Fast reading and distributed data preprocessing/cleaning/formatting
  • Highly robust feature selection and leak detection
  • Accurate hyper-parameter optimization in high-dimensional space
  • State-of-the art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,...)
  • Prediction with models interpretation

For more details, please refer to the official documentation


How to Contribute

MLBox has been developed and used by many active community members. Your help is very valuable to make it better for everyone.

  • Check out call for contributions to see what can be improved, or open an issue if you want something.
  • Contribute to the tests to make it more reliable.
  • Contribute to the documents to make it clearer for everyone.
  • Contribute to the examples to share your experience with other users.
  • Open issue if you met problems during development.

For more details, please refer to CONTRIBUTING.

Comments
  • Trying to install and getting xgboost errors

    Trying to install and getting xgboost errors

    Systems is Kaggle kernel which is Ubuntu based which seems to be the desired environment

    I rung this:

    !apt-get install build-essential
    !pip install cmake
    !pip install xgboost>=0.6a2
    !pip install lightgbm>=2.0.2
    !pip install mlbox
    

    Resulting in this:

    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xib6_1h7/xgboost/

    Can you please help me out? I see your examples are also Kaggle based, but they don't have the install steps. Do you somehow install packages from setup within the kernel???

    opened by TimusLetap 19
  • Setup script exited with usage: setup.py [global_opts] error: no commands supplied

    Setup script exited with usage: setup.py [global_opts] error: no commands supplied

    Hi AxeldeRomblay,

    System Information Ubuntu 16.04

    I was actually trying to install your MLBox to give it a try. The steps I took were - 1- clone the repository. 2- run setup.py file.

    The building part finishes till 100% but then throws a couple of errors. I am posting the actual error below.

    [100%] Linking CXX shared library /tmp/easy_install-1gz4zwpu/lightgbm-2.0.2/lightgbm/lib_lightgbm.so [100%] Built target _lightgbm Install lib_lightgbm from: ['lightgbm/lib_lightgbm.so'] error: Setup script exited with usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help

    error: no commands supplied.

    opened by NeerajSarwan 15
  • Install fails. Can't tell if it is MLbox or XGboost that doesnt work

    Install fails. Can't tell if it is MLbox or XGboost that doesnt work

    Hi Axel,

    We're trying to install MLbox and get the following error :

    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-ob6fq6jv/xgboost/
    Error installing the following packages:
    ['xgboost==0.6a2']
    Please install them manually
    

    Now, given that xgboost is installed and works, I suppose two issues can cause the error :

    1. The version of xgboost is too old. We're running 0.6.
    2. MLbox can't find the available xgboost version

    Indeed, it MLbox seems to try and install xgboost 0.6a2 even though a version is already installed, which is surprising.

    Maybe it is me. Thank you for your help.

    opened by brcacrm 13
  • Hub connection request timed out

    Hub connection request timed out

    I tried to run the code in the picture below, but I got the error saying TimeoutError: Hub connection request timed out. I'm using Python2.7 under Ubuntu 16.04

    hub connection time out

    Thanks for your help

    opened by ilyes495 10
  • Cleaning takes too long time on multi-cores cpu

    Cleaning takes too long time on multi-cores cpu

    Cleaning takes 276s for house price dataset on intel E5-2683v3 As E5-2683 has more 14cores and 28threads. I guess the problem may cause by n-job=-1 in here. ` if (self.verbose): print("cleaning data ...")

        df = pd.concat(Parallel(n_jobs=-1)(delayed(convert_list)(df[col]) for col in df.columns),
                       axis=1)
    
        df = pd.concat(Parallel(n_jobs=-1)(delayed(convert_float_and_dates)(df[col]) for col in df.columns), axis=1) `       
    

    I don't know how to fix it, may be add a n_jobs arguments for class Reader? Looking for you response. Thank you.

    opened by a1a2y3 8
  • FYI: ColumnTransformer

    FYI: ColumnTransformer

    We'll have a ColumnTransformer in sklearn pretty soon that will make it easier to treat different columns differently. That should make is much simpler to have different pipelines for categorical and continuous data, which seems one of the big issues MLBox addresses.

    opened by amueller 8
  • Code implementation frozen

    Code implementation frozen

    Hello,

    I tried implementing the code in https://www.analyticsvidhya.com/blog/2017/07/mlbox-library-automated-machine-learning/. The engines start but the code implementation is frozen (still running but no task is done). I get the following message on my screen:

    screen

    I tried to put time.sleep but it doesn't change. I'm on Windows 10 Pro, Python 3.5 with Anaconda Do you have any idea why?

    opened by yousseferahim 7
  • Testing with Predicting Blood Donation challenge

    Testing with Predicting Blood Donation challenge

    Hi, Doing some tests with this challenge https://www.drivendata.org/competitions/2/warm-up-predict-blood-donations/

    With minimal understanding I rank around 700 on 2400 ! I must document some questions on how to get features importance how to set up stacking

    Rgds Bruno Seznec

    opened by brunosez 7
  • TypeError: 'generator' object is not subscriptable

    TypeError: 'generator' object is not subscriptable

    When running on a python 3.6 environment in a jupyter notebook, ubuntu 14.04 I get the following:

    ' from mlbox.preprocessing import * from mlbox.optimisation import * from mlbox.prediction import *

    paths = ["train.csv", "test.csv"] target_name = "target"

    data = Reader(sep=",").train_test_split(paths, target_name) #reading

    space = {

        'ne__numerical_strategy' : {"space" : [0, 'mean']},
    
        'ce__strategy' : {"space" : ["label_encoding", "random_projection", "entity_embedding"]},
    
        'fs__strategy' : {"space" : ["variance", "rf_feature_importance"]},
        'fs__threshold': {"search" : "choice", "space" : [0.1, 0.2, 0.3]},
    
        'est__strategy' : {"space" : ["XGBoost"]},
        'est__max_depth' : {"search" : "choice", "space" : [5,6]},
        'est__subsample' : {"search" : "uniform", "space" : [0.6,0.9]}
    
        }
    

    opt = Optimiser(scoring = 'roc_auc', n_folds = 4)

    best = opt.optimise(space, data, max_evals = 5)

    `

    `TypeError Traceback (most recent call last) in () 16 opt = Optimiser(scoring = 'roc_auc', n_folds = 4) 17 ---> 18 best = opt.optimise(space, data, max_evals = 5) 19

    ~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/mlbox/optimisation/optimiser.py in optimise(self, space, df, max_evals) 565 space=hyper_space, 566 algo=tpe.suggest, --> 567 max_evals=max_evals) 568 569 # Displaying best_params

    ~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin) 312 313 domain = base.Domain(fn, space, --> 314 pass_expr_memo_ctrl=pass_expr_memo_ctrl) 315 316 rval = FMinIter(algo, domain, trials, max_evals=max_evals,

    ~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/base.py in init(self, fn, expr, workdir, pass_expr_memo_ctrl, name, loss_target) 784 before = pyll.dfs(self.expr) 785 # -- raises exception if expr contains cycles --> 786 pyll.toposort(self.expr) 787 vh = self.vh = VectorizeHelper(self.expr, self.s_new_ids) 788 # -- raises exception if v_expr contains cycles

    ~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/pyll/base.py in toposort(expr) 713 G.add_edges_from([(n_in, node) for n_in in node.inputs()]) 714 order = nx.topological_sort(G) --> 715 assert order[-1] == expr 716 return order 717 `

    opened by NickBuchny 6
  • Redundant results and model overfitting

    Redundant results and model overfitting

    1. We are getting same results irrespective of number of max_evals and seed change.
    2. We have increased n_fold and also reduced max_evals to see if we get different results. For any combination of parameter settings we are getting same results. I think the model is over-fitting during training. Is there any other way where we can check and stop this to get better results?
    3. In our use-case we have not used ne, ce, and fs params in 'space' settings. Is there a way to use stacking regression without these? We are not able to resolve errors while using stacking with only params related to algorithm selection in regression strategy.

    I will be grateful if you can help me resolve the above issues. Thanks.

    opened by mahatibharadwaj 5
  • Error while computing the cross validation mean score.

    Error while computing the cross validation mean score.

    Hi,

    I am interested in MLBox and tried for a Kaggle classification project. When processing to the step of optimizing the best hyperparameters, an error message showed as 'An error occurred while computing the cross validation mean score. Check the parameter values and your scoring function.'

    Here's the code I used:

    ` Path = ['train_path', 'test_path'] target = 'target_name'

    rd = Reader(sep = ",") df = rd.train_test_split(paths, target_name)

    dft = Drift_thresholder() df = dft.fit_transform(df)

    space = {'ne__numerical_strategy':{"search":"choice", "space":['mean','median']},

         'ne__categorical_strategy':{"search":"choice",
                                     "space":[np.NaN]},
         
         'ce__strategy':{"search":"choice",
                         "space":['label_encoding','entity_embedding','random_projection']},
         
        'est__strategy':{"search":"choice",
                                  "space":["LightGBM"]},    
        'est__n_estimators':{"search":"choice",
                                  "space":[150]},    
        'est__colsample_bytree':{"search":"uniform",
                                  "space":[0.8,0.95]},
        'est__subsample':{"search":"uniform",
                                  "space":[0.8,0.95]},
        'est__max_depth':{"search":"choice",
                                  "space":[5,6,7,8,9]},
        'est__learning_rate':{"search":"choice",
                                  "space":[0.07]} 
    
        }
    

    opt = Optimiser(scoring = "roc_auc", n_folds = 5) best_params = opt.optimise(space, df, 15)

    ` Can you help me with fixing it? Thanks for that!

    opened by YAOLI0407 5
  • Import error while using MLBox inside google collab

    Import error while using MLBox inside google collab

    I'm facing an abrupt import error, not able to figure out why it is occurring.

    Here is the problem

    • google collab discards the latest versions of MLBox due to dependency failure
    • automatically downgrades the dependencies
    • can't fulfill all the import * actions
    • throws type error when manually installing the MLBox new version

    Screenshot of the issue : Screenshot 2022-12-11 122655

    Sklearn version: 1.0.2 MLBox version: 0.5.1

    Pls look into it and help me resolve this issue

    opened by prathikshetty2002 0
  • Bump tensorflow from 2.0.0 to 2.9.3

    Bump tensorflow from 2.0.0 to 2.9.3

    Bumps tensorflow from 2.0.0 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Project dependencies may have API risk issues

    Project dependencies may have API risk issues

    Hi, In MLBox, inappropriate dependency versioning constraints can cause risks.

    Below are the dependencies and version constraints that the project is using

    numpy==1.18.2
    scipy==1.4.1
    matplotlib==3.0.3
    hyperopt==0.2.3
    pandas==0.25.3
    joblib==0.14.1
    scikit-learn==0.22.1
    tensorflow==2.0.0
    lightgbm==2.3.1
    tables==3.5.2
    xlrd==1.2.0
    

    The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

    After further analysis, in this project, The version constraint of dependency matplotlib can be changed to >=1.3.0,<=3.0.3. The version constraint of dependency joblib can be changed to ==0.7.0d. The version constraint of dependency joblib can be changed to >=0.3.6.dev,<=1.1.0. The version constraint of dependency scikit-learn can be changed to >=0.20rc1,<=0.20.4.

    The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

    The invocation of the current project includes all the following methods.

    The calling methods from the matplotlib
    matplotlib.use
    
    The calling methods from the joblib
    joblib.delayed
    joblib.Parallel
    
    The calling methods from the scikit-learn
    sklearn.tree.DecisionTreeRegressor
    sklearn.ensemble.RandomForestRegressor
    sklearn.linear_model.LinearRegression
    sklearn.linear_model.Ridge
    sklearn.ensemble.ExtraTreesRegressor
    sklearn.ensemble.AdaBoostClassifier
    sklearn.preprocessing.LabelEncoder
    joblib.delayed
    sklearn.ensemble.RandomForestClassifier
    sklearn.impute.SimpleImputer
    sklearn.preprocessing.LabelEncoder.fit_transform
    sklearn.tree.DecisionTreeClassifier
    sklearn.ensemble.BaggingClassifier
    sklearn.ensemble.AdaBoostRegressor
    sklearn.linear_model.LogisticRegression
    joblib.Parallel
    sklearn.ensemble.ExtraTreesClassifier
    sklearn.ensemble.BaggingRegressor
    sklearn.linear_model.Lasso
    sklearn.metrics.roc_auc_score
    sklearn.metrics.make_scorer
    
    The calling methods from the all methods
    self.__Lnum.df.fillna
    self.fit_transform
    x.col.self.__Enc.col.df.apply.tolist
    self.set_params
    readme_file.read
    col.df_train.apply
    setattr
    drift.DriftThreshold.get_support
    estimator.fit
    encoding.categorical_encoder.Categorical_encoder
    pandas.datetime
    i.col.x.get_embeddings.col.df.apply.tolist
    clf.predict_proba
    numpy.arange
    print
    y_train.drop.drop
    est.get_params.items
    tensorflow.keras.layers.Dense
    mlbox.preprocessing.Reader.train_test_split
    serie_to_df.hour.astype
    self.transform
    y.value_counts
    warnings.warn
    pandas.datetime.serie.pandas.DatetimeIndex.total_seconds
    pandas.Series.describe
    mlbox.optimisation.make_scorer
    self.get_estimator
    convert_list
    sklearn.ensemble.RandomForestRegressor.fit
    numpy.shape
    self.__cv.split
    classifier.Classifier
    col.df.apply
    self.clean
    model.regression.feature_selector.Reg_feature_selector
    self.__classifier.score
    serie.pandas.DatetimeIndex.dayofweek.astype
    tensorflow.keras.layers.concatenate
    pandas.read_csv
    pipe.append
    self.__regressor.predict
    len
    self.fit
    mlbox.preprocessing.Drift_thresholder
    tensorflow.keras.models.Model.get_weights
    self.__classifier.get_params.keys
    hyperopt.hp.choice
    self.__set_regressor
    pp.set_params.predict
    y_train.drop.apply
    matplotlib.pyplot.savefig
    pandas.read_json
    model.get_estimator.get_params.items
    selected_col.append
    lightgbm.LGBMRegressor
    tensorflow.keras.layers.Reshape
    df_train.drop_duplicates.keys
    path.split
    drift_estimator.DriftEstimator
    pandas.DataFrame.head
    sorted.remove
    col.df_train.dropna.unique
    dropout1.Dropout
    keepList.append
    hyperopt.hp.uniform
    y_train.pd.get_dummies.astype
    pandas.Series.value_counts
    time.time
    sklearn.metrics.roc_auc_score
    open.close
    zip
    d.copy
    sklearn.linear_model.LinearRegression
    sklearn.ensemble.ExtraTreesRegressor
    pandas.DatetimeIndex
    regressor.Regressor
    pandas.Series.nunique
    convert_float_and_dates.delayed
    tuples.dict.items
    col.df_train.dropna
    pandas.concat.to_hdf
    y.apply
    str
    ValueError
    version_file.read
    self.__K.values
    serie_to_df.dayofweek.astype
    self.__plot_feature_importances
    space.keys
    self.__classifier.get_params
    df_train.drop_duplicates.drop_duplicates
    numpy.exp
    p.startswith
    numpy.intersect1d
    range
    mock.Mock
    numpy.random.seed
    self.level_estimator.predict
    self.__regress_params.items
    params.keys
    numpy.abs
    sklearn.pipeline.Pipeline
    serie_to_df.second.astype
    self.__classif_params.items
    df.value_counts
    pandas.DataFrame
    sklearn.model_selection.cross_val_score
    serie_to_df.minute.astype
    sklearn.pipeline.Pipeline.fit
    self.level_estimator.predict_proba
    self.__regressor.fit
    reg.fit
    serie.pandas.DatetimeIndex.minute.astype
    filter
    y_train.drop.value_counts
    lightgbm.LGBMClassifier
    self.__regressor.transform
    mlbox.prediction.Predictor
    self.get_params
    tensorflow.keras.layers.Embedding
    est.get_estimator.get_params
    col.self.__K.Reshape
    os.mkdir
    drift.DriftThreshold.fit
    sklearn.model_selection.StratifiedKFold.split
    model.regression.regressor.Regressor.get_params
    pickle.load
    tensorflow.keras.layers.Dropout
    numpy.int
    sum
    model.regression.stacking_regressor.StackingRegressor
    reg.get_params
    pp.set_params.set_params
    numpy.sort
    sklearn.model_selection.cross_val_predict
    matplotlib.pyplot.yticks
    serie.apply.tolist
    pandas.concat.keys
    self.__classifier.predict
    fh.read.splitlines
    params.items
    reg.predict
    matplotlib.pyplot.barh
    params.update
    est.feature_importances.values
    pandas.DataFrame.idxmax
    encoding.na_encoder.NA_encoder.get_params
    list.x.type.serie.apply.sum
    pandas.SparseDataFrame
    model.classification.feature_selector.Clf_feature_selector
    convert_list.delayed
    self.__imp.transform
    self.__set_classifier
    self.__classifier.fit
    sklearn.linear_model.Ridge
    self.n_jobs.Parallel
    open.write
    operator.itemgetter
    ds.drifts.items
    dropList.append
    numpy.sum
    sorted
    sklearn.ensemble.ExtraTreesClassifier
    df_train.shape.df_train.isnull.sum.sort_values.max
    model.get_params.items
    serie.pandas.DatetimeIndex.second.astype
    drift_estimator.DriftEstimator.score
    self.get_estimator.estimator_weights_.sum
    mlbox.prediction.Predictor.fit_predict
    model.classification.stacking_classifier.StackingClassifier
    self.__Lcat.df.fillna
    col.df_train.nunique
    df_train.sample
    self.__regressor.score
    model.regression.feature_selector.Reg_feature_selector.get_params
    pandas.concat
    pandas.concat.values
    sklearn.metrics.SCORERS.keys
    matplotlib.pyplot.show
    sklearn.ensemble.BaggingClassifier
    model.get_estimator.get_params
    model.classification.classifier.Classifier
    S.append
    pp.set_params.fit
    stck.STCK.get_params.copy
    open
    est.get_estimator.get_params.items
    fh.read
    tensorflow.keras.models.Model.compile
    clf.fit
    max
    numpy.log
    sklearn.ensemble.AdaBoostClassifier
    sklearn.preprocessing.LabelEncoder
    importance_bag.append
    serie_to_df.month.astype
    int
    enumerate
    self.__cross_val_predict_proba
    get_embeddings
    self.__imp.fit
    df_train.shape.df_train.isnull.sum.sort_values
    sync_fit
    y_train.nunique.Dense
    sklearn.linear_model.LogisticRegression
    serie_to_df.day.astype
    sklearn.linear_model.Lasso
    min
    set
    df_test.sample
    df_train.drop_duplicates.isnull
    df_train.std
    numpy.random.shuffle
    hyperopt.fmin.items
    tensorflow.keras.layers.Input
    serie.pandas.DatetimeIndex.month.astype
    pandas.get_dummies
    pandas.to_datetime
    pandas.Series
    mlbox.optimisation.Optimiser.optimise
    self.__classifier.predict_proba
    sklearn.ensemble.RandomForestClassifier.fit
    self.level_estimator.fit
    sys.path.insert
    self.__regressor.get_params
    model.regression.regressor.Regressor.get_estimator
    serie.pandas.DatetimeIndex.hour.astype
    setuptools.setup
    df_test.index.nunique
    self.__Lcat.df_train.isnull
    sklearn.impute.SimpleImputer
    sklearn.preprocessing.LabelEncoder.fit_transform
    pandas.DataFrame.to_csv
    pandas.datetime.serie_to_df.total_seconds
    embeddings.append
    list
    col.df_train.unique
    self.__regressor.get_params.keys
    stck.STCK.get_params
    mlbox.optimisation.Optimiser
    encoding.na_encoder.NA_encoder
    pickle.dump
    Mock
    sklearn.ensemble.RandomForestRegressor
    col.self.__K.col.self.__Enc.len.Embedding
    self.get_params.keys
    sklearn.ensemble.RandomForestClassifier
    joblib.delayed
    tensorflow.keras.models.Model.fit
    df.drop
    df_train.isnull.sum
    numpy.zeros
    self.__Lcat.df_train.isnull.sum
    sys.modules.update
    serie.apply.apply
    copy.copy
    mlbox.preprocessing.Drift_thresholder.fit_transform
    df_train.drop_duplicates.to_hdf
    joblib.Parallel
    numpy.round
    tensorflow.keras.models.Model
    serie.pandas.DatetimeIndex.year.astype
    col.self.__Enc.keys
    sklearn.pipeline.Pipeline.transform
    df_train.drop_duplicates.values
    sklearn.model_selection.StratifiedKFold
    df_train.index.nunique
    sync_fit.delayed
    sklearn.tree.DecisionTreeRegressor
    pandas.read_hdf
    os.path.dirname
    self.clean.drop_duplicates
    drift.DriftThreshold.drifts
    df.name.pred.apply
    pandas.Series.values
    matplotlib.use
    serie.pandas.DatetimeIndex.day.astype
    pandas.read_excel
    sklearn.tree.DecisionTreeClassifier
    type
    dict
    drift_estimator.DriftEstimator.fit
    numpy.std
    numpy.mean
    sklearn.ensemble.BaggingRegressor
    clf.get_params
    os.getcwd
    estimator.predict_proba
    serie_to_df.year.astype
    self.level_estimator.get_params
    model.get_params
    matplotlib.pyplot.grid
    callable
    self.__save_feature_importances
    min.Dense
    mlbox.optimisation.Optimiser.evaluate
    model.regression.regressor.Regressor.feature_importances
    hyperopt.fmin
    drift.DriftThreshold
    sklearn.ensemble.AdaBoostRegressor
    matplotlib.pyplot.text
    y_train.nunique
    y_train.index.nunique
    matplotlib.pyplot.title
    stck.STCK.get_params.copy.keys
    pp.set_params.predict_proba
    sklearn.metrics.make_scorer
    matplotlib.pyplot.close
    model.regression.regressor.Regressor
    pickle.load.inverse_transform
    dropout2.Dropout
    drift.DriftThreshold.transform
    target_name.df.isnull
    var.df_train.nunique
    self.__classifier.predict_log_proba
    numpy.percentile
    sklearn.model_selection.KFold
    model.get_estimator
    int.Dense
    self.fit_transform.drop
    matplotlib.pyplot.figure
    df.apply
    self.evaluate
    is_null.df.drop
    p.split
    convert_float_and_dates
    mlbox.preprocessing.Reader
    col.df_train.mode
    getattr
    df_test.df_train.pd.concat.drop
    inputs.append
    

    @developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

    opened by PyDeps 0
  • Bump joblib from 0.14.1 to 1.2.0

    Bump joblib from 0.14.1 to 1.2.0

    Bumps joblib from 0.14.1 to 1.2.0.

    Changelog

    Sourced from joblib's changelog.

    Release 1.2.0

    • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

    • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

    • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

    • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

    • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

    • Vendor loky 3.3.0 which fixes several bugs including:

      • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

      • avoiding leaking worker processes in case of nested loky parallel calls;

      • reliability spawn the correct number of reusable workers.

    Release 1.1.0

    • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

    • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

    ... (truncated)

    Commits
    • 5991350 Release 1.2.0
    • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
    • cea26ff CI test the future loky-3.3.0 branch (#1338)
    • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
    • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
    • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
    • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
    • ac09691 [MAINT] various test updates (#1334)
    • 4a314b1 Vendor loky 3.2.0 (#1333)
    • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.18.2 to 1.22.0

    Bump numpy from 1.18.2 to 1.22.0

    Bumps numpy from 1.18.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • ModuleNotFoundError: No module named 'mlbox.preprocessing'

    ModuleNotFoundError: No module named 'mlbox.preprocessing'

    Hi,

    Even after installed !pip install mlbox, getting error message while using mlbox.

    1. Able to find in installation using !pip list command

    mlbox 0.8.5

    1. Error Message

    ModuleNotFoundError Traceback (most recent call last) in 1 #https://mlbox.readthedocs.io/en/latest/index.html 2 # importing the required libraries ----> 3 from mlbox.preprocessing import * 4 from mlbox.optimisation import * 5 from mlbox.prediction import *

    ModuleNotFoundError: No module named 'mlbox.preprocessing'

    opened by mrajkumar18 0
Releases(v0.8.1)
Iterative stochastic gradient descent (SGD) linear regressor with regularization

SGD-Linear-Regressor Iterative stochastic gradient descent (SGD) linear regressor with regularization Dataset: Kaggle “Graduate Admission 2” https://w

Zechen Ma 1 Oct 29, 2021
PLUR is a collection of source code datasets suitable for graph-based machine learning.

PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the

Google Research 76 Nov 25, 2022
A simple machine learning python sign language detection project.

SST Coursework 2022 About the app A python application that utilises the tensorflow object detection algorithm to achieve automatic detection of ameri

Xavier Koh 2 Jun 30, 2022
An easier way to build neural search on the cloud

Jina is geared towards building search systems for any kind of data, including text, images, audio, video and many more. With the modular design & multi-layer abstraction, you can leverage the effici

Jina AI 17k Jan 01, 2023
Neural Machine Translation (NMT) tutorial with OpenNMT-py

Neural Machine Translation (NMT) tutorial with OpenNMT-py. Data preprocessing, model training, evaluation, and deployment.

Yasmin Moslem 29 Jan 09, 2023
Dieses Projekt ermöglicht es den Smartmeter der EVN (Netz Niederösterreich) über die Kundenschnittstelle auszulesen.

SmartMeterEVN Dieses Projekt ermöglicht es den Smartmeter der EVN (Netz Niederösterreich) über die Kundenschnittstelle auszulesen. Smart Meter werden

greenMike 43 Dec 04, 2022
ML-powered Loan-Marketer Customer Filtering Engine

In Loan-Marketing business employees are required to call the user's to buy loans of several fields and in several magnitudes. If employees are calling everybody in the network it is also very length

Sagnik Roy 13 Jul 02, 2022
This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

1 Nov 03, 2021
fastFM: A Library for Factorization Machines

Citing fastFM The library fastFM is an academic project. The time and resources spent developing fastFM are therefore justified by the number of citat

1k Dec 24, 2022
李航《统计学习方法》复现

本项目复现李航《统计学习方法》每一章节的算法 特点: 笔记摘要:在每个文件开头都会有一些核心的摘要 pythonic:这里会用尽可能规范的方式来实现,包括编程风格几乎严格按照PEP8 循序渐进:前期的算法会更list的方式来做计算,可读性比较强,后期几乎完全为numpy.array的计算,并且辅助详

58 Oct 22, 2021
Simulate & classify transient absorption spectroscopy (TAS) spectral features for bulk semiconducting materials (Post-DFT)

PyTASER PyTASER is a Python (3.9+) library and set of command-line tools for classifying spectral features in bulk materials, post-DFT. The goal of th

Materials Design Group 4 Dec 27, 2022
Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led

Brett Vogelsang 2 Jan 18, 2022
Turning images into '9-pan' palettes using KMeans clustering from sklearn.

img2palette Turning images into '9-pan' palettes using KMeans clustering from sklearn. Requirements We require: Pillow, for opening and processing ima

Samuel Vidovich 2 Jan 01, 2022
OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference S

OptaPy 208 Dec 27, 2022
whylogs: A Data and Machine Learning Logging Standard

whylogs: A Data and Machine Learning Logging Standard whylogs is an open source standard for data and ML logging whylogs logging agent is the easiest

WhyLabs 2k Jan 06, 2023
Software Engineer Salary Prediction

Based on 2021 stack overflow data, this machine learning web application helps one predict the salary based on years of experience, level of education and the country they work in.

Jhanvi Mimani 1 Jan 08, 2022
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022
Applied Machine Learning for Graduate Program in Computer Science (PPGCC)

Applied Machine Learning for Graduate Program in Computer Science (PPGCC) - Federal University of Santa Catarina

Jônatas Negri Grandini 1 Dec 22, 2021
Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Time series analysis today is an important cornerstone of quantitative science in many disciplines, including natural and life sciences as well as eco

Christoph Mark 129 Dec 24, 2022