FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

Last update: Jan 09, 2023

Overview

FLAML - Fast and Lightweight AutoML

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically. It frees users from selecting learners and hyperparameters for each learner. It is fast and economical. The simple and lightweight design makes it easy to extend, such as adding customized learners or metrics. FLAML is powered by a new, cost-effective hyperparameter optimization and learner selection method invented by Microsoft Research. FLAML leverages the structure of the search space to choose a search order optimized for both cost and error. For example, the system tends to propose cheap configurations at the beginning stage of the search, but quickly moves to configurations with high model complexity and large sample size when needed in the later stage of the search. For another example, it favors cheap learners in the beginning but penalizes them later if the error improvement is slow. The cost-bounded search and cost-based prioritization make a big difference in the search efficiency under budget constraints.

FLAML has a .NET implementation as well from ML.NET Model Builder. This ML.NET blog describes the improvement brought by FLAML.

Installation

FLAML requires Python version >= 3.6. It can be installed from pip:

pip install flaml

To run the notebook example, install flaml with the [notebook] option:

pip install flaml[notebook]

Quickstart

With three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.

from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")

You can restrict the learners and use FLAML as a fast hyperparameter tuning tool for XGBoost, LightGBM, Random Forest etc. or a customized learner.

automl.fit(X_train, y_train, task="classification", estimator_list=["lgbm"])

You can also run generic ray-tune style hyperparameter tuning for a custom function.

from flaml import tune
tune.run(train_with_config, config={…}, low_cost_partial_config={…}, time_budget_s=3600)

Advantages

For common machine learning tasks like classification and regression, find quality models with small computational resources.
Users can choose their desired customizability: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), full customization (arbitrary training and evaluation code).
Allow human guidance in hyperparameter tuning to respect prior on certain subspaces but also able to explore other subspaces. Read more about the hyperparameter optimization methods in FLAML here. They can be used beyond the AutoML context. And they can be used in distributed HPO frameworks such as ray tune or nni.
Support online AutoML: automatic hyperparameter tuning for online learning algorithms. Read more about the online AutoML method in FLAML here.

Examples

A basic classification example.

from flaml import AutoML
from sklearn.datasets import load_iris
# Initialize an AutoML instance
automl = AutoML()
# Specify automl goal and constraint
automl_settings = {
    "time_budget": 10,  # in seconds
    "metric": 'accuracy',
    "task": 'classification',
    "log_file_name": "test/iris.log",
}
X_train, y_train = load_iris(return_X_y=True)
# Train with labeled input data
automl.fit(X_train=X_train, y_train=y_train,
           **automl_settings)
# Predict
print(automl.predict_proba(X_train))
# Export the best model
print(automl.model)

A basic regression example.

from flaml import AutoML
from sklearn.datasets import load_boston
# Initialize an AutoML instance
automl = AutoML()
# Specify automl goal and constraint
automl_settings = {
    "time_budget": 10,  # in seconds
    "metric": 'r2',
    "task": 'regression',
    "log_file_name": "test/boston.log",
}
X_train, y_train = load_boston(return_X_y=True)
# Train with labeled input data
automl.fit(X_train=X_train, y_train=y_train,
           **automl_settings)
# Predict
print(automl.predict(X_train))
# Export the best model
print(automl.model)

Time series forecasting.

# pip install flaml[forecast]
import numpy as np
from flaml import AutoML
X_train = np.arange('2014-01', '2021-01', dtype='datetime64[M]')
y_train = np.random.random(size=72)
automl = AutoML()
automl.fit(X_train=X_train[:72],  # a single column of timestamp
           y_train=y_train,  # value for each timestamp
           period=12,  # time horizon to forecast, e.g., 12 months
           task='forecast', time_budget=15,  # time budget in seconds
           log_file_name="test/forecast.log",
          )
print(automl.predict(X_train[72:]))

Learning to rank.

from sklearn.datasets import fetch_openml
from flaml import AutoML
X, y = fetch_openml(name="credit-g", return_X_y=True)   
# not a real learning to rank dataaset
groups = [200] * 4 + [100] * 2,    # group counts
automl = AutoML()
automl.fit(
    X_train, y_train, groups=groups,
    task='rank', time_budget=10,    # in seconds
)

More examples can be found in notebooks.

Documentation

Please find the API documentation here.

Please find demo and tutorials of FLAML here.

For more technical details, please check our papers.

FLAML: A Fast and Lightweight AutoML Library. Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu. MLSys 2021.

@inproceedings{wang2021flaml,
    title={FLAML: A Fast and Lightweight AutoML Library},
    author={Chi Wang and Qingyun Wu and Markus Weimer and Erkang Zhu},
    year={2021},
    booktitle={MLSys},
}

Frugal Optimization for Cost-related Hyperparameters. Qingyun Wu, Chi Wang, Silu Huang. AAAI 2021.
Economical Hyperparameter Optimization With Blended Search Strategy. Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.
ChaCha for Online AutoML. Qingyun Wu, Chi Wang, John Langford, Paul Mineiro and Marco Rossi. ICML 2021.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

If you are new to GitHub here is a detailed help source on getting involved with development on GitHub.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Developing

Setup

git clone https://github.com/microsoft/FLAML.git
pip install -e .[test,notebook]

Coverage

Any code you commit should generally not significantly impact coverage. To run all unit tests:

coverage run -m pytest test

Then you can see the coverage report by coverage report -m or coverage html. If all the tests are passed, please also test run notebook/flaml_automl to make sure your commit does not break the notebook example.

Authors

Chi Wang
Qingyun Wu

Contributors (alphabetical order): Amir Aghaei, Vijay Aski, Sebastien Bubeck, Surajit Chaudhuri, Nadiia Chepurko, Ofer Dekel, Alex Deng, Anshuman Dutt, Nicolo Fusi, Jianfeng Gao, Johannes Gehrke, Niklas Gustafsson, Silu Huang, Dongwoo Kim, Christian Konig, John Langford, Menghao Li, Mingqin Li, Zhe Liu, Naveen Gaur, Paul Mineiro, Vivek Narasayya, Jake Radzikowski, Marco Rossi, Amin Saied, Neil Tenenholtz, Olga Vrousgou, Markus Weimer, Yue Wang, Qingyun Wu, Qiufeng Yin, Haozhe Zhang, Minjia Zhang, XiaoYun Zhang, Eric Zhu, and open-source contributors.

License

MIT License

Comments

Feature Request : Make FLAML installable with Conda

Basically the title.

Gherkin style :

AS A ML developer
WHEN I run conda install flaml (or conda -c conda-forge install flaml)
THEN it installs FLAML in my current Conda environment
- AND I can execute the following code :

from flaml import AutoML
from sklearn.datasets import load_iris
# Initialize an AutoML instance
automl = AutoML()
# Specify automl goal and constraint
automl_settings = {
    "time_budget": 10,  # in seconds
    "metric": 'accuracy',
    "task": 'classification',
    "log_file_name": "test/iris.log",
}
X_train, y_train = load_iris(return_X_y=True)
# Train with labeled input data
automl.fit(X_train=X_train, y_train=y_train,
           **automl_settings)
# Predict
print(automl.predict_proba(X_train))
# Export the best model
print(automl.model)

help wanted

opened by fleuryc 35

Kernel panic / Segmentation fault when trying to run example

Hi! I tried to install and run a simple example, but ran into a Segmentation Fault: 11 error when using Python 3.9.6:


from sklearn import datasets

def X_y():
    X, y = datasets.make_classification(n_samples=100, n_features=20,
                                        n_informative=2, n_redundant=2, random_state=0)

    return X, y

from flaml import AutoML
automl = AutoML()
automl.fit(X, y, task="classification")

Running this in both my Jupyter notebook and via bash gives me:

$ python3 innodays_flaml.py
[flaml.automl: 09-24 10:45:44] {1431} INFO - Evaluation method: cv
[flaml.automl: 09-24 10:45:44] {1477} INFO - Minimizing error metric: 1-accuracy
[flaml.automl: 09-24 10:45:44] {1514} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'catboost', 'xgboost', 'extra_tree', 'lrl1']
[flaml.automl: 09-24 10:45:44] {1746} INFO - iteration 0, current learner lgbm
Segmentation fault: 11

opened by angela97lin 26

A few questions on running FLAML distributedly via Ray on compute clusters in AzureML
Hi @sonichi:

My team are running FLAML distributedly using Ray on compute clusters in AzureML. We have a few questions since we've never used FLAML in this environment before and hope you could provide some insights.

With this setting, what is the best way to log and register the optimal model returned of each learner in AzureML using mlflow? Shall we simply do mlflow.sklearn.log_model(automl.best_model_for_estimator('LearnerA'), "BestModelLearnerA") and then mlflow.register_model(model_uri=f"{run.info.artifact_uri}/LearnerA", name='flaml-LearnerA') ?

Where is the log file and how to change the directory for it?

Thank you!
opened by flippercy 19
Why not use early_stop_rounds?

For LGBM and XGBoost, num_estimators is sampled between 4 and min(32768, int(data_size)).

Instead, have you considered setting num_estimators (alias num_boost_round) to a large value (say 32768) and using evals and early_stop_rounds during training? This would allow the learning algorithm (rather than the tuning algorithm) to directly find the number of boosting rounds that is optimal, given the data and all other hyper parameter values. (The algorithms will output the best num_estimators/num_boost_round via best_iteration.)

And then, when you refit on the full dataset, you can use the (average/mode/etc) best_iteration that were used during CV.

opened by stepthom 14
The result of demo case between ray tune and flaml is not same.

I run case studies in page of https://github.com/microsoft/FLAML/tree/main/flaml/tune, which includes the cfo used for flmal.tune and ray.tune. But the result is different between them, so is there something wrong in embding the flmal into ray.tune's framework?

opened by zuoxiaojiang 13
reproducibility and random state for AutoML.fit()

Hello, I wonder if it is possible to reproduce the results of "flaml.AutoML.fit()"? If possible, could you please kindly let me know how to set up the random_state (or seed) for the "flaml.AutoML.fit()"? Thanks!

opened by zzheng93 13

Using Scikit-learn APIs directly

Almost yesterday, I had a short conversation with @sonichi about this, and in general it is better to provide such features more easily to the users... Maybe you (FLAML maintainers) don't have any contest, but you should have features so that more developers will use your product...

Anyway, the files I imported from them:

Pre-processing: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/preprocessing/init.py
Model selection: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/model_selection/init.py
Metrics: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/metrics/init.py

I don't think that there is a need to write a test, but we should be careful that a method (class and/or function) is not deprecated...

A quick example that shows how to use these APIs:

import pandas as pd
from flaml.utils import preprocessing
from flaml.utils import model_selection
from flaml.utils import metrics
from flaml import AutoML

# Loading data (nothing changed)
df = pd.read_csv('<a_random_dataset_that_needs_preprocessing.csv>')
X = df[['field_no1', 'field_no2', 'field_no3', 'field_no4']]
y = df['field_no5']

# Preprocessing
le = preprocessing.LabelEncoder()
X['field_no3'] = le.fit_transform(X['field_no3'])
y['field_no5'] = le.fit_transform(X['field_no5'])

# Seperating the train and test data
X_train, y_train, X_test, y_test = model_selection.train_test_split(X, y, test_size=.2)

# Training phase (nothing changed)
automl = AutoML()
automl.fit(X_train, y_train, task='classification')

# Measuring accuracy
y_pred = automl.predict(X_test)
print(metrics.classification_report(y_test, y_pred))

Or:

from flaml.utils import (
    LabelEncoder,
    train_test_split,
    classification_report,
)
from flaml import AutoML

Or even:

from flaml import (
    LabelEncoder,
    train_test_split,
    classification_report,
    AutoML,
)

opened by sheikhartin 11

Crash with ValueError when ensemble=True

When I set ensemble=True, and my data has categorical features, I get the following error at the end of the FLAML run:

[flaml.automl: 07-08 09:40:44] {1141} INFO -  at 9373.5s,       best extra_tree's error=0.2056, best rf's error=0.1950[flaml.automl: 07-08 09:40:44] {993} INFO - iteration 52, current learner rf[flaml.automl: 07-08 09:41:42] {1141} INFO -  at 9431.7s,       best rf's error=0.1950, best rf's error=0.1950
[flaml.automl: 07-08 09:41:42] {993} INFO - iteration 53, current learner rf
[flaml.automl: 07-08 09:42:11] {1141} INFO -  at 9460.7s,       best rf's error=0.1950, best rf's error=0.1950[flaml.automl: 07-08 09:42:11] {993} INFO - iteration 54, current learner rf[flaml.automl: 07-08 09:50:15] {1141} INFO -  at 9944.4s,       best rf's error=0.1949, best rf's error=0.1949
[flaml.automl: 07-08 09:50:15] {1187} INFO - selected model: RandomForestClassifier(criterion='entropy', max_features=0.7294599478674504,
                       n_estimators=347, n_jobs=10)[flaml.automl: 07-08 09:50:15] {1197} INFO - [('rf', <flaml.model.RandomForestEstimator object at 0x7fca69effaf0>), ('extra_tree', <flaml.model.ExtraTreeEstimator object at 0x7fca8cc1f8e0>), ('lgbm', <flaml.model.LGBMEstimator object at 0x7fc799985190>), ('catboost', <flaml.model.CatBoostEstimator object at 0x7fc
a8cc884f0>), ('xgboost', <flaml.model.XGBoostSklearnEstimator object at 0x7fca8cd0e610>)]
/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/xgboost/sklearn.py:888: UserWarning: The use of label encoder in XGBClassifier is deprecat
ed and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier
object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/xgboost/sklearn.py:888: UserWarning: The use of label encoder in XGBClassifier is deprecat
ed and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier
object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
Traceback (most recent call last):  File "search.py", line 212, in <module>    dump_json(data_sheet_file, data_sheet)
  File "search.py", line 208, in main
    with open(data_sheet_file) as f:  File "search.py", line 163, in run_data_sheet    run['flaml_settings'] = jsonpickle.encode(automl_settings, unpicklable=False, keys=True)
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/flaml/automl.py", line 943, in fit
    self._search()  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/flaml/automl.py", line 1212, in _search    stacker.fit(self._X_train_all, self._y_train_all,
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 441, in fit
    return super().fit(X, self._le.transform(y), sample_weight)  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 196, in fit    _fit_single_estimator(self.final_estimator_, X_meta, y,
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_base.py", line 39, in _fit_single_estimator
    estimator.fit(X, y)
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/flaml/model.py", line 296, in fit
    self._fit(X_train, y_train, **kwargs)
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/flaml/model.py", line 78, in _fit
    model.fit(X_train, y_train, **kwargs)
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 304, in fit
    X, y = self._validate_data(X, y, multi_output=True,
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/base.py", line 433, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 871, in check_X_y
    X = check_array(X, accept_sparse=accept_sparse,
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 673, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
  File "/global/home/hpc3552/.conda/envs/myenv/lib/python3.8/site-packages/numpy/core/_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: '__OTHER__'

This error does not occur if ensemble=False or if I remove (or encode) the categorical features from my dataset

My guess is that FLAML properly encodes categorical features when training the base estimators (LGBM, RF, etc), but not when training the stacking classifier.

opened by stepthom 11

Catboost not respecting custom_hp

Hi,

I have configured a custom_hp for catboost:

custom_hp = {
    "catboost": {
        'n_estimators': {
            "domain": tune.randint(lower=1000, upper=3000),
            "init_value": 2001,
            "low_cost_init_value": 2000,
        },
        'learning_rate': {
            "domain": tune.uniform(lower=0.1, upper=1.0),
            "init_value": 0.01,
            "low_cost_init_value": .1,
        },
        'colsample_bylevel': {
            "domain": tune.uniform(lower=0.1, upper=1.0),
            "init_value": 0.1,
        },
        'depth': {
            "domain": tune.randint(lower=1, upper=12),
            "init_value": 2,
            "low_cost_init_value": 2,
        },
        'l2_leaf_reg': {
            "domain": tune.uniform(lower=0.1, upper=20),
            "init_value": 3.0,
            "low_cost_init_value": 1.0,
        },
        'bootstrap_type': {
            "domain": tune.choice(['Bayesian', 'Bernoulli', 'MVS']),
            "init_value": "Bayesian",
            "low_cost_init_value": "Bayesian",
        },
        'grow_policy': {
            "domain": tune.choice(['Lossguide', 'Depthwise']),
            "init_value": "Lossguide",
            "low_cost_init_value": "Lossguide",
        },
    },
}

The first iteration of the model fit the self.params for n_estimators is 2001 as defined by init_value. However, subsequent iterations of the fit the n_estimators is a very low value such as 3. Initially I thought it was because my low_cost_init_value was low set at 2 but as you can see I set it to the minimun for the domain as 2000.

Here is screen shot of first iteration where self.params are consistent with custom_hp:

On second iteration onward, the n_estimators are very low and outside of the custom_hp domain:

And here is the log after first iteration:

{"record_id": 0, "iter_per_learner": 1, "logged_metric": {"sharpe": -0.00794463007228285, "correlation": -0.00015975141980052537}, "trial_time": 95.14761304855347, "wall_clock_time": 95.15061020851135, "validation_loss": 0.00794463007228285, "config": {"early_stopping_rounds": 10, "n_estimators": 1, "colsample_bylevel": 0.1, "depth": 2, "l2_leaf_reg": 2.9999999999999996, "bootstrap_type": "Bayesian", "grow_policy": "Lossguide", "learning_rate": 0.42154280932622956}, "learner": "catboost", "sample_size": 70417}

opened by jmrichardson 10

Unable to work with FLAML in Kaggle

Error : IImportError Traceback (most recent call last) /tmp/ipykernel_33/3768597154.py in ----> 1 from flaml import AutoML 2 automl = AutoML()

/opt/conda/lib/python3.7/site-packages/flaml/init.py in 1 from flaml.searcher import CFO, BlendSearch, FLOW2, BlendSearchTuner ----> 2 from flaml.automl import AutoML, logger_formatter 3 from flaml.onlineml.autovw import AutoVW 4 from flaml.version import version 5 import logging

/opt/conda/lib/python3.7/site-packages/flaml/automl.py in 22 import logging 23 import json ---> 24 from .ml import ( 25 compute_estimator, 26 train_estimator,

/opt/conda/lib/python3.7/site-packages/flaml/ml.py in 6 import numpy as np 7 import pandas as pd ----> 8 from sklearn.metrics import ( 9 mean_squared_error, 10 r2_score,

ImportError: cannot import name 'mean_absolute_percentage_error' from 'sklearn.metrics' (/opt/conda/lib/python3.7/site-packages/sklearn/metrics/init.py)

opened by GDGauravDutta 10
Do you expect FLAML to run across multiple nodes in AML using RAY

On your page https://microsoft.github.io/FLAML/docs/Examples/Integrate%20-%20AzureML/#use-ray-to-distribute-across-a-cluster you describe the process of configuring an AML cluster with Ray and using FLAML against it.

Is it your expectation that this configuration, as you have it, will be distributed Ray or parallel Ray on a single node?

I was under the impression you needed a compute cluster with a VNET to allow a Ray cluster?

Also I ran your sample and modified it slightly to log the node id of the machine executing (I did this via a custom metric so it ran for each iteration) and it always only logged the same node that my flaml script is executing on even though I have 2 nodes available.

opened by camer314 10
New tuning API in ray 2

There seems to be a big change in ray tune API: https://docs.ray.io/en/latest/tune/api_docs/execution.html#tuner How does it affect flaml when using ray 2 as the backend?

@Yard1 your insight would be appreciated.

opened by sonichi 1
fix #871: call check_spark only when necessary
Why are these changes needed?

Currently check_spark will be called when use_spark=False and n_concurrent_trials=1 . Which creates unnecessary spark session. With this PR, only when use_spark=True or (n_concurrent_trials>1 and ray is not available) will check_spark be called.

Related issue number

Closes #871

Checks

[x] I've used pre-commit to lint the changes in this PR, or I've made sure lint with flake8 output is two 0s.

[ ] I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.

[ ] I've added tests (if relevant) corresponding to the changes introduced in this PR.

[x] I've made sure all auto checks have passed.
opened by thinkall 4

`AutoML.fit` always creates a spark session even when it doesn't need to

Currently, AutoML.fit always creates a spark session even when it doesn't need to in the following line: https://github.com/microsoft/FLAML/blob/90aea9c28b6100faf86f2c204e53a68e87f10c66/flaml/automl/automl.py#L2607

check_spark calls SparkSession.builder.getOrCreate and creates a spark session if it doesn't exist:

https://github.com/microsoft/FLAML/blob/90aea9c28b6100faf86f2c204e53a68e87f10c66/flaml/tune/spark/utils.py#L48

I think we can skip check_spark in the following cases:

When use_ray = True
When n_concurrent_trials = 1

Code to reproduce:

from flaml import AutoML
from sklearn.datasets import load_iris

automl = AutoML()
X, y =load_iris(as_frame=True, return_X_y=True)
automl.fit(X, y, task="classification")

Output:

check Spark installation...This line should appear only once.

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/05 14:41:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[flaml.automl.automl: 01-05 14:41:48] {2712} INFO - task = classification
[flaml.automl.automl: 01-05 14:41:48] {2714} INFO - Data split method: stratified
[flaml.automl.automl: 01-05 14:41:48] {2717} INFO - Evaluation method: cv
[flaml.automl.automl: 01-05 14:41:48] {2844} INFO - Minimizing error metric: log_loss
...

opened by harupy 2

notebook test
Why are these changes needed?

Add tests for some notebooks. Removed warning message about Spark which is unnecessary for non-Spark users.

Related issue number

#851

Checks

[x] I've used pre-commit to lint the changes in this PR, or I've made sure lint with flake8 output is two 0s.

[x] I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.

[x] I've added tests (if relevant) corresponding to the changes introduced in this PR.

[ ] I've made sure all auto checks have passed.
opened by sonichi 6
HistGradientBoosting support

Right now it could be only added by writing custom class.

Question is: will there be added support to flamlize_estimator for HistGradientBoostingClassifier and HistGradientBoostingRegressor ?

opened by glevv 1

Releases(v1.1.0)

v1.1.0(Dec 30, 2022)
Highlights

Spark is now supported as a new parallel tuning backend.

New tuning capability: targeted tuning with multiple lexicographic objectives. Check out documentation and an example for this new tuning capability.

New metrics: roc_auc_weighted, roc_auc_ovr_weighted, roc_auc_ovo_weighted.

New reproducible learner selection method when time_budget is not specified.

AutoML-related functionaility is moved into a new automl subpackage.

Thanks to all contributors who contributed to this release!

What's Changed

Bump actions/checkout from 2 to 3 by @dependabot in https://github.com/microsoft/FLAML/pull/699

fix dependably alert by @skzhang1 in https://github.com/microsoft/FLAML/pull/818

fix typo by @skzhang1 in https://github.com/microsoft/FLAML/pull/823

install editable package in codespace by @sonichi in https://github.com/microsoft/FLAML/pull/826

skip test_hf_data in py 3.6 by @sonichi in https://github.com/microsoft/FLAML/pull/832

fix typo of output directory by @thinkall in https://github.com/microsoft/FLAML/pull/828

catch TFT logger bugs by @int-chaos in https://github.com/microsoft/FLAML/pull/833

roc_auc_weighted metric addition by @shreyas36 in https://github.com/microsoft/FLAML/pull/827

make performance test reproducible by @sonichi in https://github.com/microsoft/FLAML/pull/837

Refactor into automl subpackage by @markharley in https://github.com/microsoft/FLAML/pull/809

Edit the announcement of AAAI-23 tutorial and the KDD tutorial announcement. by @HangHouCheong in https://github.com/microsoft/FLAML/pull/820

Use get to avoid KeyError by @sonichi in https://github.com/microsoft/FLAML/pull/824

Update doc by @skzhang1 in https://github.com/microsoft/FLAML/pull/843

fix bug related to choice by @sonichi in https://github.com/microsoft/FLAML/pull/848

FAQ about OOM by @sonichi in https://github.com/microsoft/FLAML/pull/849

Update .NET documentation links by @luisquintanilla in https://github.com/microsoft/FLAML/pull/847

Added an info reminding user that if no time_budget and no max_iter is specified, then effectively zero-shot AutoML is used by @jingdong00 in https://github.com/microsoft/FLAML/pull/850

Fix example tune-pytorch where the checkpoint path may be named differently by @jingdong00 in https://github.com/microsoft/FLAML/pull/853

Format errors on the web. by @skzhang1 in https://github.com/microsoft/FLAML/pull/855

Add supporting using Spark as the backend of parallel training by @thinkall in https://github.com/microsoft/FLAML/pull/846

Info and naming by @sonichi in https://github.com/microsoft/FLAML/pull/864

New Contributors

@thinkall made their first contribution in https://github.com/microsoft/FLAML/pull/828

@markharley made their first contribution in https://github.com/microsoft/FLAML/pull/809

@HangHouCheong made their first contribution in https://github.com/microsoft/FLAML/pull/820

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.14...v1.1.0
Source code(tar.gz)
Source code(zip)
v1.0.14(Nov 16, 2022)
Highlights

Preparing alpha release of multi-objective hyperparameter tuning with lexicographic preference.

Fixed issues related to zero-shot automl.

Multiple improvements to documentation.

What's Changed

Discord Badge Added by @royninja in https://github.com/microsoft/FLAML/pull/760

fix bug in current nlp documentation by @liususan091219 in https://github.com/microsoft/FLAML/pull/763

Multiple objectives hyperparameter tuning with lexicographic preference by @Anonymous-submission-repo in https://github.com/microsoft/FLAML/pull/752

Indentation corrected by @Kirito-Excalibur in https://github.com/microsoft/FLAML/pull/778

Included hint to escape brackets for pip setup by @evensure in https://github.com/microsoft/FLAML/pull/786

Docs by @velezbeltran in https://github.com/microsoft/FLAML/pull/765

Bump actions/setup-python from 2 to 4 by @dependabot in https://github.com/microsoft/FLAML/pull/700

Bump codecov/codecov-action from 1 to 3 by @dependabot in https://github.com/microsoft/FLAML/pull/697

Removed extra | in documentation by @satya-vinay in https://github.com/microsoft/FLAML/pull/790

fix_alert by @skzhang1 in https://github.com/microsoft/FLAML/pull/793

Fixed typo by @ElinaAndreeva in https://github.com/microsoft/FLAML/pull/797

fix_alerts by @skzhang1 in https://github.com/microsoft/FLAML/pull/799

Documentation about classification/regression task #753 by @royninja in https://github.com/microsoft/FLAML/pull/802

Added a link to documentation webpage in notebook time_series_forcast by @jingdong00 in https://github.com/microsoft/FLAML/pull/791

Fix issues related to zero-shot automl by @sonichi in https://github.com/microsoft/FLAML/pull/783

added the models used for forecasting in documentation by @shreyas36 in https://github.com/microsoft/FLAML/pull/811

Add performance test for LexiFlow by @Anonymous-submission-repo in https://github.com/microsoft/FLAML/pull/812

New Contributors

@royninja made their first contribution in https://github.com/microsoft/FLAML/pull/760

@Anonymous-submission-repo made their first contribution in https://github.com/microsoft/FLAML/pull/752

@Kirito-Excalibur made their first contribution in https://github.com/microsoft/FLAML/pull/778

@evensure made their first contribution in https://github.com/microsoft/FLAML/pull/786

@velezbeltran made their first contribution in https://github.com/microsoft/FLAML/pull/765

@satya-vinay made their first contribution in https://github.com/microsoft/FLAML/pull/790

@ElinaAndreeva made their first contribution in https://github.com/microsoft/FLAML/pull/797

@jingdong00 made their first contribution in https://github.com/microsoft/FLAML/pull/791

@shreyas36 made their first contribution in https://github.com/microsoft/FLAML/pull/811

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.13...v1.0.14
Source code(tar.gz)
Source code(zip)
v1.0.13(Oct 13, 2022)
Highlights

Logging the search_state.config directly to MLflow instead of key-dictionary pair

Move searcher and scheduler into tune

Move import location for Ray 2

Fix NLP dimension mismatch bug

What's Changed

Dockerfile building problem by @skzhang1 in https://github.com/microsoft/FLAML/pull/719

Update Contribute.md by @vijaya-lakshmi-venkatraman in https://github.com/microsoft/FLAML/pull/716

Move import location for Ray 2 by @sonichi in https://github.com/microsoft/FLAML/pull/721

Fix issue 728 add hyperlink to GitHub location by @Libens-bufo in https://github.com/microsoft/FLAML/pull/731

Update model.py by @vijaya-lakshmi-venkatraman in https://github.com/microsoft/FLAML/pull/739

Issue724 by @liususan091219 in https://github.com/microsoft/FLAML/pull/745

log search_state.config directly instead of under tag config by @prithvikannan in https://github.com/microsoft/FLAML/pull/747

move searcher and scheduler into tune by @sonichi in https://github.com/microsoft/FLAML/pull/746

updating the data collator for seq-regression to handle the dim mismatch problem by @liususan091219 in https://github.com/microsoft/FLAML/pull/751

Update Contribute by @sonichi in https://github.com/microsoft/FLAML/pull/741

Remove NLP classification head by @liususan091219 in https://github.com/microsoft/FLAML/pull/756

New Contributors

@vijaya-lakshmi-venkatraman made their first contribution in https://github.com/microsoft/FLAML/pull/716

@Libens-bufo made their first contribution in https://github.com/microsoft/FLAML/pull/731

@prithvikannan made their first contribution in https://github.com/microsoft/FLAML/pull/747

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.12...v1.0.13
Source code(tar.gz)
Source code(zip)
v1.0.12(Sep 6, 2022)
Highlights

Fix MLFlow bug to support the case where search.state.metric_for_logging is None

Support customized cross-validation strategy

Fix SARIMAX seasonal_order parameter name in the wrapper

Thanks to all the contributors for this release!

What's Changed

chore: Auto update github actions with dependabot by @iemejia in https://github.com/microsoft/FLAML/pull/688

talks and tutorials by @qingyun-wu in https://github.com/microsoft/FLAML/pull/694

updating nlp notebook by @liususan091219 in https://github.com/microsoft/FLAML/pull/693

"intermediate_results" TypeError: argument of type 'NoneType' is not iterable by @liususan091219 in https://github.com/microsoft/FLAML/pull/695

Update Research.md by @sonichi in https://github.com/microsoft/FLAML/pull/701

Bump actions/setup-node from 2 to 3 by @dependabot in https://github.com/microsoft/FLAML/pull/698

Bump actions/cache from 1 to 3 by @dependabot in https://github.com/microsoft/FLAML/pull/696

Support customized cross-validation strategy by @skzhang1 in https://github.com/microsoft/FLAML/pull/669

Add $schema to cgmanifest.json by @JamieMagee in https://github.com/microsoft/FLAML/pull/708

Fix SARIMAX seasonal_order parameter name in the wrapper by @EgorKraevTransferwise in https://github.com/microsoft/FLAML/pull/711

New Contributors

@iemejia made their first contribution in https://github.com/microsoft/FLAML/pull/688

@JamieMagee made their first contribution in https://github.com/microsoft/FLAML/pull/708

@EgorKraevTransferwise made their first contribution in https://github.com/microsoft/FLAML/pull/711

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.11...v1.0.12
Source code(tar.gz)
Source code(zip)
v1.0.11(Aug 21, 2022)
Highlights

Preserve the checkpoint when deleting AutoML objects.

Create no eval set when setting use_best_model to False for catboost.

What's Changed

add guideline collection by @qingyun-wu in https://github.com/microsoft/FLAML/pull/687

LightGBM notebook update by @sonichi in https://github.com/microsoft/FLAML/pull/690

Add preserve_checkpoint to preserve the checkpoint after del by @liususan091219 in https://github.com/microsoft/FLAML/pull/692

use_best_model for catboost by @sonichi in https://github.com/microsoft/FLAML/pull/679

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.10...v1.0.11
Source code(tar.gz)
Source code(zip)
v1.0.10(Aug 16, 2022)
This release contains several new features to highlight:

A major new feature is to support multiple time series in one dataset with a new task named "ts_forecast_panel" and a neural network estimator from pytorch-forecast.

Allow disabling shuffle for custom splitter.

Allow explicit specification of whether the choices of a hp have an inherent order.

Allow skipping data transformation to avoid overhead.

Support AzureML pipeline tuning.

Allow log file name to be specified in tune.run and perform logging when ray is used.

There are other improvements for the transformer estimator and bug fixes for config constraints.

What's Changed

Fixing the issue that FLAML trial number is significantly smaller than Transformers.hyperparameter_search by @liususan091219 in https://github.com/microsoft/FLAML/pull/657

make test result more stable by @sonichi in https://github.com/microsoft/FLAML/pull/646

Add pipeline tuner component and dependencies. by @ruizhuanguw in https://github.com/microsoft/FLAML/pull/671

Skip transform by @jmrichardson in https://github.com/microsoft/FLAML/pull/665

pull request template by @sonichi in https://github.com/microsoft/FLAML/pull/668

Update Research.md by @liususan091219 in https://github.com/microsoft/FLAML/pull/672

Documentation on search space and parallel/sequential tuning by @qingyun-wu in https://github.com/microsoft/FLAML/pull/675

time series forecasting with panel datasets by @int-chaos in https://github.com/microsoft/FLAML/pull/541

categorical choice can be ordered or unordered by @sonichi in https://github.com/microsoft/FLAML/pull/677

Disable shuffle for custom CV by @jmrichardson in https://github.com/microsoft/FLAML/pull/659

update time series forecast notebook by @int-chaos in https://github.com/microsoft/FLAML/pull/682

check config constraints for the initial config by @sonichi in https://github.com/microsoft/FLAML/pull/685

log_file_name in tune.run() by @sonichi in https://github.com/microsoft/FLAML/pull/681

updating nlp notebook by @liususan091219 in https://github.com/microsoft/FLAML/pull/683

VW version requirement and documentation on config_constraints vs metric_constraints by @qingyun-wu in https://github.com/microsoft/FLAML/pull/686

New Contributors

@jmrichardson made their first contribution in https://github.com/microsoft/FLAML/pull/665

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.9...v1.0.10
Source code(tar.gz)
Source code(zip)
v1.0.9(Jul 31, 2022)
Highlight

Add the feature names and importance in AutoML

Update NLP search space and fix several bugs in NLP tasks

Respect kwargs in AutoML.predict()

What's Changed

Feature names and importances by @sonichi in https://github.com/microsoft/FLAML/pull/621

fix NER roberta bug by @liususan091219 in https://github.com/microsoft/FLAML/pull/632

updating search space by @liususan091219 in https://github.com/microsoft/FLAML/pull/633

Bump terser from 5.10.0 to 5.14.2 in /website by @dependabot in https://github.com/microsoft/FLAML/pull/642

This PR fixes the frequent NLP bugs in the other PRs by @liususan091219 in https://github.com/microsoft/FLAML/pull/647

added "**kwargs" to "predict" by @zzheng93 in https://github.com/microsoft/FLAML/pull/641

Fix alerts by @skzhang1 in https://github.com/microsoft/FLAML/pull/644

Update .NET documentation by @luisquintanilla in https://github.com/microsoft/FLAML/pull/643

Fix HPO evaluation bug by @liususan091219 in https://github.com/microsoft/FLAML/pull/645

New Contributors

@dependabot made their first contribution in https://github.com/microsoft/FLAML/pull/642

@zzheng93 made their first contribution in https://github.com/microsoft/FLAML/pull/641

@luisquintanilla made their first contribution in https://github.com/microsoft/FLAML/pull/643

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.8...v1.0.9
Source code(tar.gz)
Source code(zip)
v1.0.8(Jul 10, 2022)
Support latest xgboost version

Reproducibility improvement for blendsearch

Allow custom GroupKFold object as split_type

Bug fix in token classification tasks such as NER

Allow FLAML_sample_size in starting_points

What's Changed

log msg about ensemble by @sonichi in https://github.com/microsoft/FLAML/pull/597

support latest xgboost version by @sonichi in https://github.com/microsoft/FLAML/pull/599

Fix automl settings in scikit-learn pipeline integration example by @ZviBaratz in https://github.com/microsoft/FLAML/pull/602

update got version by @sonichi in https://github.com/microsoft/FLAML/pull/607

min eci depends on cost_attr; cost_attr in ls by @sonichi in https://github.com/microsoft/FLAML/pull/612

Replaced !pip calls with %pip magic command by @ZviBaratz in https://github.com/microsoft/FLAML/pull/604

cath URLError by @sonichi in https://github.com/microsoft/FLAML/pull/613

Updated pre-commit hooks by @ZviBaratz in https://github.com/microsoft/FLAML/pull/609

Py36 by @sonichi in https://github.com/microsoft/FLAML/pull/614

Allow custom GroupKFold object as split_type by @sonichi in https://github.com/microsoft/FLAML/pull/616

Typo fix by @ZviBaratz in https://github.com/microsoft/FLAML/pull/618

use relative url in doc by @sonichi in https://github.com/microsoft/FLAML/pull/620

This PR will solve issue, code example format in the doc #622 by @31Sanskrati in https://github.com/microsoft/FLAML/pull/623

fix ner bug; refactor post processing of TransformersEstimator prediction by @liususan091219 in https://github.com/microsoft/FLAML/pull/615

isinstance(x, int) -> isinstance(x, (int, np.integer)) by @liususan091219 in https://github.com/microsoft/FLAML/pull/627

Allow FLAML_sample_size in starting_points by @qingyun-wu in https://github.com/microsoft/FLAML/pull/619

disable max_len for ner by @liususan091219 in https://github.com/microsoft/FLAML/pull/629

fix #630 by @adi611 in https://github.com/microsoft/FLAML/pull/631

New Contributors

@ZviBaratz made their first contribution in https://github.com/microsoft/FLAML/pull/602

@31Sanskrati made their first contribution in https://github.com/microsoft/FLAML/pull/623

@adi611 made their first contribution in https://github.com/microsoft/FLAML/pull/631

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.7...v1.0.8
Source code(tar.gz)
Source code(zip)
v1.0.7(Jun 17, 2022)
Add support of Python 3.10.

Enable ensemble when using ray.

Enable nested tuning runs.

Made BlendSearch reproducible when constructed outside tune.run().

Fix resource limit issue in some macos version.

Bug fix in nlp.

Make set_search_properties() compatible with ray tune.

What's Changed

enable ensemble when using ray by @sonichi in https://github.com/microsoft/FLAML/pull/583

update time from start when using ray by @sonichi in https://github.com/microsoft/FLAML/pull/586

Class variables, cost_attr, and reproducibility by @qingyun-wu in https://github.com/microsoft/FLAML/pull/587

backup & recover global vars for nested tune.run by @sonichi in https://github.com/microsoft/FLAML/pull/584

fixing a bug in nlp/utils.py by @liususan091219 in https://github.com/microsoft/FLAML/pull/590

fix resource limit issue by @sonichi in https://github.com/microsoft/FLAML/pull/589

Modified setup instructions by @daniel-555 in https://github.com/microsoft/FLAML/pull/593

Add python 3.10 in the CI by @sonichi in https://github.com/microsoft/FLAML/pull/591

trying to fix the indexerror for ner by @liususan091219 in https://github.com/microsoft/FLAML/pull/596

Update documentation for NLP by @liususan091219 in https://github.com/microsoft/FLAML/pull/594

set_search_properties by @sonichi in https://github.com/microsoft/FLAML/pull/595

New Contributors

@daniel-555 made their first contribution in https://github.com/microsoft/FLAML/pull/593

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.6...v1.0.7
Source code(tar.gz)
Source code(zip)
v1.0.6(Jun 9, 2022)
What's Changed

init value type match by @sonichi in https://github.com/microsoft/FLAML/pull/575

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.5...v1.0.6
Source code(tar.gz)
Source code(zip)
v1.0.5(Jun 7, 2022)
What's Changed

fixing trainable and update function, completing NOTE by @liususan091219 in https://github.com/microsoft/FLAML/pull/566

Update fit_kwargs_by_estimator example in Task-Oriented-AutoML.md by @liususan091219 in https://github.com/microsoft/FLAML/pull/561

add zeroshot notebook by @sonichi in https://github.com/microsoft/FLAML/pull/569

set holiday version <0.14 for prophet by @sonichi in https://github.com/microsoft/FLAML/pull/573

Updated doc by @PrajwalBorkar in https://github.com/microsoft/FLAML/pull/572

install openml for notebook example by @sonichi in https://github.com/microsoft/FLAML/pull/574

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.4...v1.0.5
Source code(tar.gz)
Source code(zip)
v1.0.4(Jun 2, 2022)
What's Changed

Update documentation for FAQ about how to handle imbalanced data by @liususan091219 in https://github.com/microsoft/FLAML/pull/560

update doc about scheduler exception by @sonichi in https://github.com/microsoft/FLAML/pull/564

version update by @sonichi in https://github.com/microsoft/FLAML/pull/567

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.3...v1.0.4
Source code(tar.gz)
Source code(zip)
v1.0.3(May 31, 2022)
Data files needed for zero-shot AutoML are included in this release. When no search budget is given via time_budget/max_iter, zero-shot automl is used automatically.

What's Changed

align indent and add missing quotation by @sonichi in https://github.com/microsoft/FLAML/pull/555

solve issue #542. fix pickle.UnpickingError while blendsearch warm start by @LinWencong in https://github.com/microsoft/FLAML/pull/554

Documentation, test and bugfix by @qingyun-wu in https://github.com/microsoft/FLAML/pull/556

Removed cat_hp_cost by @PrajwalBorkar in https://github.com/microsoft/FLAML/pull/559

Update Tune-User-Defined-Function.md by @sonichi in https://github.com/microsoft/FLAML/pull/562

use zeroshot when no budget is given; custom_hp by @sonichi in https://github.com/microsoft/FLAML/pull/563

simplify warmstart in blendsearch by @sonichi in https://github.com/microsoft/FLAML/pull/558

include .json file in flaml.default package by @sonichi in https://github.com/microsoft/FLAML/pull/565

New Contributors

@LinWencong made their first contribution in https://github.com/microsoft/FLAML/pull/554

@PrajwalBorkar made their first contribution in https://github.com/microsoft/FLAML/pull/559

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.2...v1.0.3
Source code(tar.gz)
Source code(zip)
v1.0.2(May 20, 2022)
What's Changed

docstr cleanup #523: removed lines 259 to 260 in a1c49ca by @elbowgreasel in https://github.com/microsoft/FLAML/pull/524

refactoring TransformersEstimator to support default and custom_hp by @liususan091219 in https://github.com/microsoft/FLAML/pull/511

Bump cross-fetch from 3.1.4 to 3.1.5 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/529

fixing use_ray in automl.py by @liususan091219 in https://github.com/microsoft/FLAML/pull/531

handle non-flaml scheduler in flaml.tune by @qingyun-wu in https://github.com/microsoft/FLAML/pull/532

test reproducibility from retrain by @sonichi in https://github.com/microsoft/FLAML/pull/533

fix the post-processing bug in NER by @liususan091219 in https://github.com/microsoft/FLAML/pull/534

fixing roberta add_prefix_space bug by @liususan091219 in https://github.com/microsoft/FLAML/pull/546

choose n_jobs for ensemble according to n_jobs per learner by @sonichi in https://github.com/microsoft/FLAML/pull/551

Quick-fix by @Qiaochu-Song in https://github.com/microsoft/FLAML/pull/539

fix indentation in automl.py by @harish445 in https://github.com/microsoft/FLAML/pull/553

New Contributors

@elbowgreasel made their first contribution in https://github.com/microsoft/FLAML/pull/524

@Qiaochu-Song made their first contribution in https://github.com/microsoft/FLAML/pull/539

@harish445 made their first contribution in https://github.com/microsoft/FLAML/pull/553

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.1...v1.0.2
Source code(tar.gz)
Source code(zip)
v1.0.1(Apr 24, 2022)
What's Changed

use ffill in forecasting example by @sonichi in https://github.com/microsoft/FLAML/pull/508

Handling fractional gpu_per_trial for NLP by @liususan091219 in https://github.com/microsoft/FLAML/pull/513

Fix AttributeError: readonly attribute for Python 3.10.4 by @jayshanker2000 in https://github.com/microsoft/FLAML/pull/518

max choice is n-1 by @sonichi in https://github.com/microsoft/FLAML/pull/521

allow evaluated_rewards shorter than points_to_evaluate by @sonichi in https://github.com/microsoft/FLAML/pull/522

New Contributors

@jayshanker2000 made their first contribution in https://github.com/microsoft/FLAML/pull/518

Full Changelog: https://github.com/microsoft/FLAML/compare/v1.0.0...v1.0.1
Source code(tar.gz)
Source code(zip)
v1.0.0(Mar 31, 2022)
What's Changed

zero-shot AutoML in readme by @sonichi in https://github.com/microsoft/FLAML/pull/474

update documentation for time series forecasting by @int-chaos in https://github.com/microsoft/FLAML/pull/472

metric constraints in flaml.automl by @qingyun-wu in https://github.com/microsoft/FLAML/pull/479

import from lightgbm by @sonichi in https://github.com/microsoft/FLAML/pull/489

fixing bug for ner by @liususan091219 in https://github.com/microsoft/FLAML/pull/463

doc update (#490) by @sonichi in https://github.com/microsoft/FLAML/pull/492

adding evaluation by @liususan091219 in https://github.com/microsoft/FLAML/pull/495

version number and doc by @sonichi in https://github.com/microsoft/FLAML/pull/497

fixing a few bugs in nlp by @liususan091219 in https://github.com/microsoft/FLAML/pull/503

Bug fix and add documentation for metric_constraints by @qingyun-wu in https://github.com/microsoft/FLAML/pull/498

fixing some bug in NLP by @liususan091219 in https://github.com/microsoft/FLAML/pull/506

handle failing trials by @sonichi in https://github.com/microsoft/FLAML/pull/505

Update notebook and test by @qingyun-wu in https://github.com/microsoft/FLAML/pull/507

Bump minimist from 1.2.5 to 1.2.6 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/502

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.10.0...v1.0.0
Source code(tar.gz)
Source code(zip)
v0.10.0(Mar 2, 2022)
This release contains an important new feature: zero-shot AutoML and mete learning. It provides a new way of doing AutoML without tuning. You can now use the existing training API from lightgbm, xgboost etc. while getting the benefit of AutoML in choosing high-performance hyperparameter configurations per task. Recommended for everyone currently using lightgbm, xgboost or random forest, regardless of previous experience in AutoML. This feature also enables continuous improvement of AutoML from historical AutoML experiments.

Other changes can be found below.

What's Changed

Typo on the webpage's Getting Started section by @cammarb in https://github.com/microsoft/FLAML/pull/457

Bump follow-redirects from 1.14.7 to 1.14.8 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/459

Docstr update by @qingyun-wu in https://github.com/microsoft/FLAML/pull/460

update regression metrics in notebooks by @sonichi in https://github.com/microsoft/FLAML/pull/454

make AutoML.classes_ an array by @sonichi in https://github.com/microsoft/FLAML/pull/467

Bump prismjs from 1.25.0 to 1.27.0 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/471

Zero-shot AutoML by @sonichi in https://github.com/microsoft/FLAML/pull/468

don't init global search with points_to_evaluate unless evaluated_rewards is provided; handle callbacks in fit kwargs by @sonichi in https://github.com/microsoft/FLAML/pull/469

New Contributors

@cammarb made their first contribution in https://github.com/microsoft/FLAML/pull/457

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.7...v0.10.0
Source code(tar.gz)
Source code(zip)
v0.9.7(Feb 12, 2022)
What's Changed

Update Task-Oriented-AutoML.md by @vvijayalakshmi21 in https://github.com/microsoft/FLAML/pull/446

Update Task-Oriented-AutoML.md by @vvijayalakshmi21 in https://github.com/microsoft/FLAML/pull/447

Update Tune-User-Defined-Function.md by @vvijayalakshmi21 in https://github.com/microsoft/FLAML/pull/448

corrected typo in example xgboost documentation by @MichaelMarien in https://github.com/microsoft/FLAML/pull/449

bump ray version to 1.10 by @sonichi in https://github.com/microsoft/FLAML/pull/450

fix a bug when using ray & update ray on aml by @sonichi in https://github.com/microsoft/FLAML/pull/455

New Contributors

@vvijayalakshmi21 made their first contribution in https://github.com/microsoft/FLAML/pull/446

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.6...v0.9.7
Source code(tar.gz)
Source code(zip)
v0.9.6(Jan 31, 2022)
What's Changed

reducing AutoConfig.from_pretrained by @liususan091219 in https://github.com/microsoft/FLAML/pull/411

Set use_ray to True for logging to databricks by @liususan091219 in https://github.com/microsoft/FLAML/pull/414

Bump nanoid from 3.1.30 to 3.2.0 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/420

bump version of node-fetch to 3.1.1 in website/ by @sonichi in https://github.com/microsoft/FLAML/pull/423

Use Ray _BackwardsCompatibleNumpyRng if possible by @Yard1 in https://github.com/microsoft/FLAML/pull/421

remove FLAML sample size from config by @sonichi in https://github.com/microsoft/FLAML/pull/418

max_iter < 2 -> no search; sign in metric constraints; test and example for forecasting by @sonichi in https://github.com/microsoft/FLAML/pull/415

remove redundant imports by @liususan091219 in https://github.com/microsoft/FLAML/pull/426

Support time series forecasting for discrete target variable by @int-chaos in https://github.com/microsoft/FLAML/pull/416

homepage update by @sonichi in https://github.com/microsoft/FLAML/pull/425

fix a broken link in README.md by @m13uz in https://github.com/microsoft/FLAML/pull/439

adding catch for HTTP error by @liususan091219 in https://github.com/microsoft/FLAML/pull/432

Change the upper bound for "lags" hyperparameter for sklearn forecast models by @int-chaos in https://github.com/microsoft/FLAML/pull/437

Gpu support for xgboost by @sonichi in https://github.com/microsoft/FLAML/pull/442

data in csv by @sonichi in https://github.com/microsoft/FLAML/pull/430

note about preview feature by @sonichi in https://github.com/microsoft/FLAML/pull/431

New Contributors

@m13uz made their first contribution in https://github.com/microsoft/FLAML/pull/439

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.5...v0.9.6
Source code(tar.gz)
Source code(zip)
v0.9.5(Jan 17, 2022)
What's Changed

fixing load best model at the end by @liususan091219 in https://github.com/microsoft/FLAML/pull/389

Regression forecast debug by @int-chaos in https://github.com/microsoft/FLAML/pull/391

set verbose for transformers by @liususan091219 in https://github.com/microsoft/FLAML/pull/392

Logging multiple checkpoints by @liususan091219 in https://github.com/microsoft/FLAML/pull/394

postcss version update by @sonichi in https://github.com/microsoft/FLAML/pull/385

fixing default metric for regression + change verbosity for transformers by @liususan091219 in https://github.com/microsoft/FLAML/pull/397

fix issues in logging, bug in space.py, constraint sign, and improve code coverage by @sonichi in https://github.com/microsoft/FLAML/pull/388

moving intermediate_results logging from model.py to huggingface/trainer.py by @liususan091219 in https://github.com/microsoft/FLAML/pull/403

Update flaml/nlp/README.md by @liususan091219 in https://github.com/microsoft/FLAML/pull/404

Logo by @qingyun-wu in https://github.com/microsoft/FLAML/pull/399

update browser icon by @qingyun-wu in https://github.com/microsoft/FLAML/pull/407

adding logging of training loss by @liususan091219 in https://github.com/microsoft/FLAML/pull/406

Bump shelljs from 0.8.4 to 0.8.5 in /website by @sonichi in https://github.com/microsoft/FLAML/pull/402

Sklearn api x by @MichaelMarien in https://github.com/microsoft/FLAML/pull/405

New Contributors

@MichaelMarien made their first contribution in https://github.com/microsoft/FLAML/pull/405

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.4...v0.9.5
Source code(tar.gz)
Source code(zip)
v0.9.4(Jan 8, 2022)
This release enables regression models for time series forecasting. It also fixes bugs in nlp tasks, such as serialization of transformer models and automatic metrics.

What's Changed

citation file by @sonichi in https://github.com/microsoft/FLAML/pull/364

Fix several issues for nlp tasks by @sonichi in https://github.com/microsoft/FLAML/pull/380

serialize TransformerEstimator by @sonichi in https://github.com/microsoft/FLAML/pull/381

Time series forecasting with sklearn regressors by @int-chaos in https://github.com/microsoft/FLAML/pull/362

fixing auto metric bug by @liususan091219 in https://github.com/microsoft/FLAML/pull/387

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.3...v0.9.4
Source code(tar.gz)
Source code(zip)
v0.9.3(Jan 3, 2022)
What's Changed

Finish the Multiple Choice Classification by @oberonbot in https://github.com/microsoft/FLAML/pull/367

logging by @sonichi in https://github.com/microsoft/FLAML/pull/371

adding token classification by @liususan091219 and @siddheshshaji in https://github.com/microsoft/FLAML/pull/376

New Contributors

@oberonbot and @siddheshshaji made their first contribution in https://github.com/microsoft/FLAML/pull/367

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.2...v0.9.3
Source code(tar.gz)
Source code(zip)
v0.9.2(Dec 26, 2021)
New Features:

New task: text summarization

Reproducibility of hyperparameter search sequence

Run flaml in azureml + ray

What's Changed

url update for doc edit by @sonichi in https://github.com/microsoft/FLAML/pull/345

Adding the NLP task summarization by @liususan091219 @XinZofStevens @GideonWu0105 in https://github.com/microsoft/FLAML/pull/346

reproducibility for random sampling by @sonichi in https://github.com/microsoft/FLAML/pull/349

doc update by @sonichi in https://github.com/microsoft/FLAML/pull/352

azureml + ray by @sonichi in https://github.com/microsoft/FLAML/pull/344

Fixing the bug in custom metric by @liususan091219 in https://github.com/microsoft/FLAML/pull/356

Simplify lgbm example by @ruizhuanguw in https://github.com/microsoft/FLAML/pull/358

fixing custom metric by @liususan091219 in https://github.com/microsoft/FLAML/pull/357

Example by @sonichi in https://github.com/microsoft/FLAML/pull/359

New Contributors

@ruizhuanguw @XinZofStevens @GideonWu0105 made their first contribution in https://github.com/microsoft/FLAML/pull/358

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.1...v0.9.2
Source code(tar.gz)
Source code(zip)
v0.9.1(Dec 17, 2021)
This release contains several feature improvements and bug fixes. For example,

support for custom data splitter.

evaluation_function can receive incumbent result in local search and perform domain-specific early stopping by comparing with the incumbent result. As long as the comparison result (better or worse) is known, the evaluation can be stopped.

support and automate huggingface metrics.

use cfo in tune.run if bs is not installed.

fixed a bug in modifying n_estimators to satisfy constraints.

new documentation website.

What's Changed

Update flaml_pytorch_cifar10.ipynb by @sonichi in https://github.com/microsoft/FLAML/pull/328

adding HF metrics by @liususan091219 in https://github.com/microsoft/FLAML/pull/335

train at least one iter when not trained by @sonichi in https://github.com/microsoft/FLAML/pull/336

use cfo in tune.run if bs is not installed by @sonichi in https://github.com/microsoft/FLAML/pull/334

Makes the evaluation_function could receive the incumbent best result as input in Tune by @Shao-kun-Zhang in https://github.com/microsoft/FLAML/pull/339

support for customized splitters by @wuchihsu in https://github.com/microsoft/FLAML/pull/333

Deploy a new doc website by @sonichi, @qingyun-wu and @Shao-kun-Zhang in https://github.com/microsoft/FLAML/pull/338

version update by @sonichi in https://github.com/microsoft/FLAML/pull/341

New Contributors

@Shao-kun-Zhang made their first contribution in https://github.com/microsoft/FLAML/pull/339

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.9.0...v0.9.1
Source code(tar.gz)
Source code(zip)
v0.9.0(Dec 7, 2021)
Revise flaml.tune API

Add a “scheduler” argument (a user can choose from “flaml”, “asha” or a customized scheduler)

Rename "prune_attr" to "resource_attr"

Rename “training_function” to “evaluation_function”

Remove the “report_intermediate_result” argument (covered by “scheduler” instead)

Add tests for the supported schedulers

Re-run notebooks that use schedulers

Add save_best_config() to save best config in a json file

What's Changed

add save_best_config() by @sonichi in https://github.com/microsoft/FLAML/pull/324

tune api for schedulers by @qingyun-wu in https://github.com/microsoft/FLAML/pull/322

add init.py in nlp by @sonichi in https://github.com/microsoft/FLAML/pull/325

rename training_function by @qingyun-wu in https://github.com/microsoft/FLAML/pull/327

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.8.2...v0.9.0
Source code(tar.gz)
Source code(zip)
v0.8.2(Dec 4, 2021)
What's Changed

include default value in rf search space by @sonichi in https://github.com/microsoft/FLAML/pull/317

adding TODOs for NLP module, so students can implement other tasks easier by @liususan091219 in https://github.com/microsoft/FLAML/pull/321

pred_time_limit clarification and logging by @sonichi in https://github.com/microsoft/FLAML/pull/319

bug fix in confg2params by @sonichi in https://github.com/microsoft/FLAML/pull/323

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.8.1...v0.8.2
Source code(tar.gz)
Source code(zip)
v0.8.1(Nov 28, 2021)
What's Changed

Update test_regression.py by @fengsxy in https://github.com/microsoft/FLAML/pull/306

Add conda forge minimal test by @MichalChromcak in https://github.com/microsoft/FLAML/pull/309

fixing config2params for transformersestimator by @liususan091219 in https://github.com/microsoft/FLAML/pull/316

Code quality improvement based on #275 by @abnsy and @sonichi in https://github.com/microsoft/FLAML/pull/313

skip cv preparation if eval_method is holdout by @sonichi in https://github.com/microsoft/FLAML/pull/314

New Contributors

@fengsxy made their first contribution in https://github.com/microsoft/FLAML/pull/306

@abnsy made their first contribution in https://github.com/microsoft/FLAML/pull/313

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.8.0...v0.8.1
Source code(tar.gz)
Source code(zip)
v0.8.0(Nov 23, 2021)
In this release, we add two nlp tasks: sequence classification and sequence regression to flaml.AutoML, using transformer-based neural networks. Previously the nlp module was detached from flaml.AutoML with a separate API. We redesigned the API such that the nlp tasks can be accessed from the same API as other tasks, and adding more nlp tasks in future would be easy. Thanks for the hard work @liususan091219 !

We've also continued to make more performance & feature improvements. Examples:

We added a variation of XGBoost search space which uses limited max_depth. It includes the default configuration from XGBoost library. The new search space leads to significantly better performance for some regression datasets.

We allow arguments for flaml.AutoML to be passed to the constructor. This enables multioutput regression by combining sklearn's MultioutputRegressor and flaml's AutoML.

We made more memory optimization, while allowing users to keep the best model per estimator in memory through the "model_history" option.

What's Changed

Unify regression and classification for XGBoost by @sonichi in https://github.com/microsoft/FLAML/pull/276

when max_iter=1, skip search only if retrain_final by @sonichi in https://github.com/microsoft/FLAML/pull/280

example update by @sonichi in https://github.com/microsoft/FLAML/pull/281

Merge exp into flaml by @liususan091219 in https://github.com/microsoft/FLAML/pull/210

add best_loss_per_estimator by @qingyun-wu in https://github.com/microsoft/FLAML/pull/286

model_history -> save_best_model_per_estimator by @sonichi in https://github.com/microsoft/FLAML/pull/283

datetime feature engineering by @sonichi in https://github.com/microsoft/FLAML/pull/285

add warmstart test by @qingyun-wu in https://github.com/microsoft/FLAML/pull/298

empty search space by @sonichi in https://github.com/microsoft/FLAML/pull/295

multioutput regression by @sonichi in https://github.com/microsoft/FLAML/pull/292

add max_depth to xgboost search space by @sonichi in https://github.com/microsoft/FLAML/pull/282

custom metric function clarification by @sonichi in https://github.com/microsoft/FLAML/pull/300

checkpoint naming in nonray mode, fix ray mode, delete checkpoints in nonray mode by @liususan091219 in https://github.com/microsoft/FLAML/pull/293

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.7.1...v0.8.0
Source code(tar.gz)
Source code(zip)
v0.7.1(Nov 8, 2021)
What's Changed

make default verbose level > 0 when using ray by @sonichi in https://github.com/microsoft/FLAML/pull/272

default to cfo for single estimator by @sonichi in https://github.com/microsoft/FLAML/pull/273

update docstr by @sonichi and @qingyun-wu in https://github.com/microsoft/FLAML/pull/274

fixed a bug in #278 by @sonichi in https://github.com/microsoft/FLAML/pull/274

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.7.0...v0.7.1
Source code(tar.gz)
Source code(zip)
v0.7.0(Nov 4, 2021)
New feature: multivariate time series forecasting.

What's Changed

Fix exception in CFO's _create_condition if all candidate start points didn't return yet by @Yard1 in https://github.com/microsoft/FLAML/pull/263

Integrate multivariate time series forecasting by @int-chaos in https://github.com/microsoft/FLAML/pull/254

Update Dockerfile by @wuchihsu in https://github.com/microsoft/FLAML/pull/269

limit time and memory consumption by @sonichi in https://github.com/microsoft/FLAML/pull/264

New Contributors

@wuchihsu made their first contribution in https://github.com/microsoft/FLAML/pull/269

Full Changelog: https://github.com/microsoft/FLAML/compare/v0.6.9...v0.7.0
Source code(tar.gz)
Source code(zip)

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

Related tags

Overview

FLAML - Fast and Lightweight AutoML

Installation

Quickstart

Advantages

Examples

Documentation

Contributing

Developing

Setup

Coverage

Authors

License

Comments

Why are these changes needed?

Related issue number

Checks

Code to reproduce:

Why are these changes needed?

Related issue number

Checks

Releases(v1.1.0)

v1.1.0(Dec 30, 2022)

Highlights

What's Changed

New Contributors

v1.0.14(Nov 16, 2022)

Highlights

What's Changed

New Contributors

v1.0.13(Oct 13, 2022)

Highlights

What's Changed

New Contributors

v1.0.12(Sep 6, 2022)

Highlights

What's Changed

New Contributors

v1.0.11(Aug 21, 2022)

Highlights

What's Changed

v1.0.10(Aug 16, 2022)

What's Changed

New Contributors

v1.0.9(Jul 31, 2022)

Highlight

What's Changed

New Contributors

v1.0.8(Jul 10, 2022)

What's Changed

New Contributors

v1.0.7(Jun 17, 2022)

What's Changed

New Contributors

v1.0.6(Jun 9, 2022)

What's Changed

v1.0.5(Jun 7, 2022)

What's Changed

v1.0.4(Jun 2, 2022)

What's Changed

v1.0.3(May 31, 2022)

What's Changed

New Contributors

v1.0.2(May 20, 2022)

What's Changed

New Contributors

v1.0.1(Apr 24, 2022)

What's Changed

New Contributors

v1.0.0(Mar 31, 2022)

What's Changed

v0.10.0(Mar 2, 2022)

What's Changed

New Contributors

v0.9.7(Feb 12, 2022)

What's Changed

New Contributors

v0.9.6(Jan 31, 2022)