LightGBM + Optuna: no brainer

Last update: Dec 15, 2022

Overview

AutoLGBM

LightGBM + Optuna: no brainer

auto train lightgbm directly from CSV files
auto tune lightgbm using optuna
auto serve best lightgbm model using fastapi

NOTE: PRs are currently

not accepted. If there are issues/problems, please create an issue.
accepted. If there are issues/problems, please solve with a PR.

Inspired by Abhishek Thakur's AutoXGB.

Installation

Install using pip

pip install autolgbm

Usage

Training a model using AutoLGBM is a piece of cake. All you need is some tabular data.

Parameters

###############################################################################
### required parameters
###############################################################################

# path to training data
train_filename = "data_samples/binary_classification.csv"

# path to output folder to store artifacts
output = "output"

###############################################################################
### optional parameters
###############################################################################

# path to test data. if specified, the model will be evaluated on the test data
# and test_predictions.csv will be saved to the output folder
# if not specified, only OOF predictions will be saved
# test_filename = "test.csv"
test_filename = None

# task: classification or regression
# if not specified, the task will be inferred automatically
# task = "classification"
# task = "regression"
task = None

# an id column
# if not specified, the id column will be generated automatically with the name `id`
# idx = "id"
idx = None

# target columns are list of strings
# if not specified, the target column be assumed to be named `target`
# and the problem will be treated as one of: binary classification, multiclass classification,
# or single column regression
# targets = ["target"]
# targets = ["target1", "target2"]
targets = ["income"]

# features columns are list of strings
# if not specified, all columns except `id`, `targets` & `kfold` columns will be used
# features = ["col1", "col2"]
features = None

# categorical_features are list of strings
# if not specified, categorical columns will be inferred automatically
# categorical_features = ["col1", "col2"]
categorical_features = None

# use_gpu is boolean
# if not specified, GPU is not used
# use_gpu = True
# use_gpu = False
use_gpu = True

# number of folds to use for cross-validation
# default is 5
num_folds = 5

# random seed for reproducibility
# default is 42
seed = 42

# number of optuna trials to run
# default is 1000
# num_trials = 1000
num_trials = 100

# time_limit for optuna trials in seconds
# if not specified, timeout is not set and all trials are run
# time_limit = None
time_limit = 360

# if fast is set to True, the hyperparameter tuning will use only one fold
# however, the model will be trained on all folds in the end
# to generate OOF predictions and test predictions
# default is False
# fast = False
fast = False

Python API

To train a new model, you can run:

from autolgbm import AutoLGBM


# required parameters:
train_filename = "data_samples/binary_classification.csv"
output = "output"

# optional parameters
test_filename = None
task = None
idx = None
targets = ["income"]
features = None
categorical_features = None
use_gpu = True
num_folds = 5
seed = 42
num_trials = 100
time_limit = 360
fast = False

# Now its time to train the model!
algbm = AutoLGBM(
    train_filename=train_filename,
    output=output,
    test_filename=test_filename,
    task=task,
    idx=idx,
    targets=targets,
    features=features,
    categorical_features=categorical_features,
    use_gpu=use_gpu,
    num_folds=num_folds,
    seed=seed,
    num_trials=num_trials,
    time_limit=time_limit,
    fast=fast,
)
algbm.train()

CLI

Train the model using the autolgbm train command. The parameters are same as above.

autolgbm train \
 --train_filename datasets/30train.csv \
 --output outputs/30days \
 --test_filename datasets/30test.csv \
 --use_gpu

You can also serve the trained model using the autolgbm serve command.

autolgbm serve --model_path outputs/mll --host 0.0.0.0 --debug

To know more about a command, run:

`autolgbm  --help`

autolgbm train --help


usage: autolgbm  [
   
    ] train [-h] --train_filename TRAIN_FILENAME [--test_filename TEST_FILENAME] --output
                                        OUTPUT [--task {classification,regression}] [--idx IDX] [--targets TARGETS]
                                        [--num_folds NUM_FOLDS] [--features FEATURES] [--use_gpu] [--fast]
                                        [--seed SEED] [--time_limit TIME_LIMIT]

optional arguments:
  -h, --help            show this help message and exit
  --train_filename TRAIN_FILENAME
                        Path to training file
  --test_filename TEST_FILENAME
                        Path to test file
  --output OUTPUT       Path to output directory
  --task {classification,regression}
                        User defined task type
  --idx IDX             ID column
  --targets TARGETS     Target column(s). If there are multiple targets, separate by ';'
  --num_folds NUM_FOLDS
                        Number of folds to use
  --features FEATURES   Features to use, separated by ';'
  --use_gpu             Whether to use GPU for training
  --fast                Whether to use fast mode for tuning params. Only one fold will be used if fast mode is set
  --seed SEED           Random seed
  --time_limit TIME_LIMIT
                        Time limit for optimization

LightGBM + Optuna: no brainer

Related tags

Overview

AutoLGBM

Installation

Usage

Parameters

Python API

CLI

Owner

Rishiraj Acharya

A Microsoft Azure Web App project named Covid 19 Predictor using Machine learning Model

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them

Hierarchical Time Series Forecasting using Prophet

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

This project has Classification and Clustering done Via kNN and K-Means respectfully

Azure MLOps (v2) solution accelerators.

Create large-scale ML-driven multiscale simulation ensembles to study the interactions

Machine Learning for Time-Series with Python.Published by Packt

A pure-python implementation of the UpSet suite of visualisation methods by Lex, Gehlenborg et al.

My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data

A machine learning model for Covid case prediction

Python factor analysis library (PCA, CA, MCA, MFA, FAMD)

A repository to index and organize the latest machine learning courses found on YouTube.

Implementation of linesearch Optimization Algorithms in Python

Dragonfly is an open source python library for scalable Bayesian optimisation.

MiniTorch - a diy teaching library for machine learning engineers

Lseng-iseng eksplor Machine Learning dengan menggunakan library Scikit-Learn

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

AP1 Transcription Factor Binding Site Prediction

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

LightGBM + Optuna: no brainer

Related tags

Overview

AutoLGBM

Installation

Usage

Parameters

Python API

CLI

Owner

Rishiraj Acharya

A Microsoft Azure Web App project named Covid 19 Predictor using Machine learning Model

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them

Hierarchical Time Series Forecasting using Prophet

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

This project has Classification and Clustering done Via kNN and K-Means respectfully

Azure MLOps (v2) solution accelerators.

Create large-scale ML-driven multiscale simulation ensembles to study the interactions

Machine Learning for Time-Series with Python.Published by Packt

A pure-python implementation of the UpSet suite of visualisation methods by Lex, Gehlenborg et al.

My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data

A machine learning model for Covid case prediction

Python factor analysis library (PCA, CA, MCA, MFA, FAMD)

A repository to index and organize the latest machine learning courses found on YouTube.

Implementation of linesearch Optimization Algorithms in Python

Dragonfly is an open source python library for scalable Bayesian optimisation.

MiniTorch - a diy teaching library for machine learning engineers

Lseng-iseng eksplor Machine Learning dengan menggunakan library Scikit-Learn

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

AP1 Transcription Factor Binding Site Prediction

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。它的特点包括: 效果出色、简单易用、通用、自动化、灵活。