Deep Learning with PyTorch made easy 🚀 !

Overview

carefree-learn

Streamlit App

Deep Learning with PyTorch made easy 🚀 !

Carefree?

carefree-learn aims to provide CAREFREE usages for both users and developers. It also provides a corresponding repo for production.

User Side

Machine Learning 📈

import cflearn
import numpy as np

x = np.random.random([1000, 10])
y = np.random.random([1000, 1])
m = cflearn.api.fit_ml(x, y, carefree=True)

Computer Vision 🖼️

import cflearn

data = cflearn.cv.MNISTData(batch_size=16, transform="to_tensor")
m = cflearn.api.resnet18_gray(10).fit(data)

Developer Side

This is a WIP section :D

Production Side

carefree-learn could be deployed easily because

  • It could be exported to onnx format with one line of code (m.to_onnx(...))
  • A native repo called carefree-learn-deploy could do the rest of the jobs, which uses FastAPI, uvicorn and docker as its backend.

Please refer to Quick Start and Developer Guides for detailed information.

Why carefree-learn?

carefree-learn is a general Deep Learning framework based on PyTorch. Since v0.2.x, carefree-learn has extended its usage from tabular dataset to (almost) all kinds of dataset. In the mean time, the APIs remain (almost) the same as v0.1.x: still simple, powerful and easy to use!

Here are some main advantages that carefree-learn holds:

Machine Learning 📈

  • Provides a scikit-learn-like interface with much more 'carefree' usages, including:
    • Automatically deals with data pre-processing.
    • Automatically handles datasets saved in files (.txt, .csv).
    • Supports Distributed Training, which means hyper-parameter tuning can be very efficient in carefree-learn.
  • Includes some brand new techniques which may boost vanilla Neural Network (NN) performances on tabular datasets, including:
  • Supports many convenient functionality in deep learning, including:
    • Early stopping.
    • Model persistence.
    • Learning rate schedulers.
    • And more...
  • Full utilization of the WIP ecosystem cf*, such as:
    • carefree-toolkit: provides a lot of utility classes & functions which are 'stand alone' and can be leveraged in your own projects.
    • carefree-data: a lightweight tool to read -> convert -> process ANY tabular datasets. It also utilizes cython to accelerate critical procedures.

From the above, it comes out that carefree-learn could be treated as a minimal Automatic Machine Learning (AutoML) solution for tabular datasets when it is fully utilized. However, this is not built on the sacrifice of flexibility. In fact, the functionality we've mentioned are all wrapped into individual modules in carefree-learn and allow users to customize them easily.

Computer Vision 🖼️

  • Also provides a scikit-learn-like interface with much more 'carefree' usages.
  • Provides many out-of-the-box pre-trained models and well hand-crafted training defaults for reproduction & finetuning.
  • Seamlessly supports efficient ddp (simply call cflearn.api.run_ddp("run.py"), where run.py is your normal training script).
  • Bunch of utility functions for research and production.

Installation

carefree-learn requires Python 3.6 or higher.

Pre-Installing PyTorch

carefree-learn requires pytorch>=1.9.0. Please refer to PyTorch, and it is highly recommended to pre-install PyTorch with conda.

pip installation

After installing PyTorch, installation of carefree-learn would be rather easy:

If you pre-installed PyTorch with conda, remember to activate the corresponding environment!

pip install carefree-learn

Docker

Prepare

carefree-learn has already been published on DockerHub, so it can be pulled directly:

docker pull carefree0910/carefree-learn:dev

or can be built locally:

docker build -t carefree0910/carefree-learn:dev .

Run

docker run --rm -it --gpus all carefree0910/carefree-learn:dev

Examples

  • Iris – perhaps the best known database to be found in the pattern recognition literature.
  • Titanic – the best, first challenge for you to dive into ML competitions and familiarize yourself with how the Kaggle platform works.
  • Operations - toy datasets for us to illustrate how to build your own models in carefree-learn.

Citation

If you use carefree-learn in your research, we would greatly appreciate if you cite this library using this Bibtex:

@misc{carefree-learn,
  year={2020},
  author={Yujian He},
  title={carefree-learn, Deep Learning with PyTorch made easy},
  howpublished={\url{https://https://github.com/carefree0910/carefree-learn/}},
}

License

carefree-learn is MIT licensed, as found in the LICENSE file.

Comments
  • AttributeError: module 'cflearn' has no attribute 'Ensemble'

    AttributeError: module 'cflearn' has no attribute 'Ensemble'

    When I git clone your repo, pip3 install it (Ubuntu 18.04.5) , and run test_titanic.py without any changes, I get this error: [email protected]:~/carefree-learn/examples/titanic# sudo python3 test_titanic.py Traceback (most recent call last):   File "test_titanic.py", line 64, in     test_adaboost()   File "test_titanic.py", line 60, in test_adaboost     _test("adaboost", _adaboost_core)   File "test_titanic.py", line 44, in _test     data, pattern = _core(train_file)   File "test_titanic.py", line 36, in _adaboost_core     ensemble = cflearn.Ensemble(TaskTypes.CLASSIFICATION, config) AttributeError: module 'cflearn' has no attribute 'Ensemble'

    opened by impulsecorp 23
  • AutoML Scores

    AutoML Scores

    The scores in your Titanic demo, with the new AutoML system, are not as good as they were before. I tried it now using https://github.com/carefree0910/carefree-learn/blob/dev/examples/titanic/test_titanic.py and submitted to Kaggle and got: Optuna: 0.77751 HPO - 0.75598 AdaBoost: 0.67703

    enhancement 
    opened by impulsecorp 7
  • AttributeError: module 'cflearn' has no attribute 'make'

    AttributeError: module 'cflearn' has no attribute 'make'

    On my Ubuntu 18.04.4 server, when I run your quick start code, I get this error: [email protected]:~/newautoml# python3 cflearn.py Traceback (most recent call last): File "cflearn.py", line 1, in import cflearn File "/root/newautoml/cflearn.py", line 5, in m = cflearn.make().fit(x, y) AttributeError: module 'cflearn' has no attribute 'make'

    opened by impulsecorp 5
  • Split?

    Split?

    Hi, I am new to carefree and enjoying it so far. I am using cv_split=.2. My data is not IID and temporal so want to make sure the split is doesn't shuffle/stratify. It appears from your code that it does not shuffle:

    split = self.tr_data.split(self._cv_split)

    Is this correct?

    opened by jmrichardson 4
  • example of how to use `Element`?

    example of how to use `Element`?

    Can you include an example in the docs of how to use Elements?

    I don't understand how to the config management fits into Auto and fit or am I not supposed to use a custom config with Auto?

    is the Elements just the internal structure that you use to store a config json or am I supposed to create an Elements object then update values in that?

    good first issue 
    opened by Data-drone 3
  • remove the obsolete TODO comments within pipeline.py

    remove the obsolete TODO comments within pipeline.py

    The pending task of the TODO comment has already been resolved in the earlier version (https://github.com/carefree0910/carefree-learn/commit/dc6fb7b141ca7e0efee5b5714d5c10bf845c18c6).

    opened by beyondacm 2
  • AutoML Mode Question

    AutoML Mode Question

    Maybe I am misunderstanding how it works, but when I run your latest version in automl mode (using https://github.com/carefree0910/carefree-learn/blob/dev/examples/titanic/automl.py), it runs the same methods it always did (fcnn_optuna, tree_dnn_optuna, etc.) and gets the same Kaggle score as before (0.78947). But in your latest version you wrote that you "Implemented more models (Linear, TreeLinear, Wide and Deep, RNN, Transformer, etc.).". Shouldn't those be in the automl part?

    opened by impulsecorp 2
  • TypeError: estimate() got an unexpected keyword argument 'pipelines'

    TypeError: estimate() got an unexpected keyword argument 'pipelines'

    When running tutorial code :

    #%%
    import cflearn
    from cfdata.tabular import TabularDataset
    
    import cflearn
    
    from cfdata.tabular import *
    
    # prepare iris dataset
    iris = TabularDataset.iris()
    iris = TabularData.from_dataset(iris)
    # split 10% of the data as validation data
    split = iris.split(0.1)
    train, valid = split.remained, split.split
    x_tr, y_tr = train.processed.xy
    x_cv, y_cv = valid.processed.xy
    data = x_tr, y_tr, x_cv, y_cv
    
    m = cflearn.make().fit(*data)
    # Make label predictions
    m.predict(x_cv)
    # Make probability predictions
    m.predict_prob(x_cv)
    # Estimate performance
    cflearn.estimate(x_cv, y_cv, pipelines=m)
    

    We get :

                                         Traceback (most recent call last):
      File "C:\ProgramData\miniconda\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-4-78d46f42bbd0>", line 24, in <module>
        cflearn.estimate(x_cv, y_cv, pipelines=m)
    TypeError: estimate() got an unexpected keyword argument 'pipelines'
    
    opened by Vaunorage 2
  • AttributeError: module 'cflearn' has no attribute 'Auto'

    AttributeError: module 'cflearn' has no attribute 'Auto'

    when running turorial code :

    import cflearn
    
    from cfdata.tabular import *
    
    # prepare iris dataset
    iris = TabularDataset.iris()
    iris = TabularData.from_dataset(iris)
    # split 10% of the data as validation data
    split = iris.split(0.1)
    train, valid = split.remained, split.split
    x_tr, y_tr = train.processed.xy
    x_cv, y_cv = valid.processed.xy
    data = x_tr, y_tr, x_cv, y_cv
    
    #%%
    fcnn = cflearn.make().fit(*data)
    
    # 'overfit' validation set
    auto = cflearn.Auto(TaskTypes.CLASSIFICATION).fit(*data, num_jobs=2)
    
    # estimate manually
    predictions = auto.predict(x_cv)
    print("accuracy:", (y_cv == predictions).mean())
    
    # estimate with `cflearn`
    cflearn.estimate(
        x_cv,
        y_cv,
        pipelines=fcnn,
        other_patterns={"auto": auto.pattern},
    )
    

    Get this error :

    File "C:\ProgramData\miniconda\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 4, in auto = cflearn.Auto(TaskTypes.CLASSIFICATION).fit(*data, num_jobs=2) AttributeError: module 'cflearn' has no attribute 'Auto'

    opened by Vaunorage 2
  • Managing CUDA memory?

    Managing CUDA memory?

    I was trying this on a kaggle dataset and ran into CUDA Out of Memory issues.

    How can I adjust the Auto and fit functions to make sure that this doesn't happen?

    opened by Data-drone 1
  • Integrate DeepSpeed

    Integrate DeepSpeed

    This is mainly for downstream usages, because in most cases neural networks are not required to train distributedly when they are targeting tabular datasets.

    enhancement 
    opened by carefree0910 1
Releases(v0.3.2)
  • v0.3.2(Oct 3, 2022)

  • v0.3.1(Sep 13, 2022)

  • v0.3.0(Jun 20, 2022)

  • v0.2.5(Jun 20, 2022)

  • v0.2.4(Jun 16, 2022)

  • v0.2.3(Apr 29, 2022)

  • v0.2.2(Jan 29, 2022)

  • v0.2.1(Oct 29, 2021)

    Release Notes

    We're happy to announce that carefree-learn released v0.2.x, which made it capable of solving not only tabular tasks, but also other general deep learning tasks!

    Introduction

    Deep Learning with PyTorch made easy 🚀!

    Like many similar projects, carefree-learn can be treated as a high-level library to help with training neural networks in PyTorch. However, carefree-learn does more than that.

    • carefree-learn is highly customizable for developers. We have already wrapped (almost) every single functionality / process into a single module (a Python class), and they can be replaced or enhanced either directly from source codes or from local codes with the help of some pre-defined functions provided by carefree-learn (see Register Mechanism).
    • carefree-learn supports easy-to-use saving and loading. By default, everything will be wrapped into a .zip file, and onnx format is natively supported!
    • carefree-learn supports Distributed Training.

    Apart from these, carefree-learn also has quite a few specific advantages in each area:

    Machine Learning 📈

    • carefree-learn provides an end-to-end pipeline for tabular tasks, including AUTOMATICALLY deal with (this part is mainly handled by carefree-data, though):
      • Detection of redundant feature columns which can be excluded (all SAME, all DIFFERENT, etc).
      • Detection of feature columns types (whether a feature column is string column / numerical column / categorical column).
      • Imputation of missing values.
      • Encoding of string columns and categorical columns (Embedding or One Hot Encoding).
      • Pre-processing of numerical columns (Normalize, Min Max, etc.).
      • And much more...
    • carefree-learn can help you deal with almost ANY kind of tabular datasets, no matter how dirty and messy it is. It can be either trained directly with some numpy arrays, or trained indirectly with some files locate on your machine. This makes carefree-learn stand out from similar projects.

    When we say ANY, it means that carefree-learn can even train on one single sample.

    For example

    import cflearn
    
    toy = cflearn.ml.make_toy_model()
    data = toy.data.cf_data.converted
    print(f"x={data.x}, y={data.y}")  # x=[[0.]], y=[[1.]]
    


    This is especially useful when we need to do unittests or to verify whether our custom modules (e.g. custom pre-processes) are correctly integrated into carefree-learn.

    For example

    import cflearn
    import numpy as np
    
    # here we implement a custom processor
    @cflearn.register_processor("plus_one")
    class PlusOne(cflearn.Processor):
        @property
        def input_dim(self) -> int:
            return 1
    
        @property
        def output_dim(self) -> int:
            return 1
    
        def fit(self, columns: np.ndarray) -> cflearn.Processor:
            return self
    
        def _process(self, columns: np.ndarray) -> np.ndarray:
            return columns + 1
    
        def _recover(self, processed_columns: np.ndarray) -> np.ndarray:
            return processed_columns - 1
    
    # we need to specify that we use the custom process method to process our labels
    toy = cflearn.ml.make_toy_model(cf_data_config={"label_process_method": "plus_one"})
    data = toy.data.cf_data
    y = data.converted.y
    processed_y = data.processed.y
    print(f"y={y}, new_y={processed_y}")  # y=[[1.]], new_y=[[2.]]
    

    There is one more thing we'd like to mention: carefree-learn is Pandas-free. The reasons why we excluded Pandas are listed in carefree-data.


    Computer Vision 🖼️

    • carefree-learn also provides an end-to-end pipeline for computer vision tasks, and:
      • Supports native torchvision datasets.

        data = cflearn.cv.MNISTData(transform="to_tensor")
        

        Currently only mnist is supported, but will add more in the future (if needed) !

      • Focuses on the ImageFolderDataset for customization, which:

        • Automatically splits the dataset into train & valid.
        • Supports generating labels in parallel, which is very useful when calculating labels is time consuming.

        See IFD introduction for more details.

    • carefree-learn supports various kinds of Callbacks, which can be used for saving intermediate visualizations / results.
      • For instance, carefree-learn implements an ArtifactCallback, which can dump artifacts to disk elaborately during training.

    Examples

    Machine Learning 📈 Computer Vision 🖼️
    import cflearn
    import numpy as np
    

    x = np.random.random([1000, 10]) y = np.random.random([1000, 1]) m = cflearn.api.fit_ml(x, y, carefree=True)

    import cflearn
    

    data = cflearn.cv.MNISTData(batch_size=16, transform="to_tensor") m = cflearn.api.resnet18_gray(10).fit(data)

    Please refer to Quick Start and Developer Guides for detailed information.

    Migration Guide

    From 0.1.x to v0.2.x, the design principle of carefree-learn changed in two aspects:

    Framework

    • The DataLayer in v0.1.x has changed to the more general DataModule in v0.2.x.
    • The Model in v0.1.x, which is constructed by pipes, has changed to general Model.

    These changes are made because we want to make carefree-learn compatible with general deep learning tasks (e.g. computer vision tasks).

    Data Module

    Internally, the Pipeline will train & predict on DataModule in v0.2.x, but carefree-learn also provided useful APIs to make user experiences as identical to v0.1.x as possible:

    Train

    v0.1.x v0.2.x
    import cflearn
    import numpy as np
    

    x = np.random.random([1000, 10]) y = np.random.random([1000, 1]) m = cflearn.make().fit(x, y)

    import cflearn
    import numpy as np
    

    x = np.random.random([1000, 10]) y = np.random.random([1000, 1]) m = cflearn.api.fit_ml(x, y, carefree=True)

    Predict

    v0.1.x v0.2.x
    predictions = m.predict(x)
    
    predictions = m.predict(cflearn.MLInferenceData(x))
    

    Evaluate

    v0.1.x v0.2.x
    cflearn.evaluate(x, y, metrics=["mae", "mse"], pipelines=m)
    
    cflearn.ml.evaluate(cflearn.MLInferenceData(x, y), metrics=["mae", "mse"], pipelines=m)
    

    Model

    It's not very straight forward to migrate models from v0.1.x to v0.2.x, so if you require such migration, feel free to submit an issue and we will analyze the problems case by case!

    Source code(tar.gz)
    Source code(zip)
  • v0.1.16(Apr 9, 2021)

    Release Notes

    carefree-learn 0.1.16 improved overall performances.

    Optimizer

    MADGRAD (4466c9f) & Ranger (acdeec4) are now introduced.

    Reference: Best-Deep-Learning-Optimizers.

    Misc

    • Fixed ddp when np.ndarray is provided (969a6c8).
    • Fixed RNN when bidirectional is True (be974df) (6ef49f7).

    • Optimized Transformer (00bd2c4) (dc6abc4) (aec1846).
    • Re-designed the reduce part (b99d4a2).
    • Summary will now be written to disk (d5435e9).
    • batch_indices will be injected to forward_results (7a40dcc).
    Source code(tar.gz)
    Source code(zip)
  • v0.1.15(Mar 18, 2021)

    Release Notes

    carefree-learn 0.1.15 improved overall performances.

    DDP

    Since PyTorch is introducing ZeRO optimizer, we decided to remove deepspeed dependency and use native DDP from PyTorch.

    results = cflearn.ddp(tr_file, world_size=2)
    predictions = results.m.predict(te_file)
    

    JitLSTM

    Since native RNNs of PyTorch do not support dropouts on w_ih and w_hh, we followed the official implementation of jit version LSTM and implemented these dropouts.

    m = cflearn.make(
        "rnn",
        model_config={
            "pipe_configs": {
                "rnn": {
                    "extractor": {
                        "cell": "JitLSTM"
                    }
                }
            }
        }
    )
    

    Misc

    • Fixed NNB when std is 0 (177363e).
    • Fixed summary in some edge cases (945ca15, f95f667, 2768153).
    • Introduced ONNXWrapper for more general ONNX exports (226de5b).

    • Optimized Transformer (b09916b).
    • Upgraded PyTorch dependency (a596031).
    • Supported reuse_extractor in PipeInfo (149aa49).
    • Implemented HighwayBlock (3dad99e, 436ebab) and Introduced *FCNNConfig (e0670f7).
    • Implemented Initializer.orthogonal (1019114) and Optimized initializations of RNN (2193706).
    Source code(tar.gz)
    Source code(zip)
  • v0.1.14(Mar 6, 2021)

    Release Notes

    carefree-learn 0.1.14 improved overall performances.

    Summary

    For non-distributed trainings, carefree-learn will print out model summaries now by default (inspired by torchsummary):

    ========================================================================================================================
    Layer (type)                             Input Shape                             Output Shape    Trainable Param #
    ------------------------------------------------------------------------------------------------------------------------
    RNN                                       [-1, 5, 1]                                [-1, 256]              198,912
        GRU                                   [-1, 5, 1]           [[-1, 5, 256], [-1, 128, 256]]              198,912
    FCNNHead                                   [-1, 256]                                  [-1, 1]              395,777
      MLP                                      [-1, 256]                                  [-1, 1]              395,777
          Mapping-0                            [-1, 256]                                [-1, 512]              132,096
            Linear                             [-1, 256]                                [-1, 512]              131,072
            BN                                 [-1, 512]                                [-1, 512]                1,024
            ReLU                               [-1, 512]                                [-1, 512]                    0
            Dropout                            [-1, 512]                                [-1, 512]                    0
          Mapping-1                            [-1, 512]                                [-1, 512]              263,168
            Linear                             [-1, 512]                                [-1, 512]              262,144
            BN                                 [-1, 512]                                [-1, 512]                1,024
            ReLU                               [-1, 512]                                [-1, 512]                    0
            Dropout                            [-1, 512]                                [-1, 512]                    0
          Linear                               [-1, 512]                                  [-1, 1]                  513
    ========================================================================================================================
    Total params: 594,689
    Trainable params: 594,689
    Non-trainable params: 0
    ------------------------------------------------------------------------------------------------------------------------
    Input size (MB): 0.00
    Forward/backward pass size (MB): 0.30
    Params size (MB): 2.27
    Estimated Total Size (MB): 2.57
    ------------------------------------------------------------------------------------------------------------------------
    

    Zoo.search

    Now carefree-learn supports empirical HPO via Zoo.search (2c19505), which can achieve good performances without searching in a large search space.

    Misc

    • Upgraded numpy dependency in pyproject.toml to try to avoid building issues.
    • Optimized default Zoo settings (9e051d5).
    • Fixed & Optimized Trainer when max_epoch is specified (ea44d88).
    Source code(tar.gz)
    Source code(zip)
  • v0.1.13(Feb 8, 2021)

  • v0.1.12(Jan 25, 2021)

  • v0.1.11(Jan 18, 2021)

    carefree-learn 0.1.11 is mainly a patch release which supported more customizations. However, these features are still at early stage and are likely to be changed in the future. If you want to customize carefree-learn, it is still highly recommended to clone this repo and install it with edit mode (pip install -e .).

    Source code(tar.gz)
    Source code(zip)
  • v0.1.10(Jan 13, 2021)

    Release Notes

    carefree-learn 0.1.10 improved overall performances and deepspeed accessibility.

    Versioning

    carefree-learn now supports checking its version via __version__:

    import cflearn
    
    cflearn.__version__  # '0.1.10'
    

    Distributed Training

    carefree-learn now provides out-of-the-box API for distributed training with deepspeed:

    import cflearn
    import numpy as np
    
    x = np.random.random([1000000, 10])
    y = np.random.random([1000000, 1])
    m = cflearn.deepspeed(x, y, cuda="0,1,2,3").m
    

    ⚠️⚠️⚠️ However it is not recommended to use this API unless you have to (e.g. when training some very large models). ⚠️⚠️⚠️

    Misc

    • Supported use_final_bn in FCNNHead (#75).
    • Ensured that models are always in eval mode in inference (#67).
    • Supported specifying resource_config of Parallel in Experiment (#68).
    • Implemented profile_forward for Pipeline.

    • Fixed other bugs.
    • Accelerated DNDF with Function (#69).
    • Patience of TrainMonitor will now depend on dataset size (#43).
    • Checkpoints will be logged earlier now when using warmup (#74).
    Source code(tar.gz)
    Source code(zip)
  • v0.1.9(Dec 26, 2020)

    Release Notes

    carefree-learn 0.1.9 improved overall performances and accessibilities.

    ModelConfig

    carefree-learn now introduces ModelConfig to manage configurations more easily.

    Modify extractor_config, head_config, etc

    v0.1.8 v0.1.9
    head_config = {...}
    cflearn.make(
        model_config={
            "pipe_configs": {
                "fcnn": {"head": head_config},
            },
        },
    )
    
    head_config = {...}
    cflearn.ModelConfig("fcnn").switch().head_config = head_config
    

    Switch to a preset config

    v0.1.8 v0.1.9
    # Not accessible, must register a new model 
    #  with the corresponding config:
    cflearn.register_model(
        "pruned_fcnn",
        pipes=[
            cflearn.PipeInfo(
                "fcnn",
                head_config="pruned",
            )
        ],
    )
    cflearn.make("pruned_fcnn")
    
    cflearn.ModelConfig("fcnn").switch().replace(head_config="pruned")
    cflearn.make("fcnn")
    

    Misc

    • Enhanced LossBase (#66).
    • Introduced callbacks to Trainer (#65).
    • Enhanced Auto and support specifying extra_config with json file path (752f419).

    • Fixed other bugs.
    • Optimized Transformer (adce2f9).
    • Optimized performance of TrainMonitor (91dfc43).
    • Optimized performance of Auto (47caa48, 9dfa204, 274b28d and #61, #63, #64).
    Source code(tar.gz)
    Source code(zip)
  • v0.1.8(Dec 16, 2020)

    Release Notes

    carefree-learn 0.1.8 mainly registered all PyTorch schedulers and enhanced mlflow integration.

    Backward Compatible Breaking

    carefree-learn now keeps a copy of the orignal user defined configs (#48), which changes the saved config file:

    v0.1.7 (config.json) v0.1.8 (config_bundle.json)
    {
        "data_config": {
            "label_name": "Survived"
        },
        "cuda": 0,
        "model": "tree_dnn"
        // the `binary_config` was injected into `config.json`
        "binary_config": {
            "binary_metric": "acc",
            "binary_threshold": 0.49170631170272827
        }
    }
    
    {
        "config": {
            "data_config": {
                "label_name": "Survived"
            },
            "cuda": 0,
            "model": "tree_dnn"
        },
        "increment_config": {},
        "binary_config": {
            "binary_metric": "acc",
            "binary_threshold": 0.49170631170272827
        }
    }
    

    New Schedulers

    carefree-learn newly supports the following schedulers based on PyTorch schedulers:

    These schedulers could be utilized easily with scheduler=... specified in any high-level API in carefree-learn, e.g.:

    m = cflearn.make(scheduler="cyclic").fit(x, y)
    

    Better mlflow Integration

    In order to utilize mlflow better, carefree-learn now handles some better practices for you under the hood, e.g.:

    • Makes the initialization of mlflow multi-thread safe in distributed training.
    • Automatically handles the run_name in distributed training.
    • Automatically handles the parameters for log_params.
    • Updates the artifacts in periodically.

    The (brief) documentation for mlflow Integration could be found here.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.7.1(Dec 12, 2020)

    Release Notes

    carefree-learn 0.1.7 integrated mlflow and cleaned up Experiment API, which completes the machine learning lifecycle.

    • v0.1.7.1: Hotfixed a critical bug which will load the worst checkpoint saved.

    mlflow

    mlflow can help us visualizing, reproducing, and serving our models. In carefree-learn, we can quickly play with mlflow by specifying mlflow_config to an empty dict:

    import cflearn
    import numpy as np
    
    x = np.random.random([1000, 10])
    y = np.random.random([1000, 1])
    m = cflearn.make(mlflow_config={}).fit(x, y)
    

    After which, we can execute mlflow ui in the current working directory to inspect the tracking results (e.g. loss curve, metric curve, etc.).

    We're planning to add documentation for the mlflow integration and it should be available at v0.1.8.

    Experiment

    Experiment API was embarrassingly user unfriendly before, but has been cleaned up and is ready to use since v0.1.7. Please refer to the documentation for more details.

    Misc

    • Integrated DeepSpeed for distributed training on one single model (experimental).
    • Enhanced Protocol for downstream usages (e.g. Quantitative Trading, Computer Vision, etc.) (experimental).

    • Fixed other bugs.
    • Optimized TrainMonitor (#39)
    • Optimized some default settings.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(Nov 30, 2020)

    Release Notes

    carefree-learn 0.1.6 is mainly a hot-fix version for 0.1.5.

    Misc

    • Simplified Pipeline.load (0033bda).
    • Generalized input_sample (828e985).
    • Implemented register_aggregator.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.5(Nov 29, 2020)

    Release Notes

    ⚠️⚠️⚠️ This release is broken and could hardly perform customizations. We'll release v0.1.6 ASAP ⚠️⚠️⚠️

    draw

    We can visualize every model built by carefree-learn with draw API now (see here).

    Aggregator

    We can now customize how to aggregate the results from each head now.

    We plan to add documentation for Aggregator in v0.1.6.

    Protocol

    carefree-learn now supports Protocol. With Protocols it is possible to port other frameworks' models (e.g. models from scikit-learn) to carefree-learn, as well as utilize carefree-learn on other forms of input data.

    We plan to add documentation for Protocols in v0.1.6.

    Updated 2020.12.13: documentation for Protocol is delayed because it may lack of users. Please feel free to contact me if you are interested in this set of features!

    Misc

    • Implemented PrefetchLoader.

    • Fixed QuantileFCNN.
    • Fixed other bugs.
    • Optimized some default settings.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Nov 26, 2020)

    Release Notes

    carefree-learn 0.1.4 fixed some critical bugs in 0.1.3, as well as introduced some new features (supported evaluating multiple models, customizing losses, etc.).

    Backward Compatible Breaking

    carefree-learn now deals with list of pipelines instead of a single pipeline in most APIs (#27)

    v0.1.3 v0.1.4
    cflearn.save(m)        # -> cflearn^_^fcnn.zip
    print(cflearn.load())  # -> {'fcnn': FCNN()}
    
    cflearn.save(m)        # -> cflearn^_^fcnn^_^0000.zip
    print(cflearn.load())  # -> {'fcnn': [FCNN()]}
    

    Misc

    • Supported customizing new losses.
    • Enhanced cflearn.evaluate, it can now evaluate on multiple pipelines.
    • Changed default parallel settings to non-parallel.
    • Supported specify loss and loss_config in Elements.
    • Optimized auto metric settings. It will now depend itself on loss.
    • Implemented QuantileFCNN for quantile regression (experimental).

    • Fixed Pipeline.load.
    • Fixed the configuration stuffs.
    • Fixed other bugs.
    • Optimized some default settings.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Nov 22, 2020)

    Release Notes

    carefree-learn 0.1.3 focuses on performances and accessability.

    pipe

    A new abstraction, pipe, was introduced to carefre-learn which significantly improved accessability for developers. Now we can introduce a new model to carefree-learn with only one line of code:

    cflearn.register_model("wnd_full", pipes=[cflearn.PipeInfo("fcnn"), cflearn.PipeInfo("linear")])
    m = cflearn.make("wnd_full")
    

    Please refer to Build Your Own Models for detailed information.

    Auto

    carefree-learn now provides a high level AutoML API:

    import cflearn
    from cfdata.tabular import TabularDataset
    
    x, y = TabularDataset.iris().xy
    auto = cflearn.Auto("clf").fit(x, y)
    predictions = auto.predict(x)
    

    Production

    carefree-learn now supports onnx export, and provides a high level API Pack to pack everything up for production:

    import cflearn
    from cfdata.tabular import TabularDataset
    
    x, y = TabularDataset.iris().xy
    m = cflearn.make().fit(x, y)
    cflearn.Pack.pack(m, "onnx")
    

    This piece of code will generate an onnx.zip in the working directory with following file structure:

    |--- preprocessor
       |-- ...
    |--- binary_config.json
    |--- m.onnx
    |--- output_names.json
    |--- output_probabilities.txt
    

    With onnx.zip we can make predictions (inference) on the fly:

    predictor = cflearn.Pack.get_predictor("onnx")
    predictions = predictor.predict(x)
    

    Misc

    • carefree-learn should be ~3x faster than before on small datasets thanks to optimizations on categorical encodings.
    • DNDF in carefre-learn is highly optimized and should be ~3x faster than before.
    • APIs have been re-designed and are much easier to use now.
    • Much better documented than before (documentations).
    • Implemented more models (Linear, TreeLinear, Wide and Deep, RNN, Transformer, etc.).
    • Implemented more modules (CrossBlock, ConditionalBlocks, MonotonousMapping, MLP.funnel, etc.).

    • Fixed some bugs.
    • Optimized some default settings.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Aug 10, 2020)

  • v0.1.1(Aug 1, 2020)

    Release Notes

    Experiments

    Experiments is much more powerful and much easier to use now:

    Updated 2020.12.13: Experiment is more useful now in v0.1.7! Please refer to the documentation for more details.

    import cflearn
    import numpy as np
    
    from cfdata.tabular import *
    
    def main():
        x, y = TabularDataset.iris().xy
        experiments = cflearn.Experiments()
        experiments.add_task(x, y, model="fcnn")
        experiments.add_task(x, y, model="fcnn")
        experiments.add_task(x, y, model="tree_dnn")
        experiments.add_task(x, y, model="tree_dnn")
        results = experiments.run_tasks(num_jobs=2)
        # {'fcnn': [Task(fcnn_0), Task(fcnn_1)], 'tree_dnn': [Task(tree_dnn_0), Task(tree_dnn_1)]}
        print(results)
        ms = {k: list(map(cflearn.load_task, v)) for k, v in results.items()}
        # {'fcnn': [FCNN(), FCNN()], 'tree_dnn': [TreeDNN(), TreeDNN()]}
        print(ms)
        # experiments could be saved & loaded easily
        saving_folder = "__temp__"
        experiments.save(saving_folder)
        loaded = cflearn.Experiments.load(saving_folder)
        ms_loaded = {k: list(map(cflearn.load_task, v)) for k, v in loaded.tasks.items()}
        # {'fcnn': [FCNN(), FCNN()], 'tree_dnn': [TreeDNN(), TreeDNN()]}
        print(ms_loaded)
        assert np.allclose(ms["fcnn"][1].predict(x), ms_loaded["fcnn"][1].predict(x))
    
    if __name__ == '__main__':
        main()
    

    We can see that experiments.run_tasks returns a bunch of Tasks, which can be easily transfered to models through cflearn.load_task.

    It is important to wrap the codes with main() on some platforms (e.g. Windows), because running codes in parallel will cause some issues if we don't do so. Here's an explaination.

    Benchmark

    Benchmark class is implemented for easier benchmark testing:

    Updated 2020.12.13: Benchmark was moved to a separated repo (carefree-learn-benchmark).

    import cflearn
    import numpy as np
    
    from cfdata.tabular import *
    
    def main():
        x, y = TabularDataset.iris().xy
        benchmark = cflearn.Benchmark(
            "foo",
            TaskTypes.CLASSIFICATION,
            models=["fcnn", "tree_dnn"]
        )
        benchmarks = {
            "fcnn": {"default": {}, "sgd": {"optimizer": "sgd"}},
            "tree_dnn": {"default": {}, "adamw": {"optimizer": "adamw"}}
        }
        msg1 = benchmark.k_fold(3, x, y, num_jobs=2, benchmarks=benchmarks).comparer.log_statistics()
        """
        ~~~  [ info ] Results
        ===============================================================================================================================
        |        metrics         |                       acc                        |                       auc                        |
        --------------------------------------------------------------------------------------------------------------------------------
        |                        |      mean      |      std       |     score      |      mean      |      std       |     score      |
        --------------------------------------------------------------------------------------------------------------------------------
        |    fcnn_foo_default    |    0.780000    | -- 0.032660 -- |    0.747340    |    0.914408    |    0.040008    |    0.874400    |
        --------------------------------------------------------------------------------------------------------------------------------
        |      fcnn_foo_sgd      |    0.113333    |    0.080554    |    0.032780    |    0.460903    |    0.061548    |    0.399355    |
        --------------------------------------------------------------------------------------------------------------------------------
        |   tree_dnn_foo_adamw   | -- 0.833333 -- |    0.077172    | -- 0.756161 -- | -- 0.944698 -- | -- 0.034248 -- | -- 0.910451 -- |
        --------------------------------------------------------------------------------------------------------------------------------
        |  tree_dnn_foo_default  |    0.706667    |    0.253684    |    0.452983    |    0.924830    |    0.060007    |    0.864824    |
        ================================================================================================================================
        """
        # save & load
        saving_folder = "__temp__"
        benchmark.save(saving_folder)
        loaded_benchmark, loaded_results = cflearn.Benchmark.load(saving_folder)
        msg2 = loaded_results.comparer.log_statistics()
        assert msg1 == msg2
    
    if __name__ == '__main__':
        main()
    

    Misc

    • Integrated trains.
    • Integrated Tracker from carefree-toolkit.
    • Integrated native amp from PyTorch.
    • Implemented FocalLoss.
    • Implemented cflearn.zoo.

    • Introduced CI.
    • Fixed some bugs.
    • Simplified some APIs.
    • Optimized some default settings.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Jul 6, 2020)

PyTorch code accompanying the paper "Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning" (NeurIPS 2021).

HIGL This is a PyTorch implementation for our paper: Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning (NeurIPS 2021). Our cod

Junsu Kim 20 Dec 14, 2022
Compare outputs between layers written in Tensorflow and layers written in Pytorch

Compare outputs of Wasserstein GANs between TensorFlow vs Pytorch This is our testing module for the implementation of improved WGAN in Pytorch Prereq

Hung Nguyen 72 Dec 20, 2022
《Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis》(2021)

Image2Reverb Image2Reverb is an end-to-end neural network that generates plausible audio impulse responses from single images of acoustic environments

Nikhil Singh 48 Nov 27, 2022
Python interface for SmartRF Sniffer 2 Firmware

#TI SmartRF Packet Sniffer 2 Python Interface TI Makes available a nice packet sniffer firmware, which interfaces to Wireshark. You can see this proje

Colin O'Flynn 3 May 18, 2021
Art Project "Schrödinger's Game of Life"

Repo of the project "Team Creative Quantum AI: Schrödinger's Game of Life" Installation new conda env: conda create --name qcml python=3.8 conda activ

ℍ◮ℕℕ◭ℍ ℝ∈ᛔ∈ℝ 2 Sep 15, 2022
TensorFlow-LiveLessons - "Deep Learning with TensorFlow" LiveLessons

TensorFlow-LiveLessons Note that the second edition of this video series is now available here. The second edition contains all of the content from th

Deep Learning Study Group 830 Jan 03, 2023
Implementation of "Learning to Match Features with Seeded Graph Matching Network" ICCV2021

SGMNet Implementation PyTorch implementation of SGMNet for ICCV'21 paper "Learning to Match Features with Seeded Graph Matching Network", by Hongkai C

87 Dec 11, 2022
Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

Easy-To-Hard The official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks". Gett

Avi Schwarzschild 52 Sep 08, 2022
Mmdet benchmark with python

mmdet_benchmark 本项目是为了研究 mmdet 推断性能瓶颈,并且对其进行优化。 配置与环境 机器配置 CPU:Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz GPU:NVIDIA GeForce RTX 3080 10GB 内存:64G 硬盘:1T

杨培文 (Yang Peiwen) 24 May 21, 2022
Real-time VIBE: Frame by Frame Inference of VIBE (Video Inference for Human Body Pose and Shape Estimation)

Real-time VIBE Inference VIBE frame-by-frame. Overview This is a frame-by-frame inference fork of VIBE at [https://github.com/mkocabas/VIBE]. Usage: i

23 Jul 02, 2022
Post-Training Quantization for Vision transformers.

PTQ4ViT Post-Training Quantization Framework for Vision Transformers. We use the twin uniform quantization method to reduce the quantization error on

Zhihang Yuan 61 Dec 28, 2022
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Ursa Zrimsek 2 Dec 14, 2022
SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs

SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs SMORE is a a versatile framework that scales multi-hop query emb

Google Research 135 Dec 27, 2022
A DCGAN to generate anime faces using custom mined dataset

Anime-Face-GAN-Keras A DCGAN to generate anime faces using custom dataset in Keras. Dataset The dataset is created by crawling anime database websites

Pavitrakumar P 190 Jan 03, 2023
Efficient Deep Learning Systems course

Efficient Deep Learning Systems This repository contains materials for the Efficient Deep Learning Systems course taught at the Faculty of Computer Sc

Max Ryabinin 173 Dec 29, 2022
Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech This repository is the official implementation of "Meta-TTS: Meta-Learning for Few

Sung-Feng Huang 128 Dec 25, 2022
Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

NDQ: Learning Nearly Decomposable Value Functions with Communication Minimization Note This codebase accompanies paper Learning Nearly Decomposable Va

Tonghan Wang 69 Nov 26, 2022
Vision Deep-Learning using Tensorflow, Keras.

Welcome! I am a computer vision deep learning developer working in Korea. This is my blog, and you can see everything I've studied here. https://www.n

kimminjun 6 Dec 14, 2022
Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Image Crop Analysis This is a repo for the code used for reproducing our Image Crop Analysis paper as shared on our blog post. If you plan to use this

Twitter Research 239 Jan 02, 2023
This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

Ditch the Gold Standard: Re-evaluating Conversational Question Answering This is the repository for our paper Ditch the Gold Standard: Re-evaluating C

Princeton Natural Language Processing 38 Dec 16, 2022