A standard framework for modelling Deep Learning Models for tabular data

Overview

PyTorch Tabular pypi travis documentation status PyPI - Downloads contributions welcome Open In Colab

PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:

  • Low Resistance Useability
  • Easy Customization
  • Scalable and Easier to Deploy

It has been built on the shoulders of giants like PyTorch(obviously), and PyTorch Lightning.

Table of Contents

Installation

Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from here, picking up the right CUDA version for your machine.

Once, you have got Pytorch installed, just use:

 pip install pytorch_tabular[all]

to install the complete library with extra dependencies.

And :

 pip install pytorch_tabular

for the bare essentials.

The sources for pytorch_tabular can be downloaded from the Github repo_.

You can either clone the public repository:

git clone git://github.com/manujosephv/pytorch_tabular

Once you have a copy of the source, you can install it with:

python setup.py install

Documentation

For complete Documentation with tutorials visit []

Available Models

To implement new models, see the How to implement new models tutorial. It covers basic as well as advanced architectures.

Usage

from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig, ExperimentConfig

data_config = DataConfig(
    target=['target'], #target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
    continuous_cols=num_col_names,
    categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
    auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
    batch_size=1024,
    max_epochs=100,
    gpus=1, #index of the GPU to use. 0, means CPU
)
optimizer_config = OptimizerConfig()

model_config = CategoryEmbeddingModelConfig(
    task="classification",
    layers="1024-512-512",  # Number of nodes in each layer
    activation="LeakyReLU", # Activation between each layers
    learning_rate = 1e-3
)

tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
)
tabular_model.fit(train=train, validation=val)
result = tabular_model.evaluate(test)
pred_df = tabular_model.predict(test)
tabular_model.save_model("examples/basic")
loaded_model = TabularModel.load_from_checkpoint("examples/basic")

Blog

PyTorch Tabular – A Framework for Deep Learning for Tabular Data

Future Roadmap(Contributions are Welcome)

  1. Add GaussRank as Feature Transformation
  2. Add ability to use custom activations in CategoryEmbeddingModel
  3. Add differential dropouts(layer-wise) in CategoryEmbeddingModel
  4. Add Fourier Encoding for cyclic time variables
  5. Integrate Optuna Hyperparameter Tuning
  6. Add Text and Image Modalities for mixed modal problems
  7. Integrate Wide and Deep model
  8. Integrate TabTransformer

References and Citations

[1] Sergei Popov, Stanislav Morozov, Artem Babenko. "Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data". arXiv:1909.06312 [cs.LG] (2019)

[2] Sercan O. Arik, Tomas Pfister;. "TabNet: Attentive Interpretable Tabular Learning". arXiv:1908.07442 (2019).

Comments
  • pytorch_lightning.utilities.exceptions.MisconfigurationException  GPU not finding

    pytorch_lightning.utilities.exceptions.MisconfigurationException GPU not finding

    Excellent work,

    You can check , we are using it https://github.com/arita37/dsa2/tree/multi

    Set to CPU usage for lightning

      File "D:\_devs\Python01\ana3\envs\py36\lib\site-packages\pytorch_tabular\tabular_model.py", line 444, in fit
        reset,
      File "D:\_devs\Python01\ana3\envs\py36\lib\site-packages\pytorch_tabular\tabular_model.py", line 385, in _pre_fit
        self._prepare_trainer(max_epochs, min_epochs)
      File "D:\_devs\Python01\ana3\envs\py36\lib\site-packages\pytorch_tabular\tabular_model.py", line 328, in _prepare_trainer
        **trainer_args_config,
      File "D:\_devs\Python01\ana3\envs\py36\lib\site-packages\pytorch_lightning\trainer\connectors\env_vars_connector.py", line 41, in overwrite_by_env_vars
        return fn(self, **kwargs)
      File "D:\_devs\Python01\ana3\envs\py36\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 345, in __init__
        deterministic,
      File "D:\_devs\Python01\ana3\envs\py36\lib\site-packages\pytorch_lightning\accelerators\accelerator_connector.py", line 111, in on_trainer_init
        self.trainer.data_parallel_device_ids = device_parser.parse_gpu_ids(self.trainer.gpus)
      File "D:\_devs\Python01\ana3\envs\py36\lib\site-packages\pytorch_lightning\utilities\device_parser.py", line 76, in parse_gpu_ids
        gpus = _sanitize_gpu_ids(gpus)
      File "D:\_devs\Python01\ana3\envs\py36\lib\site-packages\pytorch_lightning\utilities\device_parser.py", line 137, in _sanitize_gpu_ids
        """)
    pytorch_lightning.utilities.exceptions.MisconfigurationException:
                    You requested GPUs: [0]
                    But your machine only has: []
    
    device = 'cpu'
    use_cuda = True
    if use_cuda and torch.cuda.is_available():
        print('cuda ready...')
        device = 'cuda:0'
    
    good first issue 
    opened by arita37 11
  • Github workflow for testing and linting.

    Github workflow for testing and linting.

    Added workflows for linting and testing.

    • The testing workflow seems to pass. Do check out testing.yml and let me know if there's a better way to set things up.
    • The liniting workflow fails. Are there some flake8 options i need to use to prevent this?
    opened by wsad1 10
  • Unable to use GPU, regardless of TrainerConfig(gpus=) setting

    Unable to use GPU, regardless of TrainerConfig(gpus=) setting

    I am unable to use GPUs whether I set 0 or 1 for the gpus parameter. I think the issue may lie inside an internal calling of the distributed.py script. As the warning states, the --gpus flag seems to not be invoked.

    C:\Users\sunis\miniconda3\envs\numerai\lib\site-packages\pytorch_lightning\utilities\distributed.py:45: UserWarning: GPU available but not used. Set the --gpus flag when calling the script.
      warnings.warn(*args, **kwargs)
    GPU available: True, used: False
    TPU available: False, using: 0 TPU cores
    
    opened by Sunishchal 10
  • model.save_model pickle error, _pickle.PicklingError: Can't pickle <function <lambda> at 0x000002716F0FF620>: it's not found as pytorch_tabular.models.node.utils.<lambda>

    model.save_model pickle error, _pickle.PicklingError: Can't pickle at 0x000002716F0FF620>: it's not found as pytorch_tabular.models.node.utils.

    I know that lambda functions are not pickle-able, and I haven't defined any lambda functions in my file neither in my model params

    model.model.save_model(path + "/model/torch_checkpoint")
    

    File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\pytorch_tabular\tabular_model.py", line 643, in save_model joblib.dump(self.callbacks, os.path.join(dir, "callbacks.sav")) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 480, in dump NumpyPickler(f, protocol=protocol).dump(value) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 409, in dump self.save(obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 781, in save_list self._batch_appends(obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 805, in _batch_appends save(x) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 634, in save_reduce save(state) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 847, in _batch_setitems save(v) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 610, in save_reduce save(args) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 736, in save_tuple save(element) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 634, in save_reduce save(state) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 847, in _batch_setitems save(v) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 634, in save_reduce save(state) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 847, in _batch_setitems save(v) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 631, in save_reduce self._batch_setitems(dictitems) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 847, in _batch_setitems save(v) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 634, in save_reduce save(state) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 847, in _batch_setitems save(v) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 631, in save_reduce self._batch_setitems(dictitems) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 852, in _batch_setitems save(v) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 634, in save_reduce save(state) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 847, in _batch_setitems save(v) File "C:\Users\asus\anaconda3\envs\dsa2\lib\site-packages\joblib\numpy_pickle.py", line 282, in save return Pickler.save(self, obj) File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "C:\Users\asus\anaconda3\envs\dsa2\lib\pickle.py", line 922, in save_global (obj, module_name, name)) _pickle.PicklingError: Can't pickle <function at 0x000002716F0FF620>: it's not found as pytorch_tabular.models.node.utils.

    opened by N950 10
  • Unable to Install on Windows Due to scikit-learn Error

    Unable to Install on Windows Due to scikit-learn Error

    I'm unable to install pytorch_tabular on Windows due to scikit-learn error. Installation worked fine when installing under WSL.

    copying sklearn\datasets\tests\data\openml\292\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz -> build\bdist.win-amd64\wheel.\sklearn\datasets\tests\data\openml\292 error: could not create 'build\bdist.win-amd64\wheel.\sklearn\datasets\tests\data\openml\292\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz': No such file or directory

    ERROR: Failed building wheel for scikit-learn Failed to build scikit-learn ERROR: Could not build wheels for scikit-learn which use PEP 517 and cannot be installed directly

    opened by BSalita 9
  • Imbalanced learning: mu-parameter not used, leads to unweighted crossentropy-function in

    Imbalanced learning: mu-parameter not used, leads to unweighted crossentropy-function in "mildly" unbalanced cases

    Hi, The utils-function '''get_class_weighted_cross_entropy(y_train, mu=0.15)''' does not actually use the mu-parameter, but sets it to 0.15 regardless. See line 29: "weights = _make_smooth_weights_for_balanced_classes(y_train, mu=0.15)" https://github.com/manujosephv/pytorch_tabular/blob/9092543e2d8a45fc284a84d91d7cc4753d449853/pytorch_tabular/utils.py#L29

    In my binary-classification case with a 1:10 imbalance, this leads to "weights" of 1 to 1 for the two classes... Also, you might want to think about setting mu higher by default, to get actual weights for non-extreme imbalances like mine. I am using a mu > 1 to actually get different weights, does not work due to the bug (setting weights manually for now)

    To Reproduce Run "get_class_weighted_cross_entropy(y_train, mu=2)" with an 1:10 imbalanced, binary y_train.

    Expected behavior Get different weights for the two classes. Returns crossentropy with weight=[1,1] instead.

    opened by JulianRein 8
  • Cannot create .tmp in Google Colab

    Cannot create .tmp in Google Colab

    Trying to fit NODE models on Colab results in an error due to the inability to create .tmp in the current directory:

    OSError                                   Traceback (most recent call last)
    <ipython-input-24-184a686ccb34> in <module>()
         45     model_config=model_config,
         46     optimizer_config=optimizer_config,
    ---> 47     trainer_config=trainer_config,
         48 )
    
    2 frames
    /usr/lib/python3.7/os.py in makedirs(name, mode, exist_ok)
        221             return
        222     try:
    --> 223         mkdir(name, mode)
        224     except OSError:
        225         # Cannot rely on checking for EEXIST, since the operating system
    
    OSError: [Errno 95] Operation not supported: '.tmp'
    

    There is not an obvious place to pass an alternative location.

    Running pytorch_tabular 0.6.0 and pytorch 1.9 on Colab Pro.

    opened by fonnesbeck 8
  • Import error

    Import error

    When I import

    from pytorch_tabular import TabularModel
    

    catch error

    14 import random
         15 
    ---> 16 from pytorch_tabular import TabularModel
         17 from pytorch_tabular.models import CategoryEmbeddingModelConfig, NodeConfig, NODEModel
         18 from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig, ExperimentConfig, ModelConfig
    
    
    [/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/utils.py](https://localhost:8080/#) in <module>()
         20 from torchmetrics.utilities.data import dim_zero_mean as _dim_zero_mean
         21 from torchmetrics.utilities.data import dim_zero_sum as _dim_zero_sum
    ---> 22 from torchmetrics.utilities.data import get_num_classes as _get_num_classes
         23 from torchmetrics.utilities.data import select_topk as _select_topk
         24 from torchmetrics.utilities.data import to_categorical as _to_categorical
    
    ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/usr/local/lib/python3.7/dist-packages/torchmetrics/utilities/data.py)
    

    How can I fix it?

    opened by sonnguyen129 7
  • GPU usage

    GPU usage

    Dear Manu,

    I've settled the whole model and sent it to our "super calculator" but one error remains. It is related to the number of GPU requested. Here is the error:

    Multi-Target Regression: using the first target(ds_serv_int) to encode the categorical columns
    /gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/category_encoders/utils.py:21: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
      elif pd.api.types.is_categorical(cols):
    GPU available: False, used: False
    TPU available: False, using: 0 TPU cores
    Traceback (most recent call last):
      File "/gpfs/workdir/bachv/Notebooks/DL_DS_Poitiers/poitiers_dureeSejour_DL.py", line 168, in <module>
        tabular_model.fit(train=train, 
      File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_tabular/tabular_model.py", line 440, in fit
        train_loader, val_loader = self._pre_fit(
      File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_tabular/tabular_model.py", line 389, in _pre_fit
        self._prepare_trainer(max_epochs, min_epochs)
      File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_tabular/tabular_model.py", line 328, in _prepare_trainer
        self.trainer = pl.Trainer(
      File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 41, in overwrite_by_env_vars
        return fn(self, **kwargs)
      File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 333, in __init__
        self.accelerator_connector.on_trainer_init(
      File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator_connector.py", line 111, in on_trainer_init
        self.trainer.data_parallel_device_ids = device_parser.parse_gpu_ids(self.trainer.gpus)
      File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_lightning/utilities/device_parser.py", line 76, in parse_gpu_ids
        gpus = _sanitize_gpu_ids(gpus)
      File "/gpfs/users/bachv/.conda/envs/hopia-ia/lib/python3.9/site-packages/pytorch_lightning/utilities/device_parser.py", line 134, in _sanitize_gpu_ids
        raise MisconfigurationException(f"""
    pytorch_lightning.utilities.exceptions.MisconfigurationException: 
                    You requested GPUs: [0]
                    But your machine only has: []
    

    Attached my code

    Thank you in advance for any hint you might have!

    Vérifier que les modules suivants sont bien installés
    
    conda install pytorch torchvision -c pytorch
    
    
    pip install pytorch_tabular[all]
    ou
    git clone git://github.com/manujosephv/pytorch_tabular
    +
    python setup.py install
    
    
    pip install torch_optimizer  #N'existe pas sur conda
    
    conda install -c conda-forge scikit-learn 
    
    conda install -c conda-forge pandas
    
    conda install -c conda-forge seaborn 
    
    conda install -c conda-forge numpy 
    
    conda install -c conda-forge matplotlib 
    
    
    ### Import des librairies utiles
    
    #PyTorch Tabular
    from pytorch_tabular import TabularModel
    from pytorch_tabular.models import CategoryEmbeddingModelConfig, NodeConfig
    from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig, ExperimentConfig
    from torch_optimizer import QHAdam
    
    
    #Scikit Learn
    from sklearn.datasets import make_regression
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_absolute_error, mean_squared_error
    from sklearn.preprocessing import PowerTransformer
    
    #other
    import random
    import numpy as np
    import pandas as pd
    import os
    import sys
    
    
    
    
    '''
    Attention : voir si nécessaire d'enlever les durées d'1 min dans un service (erreur ?) + date de sortie ultérieure à la date d'extraction (réfléchir comment faire)
    '''
    
    
    
    
    ### Utilisation de PyTorch Tabular
    
    ''' Source https://pytorch-tabular.readthedocs.io/en/latest/ '''
    
    
    ## Fonction d'utilité
    
    def print_metrics(y_true, y_pred, tag):
        if isinstance(y_true, pd.DataFrame) or isinstance(y_true, pd.Series):
            y_true = y_true.values
        if isinstance(y_pred, pd.DataFrame) or isinstance(y_pred, pd.Series):
            y_pred = y_pred.values
        if y_true.ndim>1:
            y_true=y_true.ravel()
        if y_pred.ndim>1:
            y_pred=y_pred.ravel()
        val_acc = mean_squared_error(y_true, y_pred)
        val_f1 = mean_absolute_error(y_true, y_pred)
        print(f"{tag} MSE: {val_acc} | {tag} MAE: {val_f1}")
    
    
    
    ## Préparation des données 
    
    from preprocessing_poitiers_dureeSejour import profilage_sejour
    
    '''On étudie ici la catégorie 1, afin de déterminer si son analyse est suffisante pour avoir des résultats probants, sinon étude des catégories suivantes '''
    
    data = profilage_sejour(1) # Catégorie large de diagnostics (niveau 1)
    
    list_columns = list(data.columns)
    target_cols = ['ds_serv_int', 'ds_tot_int']   #colonnes que l'on cherche à prédire
    cat_col_names = ['id_service']             #a confirmer 
    date_col_names = ['date_debut','date_entree_service','date_sortie_service']
    col_not_num = cat_col_names + date_col_names + target_cols 
    num_col_names = [x for x in list_columns if x not in col_not_num]
    
    date_col_list = [('date_debut','T'),('date_entree_service','T'),('date_sortie_service','T')]
    
    train, test = train_test_split(data, random_state=42)
    train, val = train_test_split(train, random_state=42)
    
    
    '''Comme nous allons utiliser un taux d'apprentissage plus élevé que sur l'usage basique de PyTorch Tabular, nous augmentons le nombre d'epochs (de 20 à 50) '''
    
    batch_size = 512
    steps_per_epoch = int(train.shape[0]/batch_size)
    epochs = 50
    
    
    ## Configurations 
    
    # Configuration des données
    data_config = DataConfig(
        target=target_cols, #target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
        continuous_cols=num_col_names,
        categorical_cols=cat_col_names,
        date_columns= date_col_list,
        encode_date_columns = True, 
        validation_split = 0.2,         #80% Train + test 20% validation
        continuous_feature_transform="quantile_normal",
    
    )
    
    # Configuration de l'entrainement
    trainer_config = TrainerConfig(
        auto_lr_find=False, # A voir si pertinent ?
        batch_size=batch_size,
        max_epochs=epochs,
        early_stopping=None,        # a voir si utile ?
        accumulate_grad_batches=2,
        gpus=1, #index du GPU à utiliser. 0 indique un CPU
    )
    
    
    # Configuration de l'optimisation du taux d'apprentissage
    optimizer_config = OptimizerConfig(
        lr_scheduler="OneCycleLR",  #Politique du taux d'apprentissage à un cycle PyTorch (changeant à chaque batch) 
        lr_scheduler_params={"max_lr":2e-3,     #taux d'apprentissage maximal dans le cycle
            "epochs": epochs, 
            "steps_per_epoch":steps_per_epoch}
    )
    
    
    
    
    # Configuration du modèle 
    ''' ici NODE - source : "Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data" - 09/2020 - https://arxiv.org/abs/1909.06312 '''
    
    
    model_config = NodeConfig(
        task="regression",
        num_layers=2, # Nombre de couches denses
        num_trees=1024, # Nombre d'arbres dans chaque couche
        depth=5, # Profondeur de chaque arbre
        embed_categorical=False, #If True, will use a learned embedding, else it will use LeaveOneOutEncoding for categorical columns
        learning_rate = 1e-3,
        target_range=None
    )
    
    # Utilisation de Pytoch Tabular
    tabular_model = TabularModel(
        data_config=data_config,
        model_config=model_config,
        optimizer_config=optimizer_config,
        trainer_config=trainer_config,
    )
    
    
    ## Entrainement du modèle 
    
    tabular_model.fit(train=train, 
                      validation=val, 
                      target_transform=PowerTransformer(method="yeo-johnson"), 
                      optimizer=QHAdam,         #Quasi-Hyperbolic Adam (voir rapport - https://paperswithcode.com/method/qhadam)
                      optimizer_params={"nus": (0.7, 1.0), "betas": (0.95, 0.998)})
    
    
    ## Résultats 
    
    result = tabular_model.evaluate(test)   # Evaluation du df en utilisant la loss et les metriques paramétrées dans la configuration
    
    pred_df = tabular_model.predict(test)
    pred_df.head()
    
    print("Durée de séjour par service")
    print_metrics(test['ds_serv_int'], pred_df["ds_serv_int_prediction"], tag="Holdout")
    print("Durée de séjour totale")
    print_metrics(test['ds_tot_int'], pred_df["ds_tot_int_prediction"], tag="Holdout")
    
    
    
    
    ## Sauvegarde du modèle 
    
    model_folder = os.path.join( "/Users", "victoire", "CodingProjects", "ML_Hopia", "Projet3A", "Models") #a changer selon le cas 
    
    tabular_model.TabularModel.save_model(model_folder)
    
    
    ## Utilisation du modèle sur de nouvelles données 
    
    ''' 
    new_data =
    
    tabular_model.TabularModel.predict(self, new_data)
    
    '''
    
    
    ## Chargement du modèle sauvegardé
    
    '''
    model_folder = os.path.join( "/Users", "victoire", "CodingProjects", "ML_Hopia", "Projet3A", "Models") #a changer selon le cas 
    
    tabular_model.TabularModel.load_from_checkpoint(model_folder)
    
    good first issue 
    opened by vicbach 7
  • Error in Google Colab

    Error in Google Colab

    ! pip install pytorch_tabular[all]

    from pytorch_tabular import TabularModel

    ImportError: cannot import name 'Batch' from 'torchtext.data' (/usr/local/lib/python3.7/dist-packages/torchtext/data/init.py)

    opened by franz101 6
  • Bug in categorical_encoders.py ?

    Bug in categorical_encoders.py ?

    Love the repo! I'm trying to use this to predict customer behavior and have a data set with some continuous and some categorical data as well as some dates.

    Describe the bug When I have a date_column and encode_date_columns=True for a classification objective, the following error occurs

    Traceback (most recent call last):
      File "1.py", line 53, in <module>
        tabular_model.fit(train=train, validation=val)
      File "/Users/andrewpierno/opt/anaconda3/envs/churn/lib/python3.7/site-packages/pytorch_tabular/tabular_model.py", line 455, in fit
        reset,
      File "/Users/andrewpierno/opt/anaconda3/envs/churn/lib/python3.7/site-packages/pytorch_tabular/tabular_model.py", line 383, in _pre_fit
        train, validation, test, target_transform, train_sampler
      File "/Users/andrewpierno/opt/anaconda3/envs/churn/lib/python3.7/site-packages/pytorch_tabular/tabular_model.py", line 294, in _prepare_dataloader
        self.datamodule.setup("fit")
      File "/Users/andrewpierno/opt/anaconda3/envs/churn/lib/python3.7/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn
        return fn(*args, **kwargs)
      File "/Users/andrewpierno/opt/anaconda3/envs/churn/lib/python3.7/site-packages/pytorch_tabular/tabular_datamodule.py", line 267, in setup
        self.validation, stage="inference"
      File "/Users/andrewpierno/opt/anaconda3/envs/churn/lib/python3.7/site-packages/pytorch_tabular/tabular_datamodule.py", line 182, in preprocess_data
        data = self.categorical_encoder.transform(data)
      File "/Users/andrewpierno/opt/anaconda3/envs/churn/lib/python3.7/site-packages/pytorch_tabular/categorical_encoders.py", line 41, in transform
        assert all(c in X.columns for c in self.cols)
    AssertionError
    

    I put some logging in around X.columns and self.cols

    X is:
    Index(['interval', 'amount', 'status', 'target', '_Month', '_Quarter',
           '_Is_quarter_end', '_Is_year_end'],
          dtype='object')
    self cols is:
    ['interval', 'status', '_Month', '_Quarter', '_Is_quarter_end', '_Is_year_end']
    [Next call happens immediately after]
    X is:
    Index(['interval', 'amount', 'status', 'target', '_Month', '_Quarter',
           '_Is_quarter_start', '_Is_year_start'],
          dtype='object')
    self cols is:
    ['interval', 'status', '_Month', '_Quarter', '_Is_quarter_end', '_Is_year_end']
    

    To Reproduce Steps to reproduce the behavior:

    1. Here is the script
    from pytorch_tabular import TabularModel
    from pytorch_tabular.models import CategoryEmbeddingModelConfig
    from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig
    from sklearn.model_selection import train_test_split
    import pandas as pd
    
    
    df = pd.read_csv('./repro.csv')
    data = df
    
    target_cols = ['target'] 
    cat_col_names = ['interval', 'status']
    continuous_col_names = ['amount']
    date_col_list = [('date', 'M')] # Note: other timeframes don't work either.
    
    train, test = train_test_split(data, random_state=42)
    train, val = train_test_split(train, random_state=42)
    
    data_config = DataConfig(
        target=target_cols, #target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
        continuous_cols=continuous_col_names,
        categorical_cols=cat_col_names,
        date_columns=date_col_list,
        encode_date_columns=True,
        #    validation_split = 0.2,         #80% Train + test 20% validation
        num_workers=8,
        # continuous_feature_transform="quantile_normal",
    )
    
    
    trainer_config = TrainerConfig(
        auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
        batch_size=8,
        max_epochs=100,
        gpus=0, # index of the GPU to use. 0, means CPU
    )
    optimizer_config = OptimizerConfig()
    
    model_config = CategoryEmbeddingModelConfig(
        task="classification",
        layers="1024-512-512",  # Number of nodes in each layer
        activation="LeakyReLU", # Activation between each layers
        learning_rate = 1e-3
    )
    
    tabular_model = TabularModel(
        data_config=data_config,
        model_config=model_config,
        optimizer_config=optimizer_config,
        trainer_config=trainer_config,
    )
    
    tabular_model.fit(train=train, validation=val)
    result = tabular_model.evaluate(test)
    pred_df = tabular_model.predict(test)
    tabular_model.save_model("examples/basic")
    # loaded_model = TabularModel.load_from_checkpoint("examples/basic")
    # result = loaded_model.evaluate(test)
    
    1. here is the data

    https://docs.google.com/spreadsheets/d/1jfV_p0pRXv0zkQLvaQXuDvVtw21FblT7YK83760SPDQ/edit?usp=sharing

    1. Run example.py and see the error assert all(c in X.columns for c in self.cols)

    Expected behavior assert all(c in X.columns for c in self.cols) should pass when using date_columns and encode_date_columns=True

    Screenshots If applicable, add screenshots to help explain your problem.

    Desktop (please complete the following information): ios

    Additional context

    opened by wrannaman 6
  • Problem with model saved with

    Problem with model saved with "salve_model_for_inference"

    I have succesfully trained a FTTransformer. Now I want save the model in pytorch format and use the model for making inference on new data.

    This is the code:

    csv_path = "C:/Users/path/to/dataset.csv"
    df = pd.read_csv(csv_path, sep=";")
    df.drop("Class", axis=1, inplace=True)
    features = list(df.columns)
    
    train, test = model_selection.train_test_split(df, test_size=0.2, shuffle=True)
    train, val = model_selection.train_test_split(train, test_size=0.2, shuffle=True)
    
    model = torch.load("path/inference/model.pt") #model saved with "save_model_for_inference"
    
    model.eval()
    
    with torch.no_grad():
        res = model(test)
    
    print(res)
    
    

    But I'have the error:

    Traceback (most recent call last): File "C:\Users\mfraccaroli\Miniconda3\envs\nvidia_pytorch\lib\site-packages\pandas\core\indexes\base.py", line 3803, in get_loc return self._engine.get_loc(casted_key) File "pandas_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas_libs\hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'continuous'

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "C:\Users\mfraccaroli\Desktop\UniLrn_vs_pytorch\inference.py", line 38, in res = model(test) File "C:\Users\mfraccaroli\Miniconda3\envs\nvidia_pytorch\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\mfraccaroli\Miniconda3\envs\nvidia_pytorch\lib\site-packages\pytorch_tabular-0.7.0-py3.10.egg\pytorch_tabular\models\base_model.py", line 224, in forward x = self.compute_backbone(x) File "C:\Users\mfraccaroli\Miniconda3\envs\nvidia_pytorch\lib\site-packages\pytorch_tabular-0.7.0-py3.10.egg\pytorch_tabular\models\base_model.py", line 191, in compute_backbone x = self.backbone(x) File "C:\Users\mfraccaroli\Miniconda3\envs\nvidia_pytorch\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\mfraccaroli\Miniconda3\envs\nvidia_pytorch\lib\site-packages\pytorch_tabular-0.7.0-py3.10.egg\pytorch_tabular\models\ft_transformer\ft_transformer.py", line 164, in forward continuous_data, categorical_data = x["continuous"], x["categorical"] File "C:\Users\mfraccaroli\Miniconda3\envs\nvidia_pytorch\lib\site-packages\pandas\core\frame.py", line 3805, in getitem indexer = self.columns.get_loc(key) File "C:\Users\mfraccaroli\Miniconda3\envs\nvidia_pytorch\lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc raise KeyError(key) from err KeyError: 'continuous'

    This is the code that I have written for train the model:

    from pytorch_tabular import TabularModel
    from pytorch_tabular.models import FTTransformerModel, FTTransformerConfig
    from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig, ExperimentConfig
    import pandas as pd
    from sklearn import model_selection
    from torchmetrics import Accuracy, AUROC, PrecisionRecallCurve, Precision, Recall
    from pytorch_tabular.utils import get_balanced_sampler, get_class_weighted_cross_entropy
    import torch
    
    def Normalize(df):
        means, stds = {}, {}
        for n in df.columns:
            means[n], stds[n] = df[n].mean(), df[n].std()
            df[n] = (df[n]-means[n]) / (1e-7 + stds[n])
        return df
    
    csv_path = "C:/path/to/dataset.csv"
    df = pd.read_csv(csv_path, sep=";")
    df.drop("Unnamed: 43", axis=1, inplace=True)
    features = list(df.columns)
    
    train, test = model_selection.train_test_split(df, test_size=0.2, shuffle=True)
    train, val = model_selection.train_test_split(train, test_size=0.2, shuffle=True)
    
    data_config = DataConfig(
        target=['Class'], #target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
        continuous_cols=features,
    )
    
    trainer_config = TrainerConfig(
        auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
        batch_size=128,
        max_epochs=100,
        gpus=1, #index of the GPU to use. 0, means CPU
    )
    optimizer_config = OptimizerConfig()
    
    model_config = FTTransformerConfig(
        task="classification",
        learning_rate = 1e-3,
        metrics=["auroc", "accuracy", "precision", "recall"],
        metrics_params=[{"num_classes": 2}, {}, {}, {}],
        attn_feature_importance=True,
    )
    
    exp_config = ExperimentConfig(
        project_name="Unbalanced_Classification",
        run_name="FT-Transformer",
        log_target="tensorboard",
    )
    
    tabular_model = TabularModel(
        data_config=data_config,
        model_config=model_config,
        optimizer_config=optimizer_config,
        trainer_config=trainer_config,
        experiment_config=exp_config,
    )
    
    tabular_model.fit(train=train, validation=val)
    result = tabular_model.evaluate(test)
    pred_df = tabular_model.predict(test)
    pred_df.to_excel("pred_result.xlsx")
    tabular_model.save_model("examples/basic")
    
    tabular_model.save_model_for_inference("path/inference/model.pt", kind="pytorch")
    

    Any idea how to fix?

    opened by micheleFraccaroli 0
  • How to use PCA in conjunction with models?

    How to use PCA in conjunction with models?

    Hi!

    I've been using this library in some experiments, and I came across a question. How can I apply PCA considering the training and testing sets?

    Currently, my split strategy is as follows:

    train, test = train_test_split(df, random_state=42, stratify=df['emotion'])

    I tried to adopt the following logic to apply the PCA:

    PCA = PCA(0.9) train = pca.fit_transform(train.loc[:, ~train.columns.isin(['emotion'])]) test = pca.transform(test.loc[:, ~test.columns.isin(['emotion'])]) pca.n_components_

    However, this strategy proved to be ineffective. Is there any way to do this procedure? Sorry for the naive question. I'm still learning how to work with these subjects.

    opened by PSCM 0
  • AttributeError: 'FTTransformerBackbone' object has no attribute 'cont_embedding_layer'

    AttributeError: 'FTTransformerBackbone' object has no attribute 'cont_embedding_layer'

    Hi,

    I have tested the FFTransformer

    I have a dataframe and I have divided in train, test and validation, I have a variable Target with two classes. The idea is to make classification and the extract the feature importance.

    I have tested NODE with the same dataset and it worked. For the FFTtransformer I tried this code (I checked this code )

    data_config = DataConfig(
        target=['target'],
        continuous_cols=data.columns.tolist()[:concat.shape[1]],
        #categorical_cols=cat_col,
        continuous_feature_transform="quantile_normal",
        normalize_continuous_features=True
    )
    trainer_config = TrainerConfig(
        auto_lr_find=True, 
        batch_size=1024,
        max_epochs=1000,
        auto_select_gpus=False,
        gpus=0, 
    )
    
    
    optimizer_config = OptimizerConfig()
    model_config = FTTransformerConfig(
         task="classification",
         metrics=["f1", "accuracy"],
         #embedding_initialization=None,
         embedding_bias=True,
         share_embedding = True,
         share_embedding_strategy="fraction",
         shared_embedding_fraction=0.25,
         metrics_params=[{"num_classes": 2, "average": "macro"}, {}],
     )
    tabular_model = TabularModel(
        data_config=data_config,
        model_config=model_config,
        optimizer_config=optimizer_config,
        trainer_config=trainer_config,
    )
    tabular_model.fit(train=train, test=val) #fitting the model
    
    

    I got an error and I am not sure where the error is coming from:

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-39-b2d72c9b96cf> in <module>
         36 )
         37 #tabular_model.fit(train=train, test=val) #fitting the model
    ---> 38 tabular_model.fit(train=train)
         39 results = pd.DataFrame(index = measures, columns = ["base"])
         40 base=results_NODE(model= tabular_model, X_test = test,  y=  test['target'])
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_tabular/tabular_model.py in fit(self, train, validation, test, loss, metrics, optimizer, optimizer_params, train_sampler, target_transform, max_epochs, min_epochs, reset, seed)
        445         self.model.train()
        446         if self.config.auto_lr_find and (not self.config.fast_dev_run):
    --> 447             self.trainer.tune(self.model, train_loader, val_loader)
        448             # Parameters in models needs to be initialized again after LR find
        449             self.model.data_aware_initialization(self.datamodule)
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in tune(self, model, train_dataloader, val_dataloaders, datamodule, scale_batch_size_kwargs, lr_find_kwargs)
        684         )
        685 
    --> 686         result = self.tuner._tune(model, scale_batch_size_kwargs=scale_batch_size_kwargs, lr_find_kwargs=lr_find_kwargs)
        687 
        688         assert self.state.stopped
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/tuner/tuning.py in _tune(self, model, scale_batch_size_kwargs, lr_find_kwargs)
         52         if self.trainer.auto_lr_find:
         53             lr_find_kwargs.setdefault('update_attr', True)
    ---> 54             result['lr_find'] = lr_find(self.trainer, model, **lr_find_kwargs)
         55 
         56         self.trainer.state.status = TrainerStatus.FINISHED
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/tuner/lr_finder.py in lr_find(trainer, model, min_lr, max_lr, num_training, mode, early_stop_threshold, update_attr)
        248 
        249     # Fit, lr & loss logged in callback
    --> 250     trainer.tuner._run(model)
        251 
        252     # Prompt if we stopped early
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/tuner/tuning.py in _run(self, *args, **kwargs)
         62         self.trainer.state.status = TrainerStatus.RUNNING  # last `_run` call might have set it to `FINISHED`
         63         self.trainer.training = True
    ---> 64         self.trainer._run(*args, **kwargs)
         65         self.trainer.tuning = True
         66 
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in _run(self, model)
        754 
        755         # dispatch `start_training` or `start_evaluating` or `start_predicting`
    --> 756         self.dispatch()
        757 
        758         # plugin will finalized fitting (e.g. ddp_spawn will load trained model)
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in dispatch(self)
        795             self.accelerator.start_predicting(self)
        796         else:
    --> 797             self.accelerator.start_training(self)
        798 
        799     def run_stage(self):
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer)
         94 
         95     def start_training(self, trainer: 'pl.Trainer') -> None:
    ---> 96         self.training_type_plugin.start_training(trainer)
         97 
         98     def start_evaluating(self, trainer: 'pl.Trainer') -> None:
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer)
        142     def start_training(self, trainer: 'pl.Trainer') -> None:
        143         # double dispatch to initiate the training loop
    --> 144         self._results = trainer.run_stage()
        145 
        146     def start_evaluating(self, trainer: 'pl.Trainer') -> None:
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_stage(self)
        805         if self.predicting:
        806             return self.run_predict()
    --> 807         return self.run_train()
        808 
        809     def _pre_training_routine(self):
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_train(self)
        840             self.progress_bar_callback.disable()
        841 
    --> 842         self.run_sanity_check(self.lightning_module)
        843 
        844         self.checkpoint_connector.has_trained = False
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
       1105 
       1106             # run eval step
    -> 1107             self.run_evaluation()
       1108 
       1109             self.on_sanity_check_end()
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, on_epoch)
        960                 # lightning module methods
        961                 with self.profiler.profile("evaluation_step_and_end"):
    --> 962                     output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
        963                     output = self.evaluation_loop.evaluation_step_end(output)
        964 
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py in evaluation_step(self, batch, batch_idx, dataloader_idx)
        172             model_ref._current_fx_name = "validation_step"
        173             with self.trainer.profiler.profile("validation_step"):
    --> 174                 output = self.trainer.accelerator.validation_step(args)
        175 
        176         # capture any logged information
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in validation_step(self, args)
        224 
        225         with self.precision_plugin.val_step_context(), self.training_type_plugin.val_step_context():
    --> 226             return self.training_type_plugin.validation_step(*args)
        227 
        228     def test_step(self, args: List[Union[Any, int]]) -> Optional[STEP_OUTPUT]:
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in validation_step(self, *args, **kwargs)
        159 
        160     def validation_step(self, *args, **kwargs):
    --> 161         return self.lightning_module.validation_step(*args, **kwargs)
        162 
        163     def test_step(self, *args, **kwargs):
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_tabular/models/base_model.py in validation_step(self, batch, batch_idx)
        184     def validation_step(self, batch, batch_idx):
        185         y = batch["target"]
    --> 186         y_hat = self(batch)["logits"]
        187         _ = self.calculate_loss(y, y_hat, tag="valid")
        188         _ = self.calculate_metrics(y, y_hat, tag="valid")
    
    ~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
       1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1050                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1051             return forward_call(*input, **kwargs)
       1052         # Do not call functions when jit is used
       1053         full_backward_hooks, non_full_backward_hooks = [], []
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_tabular/models/ft_transformer/ft_transformer.py in forward(self, x)
        227 
        228     def forward(self, x: Dict):
    --> 229         x = self.backbone(x)
        230         x = self.dropout(x)
        231         y_hat = self.output_layer(x)
    
    ~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
       1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1050                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1051             return forward_call(*input, **kwargs)
       1052         # Do not call functions when jit is used
       1053         full_backward_hooks, non_full_backward_hooks = [], []
    
    ~/anaconda3/lib/python3.7/site-packages/pytorch_tabular/models/ft_transformer/ft_transformer.py in forward(self, x)
        171             x_cont = torch.mul(
        172                 continuous_data.unsqueeze(2),
    --> 173                 self.cont_embedding_layer(cont_idx),
        174             )
        175             if self.hparams.embedding_bias:
    
    ~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
       1129                 return modules[name]
       1130         raise AttributeError("'{}' object has no attribute '{}'".format(
    -> 1131             type(self).__name__, name))
       1132 
       1133     def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:
    
    AttributeError: 'FTTransformerBackbone' object has no attribute 'cont_embedding_layer
    

    Thank you for your help

    opened by SalvatoreRa 1
  • Support for MPS as accelerator

    Support for MPS as accelerator

    Is your feature request related to a problem? Please describe. I prototype networks on my apple M1 laptop and it looks like pytorch and pytorch lightning already support it. I'd like to use it to speed up my work.

    Describe the solution you'd like In config.py we need to change the interface a bit, and in tabular_model.py line 331 we should get away with not changing anything.

    Describe alternatives you've considered Not using mps is meh

    opened by Aceticia 1
  • Masking certain features in attention based models

    Masking certain features in attention based models

    Is your feature request related to a problem? Please describe. In my dataset, certain columns of certain rows (as opposed to the entire column) have invalid values. Currently I'm not sure how to solve it.

    Describe the solution you'd like We can pass in an optional mask dataframe with a similar format to how pytorch's own transformers handle masking tokens such as padding.

    Describe alternatives you've considered I considered just removing the rows or replacing them with a fixed value. It's probably not ideal but it somewhat works.

    Additional context I don't know if this is possible yet, so I'm mostly asking a question here. If it's not planned I might be able to help out on this. I imagine the implementation might be relatively easy for the transformer- / attention-based models.

    opened by Aceticia 2
Releases(v0.7.0)
Conflict-aware Inference of Python Compatible Runtime Environments with Domain Knowledge Graph, ICSE 2022

PyCRE Conflict-aware Inference of Python Compatible Runtime Environments with Domain Knowledge Graph, ICSE 2022 Dependencies This project is developed

<a href=[email protected]"> 7 May 06, 2022
Pytorch implementation for "Adversarial Robustness under Long-Tailed Distribution" (CVPR 2021 Oral)

Adversarial Long-Tail This repository contains the PyTorch implementation of the paper: Adversarial Robustness under Long-Tailed Distribution, CVPR 20

Tong WU 89 Dec 15, 2022
Personals scripts using ageitgey/face_recognition

HOW TO USE pip3 install requirements.txt Add some pictures of known people in the folder 'people' : a) Create a folder called by the name of the perso

Antoine Bollengier 1 Jan 06, 2022
Implementation of the federated dual coordinate descent (FedDCD) method.

FedDCD.jl Implementation of the federated dual coordinate descent (FedDCD) method. Installation To install, just call Pkg.add("https://github.com/Zhen

Zhenan Fan 6 Sep 21, 2022
Mmdetection3d Noted - MMDetection3D is an open source object detection toolbox based on PyTorch

MMDetection3D is an open source object detection toolbox based on PyTorch

Jiangjingwen 13 Jan 06, 2023
Hypercomplex Neural Networks with PyTorch

HyperNets Hypercomplex Neural Networks with PyTorch: this repository would be a container for hypercomplex neural network modules to facilitate resear

Eleonora Grassucci 21 Dec 27, 2022
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayes

Intel Labs 210 Jan 04, 2023
Improving XGBoost survival analysis with embeddings and debiased estimators

xgbse: XGBoost Survival Embeddings "There are two cultures in the use of statistical modeling to reach conclusions from data

Loft 242 Dec 30, 2022
A program to recognize fruits on pictures or videos using yolov5

Yolov5 Fruits Detector Requirements Either Linux or Windows. We recommend Linux for better performance. Python 3.6+ and PyTorch 1.7+. Installation To

Fateme Zamanian 30 Jan 06, 2023
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022
An AFL implementation with UnTracer (our coverage-guided tracer)

UnTracer-AFL This repository contains an implementation of our prototype coverage-guided tracing framework UnTracer in the popular coverage-guided fuz

113 Dec 17, 2022
Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation (ICCV 2021) [中文|EN] 概述 本工作主要探索一种高效的多传感器(激光雷达和摄像头)融合点云语义分割方法。现有的多传感器融合方法主要将点云投影

ICE 126 Dec 30, 2022
ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing This repository contains code for the ICLR 2021 paper "SCoRE: Pre-Tr

Microsoft 28 Oct 02, 2022
Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Pytorch Code for VideoLT [Website][Paper] Updates [10/29/2021] Features uploaded to Google Drive, for access please send us an e-mail: zhangxing18 at

Skye 26 Sep 18, 2022
Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution Abstract Within the Latin (and ancient Greek) production, it is well

4 Dec 03, 2022
利用yolov5和TensorRT从0到1实现目标检测的模型训练到模型部署全过程

写在前面 利用TensorRT加速推理速度是以时间换取精度的做法,意味着在推理速度上升的同时将会有精度的下降,不过不用太担心,精度下降微乎其微。此外,要有NVIDIA显卡,经测试,CUDA10.2可以支持20系列显卡及以下,30系列显卡需要CUDA11.x的支持,并且目前有bug。 默认你已经完成了

Helium 6 Jul 28, 2022
Pytorch implementation of our paper accepted by NeurIPS 2021 -- Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021) (Link) Overview Prerequisites Linu

Shaojie Li 34 Mar 31, 2022
Direct Multi-view Multi-person 3D Human Pose Estimation

Implementation of NeurIPS-2021 paper: Direct Multi-view Multi-person 3D Human Pose Estimation [paper] [video-YouTube, video-Bilibili] [slides] This is

Sea AI Lab 251 Dec 30, 2022
Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"

UPT: Unary–Pairwise Transformers This repository contains the official PyTorch implementation for the paper Frederic Z. Zhang, Dylan Campbell and Step

Frederic Zhang 109 Dec 20, 2022
Global Filter Networks for Image Classification

Global Filter Networks for Image Classification Created by Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie Zhou This repository contains PyTorch

Yongming Rao 273 Dec 26, 2022