PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

Overview

README

TabNet : Attentive Interpretable Tabular Learning

This is a pyTorch implementation of Tabnet (Arik, S. O., & Pfister, T. (2019). TabNet: Attentive Interpretable Tabular Learning. arXiv preprint arXiv:1908.07442.) https://arxiv.org/pdf/1908.07442.pdf.

CircleCI

PyPI version

PyPI - Downloads

Any questions ? Want to contribute ? To talk with us ? You can join us on Slack

Installation

Easy installation

You can install using pip by running: pip install pytorch-tabnet

Source code

If you wan to use it locally within a docker container:

  • git clone [email protected]:dreamquark-ai/tabnet.git

  • cd tabnet to get inside the repository


CPU only

  • make start to build and get inside the container

GPU

  • make start-gpu to build and get inside the GPU container

  • poetry install to install all the dependencies, including jupyter

  • make notebook inside the same terminal. You can then follow the link to a jupyter notebook with tabnet installed.

What problems does pytorch-tabnet handle?

  • TabNetClassifier : binary classification and multi-class classification problems
  • TabNetRegressor : simple and multi-task regression problems
  • TabNetMultiTaskClassifier: multi-task multi-classification problems

How to use it?

TabNet is now scikit-compatible, training a TabNetClassifier or TabNetRegressor is really easy.

from pytorch_tabnet.tab_model import TabNetClassifier, TabNetRegressor

clf = TabNetClassifier()  #TabNetRegressor()
clf.fit(
  X_train, Y_train,
  eval_set=[(X_valid, y_valid)]
)
preds = clf.predict(X_test)

or for TabNetMultiTaskClassifier :

from pytorch_tabnet.multitask import TabNetMultiTaskClassifier
clf = TabNetMultiTaskClassifier()
clf.fit(
  X_train, Y_train,
  eval_set=[(X_valid, y_valid)]
)
preds = clf.predict(X_test)

The targets on y_train/y_valid should contain a unique type (e.g. they must all be strings or integers).

Default eval_metric

A few classic evaluation metrics are implemented (see further below for custom ones):

  • binary classification metrics : 'auc', 'accuracy', 'balanced_accuracy', 'logloss'
  • multiclass classification : 'accuracy', 'balanced_accuracy', 'logloss'
  • regression: 'mse', 'mae', 'rmse', 'rmsle'

Important Note : 'rmsle' will automatically clip negative predictions to 0, because the model can predict negative values. In order to match the given scores, you need to use np.clip(clf.predict(X_predict), a_min=0, a_max=None) when doing predictions.

Custom evaluation metrics

You can create a metric for your specific need. Here is an example for gini score (note that you need to specifiy whether this metric should be maximized or not):

from pytorch_tabnet.metrics import Metric
from sklearn.metrics import roc_auc_score

class Gini(Metric):
    def __init__(self):
        self._name = "gini"
        self._maximize = True

    def __call__(self, y_true, y_score):
        auc = roc_auc_score(y_true, y_score[:, 1])
        return max(2*auc - 1, 0.)

clf = TabNetClassifier()
clf.fit(
  X_train, Y_train,
  eval_set=[(X_valid, y_valid)],
  eval_metric=[Gini]
)

A specific customization example notebook is available here : https://github.com/dreamquark-ai/tabnet/blob/develop/customizing_example.ipynb

Semi-supervised pre-training

Added later to TabNet's original paper, semi-supervised pre-training is now available via the class TabNetPretrainer:

# TabNetPretrainer
unsupervised_model = TabNetPretrainer(
    optimizer_fn=torch.optim.Adam,
    optimizer_params=dict(lr=2e-2),
    mask_type='entmax' # "sparsemax"
)

unsupervised_model.fit(
    X_train=X_train,
    eval_set=[X_valid],
    pretraining_ratio=0.8,
)

clf = TabNetClassifier(
    optimizer_fn=torch.optim.Adam,
    optimizer_params=dict(lr=2e-2),
    scheduler_params={"step_size":10, # how to use learning rate scheduler
                      "gamma":0.9},
    scheduler_fn=torch.optim.lr_scheduler.StepLR,
    mask_type='sparsemax' # This will be overwritten if using pretrain model
)

clf.fit(
    X_train=X_train, y_train=y_train,
    eval_set=[(X_train, y_train), (X_valid, y_valid)],
    eval_name=['train', 'valid'],
    eval_metric=['auc'],
    from_unsupervised=unsupervised_model
)

The loss function has been normalized to be independent of pretraining_ratio, batch_size and the number of features in the problem. A self supervised loss greater than 1 means that your model is reconstructing worse than predicting the mean for each feature, a loss bellow 1 means that the model is doing better than predicting the mean.

A complete example can be found within the notebook pretraining_example.ipynb.

/!\ : current implementation is trying to reconstruct the original inputs, but Batch Normalization applies a random transformation that can't be deduced by a single line, making the reconstruction harder. Lowering the batch_size might make the pretraining easier.

Useful links

Model parameters

  • n_d : int (default=8)

    Width of the decision prediction layer. Bigger values gives more capacity to the model with the risk of overfitting. Values typically range from 8 to 64.

  • n_a: int (default=8)

    Width of the attention embedding for each mask. According to the paper n_d=n_a is usually a good choice. (default=8)

  • n_steps : int (default=3)

    Number of steps in the architecture (usually between 3 and 10)

  • gamma : float (default=1.3)

    This is the coefficient for feature reusage in the masks. A value close to 1 will make mask selection least correlated between layers. Values range from 1.0 to 2.0.

  • cat_idxs : list of int (default=[] - Mandatory for embeddings)

    List of categorical features indices.

  • cat_dims : list of int (default=[] - Mandatory for embeddings)

    List of categorical features number of modalities (number of unique values for a categorical feature) /!\ no new modalities can be predicted

  • cat_emb_dim : list of int (optional)

    List of embeddings size for each categorical features. (default =1)

  • n_independent : int (default=2)

    Number of independent Gated Linear Units layers at each step. Usual values range from 1 to 5.

  • n_shared : int (default=2)

    Number of shared Gated Linear Units at each step Usual values range from 1 to 5

  • epsilon : float (default 1e-15)

    Should be left untouched.

  • seed : int (default=0)

    Random seed for reproducibility

  • momentum : float

    Momentum for batch normalization, typically ranges from 0.01 to 0.4 (default=0.02)

  • clip_value : float (default None)

    If a float is given this will clip the gradient at clip_value.

  • lambda_sparse : float (default = 1e-3)

    This is the extra sparsity loss coefficient as proposed in the original paper. The bigger this coefficient is, the sparser your model will be in terms of feature selection. Depending on the difficulty of your problem, reducing this value could help.

  • optimizer_fn : torch.optim (default=torch.optim.Adam)

    Pytorch optimizer function

  • optimizer_params: dict (default=dict(lr=2e-2))

    Parameters compatible with optimizer_fn used initialize the optimizer. Since we have Adam as our default optimizer, we use this to define the initial learning rate used for training. As mentionned in the original paper, a large initial learning rate of 0.02 with decay is a good option.

  • scheduler_fn : torch.optim.lr_scheduler (default=None)

    Pytorch Scheduler to change learning rates during training.

  • scheduler_params : dict

    Dictionnary of parameters to apply to the scheduler_fn. Ex : {"gamma": 0.95, "step_size": 10}

  • model_name : str (default = 'DreamQuarkTabNet')

    Name of the model used for saving in disk, you can customize this to easily retrieve and reuse your trained models.

  • saving_path : str (default = './')

    Path defining where to save models.

  • verbose : int (default=1)

    Verbosity for notebooks plots, set to 1 to see every epoch, 0 to get None.

  • device_name : str (default='auto') 'cpu' for cpu training, 'gpu' for gpu training, 'auto' to automatically detect gpu.

  • mask_type: str (default='sparsemax') Either "sparsemax" or "entmax" : this is the masking function to use for selecting features

Fit parameters

  • X_train : np.array

    Training features

  • y_train : np.array

    Training targets

  • eval_set: list of tuple

    List of eval tuple set (X, y).
    The last one is used for early stopping

  • eval_name: list of str
    List of eval set names.

  • eval_metric : list of str
    List of evaluation metrics.
    The last metric is used for early stopping.

  • max_epochs : int (default = 200)

    Maximum number of epochs for trainng.

  • patience : int (default = 15)

    Number of consecutive epochs without improvement before performing early stopping.

    If patience is set to 0, then no early stopping will be performed.

    Note that if patience is enabled, then best weights from best epoch will automatically be loaded at the end of fit.

  • weights : int or dict (default=0)

    /!\ Only for TabNetClassifier Sampling parameter 0 : no sampling 1 : automated sampling with inverse class occurrences dict : keys are classes, values are weights for each class

  • loss_fn : torch.loss or list of torch.loss

    Loss function for training (default to mse for regression and cross entropy for classification) When using TabNetMultiTaskClassifier you can set a list of same length as number of tasks, each task will be assigned its own loss function

  • batch_size : int (default=1024)

    Number of examples per batch. Large batch sizes are recommended.

  • virtual_batch_size : int (default=128)

    Size of the mini batches used for "Ghost Batch Normalization". /!\ virtual_batch_size should divide batch_size

  • num_workers : int (default=0)

    Number or workers used in torch.utils.data.Dataloader

  • drop_last : bool (default=False)

    Whether to drop last batch if not complete during training

  • callbacks : list of callback function
    List of custom callbacks

  • pretraining_ratio : float

      /!\ TabNetPretrainer Only : Percentage of input features to mask during pretraining.
    
      Should be between 0 and 1. The bigger the harder the reconstruction task is.
    
Comments
  •    scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered

    scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered

    Describe the bug When running on GPU Tabnet crashes with scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered

    What is the current behavior? It works when the matrix I use contains only integers but fails with floats. I also made sure that NaN values are imputed and there are no Inf. Also the largest value fits into float32, Also set the batch size to a very low level.

    If the current behavior is a bug, please provide the steps to reproduce. tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)

    Expected behavior

    Screenshots

    tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10) Traceback (most recent call last):

    File "", line 1, in tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py", line 329, in fit fit_params_steps = self._check_fit_params(**fit_params)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py", line 248, in _check_fit_params "=sample_weight)`.".format(pname))

    ValueError: Pipeline.fit does not accept the batch_size parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight).

    tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10) No early stopping will be performed, last training weights will be used. Traceback (most recent call last):

    File "", line 1, in tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py", line 335, in fit self._final_estimator.fit(Xt, y, **fit_params_last_step)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py", line 173, in fit self._train_epoch(train_dataloader)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py", line 349, in _train_epoch batch_logs = self._train_batch(X, y)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py", line 384, in _train_batch output, M_loss = self.network(X)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 276, in forward return self.tabnet(x)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 151, in forward out = self.feat_transformersstep

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 375, in forward x = self.shared(x)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs)

    File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 409, in forward scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device))

    RuntimeError: CUDA error: device-side assert triggered

    Other relevant information: poetry version:
    python version: Operating System: Additional tools:

    Additional context

    bug 
    opened by ThomasWolf0701 25
  • Performance of pytorch-tabnet on forest cover type dataset

    Performance of pytorch-tabnet on forest cover type dataset

    Running out of the box the forest_example, the results differ significantly from the ones in the original paper. Specifically, I get the following:

    preds = clf.predict_proba(X_test)
    y_true = y_test
    test_acc = accuracy_score(y_pred=np.argmax(preds, axis=1), y_true=y_true)
    print(f"BEST VALID SCORE FOR {dataset_name} : {clf.best_cost}")
    BEST VALID SCORE FOR EPIGN : -0.8830427851320214
    
    print(f"FINAL TEST SCORE FOR {dataset_name} : {test_acc}")
    FINAL TEST SCORE FOR EPIGN : 0.0499728922661205
    

    Do you get similar results? Many thanks.

    question 
    opened by meechos 25
  • Models don't accept model_name, saving_path

    Models don't accept model_name, saving_path

    Describe the bug

    Models don't accept model_name, saving_path as initialization arguments.

    What is the current behavior?

    See above.

    If the current behavior is a bug, please provide the steps to reproduce.

    clf: TabNetClassifier = TabNetClassifier(saving_path="/home/user123/dev/", device_name="cpu")

    Expected behavior

    Models should accept model_name, saving_path as initialization arguments as specified in the documentation.

    Screenshots

    Other relevant information: poetry version:
    python version: Operating System: Additional tools:

    Additional context

    On a related note: How can models be persisted? The mentioned init parameters strongly suggest that it is possible, but I couldn't find any information on this - either in the documentation nor in the code.

    documentation 
    opened by rmitsch 23
  • Unable to score on CPU if model trained on GPU?

    Unable to score on CPU if model trained on GPU?

    Describe the bug

    getting errors: RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:47

    when trying to run .predict_proba on a GPU trained model even though it was loaded for CPU.

    all_clf=torch.load(filename,map_location=torch.device('cpu'))

    What is the current behavior?

    If the current behavior is a bug, please provide the steps to reproduce.

    Expected behavior

    is that the expected behavior or should I be able to score on CPU? Screenshots

    Other relevant information: poetry version:
    python version: Operating System: Additional tools:

    Additional context

    thanks

    bug 
    opened by tmontana 21
  • RandomizedSearchCV with pytorch-tabnet

    RandomizedSearchCV with pytorch-tabnet

    It appears that the TabNetClassifier does not have a get_params method for hyperparameter estimation.

    Is this reproducible your end?

    Many thanks

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-33-03d6c8d15377> in <module>()
          4 
          5 start = time()
    ----> 6 randomSearch.fit(X_train, y_train)
          7 
          8 
    
    1 frames
    /usr/local/lib/python3.6/dist-packages/sklearn/base.py in clone(estimator, safe)
         65                             "it does not seem to be a scikit-learn estimator "
         66                             "as it does not implement a 'get_params' methods."
    ---> 67                             % (repr(estimator), type(estimator)))
         68     klass = estimator.__class__
         69     new_object_params = estimator.get_params(deep=False)
    
    TypeError: Cannot clone object 'TabNetClassifier(n_d=32, n_a=32, n_steps=5,
                     lr=0.02, seed=0,
                     gamma=1.5, n_independent=2, n_shared=2,
                     cat_idxs=[],
                     cat_dims=[],
                     cat_emb_dim=1,
                     lambda_sparse=0.0001, momentum=0.3,
                     clip_value=2.0,
                     verbose=1, device_name="auto",
                     model_name="DreamQuarkTabNet", epsilon=1e-15,
                     optimizer_fn=<class 'torch.optim.adam.Adam'>,
                     scheduler_params={'gamma': 0.95, 'step_size': 20},
                     scheduler_fn=<class 'torch.optim.lr_scheduler.StepLR'>, saving_path="./")' (type <class 'pytorch_tabnet.tab_model.TabNetClassifier'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.
    
    question 
    opened by meechos 19
  • Add a `conda` install option for `pytorch-tabnet`

    Add a `conda` install option for `pytorch-tabnet`

    Feature request

    A conda installation option could be quite helpful as an alternative alongside the ability to install pytorch-tabnet with pip. I have already done the work necessary to make pytorch-tabnet available on conda-forge. (PR: https://github.com/conda-forge/staged-recipes/pull/17292)

    Now you can install pytorch-tabnet as:

    conda install -c conda-forge pytorch-tabnet
    

    Note: the PR was just merged. Give it some 2-3 hours and it will be ready for use.

    :bulb: I will send a PR to update the docs/readme.

    enhancement 
    opened by sugatoray 16
  • Hyperparameter Tunning

    Hyperparameter Tunning

    Hi,

    I would like to know whether it worths fine-tunning the hyperparameters of TABNET for a binary classification task. Also if it is, then which approach would you suggest taking?

    Best,

    Balázs

    enhancement 
    opened by balazsgonczy 14
  • Running out of memory during training

    Running out of memory during training

    When training with custom eval metric (pearson corr), after first evaluation my colab session runs out of memory.

    What is the current behavior? Training of TabNetRegressor starts fine and after first evaluation round, I run out of memory. I am training the model on GPU 16GB and free RAM is approx 40 GB. The RAM consumption during training steadily increases. I am training on a pretty large dataset (11 GB)

    Expected behavior

    I would expect that the RAM consumption is more or less constant during training, once the model is initialized.

    Screenshots

    max_epochs = 2
    batch_size = 1028
    model = TabNetRegressor(
                           optimizer_fn=torch.optim.Adam,
                           optimizer_params=dict(lr=1e-2)
                          )
    
    model.fit(
        X_train=factors_train[features].to_numpy(), y_train=factors_train.target.to_numpy().reshape((-1,1)),
        eval_set=[(factors_test[features].to_numpy(), factors_test.target.to_numpy().reshape((-1,1)))],
        eval_name=['test'],
        eval_metric=[PearsonCorrMetric],
        max_epochs=max_epochs , patience=5,
        batch_size=batch_size,
        virtual_batch_size=128,
        num_workers=0,
        drop_last=False
    )
    
    class PearsonCorrMetric(Metric):
      def __init__(self):
        self._name = "pearson_corr"
        self._maximize = True
      
    def __call__(self, y_true, y_score):
        return corr_score(y_true, y_score)[1]
    
    def corr_score(y_true, y_pred):
        return "score", np.corrcoef(y_true, y_pred)[0,1], True
    

    Other relevant information: poetry version: ? python version: 3.8 Operating System: Ubuntu Additional tools:

    Additional context

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
    | N/A   40C    P0    24W / 300W |      0MiB / 16160MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    
    help wanted 
    opened by Kayne88 13
  • setting class weights for multiclass classification

    setting class weights for multiclass classification

    If I am training TabNet for multiclass classification with unbalanced labels, how do I set the class weights?

    for example, for lightgbm, I can set the class_weight = 'balanced' parameter when defining the model.

    Screen Shot 2022-04-02 at 2 16 28 AM
    opened by puzzlecollector 11
  • TypeError: __init__() got an unexpected keyword argument 'n_indep_decoder'

    TypeError: __init__() got an unexpected keyword argument 'n_indep_decoder'

    Hi,

    I am trying to load a saved Tabnet model trained with your github repo.

    Code: from pytorch_tabnet.tab_model import TabNetClassifier loaded_clf = TabNetClassifier() loaded_clf.load_model("C:/Users/goncz/Desktop/test_model.zip")

    Error message:

    Device used : cpu

    TypeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_3000/3925200397.py in 1 loaded_clf = TabNetClassifier() ----> 2 loaded_clf.load_model("C:/Users/goncz/Desktop/test_model.zip") 3

    ~\AppData\Local\Programs\Orange\lib\site-packages\pytorch_tabnet\abstract_model.py in load_model(self, filepath) 409 raise KeyError("Your zip file is missing at least one component") 410 --> 411 self.init(**loaded_params["init_params"]) 412 413 self._set_network()

    TypeError: init() got an unexpected keyword argument 'n_indep_decoder'

    bug 
    opened by balazsgonczy 11
  • Difference problem between Local explainability and Global explainability

    Difference problem between Local explainability and Global explainability

    HI, I train tabnet with 4700-dimension feature,and i check the Global explainability.(Because of its sparseness, I did unique(),the s shows the index in the Global explainability matrix ) and i input training data to the function .explain(Training data), i sum explain_matrix cross the rows, but i get totally different result with Global explainability. For example,the index of the globally explained maximum value is even 0 in the output of explain() function 1647936738488

    bug 
    opened by yaoching0 11
  • Need Help Regarding TabNetEncoder output and TabNetPretraining

    Need Help Regarding TabNetEncoder output and TabNetPretraining

    Hye, Optimox. As you answered previously, I can use TabNetEncoder to produce custom sizes of embedding. Upon testing the layer, I realised it produced three outputs, I dont know which one is more important. image

    The paper that I tried to copy also used unsupervised training. TabNetPretraining layer also produced three different outputs. I checked the outputs' meaning on GitHub but I also doesnt know which one is important and how to use that to pass to TabNetEncoder.

    The paper that I'm trying to duplicate is here (https://ieeexplore.ieee.org/document/9658729). Basically, the methods that I need to duplicate from the paper are: image

    • Unsupervised learning using TabNet
    • Encode the value using TabNet
    • Get feature importance

    I already managed the transformer part, but I'm still stuck at the TabNet part.

    Lastly, what is the difference between forward and forward mask?

    Thank you.

    help wanted 
    opened by Hazqeel09 1
  • chore(deps): bump certifi from 2021.10.8 to 2022.12.7

    chore(deps): bump certifi from 2021.10.8 to 2022.12.7

    Bumps certifi from 2021.10.8 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Add tensorboard logging

    Add tensorboard logging

    Feature request

    What is the expected behavior? Add support for tensorboard log files as an optional option to model parameters

    What is motivation or use case for adding/changing the behavior? Better tracking of different runs when tuning hyper parameters

    How should this be implemented in your opinion? Utilize existing torch.utils.tensorboard methods provided by pytorch

    Are you willing to work on this yourself? Not sure

    enhancement 
    opened by KAEYL98 0
  • Errors thown when using custom loss function (rmsle)

    Errors thown when using custom loss function (rmsle)

    Hello! I'm super excited to dig into using pytorch_tabnet, but I've been banging my head against a wall for the past 2 nights on this issue, so I'm putting out a call for assistance.

    I've got everything setup properly and confirmed that my data has no missing values and no values outside the defined dimensions.

    I can train properly using the default (MSELoss) loss function, but for my particular problem I need to use either mean squared log error or, ideally, root mean squared log error.

    I've defined a custom loss function as follows:

    def rmsle_loss(y_pred, y_true):
        return torch.sqrt(nn.functional.mse_loss(torch.log(y_pred + 1), torch.log(y_true + 1)))
    

    And I'm applying it to the model with the loss_fn=rmsle_loss parameter to .fit().

    However - when I do this, I'm getting these dreaded errors.

    Using CPU: index -1 is out of bounds for dimension 1 with size 22

    Using GPU: CUDA error: device-side assert triggered

    Both of these are being thrown at line 94 in sparsemax.py:

    tau = input_cumsum.gather(dim, support_size - 1)
    

    Note this ONLY happens when I'm using a custom loss function. I am able to train the model just fine using the default loss function, but since that's not ideal for my domain, I really need to use the custom function. As I mentioned above, I've confirmed that there are no inf, NA, or out-of-bounds data in my training set.

    Any thoughts? Help would be deeply appreciated!

    bug 
    opened by noahlh 10
  • chore: update release script

    chore: update release script

    IMPORTANT: Please do not create a Pull Request without creating an issue first.

    Any change needs to be discussed before proceeding. Failure to do so may result in the rejection of the pull request.

    What kind of change does this PR introduce?

    Does this PR introduce a breaking change?

    What needs to be documented once your changes are merged?

    Closing issues

    Put closes #XXXX in your comment to auto-close the issue that your PR fixes (if such).

    opened by Optimox 0
  • chore(deps): update python:3.7-slim-buster docker digest

    chore(deps): update python:3.7-slim-buster docker digest

    Mend Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | python | final | digest | 50de4af -> 70cf8d0 | | python | docker | digest | fecbb1a -> 2b017ac |


    Configuration

    📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

    👻 Immortal: This PR will be recreated if closed unmerged. Get config help if that's undesired.


    • [ ] If you want to rebase/retry this PR, check this box

    This PR has been generated by Mend Renovate. View repository job log here.

    deps 
    opened by renovate[bot] 0
Releases(v4.0)
  • v4.0(Sep 14, 2022)

    What's Changed

    • fixes minor README typos and improves readability and consistency by @discdiver in https://github.com/dreamquark-ai/tabnet/pull/272
    • feat: raise error in case cat_dims and cat_idxs are incoherent by @eduardocarvp in https://github.com/dreamquark-ai/tabnet/pull/289
    • feat: pretraining matches paper by @Optimox in https://github.com/dreamquark-ai/tabnet/pull/302
    • docs: add conventional commits to readme by @eduardocarvp in https://github.com/dreamquark-ai/tabnet/pull/286
    • WIP fix: update gpg key in docker file gpu by @Optimox in https://github.com/dreamquark-ai/tabnet/pull/339
    • feat-312: replace prints by warnings by @Optimox in https://github.com/dreamquark-ai/tabnet/pull/337
    • fix: custom loss using inplace operations by @eduardocarvp in https://github.com/dreamquark-ai/tabnet/pull/323
    • feat: disable tests in docker file gpu to save CI time by @Optimox in https://github.com/dreamquark-ai/tabnet/pull/342
    • feat/336 : check if pandas df and drop_last default to True by @Optimox in https://github.com/dreamquark-ai/tabnet/pull/338
    • feat: add warm_start matching scikit-learn by @Optimox in https://github.com/dreamquark-ai/tabnet/pull/340
    • Added conda install instruction by @sugatoray in https://github.com/dreamquark-ai/tabnet/pull/347
    • Bugfix - pretraining oom by @eduardocarvp in https://github.com/dreamquark-ai/tabnet/pull/348
    • fix: feature importance not dependent from dataloader by @Optimox in https://github.com/dreamquark-ai/tabnet/pull/372
    • feat: add augmentations inside the fit method by @Optimox in https://github.com/dreamquark-ai/tabnet/pull/363
    • chore: update dockerfile_gpu to latest version by @Hartorn in https://github.com/dreamquark-ai/tabnet/pull/409
    • fix: README patience to 10 by @Optimox in https://github.com/dreamquark-ai/tabnet/pull/396
    • chore: fix release scripts by @Hartorn in https://github.com/dreamquark-ai/tabnet/pull/412

    New Contributors

    • @discdiver made their first contribution in https://github.com/dreamquark-ai/tabnet/pull/272
    • @sugatoray made their first contribution in https://github.com/dreamquark-ai/tabnet/pull/347

    Full Changelog: https://github.com/dreamquark-ai/tabnet/compare/v3.1.1...v4.0

    Source code(tar.gz)
    Source code(zip)
  • v3.1.1(Feb 2, 2021)

  • v3.1.0(Jan 12, 2021)

  • v3.0.0(Dec 15, 2020)

  • v2.0.1(Oct 15, 2020)

  • v2.0.0(Oct 13, 2020)

  • v1.2.0(Jul 1, 2020)

  • v1.1.0(Jun 2, 2020)

  • v1.0.5(Mar 13, 2020)

  • v1.0.4(Feb 28, 2020)

  • v1.0.3(Feb 7, 2020)

  • v1.0.2(Feb 3, 2020)

  • v1.0.1(Jan 20, 2020)

  • v1.0.0(Dec 3, 2019)

    Bug Fixes

    • deps: update dependency numpy to v1.17.3 (eff6555)
    • deps: update dependency numpy to v1.17.4 (a80cf29)
    • deps: update dependency torch to v1.3.1 (18ec79b)
    • deps: update dependency tqdm to v4.37.0 (f8f04e7)
    • deps: update dependency tqdm to v4.38.0 (0bf45d2)
    • functional balanced version (fab7f16)
    • remove torch warnings (index should be bool) (f5817cf)

    Features

    • add gpu dockerfile and adapt makefile (8d14406)
    • update notebooks for new model format (43e2693)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Nov 6, 2019)

Owner
DreamQuark
Sharpen decisions in Financial Services with explainable Deep Learning
DreamQuark
A PyTorch implementation of EfficientNet

EfficientNet PyTorch Quickstart Install with pip install efficientnet_pytorch and load a pretrained EfficientNet with: from efficientnet_pytorch impor

Luke Melas-Kyriazi 7.2k Jan 06, 2023
Pytorch bindings for Fortran

Pytorch bindings for Fortran

Dmitry Alexeev 46 Dec 29, 2022
PyTorch Extension Library of Optimized Scatter Operations

PyTorch Scatter Documentation This package consists of a small extension library of highly optimized sparse update (scatter and segment) operations fo

Matthias Fey 1.2k Jan 07, 2023
A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

🤗 Accelerate was created for PyTorch users who like to write the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.

Hugging Face 3.5k Jan 08, 2023
A PyTorch implementation of L-BFGS.

PyTorch-LBFGS: A PyTorch Implementation of L-BFGS Authors: Hao-Jun Michael Shi (Northwestern University) and Dheevatsa Mudigere (Facebook) What is it?

Hao-Jun Michael Shi 478 Dec 27, 2022
3D-RETR: End-to-End Single and Multi-View3D Reconstruction with Transformers

3D-RETR: End-to-End Single and Multi-View 3D Reconstruction with Transformers (BMVC 2021) Zai Shi*, Zhao Meng*, Yiran Xing, Yunpu Ma, Roger Wattenhofe

Zai Shi 36 Dec 21, 2022
ocaml-torch provides some ocaml bindings for the PyTorch tensor library.

ocaml-torch provides some ocaml bindings for the PyTorch tensor library. This brings to OCaml NumPy-like tensor computations with GPU acceleration and tape-based automatic differentiation.

Laurent Mazare 369 Jan 03, 2023
PyGCL: Graph Contrastive Learning Library for PyTorch

PyGCL is an open-source library for graph contrastive learning (GCL), which features modularized GCL components from published papers, standardized evaluation, and experiment management.

GCL: Graph Contrastive Learning Library for PyTorch 592 Jan 07, 2023
PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

README TabNet : Attentive Interpretable Tabular Learning This is a pyTorch implementation of Tabnet (Arik, S. O., & Pfister, T. (2019). TabNet: Attent

DreamQuark 2k Dec 27, 2022
A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

169 Dec 23, 2022
Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

Lambda Networks - Pytorch Implementation of λ Networks, a new approach to image recognition that reaches SOTA on ImageNet. The new method utilizes λ l

Phil Wang 1.5k Jan 07, 2023
High-level batteries-included neural network training library for Pytorch

Pywick High-Level Training framework for Pytorch Pywick is a high-level Pytorch training framework that aims to get you up and running quickly with st

382 Dec 06, 2022
Model summary in PyTorch similar to `model.summary()` in Keras

Keras style model.summary() in PyTorch Keras has a neat API to view the visualization of the model which is very helpful while debugging your network.

Shubham Chandel 3.7k Dec 29, 2022
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

878 Dec 30, 2022
PyTorch implementations of normalizing flow and its variants.

PyTorch implementations of normalizing flow and its variants.

Tatsuya Yatagawa 55 Dec 01, 2022
Implements pytorch code for the Accelerated SGD algorithm.

AccSGD This is the code associated with Accelerated SGD algorithm used in the paper On the insufficiency of existing momentum schemes for Stochastic O

205 Jan 02, 2023
An optimizer that trains as fast as Adam and as good as SGD.

AdaBound An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popula

LoLo 2.9k Dec 27, 2022
Code snippets created for the PyTorch discussion board

PyTorch misc Collection of code snippets I've written for the PyTorch discussion board. All scripts were testes using the PyTorch 1.0 preview and torc

461 Dec 26, 2022
The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.

Tensor Sensor See article Clarifying exceptions and visualizing tensor operations in deep learning code. One of the biggest challenges when writing co

Terence Parr 704 Dec 14, 2022
An implementation of Performer, a linear attention-based transformer, in Pytorch

Performer - Pytorch An implementation of Performer, a linear attention-based transformer variant with a Fast Attention Via positive Orthogonal Random

Phil Wang 900 Dec 22, 2022