deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Overview

deep-table

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Design

Architecture

As shown below, each pretraining/fine-tuning model is decomposed into two modules: Encoder and Head.

Encoder

Encoder has Embedding and Backbone.

  • Embedding makes continuous/categorical features tokenized or simply normalized.
  • Backbone processes the tokenized features.

Pretraining/Fine-tuning Head

Pretraining/Fine-tuning Head uses Encoder module for training.

Implemented Methods

Available Modules

Encoder - Embedding

  • FeatureEmbedding
  • TabTransformerEmbedding

Encoder - Backbone

  • MLPBackbone
  • FTTransformerBackbone
  • SAINTBackbone

Model - Head

  • MLPHeadModel

Model - Pretraining

  • DenoisingPretrainModel
  • SAINTPretrainModel
  • TabTransformerPretrainModel
  • VIMEPretrainModel

How To Use

Step 0. Install

python setup.py install

# Installation with pip
pip install -e .

Step 1. Define config.json

You have to define three configs at least.

  1. encoder
  2. model
  3. trainer

Minimum configurations are as follows:

from omegaconf import OmegaConf

encoder_config = OmegaConf.create({
    "embedding": {
        "name": "FeatureEmbedding",
    },
    "backbone": {
        "name": "FTTransformerBackbone",
    }
})

model_config = OmegaConf.create({
    "name": "MLPHeadModel"
})

trainer_config = OmegaConf.create({
    "max_epochs": 1,
})

Other parameters can be changed also by config.json if you want.

Step 2. Define Datamodule

from deep_table.data.data_module import TabularDatamodule


datamodule = TabularDatamodule(
    train=train_df,
    validation=val_df,
    test=test_df,
    task="binary",
    dim_out=1,
    categorical_cols=["education", "occupation", ...],
    continuous_cols=["age", "hours-per-week", ...],
    target=["income"],
    num_categories=110,
)

Step 3. Run Training

>> {'accuracy': array([0.8553...]), 'AUC': array([0.9111...]), 'F1 score': array([0.9077...]), 'cross_entropy': array([0.3093...])} ">
from deep_table.estimators.base import Estimator
from deep_table.utils import get_scores


estimator = Estimator(
    encoder_config,      # Encoder architecture
    model_config,        # model settings (learning rate, scheduler...)
    trainer_config,      # training settings (epoch, gpu...)
)

estimator.fit(datamodule)
predict = estimator.predict(datamodule.dataloader(split="test"))
get_scores(predict, target, task="binary")
>>> {'accuracy': array([0.8553...]),
     'AUC': array([0.9111...]),
     'F1 score': array([0.9077...]),
     'cross_entropy': array([0.3093...])}

If you want to train a model with pretraining, write as follows:

from deep_table.estimators.base import Estimator
from deep_table.utils import get_scores


pretrain_model_config = OmegaConf.create({
    "name": "SAINTPretrainModel"
})

pretrain_model = Estimator(encoder_config, pretrain_model_config, trainer_config)
pretrain_model.fit(datamodule)

estimator = Estimator(encoder_config, model_config, trainer_config)
estimator.fit(datamodule, from_pretrained=pretrain_model)

See notebooks/train_adult.ipynb for more details.

Custom Datasets

You can use your own datasets.

  1. Prepare datasets and create DataFrame
  2. Preprocess DataFrame
  3. Create your own datamodules using TabularDatamodule

Example code is shown below.

import pandas as pd

import os,sys; sys.path.append(os.path.abspath(".."))
from deep_table.data.data_module import TabularDatamodule
from deep_table.preprocess import CategoryPreprocessor


# 0. Prepare datasets and create DataFrame
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

# 1. Preprocessing pd.DataFrame
category_preprocesser = CategoryPreprocessor(categorical_columns=["species"], use_unk=False)
iris = category_preprocesser.fit_transform(iris)

# 2. TabularDatamodule
datamodule = TabularDatamodule(
    train=iris.iloc[:20],
    val=iris.iloc[20:40],
    test=iris.iloc[40:],
    task="multiclass",
    dim_out=3,
    categorical_columns=[],
    continuous_columns=["sepal_length", "sepal_width", "petal_length", "petal_width"],
    target=["species"],
    num_categories=0,
)

See notebooks/custom_dataset.ipynb for the full training example.

Custom Models

You can also use your Embedding/Backbone/Model. Set arguments as shown below.

estimator = Estimator(
    encoder_config, model_config, trainer_config,
    custom_embedding=YourEmbedding, custom_backbone=YourBackbone, custom_model=YourModel
)

If custom models are set, the attributes name in corresponding configs will be overwritten.

See notebooks/custom_model.ipynb for more details.

Code for the SIGGRAPH 2022 paper "DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds."

DeltaConv [Paper] [Project page] Code for the SIGGRAPH 2022 paper "DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds" by Ru

98 Nov 26, 2022
Image process framework based on plugin like imagej, it is esay to glue with scipy.ndimage, scikit-image, opencv, simpleitk, mayavi...and any libraries based on numpy

Introduction ImagePy is an open source image processing framework written in Python. Its UI interface, image data structure and table data structure a

ImagePy 1.2k Dec 29, 2022
A novel Engagement Detection with Multi-Task Training (ED-MTT) system

A novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes MSE and triplet loss together to determine the engagement level of students in an e-learning environment.

Onur Çopur 12 Nov 11, 2022
Breast cancer is been classified into benign tumour and malignant tumour.

Breast cancer is been classified into benign tumour and malignant tumour. Logistic regression is applied in this model.

1 Feb 04, 2022
Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Fine-tuning StyleGAN2 for Cartoon Face Generation

Jihye Back 520 Jan 04, 2023
we propose EfficientDerain for high-efficiency single-image deraining

EfficientDerain we propose EfficientDerain for high-efficiency single-image deraining Requirements python 3.6 pytorch 1.6.0 opencv-python 4.4.0.44 sci

Qing Guo 126 Dec 07, 2022
Neural network for recognizing the gender of people in photos

Neural Network For Gender Recognition How to test it? Install requirements.txt file using pip install -r requirements.txt command Run nn.py using pyth

Valery Chapman 1 Sep 18, 2022
Regulatory Instruments for Fair Personalized Pricing.

Fair pricing Source code for WWW 2022 paper Regulatory Instruments for Fair Personalized Pricing. Installation Requirements Linux with Python = 3.6 p

Renzhe Xu 6 Oct 26, 2022
a Lightweight library for sequential learning agents, including reinforcement learning

SaLinA: SaLinA - A Flexible and Simple Library for Learning Sequential Agents (including Reinforcement Learning) TL;DR salina is a lightweight library

Facebook Research 405 Dec 17, 2022
Fully-automated scripts for collecting AI-related papers

AI-Paper-collector Fully-automated scripts for collecting AI-related papers List of Conferences to crawel ACL: 21-19 (including findings) EMNLP: 21-19

Gordon Lee 776 Jan 08, 2023
An Approach to Explore Logistic Regression Models

User-centered Regression An Approach to Explore Logistic Regression Models This tool applies the potential of Attribute-RadViz in identifying correlat

0 Nov 12, 2021
Unofficial Tensorflow Implementation of ConvNeXt from A ConvNet for the 2020s

Tensorflow Implementation of "A ConvNet for the 2020s" This is the unofficial Tensorflow Implementation of ConvNeXt from "A ConvNet for the 2020s" pap

DK 11 Oct 12, 2022
Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

RfD-Net [Project Page] [Paper] [Video] RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction Yinyu Nie, Ji Hou, Xiaoguang Han, Matthi

Yinyu Nie 162 Jan 06, 2023
SegNet model implemented using keras framework

keras-segnet Implementation of SegNet-like architecture using keras. Current version doesn't support index transferring proposed in SegNet article, so

185 Aug 30, 2022
DenseNet Implementation in Keras with ImageNet Pretrained Models

DenseNet-Keras with ImageNet Pretrained Models This is an Keras implementation of DenseNet with ImageNet pretrained weights. The weights are converted

Felix Yu 568 Oct 31, 2022
Artifacts for paper "MMO: Meta Multi-Objectivization for Software Configuration Tuning"

MMO: Meta Multi-Objectivization for Software Configuration Tuning This repository contains the data and code for the following paper that is currently

0 Nov 17, 2021
Data manipulation and transformation for audio signal processing, powered by PyTorch

torchaudio: an audio library for PyTorch The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the

1.9k Dec 28, 2022
Unofficial implementation of the Involution operation from CVPR 2021

involution_pytorch Unofficial PyTorch implementation of "Involution: Inverting the Inherence of Convolution for Visual Recognition" by Li et al. prese

Rishabh Anand 46 Dec 07, 2022
Churn prediction

Churn-prediction Churn-prediction Data preprocessing:: Label encoder is used to normalize the categorical variable Data Transformation:: For each data

1 Sep 28, 2022
Python implementation of MULTIseq barcode alignment using fuzzy string matching and GMM barcode assignment

Python implementation of MULTIseq barcode alignment using fuzzy string matching and GMM barcode assignment.

MT Schmitz 2 Feb 11, 2022