deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Overview

deep-table

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Design

Architecture

As shown below, each pretraining/fine-tuning model is decomposed into two modules: Encoder and Head.

Encoder

Encoder has Embedding and Backbone.

  • Embedding makes continuous/categorical features tokenized or simply normalized.
  • Backbone processes the tokenized features.

Pretraining/Fine-tuning Head

Pretraining/Fine-tuning Head uses Encoder module for training.

Implemented Methods

Available Modules

Encoder - Embedding

  • FeatureEmbedding
  • TabTransformerEmbedding

Encoder - Backbone

  • MLPBackbone
  • FTTransformerBackbone
  • SAINTBackbone

Model - Head

  • MLPHeadModel

Model - Pretraining

  • DenoisingPretrainModel
  • SAINTPretrainModel
  • TabTransformerPretrainModel
  • VIMEPretrainModel

How To Use

Step 0. Install

python setup.py install

# Installation with pip
pip install -e .

Step 1. Define config.json

You have to define three configs at least.

  1. encoder
  2. model
  3. trainer

Minimum configurations are as follows:

from omegaconf import OmegaConf

encoder_config = OmegaConf.create({
    "embedding": {
        "name": "FeatureEmbedding",
    },
    "backbone": {
        "name": "FTTransformerBackbone",
    }
})

model_config = OmegaConf.create({
    "name": "MLPHeadModel"
})

trainer_config = OmegaConf.create({
    "max_epochs": 1,
})

Other parameters can be changed also by config.json if you want.

Step 2. Define Datamodule

from deep_table.data.data_module import TabularDatamodule


datamodule = TabularDatamodule(
    train=train_df,
    validation=val_df,
    test=test_df,
    task="binary",
    dim_out=1,
    categorical_cols=["education", "occupation", ...],
    continuous_cols=["age", "hours-per-week", ...],
    target=["income"],
    num_categories=110,
)

Step 3. Run Training

>> {'accuracy': array([0.8553...]), 'AUC': array([0.9111...]), 'F1 score': array([0.9077...]), 'cross_entropy': array([0.3093...])} ">
from deep_table.estimators.base import Estimator
from deep_table.utils import get_scores


estimator = Estimator(
    encoder_config,      # Encoder architecture
    model_config,        # model settings (learning rate, scheduler...)
    trainer_config,      # training settings (epoch, gpu...)
)

estimator.fit(datamodule)
predict = estimator.predict(datamodule.dataloader(split="test"))
get_scores(predict, target, task="binary")
>>> {'accuracy': array([0.8553...]),
     'AUC': array([0.9111...]),
     'F1 score': array([0.9077...]),
     'cross_entropy': array([0.3093...])}

If you want to train a model with pretraining, write as follows:

from deep_table.estimators.base import Estimator
from deep_table.utils import get_scores


pretrain_model_config = OmegaConf.create({
    "name": "SAINTPretrainModel"
})

pretrain_model = Estimator(encoder_config, pretrain_model_config, trainer_config)
pretrain_model.fit(datamodule)

estimator = Estimator(encoder_config, model_config, trainer_config)
estimator.fit(datamodule, from_pretrained=pretrain_model)

See notebooks/train_adult.ipynb for more details.

Custom Datasets

You can use your own datasets.

  1. Prepare datasets and create DataFrame
  2. Preprocess DataFrame
  3. Create your own datamodules using TabularDatamodule

Example code is shown below.

import pandas as pd

import os,sys; sys.path.append(os.path.abspath(".."))
from deep_table.data.data_module import TabularDatamodule
from deep_table.preprocess import CategoryPreprocessor


# 0. Prepare datasets and create DataFrame
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

# 1. Preprocessing pd.DataFrame
category_preprocesser = CategoryPreprocessor(categorical_columns=["species"], use_unk=False)
iris = category_preprocesser.fit_transform(iris)

# 2. TabularDatamodule
datamodule = TabularDatamodule(
    train=iris.iloc[:20],
    val=iris.iloc[20:40],
    test=iris.iloc[40:],
    task="multiclass",
    dim_out=3,
    categorical_columns=[],
    continuous_columns=["sepal_length", "sepal_width", "petal_length", "petal_width"],
    target=["species"],
    num_categories=0,
)

See notebooks/custom_dataset.ipynb for the full training example.

Custom Models

You can also use your Embedding/Backbone/Model. Set arguments as shown below.

estimator = Estimator(
    encoder_config, model_config, trainer_config,
    custom_embedding=YourEmbedding, custom_backbone=YourBackbone, custom_model=YourModel
)

If custom models are set, the attributes name in corresponding configs will be overwritten.

See notebooks/custom_model.ipynb for more details.

Vision Transformer for 3D medical image registration (Pytorch).

ViT-V-Net: Vision Transformer for Volumetric Medical Image Registration keywords: vision transformer, convolutional neural networks, image registratio

Junyu Chen 192 Dec 20, 2022
Shared Attention for Multi-label Zero-shot Learning

Shared Attention for Multi-label Zero-shot Learning Overview This repository contains the implementation of Shared Attention for Multi-label Zero-shot

dathuynh 26 Dec 14, 2022
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 3.1k Jan 02, 2023
Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

NLP_0-project Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures1. We are a "democratic" and c

3 Mar 16, 2022
A model that attempts to learn and benefit from data collected on card counting.

A model that attempts to learn and benefit from data collected on card counting. A decision tree like model is built to win more often than loose and increase the bet of the player appropriately to c

1 Dec 17, 2021
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
Code for Paper "Evidential Softmax for Sparse MultimodalDistributions in Deep Generative Models"

Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models Abstract Many applications of generative models rely on the marginali

Stanford Intelligent Systems Laboratory 9 Jun 06, 2022
Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

Jungbeom Lee 81 Dec 16, 2022
VoxHRNet - Whole Brain Segmentation with Full Volume Neural Network

VoxHRNet This is the official implementation of the following paper: Whole Brain Segmentation with Full Volume Neural Network Yeshu Li, Jonathan Cui,

Microsoft 12 Nov 24, 2022
NAACL2021 - COIL Contextualized Lexical Retriever

COIL Repo for our NAACL paper, COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. The code covers learning

Luyu Gao 108 Dec 31, 2022
[SIGGRAPH 2021 Asia] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning

DeepVecFont This is the official Pytorch implementation of the paper: Yizhi Wang and Zhouhui Lian. DeepVecFont: Synthesizing High-quality Vector Fonts

Yizhi Wang 146 Dec 18, 2022
Bayesian Optimization using GPflow

Note: This package is for use with GPFlow 1. For Bayesian optimization using GPFlow 2 please see Trieste, a joint effort with Secondmind. GPflowOpt GP

GPflow 257 Dec 26, 2022
paper: Hyperspectral Remote Sensing Image Classification Using Deep Convolutional Capsule Network

DC-CapsNet This is a tensorflow and keras based implementation of DC-CapsNet for HSI in the Remote Sensing Letters R. Lei et al., "Hyperspectral Remot

LEI 7 Nov 29, 2022
Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

tflite2tensorflow Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite. 1. Supported Layers No. TFLite Layer TF

Katsuya Hyodo 214 Dec 29, 2022
Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021) PyTorch implementation of Learning RAW-to-sRGB Mappings with Inaccurat

Zhilu Zhang 53 Dec 20, 2022
FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes This repository contains the source code accompanying the paper: FlexConv: C

Robert-Jan Bruintjes 96 Dec 12, 2022
Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

Cheng Zhang 149 Jan 08, 2023
A fast Protein Chain / Ligand Extractor and organizer.

Are you tired of using visualization software, or full blown suites just to separate protein chains / ligands ? Are you tired of organizing the mess o

Amine Abdz 9 Nov 06, 2022
This repo is to present various code demos on how to use our Graph4NLP library.

Deep Learning on Graphs for Natural Language Processing Demo The repository contains code examples for DLG4NLP tutorials at NAACL 2021, SIGIR 2021, KD

Graph4AI 143 Dec 23, 2022
This is an implementation for the CVPR2020 paper "Learning Invariant Representation for Unsupervised Image Restoration"

Learning Invariant Representation for Unsupervised Image Restoration (CVPR 2020) Introduction This is an implementation for the paper "Learning Invari

GarField 88 Nov 07, 2022