This is the code used in the paper "Entity Embeddings of Categorical Variables".

Last update: Nov 29, 2022

Overview

This is the code used in the paper "Entity Embeddings of Categorical Variables". If you want to get the original version of the code used for the Kaggle competition, please use the Kaggle branch.

To run the code one needs first download and unzip the train.csv and store.csv files on Kaggle and put them in this folder.

If you use Anaconda you can install the dependecies like the following example:

conda create --name ee python=3.7 pip
conda activate ee
pip install scikit-learn xgboost tensorflow keras jupyter matplotlib

Please refer to Keras for more details regarding how to install keras.

Next, run the following scripts to extract the csv files and prepare the features:

python3 extract_csv_files.py
python3 prepare_features.py

To run the models:

python3 train_test_model.py

You can anaylize the embeddings with plot_embeddings.ipynb. For example, the following are the learned embeeding of German States printed in 2D and the map of Germany side by side. Considering the algorithm knows nothing about German geography the remarkable resemblance between the two demonstrates the power of the algorithm for abductive reasoning. I expect entity embedding will be a very useful tool to study the relationship of genome, proteins, drugs, diseases and I would love to see its applications in biology and medicine one day.

Visualizaiton of Entity Embedding of German States in 2D	Map of Germany

This is the code used in the paper "Entity Embeddings of Categorical Variables".

Related tags

Overview

Owner

Cheng Guo

Sematic-Segmantation - Semantic Segmentation on MIT ADE20K dataset in PyTorch

Predicting 10 different clothing types using Xception pre-trained model.

Implementation of the SUMO (Slim U-Net trained on MODA) model

Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

PyTorch implementations of neural network models for keyword spotting

Automatically erase objects in the video, such as logo, text, etc.

HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events globally on daily to subseasonal timescales.

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

A 1.3B text-to-image generation model trained on 14 million image-text pairs

Code for the paper "Can Active Learning Preemptively Mitigate Fairness Issues?" presented at RAI 2021.

A paper using optimal transport to solve the graph matching problem.

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

Neighborhood Contrastive Learning for Novel Class Discovery

SuperSonic, a new open-source framework to allow compiler developers to integrate RL into compilers easily, regardless of their RL expertise

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

TensorFlow-based implementation of "Pyramid Scene Parsing Network".

Progressive Growing of GANs for Improved Quality, Stability, and Variation

pytorch implementation of dftd2 & dftd3