MolRep: A Deep Representation Learning Library for Molecular Property Prediction

Related tags

Deep LearningMolRep
Overview

MolRep: A Deep Representation Learning Library for Molecular Property Prediction

Summary

MolRep is a Python package for fairly measuring algorithmic progress on chemical property datasets. It currently provides a complete re-evaluation of 16 state-of-the-art deep representation models over 16 benchmark property datsaets.

architecture

If you found this package useful, please cite biorxiv for now:


Install & Usage

We provide a script to install the environment. You will need the conda package manager, which can be installed from here.

To install the required packages, follow there instructions (tested on a linux terminal):

  1. clone the repository

    git clone https://github.com/Jh-SYSU/MolRep

  2. cd into the cloned directory

    cd MolRep

  3. run the install script

    source install.sh [<your_cuda_version>]

Where <your_cuda_version> is an optional argument that can be either cpu, cu92, cu100, cu101. If you do not provide a cuda version, the script will default to cpu. The script will create a virtual environment named MolRep, with all the required packages needed to run our code. Important: do NOT run this command using bash instead of source!

Data

Data could be download from Google_Driver

Current Dataset

Dataset Task Task type #Molecule Splits Metric Reference
QM7 1 Regression 7160 Stratified MAE Wu et al.
QM8 12 Regression 21786 Random MAE Wu et al.
QM9 12 Regression 133885 Random MAE Wu et al.
ESOL 1 Regression 1128 Random RMSE Wu et al.
FreeSolv 1 Regression 642 Random RMSE Wu et al.
Lipophilicity 1 Regression 4200 Random RMSE Wu et al.
BBBP 1 Classification 2039 Scaffold ROC-AUC Wu et al.
Tox21 12 Classification 7831 Random ROC-AUC Wu et al.
SIDER 27 Classification 1427 Random ROC-AUC Wu et al.
ClinTox 2 Classification 1478 Random ROC-AUC Wu et al.
Liver injury 1 Classification 2788 Random ROC-AUC Xu et al.
Mutagenesis 1 Classification 6511 Random ROC-AUC Hansen et al.
hERG 1 Classification 4813 Random ROC-AUC Li et al.
MUV 17 Classification 93087 Random PRC-AUC Wu et al.
HIV 1 Classification 41127 Random ROC-AUC Wu et al.
BACE 1 Classification 1513 Random ROC-AUC Wu et al.

Methods

Current Methods

Self-/unsupervised Models

Methods Descriptions Reference
Mol2Vec Mol2Vec is an unsupervised approach to learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Jaeger et al.
N-Gram graph N-gram graph is a simple unsupervised representation for molecules that first embeds the vertices in the molecule graph and then constructs a compact representation for the graph by assembling the ver-tex embeddings in short walks in the graph. Liu et al.
FP2Vec FP2Vec is a molecular featurizer that represents a chemical compound as a set of trainable embedding vectors and combine with CNN model. Jeon et al.
VAE VAE is a framework for training two neural networks (encoder and decoder) to learn a mapping from high-dimensional molecular representation into a lower-dimensional space. Kingma et al.

Sequence Models

Methods Descriptions Reference
BiLSTM BiLSTM is an artificial recurrent neural network (RNN) architecture to encoding sequences from compound SMILES strings. Hochreiter et al.
SALSTM SALSTM is a self-attention mechanism with improved BiLSTM for molecule representation. Zheng et al
Transformer Transformer is a network based solely on attention mechanisms and dispensing with recurrence and convolutions entirely to encodes compound SMILES strings. Vaswani et al.
MAT MAT is a molecule attention transformer utilized inter-atomic distances and the molecular graph structure to augment the attention mechanism. Maziarka et al.

Graph Models

Methods Descriptions Reference
DGCNN DGCNN is a deep graph convolutional neural network that proposes a graph convolution model with SortPooling layer which sorts graph vertices in a consistent order to learning the embedding of molec-ular graph. Zhang et al.
GraphSAGE GraphSAGE is a framework for inductive representation learning on molecular graphs that used to generate low-dimensional representations for atoms and performs sum, mean or max-pooling neigh-borhood aggregation to updates the atom representation and molecular representation. Hamilton et al.
GIN GIN is the Graph Isomorphism Network that builds upon the limitations of GraphSAGE to capture different graph structures with the Weisfeiler-Lehman graph isomorphism test. Xu et al.
ECC ECC is an Edge-Conditioned Convolution Network that learns a different parameter for each edge label (bond type) on the molecular graph, and neighbor aggregation is weighted according to specific edge parameters. Simonovsky et al.
DiffPool DiffPool combines a differentiable graph encoder with its an adaptive pooling mechanism that col-lapses nodes on the basis of a supervised criterion to learning the representation of molecular graphs. Ying et al.
MPNN MPNN is a message-passing graph neural network that learns the representation of compound molecular graph. It mainly focused on obtaining effective vertices (atoms) embedding Gilmer et al.
D-MPNN DMPNN is another message-passing graph neural network that messages associated with directed edges (bonds) rather than those with vertices. It can make use of the bond attributes. Yang et al.
CMPNN CMPNN is the graph neural network that improve the molecular graph embedding by strengthening the message interactions between edges (bonds) and nodes (atoms). Song et al.

Training

To train a model by K-fold, run 5-fold-training_example.ipynb.

Testing

To test a pretrained model, run testing-example.ipynb.

Results

Results on Classification Tasks.

Datasets BBBP Tox21 SIDER ClinTox MUV HIV BACE
Mol2Vec 0.9213±0.0052 0.8139±0.0081 0.6043±0.0061 0.8572±0.0054 0.1178±0.0032 0.8413±0.0047 0.8284±0.0023
N-Gram graph 0.9012±0.0385 0.8371±0.0421 0.6482±0.0437 0.8753±0.0077 0.1011±0.0000 0.8378±0.0034 0.8472±0.0057
FP2Vec 0.8076±0.0032 0.8578±0.0076 0.6678±0.0068 0.8834±0.0432 0.0856±0.0031 0.7894±0.0052 0.8129±0.0492
VAE 0.8378±0.0031 0.8315±0.0382 0.6493±0.0762 0.8674±0.0124 0.0794±0.0001 0.8109±0.0381 0.8368±0.0762
BiLSTM 0.8391±0.0032 0.8279±0.0098 0.6092±0.0303 0.8319±0.0120 0.0382±0.0000 0.7962±0.0098 0.8263±0.0031
SALSTM 0.8482±0.0329 0.8253±0.0031 0.6308±0.0036 0.8317±0.0003 0.0409±0.0000 0.8034±0.0128 0.8348±0.0019
Transformer 0.9610±0.0119 0.8129±0.0013 0.6017±0.0012 0.8572±0.0032 0.0716±0.0017 0.8372±0.0314 0.8407±0.0738
MAT 0.9620±0.0392 0.8393±0.0039 0.6276±0.0029 0.8777±0.0149 0.0913±0.0001 0.8653±0.0054 0.8519±0.0504
DGCNN 0.9311±0.0434 0.7992±0.0057 0.6007±0.0053 0.8302±0.0126 0.0438±0.0000 0.8297±0.0038 0.8361±0.0034
GraphSAGE 0.9630±0.0474 0.8166±0.0041 0.6403±0.0045 0.9116±0.0146 0.1145±0.0000 0.8705±0.0724 0.9316±0.0360
GIN 0.8746±0.0359 0.8178±0.0031 0.5904±0.0000 0.8842±0.0004 0.0832±0.0000 0.8015±0.0328 0.8275±0.0034
ECC 0.9620±0.0003 0.8677±0.0090 0.6750±0.0092 0.8862±0.0831 0.1308±0.0013 0.8733±0.0025 0.8419±0.0092
DiffPool 0.8732±0.0391 0.8012±0.0130 0.6087±0.0130 0.8345±0.0233 0.0934±0.0001 0.8452±0.0042 0.8592±0.0391
MPNN 0.9321±0.0312 0.8440±0.014 0.6313±0.0121 0.8414±0.0294 0.0572±0.0001 0.8032±0.0092 0.8493±0.0013
DMPNN 0.9562±0.0070 0.8429±0.0391 0.6378±0.0329 0.8692±0.0051 0.0867±0.0032 0.8137±0.0072 0.8678±0.0372
CMPNN 0.9854±0.0215 0.8593±0.0088 0.6581±0.0020 0.9169±0.0065 0.1435±0.0002 0.8687±0.0003 0.8932±0.0019

More results will be updated soon.

Owner
AI-Health @NSCC-gz
AI-Health @NSCC-gz
Empowering journalists and whistleblowers

Onymochat Empowering journalists and whistleblowers Onymochat is an end-to-end encrypted, decentralized, anonymous chat application. You can also host

Samrat Dutta 19 Sep 02, 2022
The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Box-Aware Tracker (BAT) Pytorch-Lightning implementation of the Box-Aware Tracker. Box-Aware Feature Enhancement for Single Object Tracking on Point C

Kangel Zenn 5 Mar 26, 2022
Geneva is an artificial intelligence tool that defeats censorship by exploiting bugs in censors

Geneva is an artificial intelligence tool that defeats censorship by exploiting bugs in censors

Kevin Bock 1.5k Jan 06, 2023
A Keras implementation of YOLOv4 (Tensorflow backend)

keras-yolo4 请使用更完善的版本: https://github.com/miemie2013/Keras-YOLOv4 Please visit here for more complete model: https://github.com/miemie2013/Keras-YOLOv

384 Nov 29, 2022
AI Flow is an open source framework that bridges big data and artificial intelligence.

Flink AI Flow Introduction Flink AI Flow is an open source framework that bridges big data and artificial intelligence. It manages the entire machine

144 Dec 30, 2022
Python version of the amazing Reaction Mechanism Generator (RMG).

Reaction Mechanism Generator (RMG) Description This repository contains the Python version of Reaction Mechanism Generator (RMG), a tool for automatic

Reaction Mechanism Generator 284 Dec 27, 2022
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

SynSense 21 Dec 14, 2022
Reproducing code of hair style replacement method from Barbershorp.

Barbershorp Reproducing code of hair style replacement method from Barbershorp. Also reproduces II2S, an improved version of Image2StyleGAN. Requireme

1 Dec 24, 2021
Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

Improving Contrastive Learning by Visualizing Feature Transformation This project hosts the codes, models and visualization tools for the paper: Impro

Bingchen Zhao 83 Dec 15, 2022
Rule Based Classification Project For Python

Rule-Based-Classification-Project (ENG) Business Problem: A game company wants to create new level-based customer definitions (personas) by using some

Deniz Can OĞUZ 4 Oct 29, 2022
ADB-IP-ROTATION - Use your mobile phone to gain a temporary IP address using ADB and data tethering

ADB IP ROTATE This an Python script based on Android Debug Bridge (adb) shell sc

Dor Bismuth 2 Jul 12, 2022
Official PyTorch Implementation of Embedding Transfer with Label Relaxation for Improved Metric Learning, CVPR 2021

Embedding Transfer with Label Relaxation for Improved Metric Learning Official PyTorch implementation of CVPR 2021 paper Embedding Transfer with Label

Sungyeon Kim 37 Dec 06, 2022
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Welcome to AirSim AirSim is a simulator for drones, cars and more, built on Unreal Engine (we now also have an experimental Unity release). It is open

Microsoft 13.8k Jan 05, 2023
Code for the Paper: Conditional Variational Capsule Network for Open Set Recognition

Conditional Variational Capsule Network for Open Set Recognition This repository hosts the official code related to "Conditional Variational Capsule N

Guglielmo Camporese 35 Nov 21, 2022
Anderson Acceleration for Deep Learning

Anderson Accelerated Deep Learning (AADL) AADL is a Python package that implements the Anderson acceleration to speed-up the training of deep learning

Oak Ridge National Laboratory 7 Nov 24, 2022
Local Attention - Flax module for Jax

Local Attention - Flax Autoregressive Local Attention - Flax module for Jax Install $ pip install local-attention-flax Usage from jax import random fr

Phil Wang 16 Jun 16, 2022
This folder contains the implementation of the multi-relational attribute propagation algorithm.

MrAP This folder contains the implementation of the multi-relational attribute propagation algorithm. It requires the package pytorch-scatter. Please

6 Dec 06, 2022
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 07, 2023
PyTorch implementation of paper "StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement" (ICCV 2021 Oral)

StarEnhancer StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement (ICCV 2021 Oral) Abstract: Image enhancement is a subjective process w

IDKiro 133 Dec 28, 2022
Evolution Strategies in PyTorch

Evolution Strategies This is a PyTorch implementation of Evolution Strategies. Requirements Python 3.5, PyTorch = 0.2.0, numpy, gym, universe, cv2 Wh

Andrew Gambardella 333 Nov 14, 2022