Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

Related tags

Deep LearningConfGF
Overview

ConfGF


License: MIT

[PDF] | [Slides]

The official implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021 Long talk)

Installation

Install via Conda (Recommended)

# Clone the environment
conda env create -f env.yml

# Activate the environment
conda activate confgf

# Install Library
git clone https://github.com/DeepGraphLearning/ConfGF.git
cd ConfGF
python setup.py install

Install Manually

# Create conda environment
conda create -n confgf python=3.7

# Activate the environment
conda activate confgf

# Install packages
conda install -y -c pytorch pytorch=1.7.0 torchvision torchaudio cudatoolkit=10.2
conda install -y -c rdkit rdkit==2020.03.2.0
conda install -y scikit-learn pandas decorator ipython networkx tqdm matplotlib
conda install -y -c conda-forge easydict
pip install pyyaml

# Install PyTorch Geometric
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
pip install torch-geometric==1.6.3

# Install Library
git clone https://github.com/DeepGraphLearning/ConfGF.git
cd ConfGF
python setup.py install

Dataset

Offical Dataset

The offical raw GEOM dataset is avaiable [here].

Preprocessed dataset

We provide the preprocessed datasets (GEOM, ISO17) in a [google drive folder]. For ISO17 dataset, we use the default split of [GraphDG].

Prepare your own GEOM dataset from scratch (optional)

Download the raw GEOM dataset and unpack it.

tar xvf ~/rdkit_folder.tar.gz -C ~/GEOM

Preprocess the raw GEOM dataset.

python script/process_GEOM_dataset.py --base_path GEOM --dataset_name qm9 --confmin 50 --confmax 500
python script/process_GEOM_dataset.py --base_path GEOM --dataset_name drugs --confmin 50 --confmax 100

The final folder structure will look like this:

GEOM
|___rdkit_folder  # raw dataset
|   |___qm9 # raw qm9 dataset
|   |___drugs # raw drugs dataset
|   |___summary_drugs.json
|   |___summary_qm9.json
|   
|___qm9_processed
|   |___train_data_40k.pkl
|   |___val_data_5k.pkl
|   |___test_data_200.pkl
|   
|___drugs_processed
|   |___train_data_39k.pkl
|   |___val_data_5k.pkl
|   |___test_data_200.pkl
|
iso17_processed
|___iso17_split-0_train_processed.pkl
|___iso17_split-0_test_processed.pkl
|
...

Training

All hyper-parameters and training details are provided in config files (./config/*.yml), and free feel to tune these parameters.

You can train the model with the following commands:

python -u script/train.py --config_path ./config/qm9_default.yml
python -u script/train.py --config_path ./config/drugs_default.yml
python -u script/train.py --config_path ./config/iso17_default.yml

The checkpoint of the models will be saved into a directory specified in config files.

Generation

We provide the checkpoints of three trained models, i.e., qm9_default, drugs_default and iso17_default in a [google drive folder].

You can generate conformations of a molecule by feeding its SMILES into the model:

python -u script/gen.py --config_path ./config/qm9_default.yml --generator ConfGF --smiles c1ccccc1
python -u script/gen.py --config_path ./config/qm9_default.yml --generator ConfGFDist --smiles c1ccccc1

Here we use the models trained on GEOM-QM9 to generate conformations for the benzene. The argument --generator indicates the type of the generator, i.e., ConfGF vs. ConfGFDist. See the ablation study (Table 5) in the original paper for more details.

You can also generate conformations for an entire test set.

python -u script/gen.py --config_path ./config/qm9_default.yml --generator ConfGF \
                        --start 0 --end 200 \

python -u script/gen.py --config_path ./config/qm9_default.yml --generator ConfGFDist \
                        --start 0 --end 200 \

python -u script/gen.py --config_path ./config/drugs_default.yml --generator ConfGF \
                        --start 0 --end 200 \

python -u script/gen.py --config_path ./config/drugs_default.yml --generator ConfGFDist \
                        --start 0 --end 200 \

Here start and end indicate the range of the test set that we want to use. All hyper-parameters related to generation can be set in config files.

Conformations of some drug-like molecules generated by ConfGF are provided below.

Get Results

The results of all benchmark tasks can be calculated based on generated conformations.

We report the results of each task in the following tables. Results of ConfGF and ConfGFDist are re-evaluated based on the current code base, which successfully reproduce the results reported in the original paper. Results of other models are taken directly from the original paper.

Task 1. Conformation Generation

The COV and MAT scores on the GEOM datasets can be calculated using the following commands:

python -u script/get_task1_results.py --input dir_of_QM9_samples --core 10 --threshold 0.5  

python -u script/get_task1_results.py --input dir_of_Drugs_samples --core 10 --threshold 1.25  

Table: COV and MAT scores on GEOM-QM9

QM9 COV-Mean (%) COV-Median (%) MAT-Mean (\AA) MAT-Median (\AA)
ConfGF 91.06 95.76 0.2649 0.2668
ConfGFDist 85.37 88.59 0.3435 0.3548
CGCF 78.05 82.48 0.4219 0.3900
GraphDG 73.33 84.21 0.4245 0.3973
CVGAE 0.09 0.00 1.6713 1.6088
RDKit 83.26 90.78 0.3447 0.2935

Table: COV and MAT scores on GEOM-Drugs

Drugs COV-Mean (%) COV-Median (%) MAT-Mean (\AA) MAT-Median (\AA)
ConfGF 62.54 71.32 1.1637 1.1617
ConfGFDist 49.96 48.12 1.2845 1.2827
CGCF 53.96 57.06 1.2487 1.2247
GraphDG 8.27 0.00 1.9722 1.9845
CVGAE 0.00 0.00 3.0702 2.9937
RDKit 60.91 65.70 1.2026 1.1252

Task 2. Distributions Over Distances

The MMD metrics on the ISO17 dataset can be calculated using the following commands:

python -u script/get_task2_results.py --input dir_of_ISO17_samples

Table: Distributions over distances

Method Single-Mean Single-Median Pair-Mean Pair-Median All-Mean All-Median
ConfGF 0.3430 0.2473 0.4195 0.3081 0.5432 0.3868
ConfGFDist 0.3348 0.2011 0.4080 0.2658 0.5821 0.3974
CGCF 0.4490 0.1786 0.5509 0.2734 0.8703 0.4447
GraphDG 0.7645 0.2346 0.8920 0.3287 1.1949 0.5485
CVGAE 4.1789 4.1762 4.9184 5.1856 5.9747 5.9928
RDKit 3.4513 3.1602 3.8452 3.6287 4.0866 3.7519

Visualizing molecules with PyMol

Start Setup

  1. pymol -R
  2. Display - Background - White
  3. Display - Color Space - CMYK
  4. Display - Quality - Maximal Quality
  5. Display Grid
    1. by object: use set grid_slot, int, mol_name to put the molecule into the corresponding slot
    2. by state: align all conformations in a single slot
    3. by object-state: align all conformations and put them in separate slots. (grid_slot dont work!)
  6. Setting - Line and Sticks - Ball and Stick on - Ball and Stick ratio: 1.5
  7. Setting - Line and Sticks - Stick radius: 0.2 - Stick Hydrogen Scale: 1.0

Show Molecule

  1. To show molecules

    1. hide everything
    2. show sticks
  2. To align molecules: align name1, name2

  3. Convert RDKit mol to Pymol

    from rdkit.Chem import PyMol
    v= PyMol.MolViewer()
    rdmol = Chem.MolFromSmiles('C')
    v.ShowMol(rdmol, name='mol')
    v.SaveFile('mol.pkl')

Make the trajectory for Langevin dynamics

  1. load a sequence of pymol objects named traj*.pkl into the PyMol, where traji.pkl is the i-th conformation in the trajectory.
  2. Join states: join_states mol, traj*, 0
  3. Delete useless object: delete traj*
  4. Movie - Program - State Loop - Full Speed
  5. Export the movie to a sequence of PNG files: File - Export Movie As - PNG Images
  6. Use photoshop to convert the PNG sequence to a GIF with the transparent background.

Citation

Please consider citing the following paper if you find our codes helpful. Thank you!

@inproceedings{shi*2021confgf,
title={Learning Gradient Fields for Molecular Conformation Generation},
author={Shi, Chence and Luo, Shitong and Xu, Minkai and Tang, Jian},
booktitle={International Conference on Machine Learning},
year={2021}
}

Contact

Chence Shi ([email protected])

Owner
MilaGraph
Research group led by Prof. Jian Tang at Mila-Quebec AI Institute (https://mila.quebec/) focusing on graph representation learning and graph neural networks.
MilaGraph
Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid

Kushal Shingote 2 Feb 10, 2022
Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can ! 🤡

Customers Segmentation using PHP and Rubix ML PHP Library Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can !

Mickaël Andrieu 11 Oct 08, 2022
[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning DouZero is a reinforcement learning framework for DouDizhu (斗地主), t

Kwai Inc. 3.1k Jan 04, 2023
Code for "Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks", CVPR 2021

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks This repository contains the code that accompanies our CVPR 20

Despoina Paschalidou 161 Dec 20, 2022
TensorFlow, PyTorch and Numpy layers for generating Orthogonal Polynomials

OrthNet TensorFlow, PyTorch and Numpy layers for generating multi-dimensional Orthogonal Polynomials 1. Installation 2. Usage 3. Polynomials 4. Base C

Chuan 29 May 25, 2022
Multi agent DDPG algorithm written in Python + Pytorch

Multi agent DDPG algorithm written in Python + Pytorch. It also includes a Jupyter notebook, Tennis.ipynb, as a showcase.

Rogier Wachters 2 Feb 26, 2022
A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku.

Automatic_Background_Remover A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku. 👉 https:

Gaurav 16 Oct 29, 2022
[CVPR 2022 Oral] Balanced MSE for Imbalanced Visual Regression https://arxiv.org/abs/2203.16427

Balanced MSE Code for the paper: Balanced MSE for Imbalanced Visual Regression Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Ziwei Liu CVPR 2022 (Oral) News

Jiawei Ren 267 Jan 01, 2023
This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

Tianhong Dai 6 Jul 18, 2022
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

2s-AGCN Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19 Note PyTorch version should be 0.3! For PyTor

LShi 547 Dec 26, 2022
Genetic feature selection module for scikit-learn

sklearn-genetic Genetic feature selection module for scikit-learn Genetic algorithms mimic the process of natural selection to search for optimal valu

Manuel Calzolari 260 Dec 14, 2022
Benchmarking the robustness of Spatial-Temporal Models

Benchmarking the robustness of Spatial-Temporal Models This repositery contains the code for the paper Benchmarking the Robustness of Spatial-Temporal

Yi Chenyu Ian 15 Dec 16, 2022
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022
The official implementation of Theme Transformer

Theme Transformer This is the official implementation of Theme Transformer. Checkout our demo and paper : Demo | arXiv Environment: using python versi

Ian Shih 85 Dec 08, 2022
Pywonderland - A tour in the wonderland of math with python.

A Tour in the Wonderland of Math with Python A collection of python scripts for drawing beautiful figures and animating interesting algorithms in math

Zhao Liang 4.1k Jan 03, 2023
OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion.

OstrichRL This is the repository accompanying the paper OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion. It contain

Vittorio La Barbera 51 Nov 17, 2022
ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

ROCKET + MINIROCKET ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge D

298 Dec 26, 2022
View model summaries in PyTorch!

torchinfo (formerly torch-summary) Torchinfo provides information complementary to what is provided by print(your_model) in PyTorch, similar to Tensor

Tyler Yep 1.5k Jan 05, 2023
StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

StyleGAN2 with adaptive discriminator augmentation (ADA) — Official TensorFlow implementation Training Generative Adversarial Networks with Limited Da

NVIDIA Research Projects 1.7k Dec 29, 2022
PyTorch implementation of MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning This is a PyTorch implementation of the MoCo paper: @Article{he2019moco, aut

Meta Research 3.7k Jan 02, 2023