Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

Related tags

Deep LearningConfGF
Overview

ConfGF


License: MIT

[PDF] | [Slides]

The official implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021 Long talk)

Installation

Install via Conda (Recommended)

# Clone the environment
conda env create -f env.yml

# Activate the environment
conda activate confgf

# Install Library
git clone https://github.com/DeepGraphLearning/ConfGF.git
cd ConfGF
python setup.py install

Install Manually

# Create conda environment
conda create -n confgf python=3.7

# Activate the environment
conda activate confgf

# Install packages
conda install -y -c pytorch pytorch=1.7.0 torchvision torchaudio cudatoolkit=10.2
conda install -y -c rdkit rdkit==2020.03.2.0
conda install -y scikit-learn pandas decorator ipython networkx tqdm matplotlib
conda install -y -c conda-forge easydict
pip install pyyaml

# Install PyTorch Geometric
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
pip install torch-geometric==1.6.3

# Install Library
git clone https://github.com/DeepGraphLearning/ConfGF.git
cd ConfGF
python setup.py install

Dataset

Offical Dataset

The offical raw GEOM dataset is avaiable [here].

Preprocessed dataset

We provide the preprocessed datasets (GEOM, ISO17) in a [google drive folder]. For ISO17 dataset, we use the default split of [GraphDG].

Prepare your own GEOM dataset from scratch (optional)

Download the raw GEOM dataset and unpack it.

tar xvf ~/rdkit_folder.tar.gz -C ~/GEOM

Preprocess the raw GEOM dataset.

python script/process_GEOM_dataset.py --base_path GEOM --dataset_name qm9 --confmin 50 --confmax 500
python script/process_GEOM_dataset.py --base_path GEOM --dataset_name drugs --confmin 50 --confmax 100

The final folder structure will look like this:

GEOM
|___rdkit_folder  # raw dataset
|   |___qm9 # raw qm9 dataset
|   |___drugs # raw drugs dataset
|   |___summary_drugs.json
|   |___summary_qm9.json
|   
|___qm9_processed
|   |___train_data_40k.pkl
|   |___val_data_5k.pkl
|   |___test_data_200.pkl
|   
|___drugs_processed
|   |___train_data_39k.pkl
|   |___val_data_5k.pkl
|   |___test_data_200.pkl
|
iso17_processed
|___iso17_split-0_train_processed.pkl
|___iso17_split-0_test_processed.pkl
|
...

Training

All hyper-parameters and training details are provided in config files (./config/*.yml), and free feel to tune these parameters.

You can train the model with the following commands:

python -u script/train.py --config_path ./config/qm9_default.yml
python -u script/train.py --config_path ./config/drugs_default.yml
python -u script/train.py --config_path ./config/iso17_default.yml

The checkpoint of the models will be saved into a directory specified in config files.

Generation

We provide the checkpoints of three trained models, i.e., qm9_default, drugs_default and iso17_default in a [google drive folder].

You can generate conformations of a molecule by feeding its SMILES into the model:

python -u script/gen.py --config_path ./config/qm9_default.yml --generator ConfGF --smiles c1ccccc1
python -u script/gen.py --config_path ./config/qm9_default.yml --generator ConfGFDist --smiles c1ccccc1

Here we use the models trained on GEOM-QM9 to generate conformations for the benzene. The argument --generator indicates the type of the generator, i.e., ConfGF vs. ConfGFDist. See the ablation study (Table 5) in the original paper for more details.

You can also generate conformations for an entire test set.

python -u script/gen.py --config_path ./config/qm9_default.yml --generator ConfGF \
                        --start 0 --end 200 \

python -u script/gen.py --config_path ./config/qm9_default.yml --generator ConfGFDist \
                        --start 0 --end 200 \

python -u script/gen.py --config_path ./config/drugs_default.yml --generator ConfGF \
                        --start 0 --end 200 \

python -u script/gen.py --config_path ./config/drugs_default.yml --generator ConfGFDist \
                        --start 0 --end 200 \

Here start and end indicate the range of the test set that we want to use. All hyper-parameters related to generation can be set in config files.

Conformations of some drug-like molecules generated by ConfGF are provided below.

Get Results

The results of all benchmark tasks can be calculated based on generated conformations.

We report the results of each task in the following tables. Results of ConfGF and ConfGFDist are re-evaluated based on the current code base, which successfully reproduce the results reported in the original paper. Results of other models are taken directly from the original paper.

Task 1. Conformation Generation

The COV and MAT scores on the GEOM datasets can be calculated using the following commands:

python -u script/get_task1_results.py --input dir_of_QM9_samples --core 10 --threshold 0.5  

python -u script/get_task1_results.py --input dir_of_Drugs_samples --core 10 --threshold 1.25  

Table: COV and MAT scores on GEOM-QM9

QM9 COV-Mean (%) COV-Median (%) MAT-Mean (\AA) MAT-Median (\AA)
ConfGF 91.06 95.76 0.2649 0.2668
ConfGFDist 85.37 88.59 0.3435 0.3548
CGCF 78.05 82.48 0.4219 0.3900
GraphDG 73.33 84.21 0.4245 0.3973
CVGAE 0.09 0.00 1.6713 1.6088
RDKit 83.26 90.78 0.3447 0.2935

Table: COV and MAT scores on GEOM-Drugs

Drugs COV-Mean (%) COV-Median (%) MAT-Mean (\AA) MAT-Median (\AA)
ConfGF 62.54 71.32 1.1637 1.1617
ConfGFDist 49.96 48.12 1.2845 1.2827
CGCF 53.96 57.06 1.2487 1.2247
GraphDG 8.27 0.00 1.9722 1.9845
CVGAE 0.00 0.00 3.0702 2.9937
RDKit 60.91 65.70 1.2026 1.1252

Task 2. Distributions Over Distances

The MMD metrics on the ISO17 dataset can be calculated using the following commands:

python -u script/get_task2_results.py --input dir_of_ISO17_samples

Table: Distributions over distances

Method Single-Mean Single-Median Pair-Mean Pair-Median All-Mean All-Median
ConfGF 0.3430 0.2473 0.4195 0.3081 0.5432 0.3868
ConfGFDist 0.3348 0.2011 0.4080 0.2658 0.5821 0.3974
CGCF 0.4490 0.1786 0.5509 0.2734 0.8703 0.4447
GraphDG 0.7645 0.2346 0.8920 0.3287 1.1949 0.5485
CVGAE 4.1789 4.1762 4.9184 5.1856 5.9747 5.9928
RDKit 3.4513 3.1602 3.8452 3.6287 4.0866 3.7519

Visualizing molecules with PyMol

Start Setup

  1. pymol -R
  2. Display - Background - White
  3. Display - Color Space - CMYK
  4. Display - Quality - Maximal Quality
  5. Display Grid
    1. by object: use set grid_slot, int, mol_name to put the molecule into the corresponding slot
    2. by state: align all conformations in a single slot
    3. by object-state: align all conformations and put them in separate slots. (grid_slot dont work!)
  6. Setting - Line and Sticks - Ball and Stick on - Ball and Stick ratio: 1.5
  7. Setting - Line and Sticks - Stick radius: 0.2 - Stick Hydrogen Scale: 1.0

Show Molecule

  1. To show molecules

    1. hide everything
    2. show sticks
  2. To align molecules: align name1, name2

  3. Convert RDKit mol to Pymol

    from rdkit.Chem import PyMol
    v= PyMol.MolViewer()
    rdmol = Chem.MolFromSmiles('C')
    v.ShowMol(rdmol, name='mol')
    v.SaveFile('mol.pkl')

Make the trajectory for Langevin dynamics

  1. load a sequence of pymol objects named traj*.pkl into the PyMol, where traji.pkl is the i-th conformation in the trajectory.
  2. Join states: join_states mol, traj*, 0
  3. Delete useless object: delete traj*
  4. Movie - Program - State Loop - Full Speed
  5. Export the movie to a sequence of PNG files: File - Export Movie As - PNG Images
  6. Use photoshop to convert the PNG sequence to a GIF with the transparent background.

Citation

Please consider citing the following paper if you find our codes helpful. Thank you!

@inproceedings{shi*2021confgf,
title={Learning Gradient Fields for Molecular Conformation Generation},
author={Shi, Chence and Luo, Shitong and Xu, Minkai and Tang, Jian},
booktitle={International Conference on Machine Learning},
year={2021}
}

Contact

Chence Shi ([email protected])

Owner
MilaGraph
Research group led by Prof. Jian Tang at Mila-Quebec AI Institute (https://mila.quebec/) focusing on graph representation learning and graph neural networks.
MilaGraph
GARCH and Multivariate LSTM forecasting models for Bitcoin realized volatility with potential applications in crypto options trading, hedging, portfolio management, and risk management

Bitcoin Realized Volatility Forecasting with GARCH and Multivariate LSTM Author: Chi Bui This Repository Repository Directory ├── README.md

Chi Bui 113 Dec 29, 2022
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
RoFormer_pytorch

PyTorch RoFormer 原版Tensorflow权重(https://github.com/ZhuiyiTechnology/roformer) chinese_roformer_L-12_H-768_A-12.zip (提取码:xy9x) 已经转化为PyTorch权重 chinese_r

yujun 283 Dec 12, 2022
Emblaze - Interactive Embedding Comparison

Emblaze - Interactive Embedding Comparison Emblaze is a Jupyter notebook widget for visually comparing embeddings using animated scatter plots. It bun

CMU Data Interaction Group 77 Nov 24, 2022
Official pytorch code for SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal

SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal This is the official pytorch code for SSAT: A Symmetric Semantic-

ForeverPupil 57 Dec 13, 2022
Machine Learning Time-Series Platform

cesium: Open-Source Platform for Time Series Inference Summary cesium is an open source library that allows users to: extract features from raw time s

632 Dec 26, 2022
Repository for "Exploring Sparsity in Image Super-Resolution for Efficient Inference", CVPR 2021

SMSR Reposity for "Exploring Sparsity in Image Super-Resolution for Efficient Inference" [arXiv] Highlights Locate and skip redundant computation in S

Longguang Wang 225 Dec 26, 2022
FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics

FusionNet_Pytorch FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics Requirements Pytorch 0.1.11 Pyt

Choi Gunho 102 Dec 13, 2022
A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Duplicate Image Detection Getting Started Install dependencies pip install -r requirements.txt Run service python main.py Testing Test with pytest How

Matthew Podolak 21 Nov 11, 2022
Implementation for "Conditional entropy minimization principle for learning domain invariant representation features"

Implementation for "Conditional entropy minimization principle for learning domain invariant representation features". The code is reproduced from thi

1 Nov 02, 2022
An Official Repo of CVPR '20 "MSeg: A Composite Dataset for Multi-Domain Segmentation"

This is the code for the paper: MSeg: A Composite Dataset for Multi-domain Semantic Segmentation (CVPR 2020, Official Repo) [CVPR PDF] [Journal PDF] J

226 Nov 05, 2022
A library of scripts that interact with the PythonTurtle module to create games, drawings, and more

TurtleLib TurtleLib is a library of scripts that interact with the PythonTurtle module to create games, drawings, and more! Using the Scripts Copy or

1 Jan 15, 2022
Prevent `CUDA error: out of memory` in just 1 line of code.

🐨 Koila Koila solves CUDA error: out of memory error painlessly. Fix it with just one line of code, and forget it. 🚀 Features 🙅 Prevents CUDA error

RenChu Wang 1.7k Jan 02, 2023
Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT : Cross-Attention Multi-Scale Vision Transformer for Image Classification This is an unofficial PyTorch implementation of CrossViT: Cross-Att

Rishikesh (ऋषिकेश) 103 Nov 25, 2022
PyTorch implementation of "VRT: A Video Restoration Transformer"

VRT: A Video Restoration Transformer Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, Luc Van Gool Computer

Jingyun Liang 837 Jan 09, 2023
Node for thenewboston digital currency network.

Project setup For project setup see INSTALL.rst Community Join the community to stay updated on the most recent developments, project roadmaps, and ra

thenewboston 27 Jul 08, 2022
Neural Message Passing for Computer Vision

Neural Message Passing for Quantum Chemistry Implementation of different models of Neural Networks on graphs as explained in the article proposed by G

Pau Riba 310 Nov 07, 2022
High-performance moving least squares material point method (MLS-MPM) solver.

High-Performance MLS-MPM Solver with Cutting and Coupling (CPIC) (MIT License) A Moving Least Squares Material Point Method with Displacement Disconti

Yuanming Hu 2.2k Dec 31, 2022
Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

How The New York Times can increase Engagement on Facebook Using machine learning to understand characteristics of news content that garners "high" Fa

Jessica Miles 0 Sep 16, 2021
Multi-Stage Spatial-Temporal Convolutional Neural Network (MS-GCN)

Multi-Stage Spatial-Temporal Convolutional Neural Network (MS-GCN) This code implements the skeleton-based action segmentation MS-GCN model from Autom

Benjamin Filtjens 8 Nov 29, 2022