DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

Overview

DockStream

alt text

Description

DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends. Docking execution and post hoc analysis can be automated via the benchmarking and analysis workflow. The flexilibity to specifiy a large variety of docking configurations allows tailored protocols for diverse end applications. DockStream can also parallelize docking across CPU cores, increasing throughput. DockStream is integrated with the de novo design platform, REINVENT, allowing one to incorporate docking into the generative process, thus providing the agent with 3D structural information.

Supported Backends

Ligand Embedders

Docking Backends

Note: The CCDC package, the OpenEye toolkit and Schrodinger's tools require you to obtain the respective software from those vendors.

Tutorials and Usage

Detailed Jupyter Notebook tutorials for all DockStream functionalities and workflows are provided in DockStreamCommunity. The DockStream repository here contains input JSON templates located in examples. The templates are organized as follows:

  • target_preparation: Preparing targets for docking
  • ligand_preparation: Generating 3D coordinates for ligands
  • docking: Docking ligands
  • integration: Combining different ligand embedders and docking backends into a single input JSON to run successively

Requirements

Two Conda environments are provided: DockStream via environment.yml and DockStreamFull via environment_full.yml. DockStream suffices for all use cases except when CCDC GOLD software is used, in which case DockStreamFull is required.

git clone <DockStream repository>
cd <DockStream directory>
conda env create -f environment.yml
conda activate DockStream

Enable use of OpenEye software (from REINVENT README)

You will need to set the environmental variable OE_LICENSE to activate the oechem license. One way to do this and keep it conda environment specific is: On the command-line, first:

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Then edit ./etc/conda/activate.d/env_vars.sh as follows:

#!/bin/sh
export OE_LICENSE='/opt/scp/software/oelicense/1.0/oe_license.seq1'

and finally, edit ./etc/conda/deactivate.d/env_vars.sh :

#!/bin/sh
unset OE_LICENSE

Unit Tests

After cloning the DockStream repository, enable licenses, if applicable (OpenEye, CCDC, Schrodinger). Then execute the following:

python unit_tests.py

Contributors

Christian Margreitter ([email protected]) Jeff Guo ([email protected]) Alexey Voronov ([email protected])

Comments
  • Glide dockings using local machine

    Glide dockings using local machine

    Hi, I am trying to play with DockStream using Schrodinger. I am wondering if there is the possibility to use it in the local machine specifying $SCHRODINGER/glide instead of the tokens procedure.

    opened by Oulfin 6
  • Bug in Glide backend parallelization

    Bug in Glide backend parallelization

    First, thanks for contributing this nice toolbox.

    This is to report a bug in the following module:

    https://github.com/MolecularAI/DockStream/blob/7bdfd4a67f5c938e3222db59387e5a95e8a59e56/dockstream/core/Schrodinger/Glide_docker.py#L404

    while loop is used to process all sublists in batches. However, the number of processed sublists as recorded in jobs_submitted could be off because this variable is the cumulative sum of len(tmp_output_dirs), which could be smaller than len(cur_slice_sublists) if any of the sublists has no valid molecules to write out.

    The bug may cause some of the sublists get processed repeatedly, and in extreme cases may result in an infinite loop.

    I didn't check if any other backend uses similar logic to parallelize the run and may suffer from the same problem.

    opened by hshany 3
  • Question: Is it possible to feed an sdf file of prepared ligands straight into docking?

    Question: Is it possible to feed an sdf file of prepared ligands straight into docking?

    I'm trying to work out whether it's possible to put an sdf file of prepared ligands straight into a Glide run? i.e. not specifying an input_pool to the docking_runs list? (especially when using docker.py)

    opened by reskyner 2
  • Raise LigandPreparationFailed error

    Raise LigandPreparationFailed error

    For OpenEye Hybrid, it reported LigandPreparationFailed errors for both CORINA and OMEGA backend. One example is shown below: `File "/DockStream/dockstream/core/OpenEyeHybrid/Omega_ligand_preparator.py", line 66, in init raise LigandPreparationFailed("Cannot initialize OMEGA backend - abort.") dockstream.utils.dockstream_exceptions.LigandPreparationFailed: Cannot initialize OMEGA backend - abort.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/DockStream/docker.py", line 132, in raise LigandPreparationFailed dockstream.utils.dockstream_exceptions.LigandPreparationFailed`

    Could you please help me with this problem? I tried both the provided receptor-ligand data files from DockStreamCommunity and my own dataset. Both reported same LigandPreparation error. Thank you in advance!

    opened by fangffRS 1
  • ADV 1.2.0 support

    ADV 1.2.0 support

    For DockStream to work with the new AutoDock-Vina 1.2.0 (https://pubs.acs.org/doi/10.1021/acs.jcim.1c00203), the "log-file" specification has to go:

    https://github.com/MolecularAI/DockStream/blob/efefbe52d3cecb8b6d1b72ab719aad1e4702833b/dockstream/core/AutodockVina/AutodockVina_docker.py#L275

    Should be backwards-compatible.

    opened by CMargreitter 1
  • Input file of the function

    Input file of the function "parse_maestro"

    First of all, thank you for your wonderful work in drug development area using AI. I am using Glide to get the result through DockStream. I think the the function parse_maestro in Glide_docker.py can be used to extract setting for docking(In DockStream, this setting is written json file). Is this right? If so, could you tell me the input file type for the parse_mastro?! (eg. maegz, mae, sdf, etc.) I tried the function with maegz (output from Glide docking), but I couldn't get the result. I want to use parse_maestro function to reproduce the setting which applied to previous docking simulation. I would be very grateful if you could give the answer to me. Thanks!

    opened by SejeongPark8354 0
  • Openbabel integration failed

    Openbabel integration failed

    I am trying to implement Dockstream with the vina backend, an exception is raised with openBabel executable.

    Traceback (most recent call last): File "DockStream/target_preparator.py", line 130, in prep = AutodockVinaTargetPreparator(conf=config, target=input_pdb_path, run_number=run_number) File "C:\Users\Y-8874903-E.ESTUDIANT\OneDrive - URV\Escritorio\PLIP interaction\DockStream\dockstream\core\AutodockVina\AutodockVina_target_preparator.py", line 56, in init raise TargetPreparationFailed("Cannot initialize OpenBabel external library, which should be part of the environment - abort.") dockstream.utils.dockstream_exceptions.TargetPreparationFailed: Cannot initialize OpenBabel external library, which should be part of the environment - abort.

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "DockStream/target_preparator.py", line 139, in raise TargetPreparationFailed() from e dockstream.utils.dockstream_exceptions.TargetPreparationFailed

    Follow all necessary steps mentioned in docs.

    opened by Crispae 1
  • Parallelization of ADV for docking

    Parallelization of ADV for docking

    Hello,

    I am trying to run first docking experiments together with reinvent. I am observing many ADV jobs getting started with -cpu 1 (hardcoded), but a few (1 or 2) take quite long and leave all other CPUs idle until the batch has finished and a new batch has started.

    This leaves quite some capacity of a e.g. 16-core machine unused - at least that is my impression when observing the run via top or ps. In the dockstream.config, parallelization.number_cores is set to 16.

    Are there better practical settings to better exploit larger machines with 16-64 CPUs ?

    Lars

    opened by LarsAC 3
  • No module named 'ccdc'

    No module named 'ccdc'

    I believe I successfully installed the normal (not Full) DockStream package as per your instructions on the github site, and then tried to run the unit test, but this fails with a complaint regarding the ccdc module missing (see below). But I want to use Glide so wouldn’t need (nor have) ccdc. I am doing this on Ubuntu 18.04.

    Dockstream/python ./unit_tests.py Traceback (most recent call last): File "./unit_tests.py", line 10, in from tests.Gold import * File "/media/data/evehom/Projects/CompChem/DockStream/tests/Gold/init.py", line 1, in from tests.Gold.test_Gold_target_preparation import * File "/media/data/evehom/Projects/CompChem/DockStream/tests/Gold/test_Gold_target_preparation.py", line 11, in from dockstream.core.Gold.Gold_target_preparator import GoldTargetPreparator File "/media/data/evehom/Projects/CompChem/DockStream/dockstream/core/Gold/Gold_target_preparator.py", line 3, in import ccdc ModuleNotFoundError: No module named 'ccdc'

    opened by Evert-Homan 4
Releases(v1.0.0)
Owner
AstraZeneca - Molecular AI
Software from the Molecular AI department at AstraZeneca R&D
AstraZeneca - Molecular AI
FastyAPI is a Stack boilerplate optimised for heavy loads.

FastyAPI A FastAPI based Stack boilerplate for heavy loads. Explore the docs » View Demo · Report Bug · Request Feature Table of Contents About The Pr

Ali Chaayb 47 Dec 27, 2022
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021 The code for training mCOLT/mRASP2, a multilingua

104 Jan 01, 2023
Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions"

Graph Convolution Simulator (GCS) Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions" Requirements: PyTor

yifan 10 Oct 18, 2022
The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

Energy-based Conditional Generative Adversarial Network (ECGAN) This is the code for the NeurIPS 2021 paper "A Unified View of cGANs with and without

sianchen 22 May 28, 2022
Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan 70 Dec 18, 2022
Python periodic table module

elemenpy Hello! elements.py is a small Python periodic table module that is used for calling certain information about an element. Installation Instal

Eric Cheng 2 Dec 27, 2021
Package for extracting emotions from social media text. Tailored for financial data.

EmTract: Extracting Emotions from Social Media Text Tailored for Financial Contexts EmTract is a tool that extracts emotions from social media text. I

13 Nov 17, 2022
Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid

Kushal Shingote 2 Feb 10, 2022
Notepy is a full-featured Notepad Python app

Notepy A full featured python text-editor Notable features Autocompletion for parenthesis and quote Auto identation Syntax highlighting Compile and ru

Mirko Rovere 11 Sep 28, 2022
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save SAVE_NAME --data PATH_TO_DATA_DIR --dataset DATASET --model model_name [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Neural Magic Eye Preprint | Project Page | Colab Runtime Official PyTorch implementation of the preprint paper "NeuralMagicEye: Learning to See and Un

Zhengxia Zou 56 Jul 15, 2022
How to Leverage Multimodal EHR Data for Better Medical Predictions?

How to Leverage Multimodal EHR Data for Better Medical Predictions? This repository contains the code of the paper: How to Leverage Multimodal EHR Dat

13 Dec 13, 2022
Ratatoskr: Worcester Tech's conference scheduling system

Ratatoskr: Worcester Tech's conference scheduling system In Norse mythology, Ratatoskr is a squirrel who runs up and down the world tree Yggdrasil to

4 Dec 22, 2022
OpenMMLab Computer Vision Foundation

English | 简体中文 Introduction MMCV is a foundational library for computer vision research and supports many research projects as below: MMCV: OpenMMLab

OpenMMLab 4.6k Jan 09, 2023
A simple and useful implementation of LPIPS.

lpips-pytorch Description Developing perceptual distance metrics is a major topic in recent image processing problems. LPIPS[1] is a state-of-the-art

So Uchida 121 Dec 24, 2022
The codebase for Data-driven general-purpose voice activity detection.

Data driven GPVAD Repository for the work in TASLP 2021 Voice activity detection in the wild: A data-driven approach using teacher-student training. S

Heinrich Dinkel 75 Nov 27, 2022
Human Pose estimation with TensorFlow framework

Human Pose Estimation with TensorFlow Here you can find the implementation of the Human Body Pose Estimation algorithm, presented in the DeeperCut and

Eldar Insafutdinov 1.1k Dec 29, 2022
AlgoVision - A Framework for Differentiable Algorithms and Algorithmic Supervision

NeurIPS 2021 Paper "Learning with Algorithmic Supervision via Continuous Relaxations"

Felix Petersen 76 Jan 01, 2023
PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Memory In Memory Networks It is based on the paper Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spati

Yang Li 12 May 30, 2022
On Generating Extended Summaries of Long Documents

ExtendedSumm This repository contains the implementation details and datasets used in On Generating Extended Summaries of Long Documents paper at the

Georgetown Information Retrieval Lab 76 Sep 05, 2022