DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

Overview

DockStream

alt text

Description

DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends. Docking execution and post hoc analysis can be automated via the benchmarking and analysis workflow. The flexilibity to specifiy a large variety of docking configurations allows tailored protocols for diverse end applications. DockStream can also parallelize docking across CPU cores, increasing throughput. DockStream is integrated with the de novo design platform, REINVENT, allowing one to incorporate docking into the generative process, thus providing the agent with 3D structural information.

Supported Backends

Ligand Embedders

Docking Backends

Note: The CCDC package, the OpenEye toolkit and Schrodinger's tools require you to obtain the respective software from those vendors.

Tutorials and Usage

Detailed Jupyter Notebook tutorials for all DockStream functionalities and workflows are provided in DockStreamCommunity. The DockStream repository here contains input JSON templates located in examples. The templates are organized as follows:

  • target_preparation: Preparing targets for docking
  • ligand_preparation: Generating 3D coordinates for ligands
  • docking: Docking ligands
  • integration: Combining different ligand embedders and docking backends into a single input JSON to run successively

Requirements

Two Conda environments are provided: DockStream via environment.yml and DockStreamFull via environment_full.yml. DockStream suffices for all use cases except when CCDC GOLD software is used, in which case DockStreamFull is required.

git clone <DockStream repository>
cd <DockStream directory>
conda env create -f environment.yml
conda activate DockStream

Enable use of OpenEye software (from REINVENT README)

You will need to set the environmental variable OE_LICENSE to activate the oechem license. One way to do this and keep it conda environment specific is: On the command-line, first:

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Then edit ./etc/conda/activate.d/env_vars.sh as follows:

#!/bin/sh
export OE_LICENSE='/opt/scp/software/oelicense/1.0/oe_license.seq1'

and finally, edit ./etc/conda/deactivate.d/env_vars.sh :

#!/bin/sh
unset OE_LICENSE

Unit Tests

After cloning the DockStream repository, enable licenses, if applicable (OpenEye, CCDC, Schrodinger). Then execute the following:

python unit_tests.py

Contributors

Christian Margreitter ([email protected]) Jeff Guo ([email protected]) Alexey Voronov ([email protected])

Comments
  • Glide dockings using local machine

    Glide dockings using local machine

    Hi, I am trying to play with DockStream using Schrodinger. I am wondering if there is the possibility to use it in the local machine specifying $SCHRODINGER/glide instead of the tokens procedure.

    opened by Oulfin 6
  • Bug in Glide backend parallelization

    Bug in Glide backend parallelization

    First, thanks for contributing this nice toolbox.

    This is to report a bug in the following module:

    https://github.com/MolecularAI/DockStream/blob/7bdfd4a67f5c938e3222db59387e5a95e8a59e56/dockstream/core/Schrodinger/Glide_docker.py#L404

    while loop is used to process all sublists in batches. However, the number of processed sublists as recorded in jobs_submitted could be off because this variable is the cumulative sum of len(tmp_output_dirs), which could be smaller than len(cur_slice_sublists) if any of the sublists has no valid molecules to write out.

    The bug may cause some of the sublists get processed repeatedly, and in extreme cases may result in an infinite loop.

    I didn't check if any other backend uses similar logic to parallelize the run and may suffer from the same problem.

    opened by hshany 3
  • Question: Is it possible to feed an sdf file of prepared ligands straight into docking?

    Question: Is it possible to feed an sdf file of prepared ligands straight into docking?

    I'm trying to work out whether it's possible to put an sdf file of prepared ligands straight into a Glide run? i.e. not specifying an input_pool to the docking_runs list? (especially when using docker.py)

    opened by reskyner 2
  • Raise LigandPreparationFailed error

    Raise LigandPreparationFailed error

    For OpenEye Hybrid, it reported LigandPreparationFailed errors for both CORINA and OMEGA backend. One example is shown below: `File "/DockStream/dockstream/core/OpenEyeHybrid/Omega_ligand_preparator.py", line 66, in init raise LigandPreparationFailed("Cannot initialize OMEGA backend - abort.") dockstream.utils.dockstream_exceptions.LigandPreparationFailed: Cannot initialize OMEGA backend - abort.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/DockStream/docker.py", line 132, in raise LigandPreparationFailed dockstream.utils.dockstream_exceptions.LigandPreparationFailed`

    Could you please help me with this problem? I tried both the provided receptor-ligand data files from DockStreamCommunity and my own dataset. Both reported same LigandPreparation error. Thank you in advance!

    opened by fangffRS 1
  • ADV 1.2.0 support

    ADV 1.2.0 support

    For DockStream to work with the new AutoDock-Vina 1.2.0 (https://pubs.acs.org/doi/10.1021/acs.jcim.1c00203), the "log-file" specification has to go:

    https://github.com/MolecularAI/DockStream/blob/efefbe52d3cecb8b6d1b72ab719aad1e4702833b/dockstream/core/AutodockVina/AutodockVina_docker.py#L275

    Should be backwards-compatible.

    opened by CMargreitter 1
  • Input file of the function

    Input file of the function "parse_maestro"

    First of all, thank you for your wonderful work in drug development area using AI. I am using Glide to get the result through DockStream. I think the the function parse_maestro in Glide_docker.py can be used to extract setting for docking(In DockStream, this setting is written json file). Is this right? If so, could you tell me the input file type for the parse_mastro?! (eg. maegz, mae, sdf, etc.) I tried the function with maegz (output from Glide docking), but I couldn't get the result. I want to use parse_maestro function to reproduce the setting which applied to previous docking simulation. I would be very grateful if you could give the answer to me. Thanks!

    opened by SejeongPark8354 0
  • Openbabel integration failed

    Openbabel integration failed

    I am trying to implement Dockstream with the vina backend, an exception is raised with openBabel executable.

    Traceback (most recent call last): File "DockStream/target_preparator.py", line 130, in prep = AutodockVinaTargetPreparator(conf=config, target=input_pdb_path, run_number=run_number) File "C:\Users\Y-8874903-E.ESTUDIANT\OneDrive - URV\Escritorio\PLIP interaction\DockStream\dockstream\core\AutodockVina\AutodockVina_target_preparator.py", line 56, in init raise TargetPreparationFailed("Cannot initialize OpenBabel external library, which should be part of the environment - abort.") dockstream.utils.dockstream_exceptions.TargetPreparationFailed: Cannot initialize OpenBabel external library, which should be part of the environment - abort.

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "DockStream/target_preparator.py", line 139, in raise TargetPreparationFailed() from e dockstream.utils.dockstream_exceptions.TargetPreparationFailed

    Follow all necessary steps mentioned in docs.

    opened by Crispae 1
  • Parallelization of ADV for docking

    Parallelization of ADV for docking

    Hello,

    I am trying to run first docking experiments together with reinvent. I am observing many ADV jobs getting started with -cpu 1 (hardcoded), but a few (1 or 2) take quite long and leave all other CPUs idle until the batch has finished and a new batch has started.

    This leaves quite some capacity of a e.g. 16-core machine unused - at least that is my impression when observing the run via top or ps. In the dockstream.config, parallelization.number_cores is set to 16.

    Are there better practical settings to better exploit larger machines with 16-64 CPUs ?

    Lars

    opened by LarsAC 3
  • No module named 'ccdc'

    No module named 'ccdc'

    I believe I successfully installed the normal (not Full) DockStream package as per your instructions on the github site, and then tried to run the unit test, but this fails with a complaint regarding the ccdc module missing (see below). But I want to use Glide so wouldn’t need (nor have) ccdc. I am doing this on Ubuntu 18.04.

    Dockstream/python ./unit_tests.py Traceback (most recent call last): File "./unit_tests.py", line 10, in from tests.Gold import * File "/media/data/evehom/Projects/CompChem/DockStream/tests/Gold/init.py", line 1, in from tests.Gold.test_Gold_target_preparation import * File "/media/data/evehom/Projects/CompChem/DockStream/tests/Gold/test_Gold_target_preparation.py", line 11, in from dockstream.core.Gold.Gold_target_preparator import GoldTargetPreparator File "/media/data/evehom/Projects/CompChem/DockStream/dockstream/core/Gold/Gold_target_preparator.py", line 3, in import ccdc ModuleNotFoundError: No module named 'ccdc'

    opened by Evert-Homan 4
Releases(v1.0.0)
Owner
AstraZeneca - Molecular AI
Software from the Molecular AI department at AstraZeneca R&D
AstraZeneca - Molecular AI
Users can free try their models on SIDD dataset based on this code

SIDD benchmark 1 Train python train.py If you want to train your network, just modify the yaml in the options folder. 2 Validation python validation.p

Yuzhi ZHAO 2 May 20, 2022
FedGS: A Federated Group Synchronization Framework Implemented by LEAF-MX.

FedGS: Data Heterogeneity-Robust Federated Learning via Group Client Selection in Industrial IoT Preparation For instructions on generating data, plea

Lizonghang 9 Dec 22, 2022
ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

ARKitScenes This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D

Apple 371 Jan 05, 2023
Patch2Pix: Epipolar-Guided Pixel-Level Correspondences [CVPR2021]

Patch2Pix for Accurate Image Correspondence Estimation This repository contains the Pytorch implementation of our paper accepted at CVPR2021: Patch2Pi

Qunjie Zhou 199 Nov 29, 2022
Open-L2O: A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

Open-L2O This repository establishes the first comprehensive benchmark efforts of existing learning to optimize (L2O) approaches on a number of proble

VITA 161 Jan 02, 2023
It is modified Tensorflow 2.x version of Mask R-CNN

[TF 2.X] Mask R-CNN for Object Detection and Segmentation [Notice] : The original mask-rcnn uses the tensorflow 1.X version. I modified it for tensorf

Milner 34 Nov 09, 2022
a project for 3D multi-object tracking

a project for 3D multi-object tracking

155 Jan 04, 2023
Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Triangle Multiplicative Module - Pytorch Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or c

Phil Wang 22 Oct 28, 2022
Official code for our EMNLP2021 Outstanding Paper MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks

MindCraft Authors: Cristian-Paul Bara*, Sky CH-Wang*, Joyce Chai This is the official code repository for the paper (arXiv link): Cristian-Paul Bara,

Situated Language and Embodied Dialogue (SLED) Research Group 14 Dec 29, 2022
Machine learning algorithms for many-body quantum systems

NetKet NetKet is an open-source project delivering cutting-edge methods for the study of many-body quantum systems with artificial neural networks and

NetKet 413 Dec 31, 2022
Pyramid Scene Parsing Network, CVPR2017.

Pyramid Scene Parsing Network by Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia, details are in project page. Introduction This

Hengshuang Zhao 1.5k Jan 05, 2023
PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

FInite volume Neural Network (FINN) This repository contains the PyTorch code for models, training, and testing, and Python code for data generation t

Cognitive Modeling 20 Dec 18, 2022
Decorator for PyMC3

sampled Decorator for reusable models in PyMC3 Provides syntactic sugar for reusable models with PyMC3. This lets you separate creating a generative m

Colin 50 Oct 08, 2021
[AAAI2022] Source code for our paper《Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning》

SSVC The source code for paper [Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning] samples of the

7 Oct 26, 2022
SparseML is a libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

SparseML is a toolkit that includes APIs, CLIs, scripts and libraries that apply state-of-the-art sparsification algorithms such as pruning and quantization to any neural network. General, recipe-dri

Neural Magic 1.5k Dec 30, 2022
Angular & Electron desktop UI framework. Angular components for native looking and behaving macOS desktop UI (Electron/Web)

Angular Desktop UI This is a collection for native desktop like user interface components in Angular, especially useful for Electron apps. It starts w

Marc J. Schmidt 49 Dec 22, 2022
This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

14 Sep 13, 2022
ReferFormer - Official Implementation of ReferFormer

The official implementation of the paper: Language as Queries for Referring Video Object Segmentation Language as Queries for Referring Video Object S

Jonas Wu 232 Dec 29, 2022
Julia and Matlab codes to simulated all problems in El-Hachem, McCue and Simpson (2021)

Substrate_Mediated_Invasion Julia and Matlab codes to simulated all problems in El-Hachem, McCue and Simpson (2021) 2DSolver.jl reproduces the simulat

Matthew Simpson 0 Nov 09, 2021
Face Recognition plus identification simply and fast | Python

PyFaceDetection Face Recognition plus identification simply and fast Ubuntu Setup sudo pip3 install numpy sudo pip3 install cmake sudo pip3 install dl

Peyman Majidi Moein 16 Sep 22, 2022