Systemic Evolutionary Chemical Space Exploration for Drug Discovery

Overview

SECSE


SECSE: Systemic Evolutionary Chemical Space Explorer

plot

Chemical space exploration is a major task of the hit-finding process during the pursuit of novel chemical entities. Compared with other screening technologies, computational de novo design has become a popular approach to overcome the limitation of current chemical libraries. Here, we reported a de novo design platform named systemic evolutionary chemical space explorer (SECSE). The platform was conceptually inspired by fragment-based drug design, that miniaturized a “lego-building” process within the pocket of a certain target. The key of virtual hits generation was then turned into a computational search problem. To enhance search and optimization, human intelligence and deep learning were integrated. SECSE has the potential in finding novel and diverse small molecules that are attractive starting points for further validation.

Tutorials and Usage


  1. Set Environment Variables
    export $SECSE=path/to/SECSE
    if you use AutoDock Vina for docking: (download here)
    export $VINA=path/to/AutoDockVINA
    if you use Gilde for docking (additional installation & license required):
    export $SCHRODINGER=path/to/SCHRODINGER

  2. Give execution permissions to the SECSE directory
    chmod -R +X path/to/SECSE

  3. Input fragments: a tab split .smi file without header. See demo here.

  4. Parameters in config file:
    [DEFAULT]

    • workdir, working directory, create if not exists, otherwise overwrite, type=str
    • fragments, file path to seed fragments, smi format, type=str
    • num_gen, number of generations, type=int
    • num_per_gen, number of molecules generated each generation, type=int
    • seed_per_gen, number of selected seed molecules per generation, default=1000, type=int
    • start_gen, number of staring generation, default=0, type=int
    • docking_program, name of docking program, AutoDock-Vina (input vina) or Glide (input glide) , default=vina, type=str

    [docking]

    • target, protein PDBQT if use AutoDock Vina; Grid file if choose Glide, type=str
    • RMSD, docking pose RMSD cutoff between children and parent, default=2, type=float
    • delta_score, decreased docking score cutoff between children and parent, default=-1.0, type=float
    • score_cutoff, default=-9, type=float

    Parameters when docking by AutoDock Vina:

    • x, Docking box x, type=float
    • y, Docking box y, type=float
    • z, Docking box z, type=float
    • box_size_x, Docking box size x, default=20, type=float
    • box_size_y, Docking box size y, default=20, type=float
    • box_size_z, Docking box size z, default=20, type=float

    [deep learning]

    • mode, mode of deep learning modeling, 0: not use, 1: modeling per generation, 2: modeling overall after all the generation, default=0, type=int
    • dl_per_gen, top N predicted molecules for docking, default=100, type=int
    • dl_score_cutoff, default=-9, type=float

    [properties]

    • MW, molecular weights cutoff, default=450, type=int
    • logP_lower, minimum of logP, default=0.5, type=float
    • logP_upper, maximum of logP, default=7, type=float
    • chiral_center, maximum of chiral center,default=3, type=int
    • heteroatom_ratio, maximum of heteroatom ratio, default=0.35, type=float
    • rotatable_bound_num, maximum of rotatable bound, default=5, type=int
    • rigid_body_num, default=2, type=int

    Config file of a demo case phgdh_demo_vina.ini

  5. Run SECSE
    python $SECSE/run_secse.py --config path/to/config

  6. Output files

    • merged_docked_best_timestamp_with_grow_path.csv: selected molecules and growing path
    • selected.sdf: 3D conformers of all selected molecules

Dependencies


GNU Parallel installation

numpy~=1.20.3, pandas~=1.3.3, pandarallel~=1.5.2, tqdm~=4.62.2, biopandas~=0.2.9, openbabel~=3.1.1, rdkit~=2021.03.5, chemprop~=1.3.1, torch~=1.9.0+cu111

Citation


Lu, C.; Liu, S.; Shi, W.; Yu, J.; Zhou, Z.; Zhang, X.; Lu, X.; Cai, F.; Xia, N.; Wang, Y. Systemic Evolutionary Chemical Space Exploration For Drug Discovery. ChemRxiv 2021. This content is a preprint and has not been peer-reviewed.

License


SECSE is released under Apache License, Version 2.0.

You might also like...
ETMO: Evolutionary Transfer Multiobjective Optimization

ETMO: Evolutionary Transfer Multiobjective Optimization To promote the research on ETMO, benchmark problems are of great importance to ETMO algorithm

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop
Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

Guiding Evolutionary Strategies by Differentiable Robot Simulators In recent years, Evolutionary Strategies were actively explored in robotic tasks fo

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

Balanced-Evolutionary-Semi-Stacking Code for the paper ''BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalan

This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network
Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

DeepCDR Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network This work has been accepted to ECCB2020 and was also published in the

Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

CoaDTI Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2 Abstract Environment The test was conducted i

The code for SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network.

SAG-DTA The code is the implementation for the paper 'SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network'. Requirements py

[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.
[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning This is the Tensorflow implementation of ICLR 2021 paper Rank the Episo

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer  from NNAISENSE.
A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

Comments
  • Problem running demo

    Problem running demo

    Hi!

    When I try to run the demo with the command below. python $SECSE/run_secse.py --config demo/phgdh_demo_vina.ini

    It generates pandas.errors.EmptyDataError: No columns to parse from file, what should I do to solve it? Thank you!

    Here is the output

    **************************************************************************************** 
          ____    _____    ____   ____    _____ 
         / ___|  | ____|  / ___| / ___|  | ____|
         \___ \  |  _|   | |     \___ \  |  _|  
          ___) | | |___  | |___   ___) | | |___ 
         |____/  |_____|  \____| |____/  |_____|
    /home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/core/generic.py:2882: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
     method=method,
    Table 'G-001' already exists.
    
    ******************************************************************
    Input fragment file: /home/bruce/Work/CADD/SECSE/code/demo/demo_1020.smi
    Target grid file: /home/bruce/Work/CADD/SECSE/code/demo/PHGDH_6RJ3_for_vina.pdbqt
    Workdir: /home/bruce/Work/CADD/SECSE/code/res/
    
    
    ************************************************** 
    Generation  0 ...
    Step 1: Docking with Autodock Vina ...
    /home/bruce/Work/CADD/SECSE/code/secse/evaluate/ligprep_vina_parallel.sh /home/bruce/Work/CADD/SECSE/code/res/generation_0 /home/bruce/Work/CADD/SECSE/code/demo/demo_1020.smi /home/bruce/Work/CADD/SECSE/code/demo/PHGDH_6RJ3_for_vina.pdbqt 20.9 -10.4 3.0 20.0 20.0 25.0 10
    find /home/bruce/Work/CADD/SECSE/code/res/generation_0/sdf_files -name "*sdf" | xargs -n 100 cat > /home/bruce/Work/CADD/SECSE/code/res/generation_0/docking_outputs_with_score.sdf
    Docking time cost: 0.12 min.
    Step 2: Ranking docked molecules...
    9 cmpds after evaluate
    The evaluate score cutoff is: -9.0
    9 final seeds.
    
    ************************************************** 
    Generation  1 ...
    Step 1: Mutation
    No rule class:  B-001
    No rule class:  G-003
    No rule class:  G-004
    No rule class:  G-005
    No rule class:  G-006
    No rule class:  G-007
    No rule class:  M-001
    No rule class:  M-002
    No rule class:  M-003
    No rule class:  M-004
    No rule class:  M-005
    No rule class:  M-006
    No rule class:  M-007
    No rule class:  M-008
    No rule class:  M-009
    No rule class:  M-010
    No rule class: G-002
    Step 2: Filtering all mutated mols
    sh /home/bruce/Work/CADD/SECSE/code/secse/growing/filter_parallel.sh /home/bruce/Work/CADD/SECSE/code/res/generation_1 1 demo/phgdh_demo_vina.ini 10
    Filter runtime: 0.00 min.
    Traceback (most recent call last):
     File "/home/bruce/Work/CADD/SECSE/code/secse/run_secse.py", line 80, in <module>
       main()
     File "/home/bruce/Work/CADD/SECSE/code/secse/run_secse.py", line 65, in main
       workflow.grow()
     File "/home/bruce/Work/CADD/SECSE/code/secse/grow_processes.py", line 208, in grow
       self._filter_df = pd.read_csv(os.path.join(self.workdir_now, "filter.csv"), header=None)
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
       return func(*args, **kwargs)
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
       return _read(filepath_or_buffer, kwds)
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
       parser = TextFileReader(filepath_or_buffer, **kwds)
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
       self._engine = self._make_engine(self.engine)
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
       return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
     File "/home/bruce/Downloads/Softwares/Anaconda/envs/secse/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in __init__
       self._reader = parsers.TextReader(self.handles.handle, **kwds)
     File "pandas/_libs/parsers.pyx", line 549, in pandas._libs.parsers.TextReader.__cinit__
    pandas.errors.EmptyDataError: No columns to parse from file
    
    opened by BW15061999 17
  • Question about running the demo code

    Question about running the demo code

    Hi authors,

    I have tried to run your demo code in README.md, but got some errors.

    Command

    python /home/xxx/workspace/off-SECSE/secse/run_secse.py --config ./config.ini
    

    Output

     **************************************************************************************** 
           ____    _____    ____   ____    _____ 
          / ___|  | ____|  / ___| / ___|  | ____|
          \___ \  |  _|   | |     \___ \  |  _|  
           ___) | | |___  | |___   ___) | | |___ 
          |____/  |_____|  \____| |____/  |_____|
    
    ******************************************************************
    Input fragment file: /home/xxx/workspace/off-SECSE/fy-run/demo001/ligand.smi
    Target grid file: /home/xxx/workspace/off-SECSE/fy-run/demo001/receptor.pdbqt
    Workdir: /home/xxx/workspace/off-SECSE/fy-run/demo001/
    
    Step 1: Docking with Autodock Vina ...
    /home/xxx/workspace/off-SECSE/secse/evaluate/ligprep_vina_parallel.sh /home/xxx/workspace/off-SECSE/fy-run/demo001/generation_0 /home/xxx/workspace/off-SECSE/fy-run/demo001/ligand.smi /home/t-yafan/workspace/off-SECSE/fy-run/demo001/receptor.pdbqt 20.9 -10.4 3.0 20.0 20.0 25.0 10
    find /home/xxx/workspace/off-SECSE/fy-run/demo001/generation_0/sdf_files -name "*sdf" | xargs -n 100 cat > /home/xxx/workspace/off-SECSE/fy-run/demo001/generation_0/docking_outputs_with_score.sdf
    Docking time cost: 0.11 min.
    Step 2: Ranking docked molecules...
    9 cmpds after evaluate
    The evaluate score cutoff is: -9.0
    9 final seeds.
    
     ************************************************** 
    Generation  1 ...
    Step 1: Mutation
    Traceback (most recent call last):
      File "/home/xxx/workspace/off-SECSE/secse/run_secse.py", line 70, in <module>
        main()
      File "/home/xxx/workspace/off-SECSE/secse/run_secse.py", line 55, in main
        workflow.grow()
      File "/home/xxx/workspace/off-SECSE/secse/grow_processes.py", line 159, in grow
        header = mutation_df(self.winner_df, self.workdir, self.cpu_num, self.gen)
      File "/home/xxx/workspace/off-SECSE/secse/growing/mutation/mutation.py", line 166, in mutation_df
        mutation = Mutation(5000, workdir)
      File "/home/xxx/workspace/off-SECSE/secse/growing/mutation/mutation.py", line 29, in __init__
        self.load_common_rules()
      File "/home/xxx/workspace/off-SECSE/secse/growing/mutation/mutation.py", line 50, in load_common_rules
        c.execute(sql)
    sqlite3.OperationalError: no such table: B-001
    

    It seems that the file secse/growing/mutation/rules_demo.db is missing in the repo. How can I fix it?

    Thanks!

    opened by fyabc 5
  • All dockings do not work because there's no gridding process.

    All dockings do not work because there's no gridding process.

    Hi, I was trying out the repo when I realised that neither the autodock nor glide is able to run because there was no gridding process, resulting in no grid files. >.<

    opened by yipy0005 3
Releases(v1.1.0)
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Multimedia Computing Group, Nanjing University 235 Jan 03, 2023
LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs.

LocUNet LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs. The method utilizes accura

4 Oct 05, 2022
Temporal Knowledge Graph Reasoning Triggered by Memories

MTDM Temporal Knowledge Graph Reasoning Triggered by Memories To alleviate the time dependence, we propose a memory-triggered decision-making (MTDM) n

4 Sep 25, 2022
Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

274 Dec 06, 2022
Datasets and pretrained Models for StyleGAN3 ...

Datasets and pretrained Models for StyleGAN3 ... Dear arfiticial friend, this is a collection of artistic datasets and models that we have put togethe

lucid layers 34 Oct 06, 2022
Libraries, tools and tasks created and used at DeepMind Robotics.

dm_robotics: Libraries, tools, and tasks created and used for Robotics research at DeepMind. Package overview Package Summary Transformations Rigid bo

DeepMind 273 Jan 06, 2023
An OpenAI Gym environment for Super Mario Bros

gym-super-mario-bros An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) us

Andrew Stelmach 1 Jan 05, 2022
Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

UNICORN 🦄 Webpage | Paper | BibTex PyTorch implementation of "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" pap

118 Jan 06, 2023
The implementation of 'Image synthesis via semantic composition'.

Image synthesis via semantic synthesis [Project Page] by Yi Wang, Lu Qi, Ying-Cong Chen, Xiangyu Zhang, Jiaya Jia. Introduction This repository gives

DV Lab 71 Jan 06, 2023
MobileNetV1-V2,MobileNeXt,GhostNet,AdderNet,ShuffleNetV1-V2,Mobile+ViT etc.

MobileNetV1-V2,MobileNeXt,GhostNet,AdderNet,ShuffleNetV1-V2,Mobile+ViT etc. ⭐⭐⭐⭐⭐

568 Jan 04, 2023
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

P-Lambda 437 Dec 30, 2022
Fake News Detection Using Machine Learning Methods

Fake-News-Detection-Using-Machine-Learning-Methods Fake news is always a real and dangerous issue. However, with the presence and abundance of various

Achraf Safsafi 1 Jan 11, 2022
PyTorch EO aims to make Deep Learning for Earth Observation data easy and accessible to real-world cases and research alike.

Pytorch EO Deep Learning for Earth Observation applications and research. 🚧 This project is in early development, so bugs and breaking changes are ex

earthpulse 28 Aug 25, 2022
Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Robust Object Detection via Instance-Level Temporal Cycle Confusion This repo contains the implementation of the ICCV 2021 paper, Robust Object Detect

Xin Wang 69 Oct 13, 2022
This repo generates the training data and the model for Morpheus-Deblend

Morpheus-Deblend This repo generates the training data and the model for Morpheus-Deblend. This is the active development repo for the project and as

Ryan Hausen 2 Apr 18, 2022
Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral] Learning to Disambiguate Strongly In

Zicong Fan 40 Dec 22, 2022
Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

HAIS Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021) by Shaoyu Chen, Jiemin Fang, Qian Zhang, Wenyu Liu, Xinggang Wang*. (*) Corresp

Hust Visual Learning Team 145 Jan 05, 2023
A Python package for performing pore network modeling of porous media

Overview of OpenPNM OpenPNM is a comprehensive framework for performing pore network simulations of porous materials. More Information For more detail

PMEAL 336 Dec 30, 2022
WormMovementSimulation - 3D Simulation of Worm Body Movement with Neurons attached to its body

Generate 3D Locomotion Data This module is intended to create 2D video trajector

1 Aug 09, 2022
Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Geometry-aware Instance-reweighted Adversarial Training This repository provides codes for Geometry-aware Instance-reweighted Adversarial Training (ht

Jingfeng 47 Dec 22, 2022