OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

Related tags

Deep LearningDrugOOD
Overview

🔥 DrugOOD 🔥 : OOD Dataset Curator and Benchmark for AI Aided Drug Discovery

This is the official implementation of the DrugOOD project, this is the project page: https://drugood.github.io/

Environment Installation

You can install the conda environment using the drugood.yaml file provided:

!git clone https://github.com/tencent-ailab/DrugOOD.git
!cd DrugOOD
!conda env create --name drugood --file=drugood.yaml
!conda activate drugood

Then you can go to the demo at demo/demo.ipynb which gives a quick practice on how to use DrugOOD.

Demo

For a quick practice on using DrugOOD for dataset curation and OOD benchmarking, one can refer to the demo/demo.ipynb.

Dataset Curator

First, you need to generate the required DrugOOD dataset with our code. The dataset curator currently focusing on generating datasets from CHEMBL. It supports the following two tasks:

  • Ligand Based Affinity Prediction (LBAP).
  • Structure Based Affinity Prediction (SBAP).

For OOD domain annotations, it supports the following 5 choices.

  • Assay.
  • Scaffold.
  • Size.
  • Protein. (only for SBAP task)
  • Protein Family. (only for SBAP task)

For noise annotations, it supports the following three noise levels. Datasets with different noises are implemented by filters with different levels of strictness.

  • Core.
  • Refined.
  • General.

At the same time, due to the inconvenient conversion between different measurement type (E.g. IC50, EC50, Ki, Potency), one needs to specify the measurement type when generating the dataset.

How to Run and Reproduce the 96 Datasets?

Firstly, specifiy the path of CHEMBL database and the directory to save the data in the configuration file: configs/_base_/curators/lbap_defaults.py for LBAP task or configs/_base_/curators/sbap_defaults.py for SBAP task.
The source_root="YOUR_PATH/chembl_29_sqlite/chembl_29.db" means the path to the chembl29 sqllite file. The target_root="data/" specifies the folder to save the generated data.

Note that you can download the original chembl29 database with sqllite format from http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_29/chembl_29_sqlite.tar.gz.

The built-in configuration files are located in:
configs/curators/. Here we provide the 96 config files to reproduce the 96 datasets in our paper. Meanwhile, you can also customize your own datasets by changing the config files.

Run tools/curate.py to generate dataset. Here are some examples:

Generate datasets for the LBAP task, with assay as domain, core as noise level, IC50 as measurement type, LBAP as task type.:

python tools/curate.py --cfg configs/curators/lbap_core_ic50_assay.py

Generate datasets for the SBAP task, with protein as domain, refined as noise level, EC50 as measurement type, SBAP as task type.:

python tools/curate.py --cfg configs/curator/sbap_refined_ec50_protein.py

Benchmarking SOTA OOD Algorithms

Currently we support 6 different baseline algorithms:

  • ERM
  • IRM
  • GroupDro
  • Coral
  • MixUp
  • DANN

Meanwhile, we support various GNN backbones:

  • GIN
  • GCN
  • Weave
  • ShcNet
  • GAT
  • MGCN
  • NF
  • ATi-FPGNN
  • GTransformer

And different backbones for protein sequence modeling:

  • Bert
  • ProteinBert

How to Run?

Firstly, run the following command to install.

python setup.py develop

Run the LBAP task with ERM algorithm:

python tools/train.py configs/algorithms/erm/lbap_core_ec50_assay_erm.py

If you would like to run ERM on other datasets, change the corresponding options inside the above config file. For example, ann_file = 'data/lbap_core_ec50_assay.json' specifies the input data.

Similarly, run the SBAP task with ERM algorithm:

python tools/train.py configs/algorithms/erm/sbap_core_ec50_assay_erm.py

Reference

😄 If you find this repo is useful, please consider to cite our paper:

@ARTICLE{2022arXiv220109637J,
    author = {{Ji}, Yuanfeng and {Zhang}, Lu and {Wu}, Jiaxiang and {Wu}, Bingzhe and {Huang}, Long-Kai and {Xu}, Tingyang and {Rong}, Yu and {Li}, Lanqing and {Ren}, Jie and {Xue}, Ding and {Lai}, Houtim and {Xu}, Shaoyong and {Feng}, Jing and {Liu}, Wei and {Luo}, Ping and {Zhou}, Shuigeng and {Huang}, Junzhou and {Zhao}, Peilin and {Bian}, Yatao},
    title = "{DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations}",
    journal = {arXiv e-prints},
    keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Quantitative Biology - Quantitative Methods},
    year = 2022,
    month = jan,
    eid = {arXiv:2201.09637},
    pages = {arXiv:2201.09637},
    archivePrefix = {arXiv},
    eprint = {2201.09637},
    primaryClass = {cs.LG}
}

Disclaimer

This is not an officially supported Tencent product.

Owner
Research repositories.
Companion code for the paper "Meta-Learning the Search Distribution of Black-Box Random Search Based Adversarial Attacks" by Yatsura et al.

META-RS This is the companion code for the paper "Meta-Learning the Search Distribution of Black-Box Random Search Based Adversarial Attacks" by Yatsu

Bosch Research 7 Dec 09, 2022
Accelerating BERT Inference for Sequence Labeling via Early-Exit

Sequence-Labeling-Early-Exit Code for ACL 2021 paper: Accelerating BERT Inference for Sequence Labeling via Early-Exit Requirement: Please refer to re

李孝男 23 Oct 14, 2022
rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle.

rastrainer rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle. UI TODO Init UI. Add Block. Add l

deepbands 5 Mar 04, 2022
FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

FairEdit Relevent Publication FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

5 Feb 04, 2022
Deep Learning Algorithms for Hedging with Frictions

Deep Learning Algorithms for Hedging with Frictions This repository contains the Forward-Backward Stochastic Differential Equation (FBSDE) solver and

Xiaofei Shi 3 Dec 22, 2022
Evaluation suite for large-scale language models.

This repo contains code for running the evaluations and reproducing the results from the Jurassic-1 Technical Paper (see blog post), with current support for running the tasks through both the AI21 S

71 Dec 17, 2022
Deep Convolutional Generative Adversarial Networks

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Alec Radford, Luke Metz, Soumith Chintala All images in t

Alec Radford 3.4k Dec 29, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Swin Transformer for Semantic Segmentation of satellite images This repo contains the supported code and configuration files to reproduce semantic seg

23 Oct 10, 2022
MPI-IS Mesh Processing Library

Perceiving Systems Mesh Package This package contains core functions for manipulating meshes and visualizing them. It requires Python 3.5+ and is supp

Max Planck Institute for Intelligent Systems 494 Jan 06, 2023
Convert Table data to approximate values with GUI

Table_Editor Convert Table data to approximate values with GUIs... usage - Import methods for extension Tables. Imported method supposed to have only

CLJ 1 Jan 10, 2022
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 364 Dec 28, 2022
Rational Activation Functions - Replacing Padé Activation Units

Rational Activations - Learnable Rational Activation Functions First introduce as PAU in Padé Activation Units: End-to-end Learning of Activation Func

<a href=[email protected]"> 38 Nov 22, 2022
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

SGLKT-VisDial Pytorch Implementation for the paper: Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer Gi-Cheon Kang, Junseok P

Gi-Cheon Kang 9 Jul 05, 2022
An updated version of virtual model making

Model-Swap-Face v2   这个项目是基于stylegan2 pSp制作的,比v1版本Model-Swap-Face在推理速度和图像质量上有一定提升。主要的功能是将虚拟模特进行环球不同区域的风格转换,目前转换器提供西欧模特、东亚模特和北非模特三种主流的风格样式,可帮我们实现生产资料零成

seeprettyface.com 62 Dec 09, 2022
Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression

Quantile Regression DQN Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression (https://arx

Arsenii Senya Ashukha 80 Sep 17, 2022
Semi-supervised Implicit Scene Completion from Sparse LiDAR

Semi-supervised Implicit Scene Completion from Sparse LiDAR Paper Created by Pengfei Li, Yongliang Shi, Tianyu Liu, Hao Zhao, Guyue Zhou and YA-QIN ZH

114 Nov 30, 2022
Dynamica causal Bayesian optimisation

Dynamic Causal Bayesian Optimization This is a Python implementation of Dynamic Causal Bayesian Optimization as presented at NeurIPS 2021. Abstract Th

nd308 18 Nov 22, 2022
This repository contains small projects related to Neural Networks and Deep Learning in general.

ILearnDeepLearning.py Description People say that nothing develops and teaches you like getting your hands dirty. This repository contains small proje

Piotr Skalski 1.2k Dec 22, 2022
AQP is a modular pipeline built to enable the comparison and testing of different quality metric configurations.

Audio Quality Platform - AQP An Open Modular Python Platform for Objective Speech and Audio Quality Metrics AQP is a highly modular pipeline designed

Jack Geraghty 24 Oct 01, 2022
Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

Deep3DMM Official repository for the CVPR 2021 paper Learning Feature Aggregation for Deep 3D Morphable Models. Requirements This code is tested on Py

38 Dec 27, 2022