OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

Related tags

Deep LearningDrugOOD
Overview

🔥 DrugOOD 🔥 : OOD Dataset Curator and Benchmark for AI Aided Drug Discovery

This is the official implementation of the DrugOOD project, this is the project page: https://drugood.github.io/

Environment Installation

You can install the conda environment using the drugood.yaml file provided:

!git clone https://github.com/tencent-ailab/DrugOOD.git
!cd DrugOOD
!conda env create --name drugood --file=drugood.yaml
!conda activate drugood

Then you can go to the demo at demo/demo.ipynb which gives a quick practice on how to use DrugOOD.

Demo

For a quick practice on using DrugOOD for dataset curation and OOD benchmarking, one can refer to the demo/demo.ipynb.

Dataset Curator

First, you need to generate the required DrugOOD dataset with our code. The dataset curator currently focusing on generating datasets from CHEMBL. It supports the following two tasks:

  • Ligand Based Affinity Prediction (LBAP).
  • Structure Based Affinity Prediction (SBAP).

For OOD domain annotations, it supports the following 5 choices.

  • Assay.
  • Scaffold.
  • Size.
  • Protein. (only for SBAP task)
  • Protein Family. (only for SBAP task)

For noise annotations, it supports the following three noise levels. Datasets with different noises are implemented by filters with different levels of strictness.

  • Core.
  • Refined.
  • General.

At the same time, due to the inconvenient conversion between different measurement type (E.g. IC50, EC50, Ki, Potency), one needs to specify the measurement type when generating the dataset.

How to Run and Reproduce the 96 Datasets?

Firstly, specifiy the path of CHEMBL database and the directory to save the data in the configuration file: configs/_base_/curators/lbap_defaults.py for LBAP task or configs/_base_/curators/sbap_defaults.py for SBAP task.
The source_root="YOUR_PATH/chembl_29_sqlite/chembl_29.db" means the path to the chembl29 sqllite file. The target_root="data/" specifies the folder to save the generated data.

Note that you can download the original chembl29 database with sqllite format from http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_29/chembl_29_sqlite.tar.gz.

The built-in configuration files are located in:
configs/curators/. Here we provide the 96 config files to reproduce the 96 datasets in our paper. Meanwhile, you can also customize your own datasets by changing the config files.

Run tools/curate.py to generate dataset. Here are some examples:

Generate datasets for the LBAP task, with assay as domain, core as noise level, IC50 as measurement type, LBAP as task type.:

python tools/curate.py --cfg configs/curators/lbap_core_ic50_assay.py

Generate datasets for the SBAP task, with protein as domain, refined as noise level, EC50 as measurement type, SBAP as task type.:

python tools/curate.py --cfg configs/curator/sbap_refined_ec50_protein.py

Benchmarking SOTA OOD Algorithms

Currently we support 6 different baseline algorithms:

  • ERM
  • IRM
  • GroupDro
  • Coral
  • MixUp
  • DANN

Meanwhile, we support various GNN backbones:

  • GIN
  • GCN
  • Weave
  • ShcNet
  • GAT
  • MGCN
  • NF
  • ATi-FPGNN
  • GTransformer

And different backbones for protein sequence modeling:

  • Bert
  • ProteinBert

How to Run?

Firstly, run the following command to install.

python setup.py develop

Run the LBAP task with ERM algorithm:

python tools/train.py configs/algorithms/erm/lbap_core_ec50_assay_erm.py

If you would like to run ERM on other datasets, change the corresponding options inside the above config file. For example, ann_file = 'data/lbap_core_ec50_assay.json' specifies the input data.

Similarly, run the SBAP task with ERM algorithm:

python tools/train.py configs/algorithms/erm/sbap_core_ec50_assay_erm.py

Reference

😄 If you find this repo is useful, please consider to cite our paper:

@ARTICLE{2022arXiv220109637J,
    author = {{Ji}, Yuanfeng and {Zhang}, Lu and {Wu}, Jiaxiang and {Wu}, Bingzhe and {Huang}, Long-Kai and {Xu}, Tingyang and {Rong}, Yu and {Li}, Lanqing and {Ren}, Jie and {Xue}, Ding and {Lai}, Houtim and {Xu}, Shaoyong and {Feng}, Jing and {Liu}, Wei and {Luo}, Ping and {Zhou}, Shuigeng and {Huang}, Junzhou and {Zhao}, Peilin and {Bian}, Yatao},
    title = "{DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations}",
    journal = {arXiv e-prints},
    keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Quantitative Biology - Quantitative Methods},
    year = 2022,
    month = jan,
    eid = {arXiv:2201.09637},
    pages = {arXiv:2201.09637},
    archivePrefix = {arXiv},
    eprint = {2201.09637},
    primaryClass = {cs.LG}
}

Disclaimer

This is not an officially supported Tencent product.

Owner
Research repositories.
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

hierarchical-transformer-1d Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers In Progress!! 2021.

MyungHoon Jin 7 Nov 06, 2022
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG) 97 Jan 03, 2023
Stitch it in Time: GAN-Based Facial Editing of Real Videos

STIT - Stitch it in Time [Project Page] Stitch it in Time: GAN-Based Facial Edit

1.1k Jan 04, 2023
PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi PIKA is a lightweight speech processing toolkit based on Pytorch and (Py)

336 Nov 25, 2022
A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners A PyTorch re-implementation of Mask Autoencoder trai

Tianyu Hua 23 Dec 13, 2022
Photo2cartoon - 人像卡通化探索项目 (photo-to-cartoon translation project)

人像卡通化 (Photo to Cartoon) 中文版 | English Version 该项目为小视科技卡通肖像探索项目。您可使用微信扫描下方二维码或搜索“AI卡通秀”小程序体验卡通化效果。

Minivision_AI 3.5k Dec 30, 2022
EdiBERT, a generative model for image editing

EdiBERT, a generative model for image editing EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation. The

16 Dec 07, 2022
Machine Learning Time-Series Platform

cesium: Open-Source Platform for Time Series Inference Summary cesium is an open source library that allows users to: extract features from raw time s

632 Dec 26, 2022
The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

SGRAF PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”. It is built on top of the SCAN and C

Ronnie_IIAU 149 Dec 22, 2022
The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

FMFCC-A This project is the description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts. The FMFCC-A dataset is shared through BaiduCl

18 Dec 24, 2022
RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation

RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation Anonymous submission Abstract 3D obj

30 Sep 16, 2022
PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

D2C: Diffuison-Decoding Models for Few-shot Conditional Generation Project | Paper PyTorch implementation of D2C: Diffuison-Decoding Models for Few-sh

Jiaming Song 90 Dec 27, 2022
Deep motion transfer

animation-with-keypoint-mask Paper The right most square is the final result. Softmax mask (circles): \ Heatmap mask: \ conda env create -f environmen

9 Nov 01, 2022
Boosted CVaR Classification (NeurIPS 2021)

Boosted CVaR Classification Runtian Zhai, Chen Dan, Arun Sai Suggala, Zico Kolter, Pradeep Ravikumar NeurIPS 2021 Table of Contents Quick Start Train

Runtian Zhai 4 Feb 15, 2022
Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision

MLP Mixer Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision. Give us a star if you like this repo. Author: Github: bangoc123 Emai

Ngoc Nguyen Ba 86 Dec 10, 2022
Head and Neck Tumour Segmentation and Prediction of Patient Survival Project

Head-and-Neck-Tumour-Segmentation-and-Prediction-of-Patient-Survival Welcome to the Head and Neck Tumour Segmentation and Prediction of Patient Surviv

5 Oct 20, 2022
Fast Style Transfer in TensorFlow

Fast Style Transfer in TensorFlow Add styles from famous paintings to any photo in a fraction of a second! You can even style videos! It takes 100ms o

Jefferson 5 Oct 24, 2021
disentanglement_lib is an open-source library for research on learning disentangled representations.

disentanglement_lib disentanglement_lib is an open-source library for research on learning disentangled representation. It supports a variety of diffe

Google Research 1.3k Dec 28, 2022
API for RL algorithm design & testing of BCA (Building Control Agent) HVAC on EnergyPlus building energy simulator by wrapping their EMS Python API

RL - EmsPy (work In Progress...) The EmsPy Python package was made to facilitate Reinforcement Learning (RL) algorithm research for developing and tes

20 Jan 05, 2023
Latent Execution for Neural Program Synthesis

Latent Execution for Neural Program Synthesis This repo provides the code to replicate the experiments in the paper Xinyun Chen, Dawn Song, Yuandong T

Xinyun Chen 16 Oct 02, 2022