OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

Related tags

Deep LearningDrugOOD
Overview

🔥 DrugOOD 🔥 : OOD Dataset Curator and Benchmark for AI Aided Drug Discovery

This is the official implementation of the DrugOOD project, this is the project page: https://drugood.github.io/

Environment Installation

You can install the conda environment using the drugood.yaml file provided:

!git clone https://github.com/tencent-ailab/DrugOOD.git
!cd DrugOOD
!conda env create --name drugood --file=drugood.yaml
!conda activate drugood

Then you can go to the demo at demo/demo.ipynb which gives a quick practice on how to use DrugOOD.

Demo

For a quick practice on using DrugOOD for dataset curation and OOD benchmarking, one can refer to the demo/demo.ipynb.

Dataset Curator

First, you need to generate the required DrugOOD dataset with our code. The dataset curator currently focusing on generating datasets from CHEMBL. It supports the following two tasks:

  • Ligand Based Affinity Prediction (LBAP).
  • Structure Based Affinity Prediction (SBAP).

For OOD domain annotations, it supports the following 5 choices.

  • Assay.
  • Scaffold.
  • Size.
  • Protein. (only for SBAP task)
  • Protein Family. (only for SBAP task)

For noise annotations, it supports the following three noise levels. Datasets with different noises are implemented by filters with different levels of strictness.

  • Core.
  • Refined.
  • General.

At the same time, due to the inconvenient conversion between different measurement type (E.g. IC50, EC50, Ki, Potency), one needs to specify the measurement type when generating the dataset.

How to Run and Reproduce the 96 Datasets?

Firstly, specifiy the path of CHEMBL database and the directory to save the data in the configuration file: configs/_base_/curators/lbap_defaults.py for LBAP task or configs/_base_/curators/sbap_defaults.py for SBAP task.
The source_root="YOUR_PATH/chembl_29_sqlite/chembl_29.db" means the path to the chembl29 sqllite file. The target_root="data/" specifies the folder to save the generated data.

Note that you can download the original chembl29 database with sqllite format from http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_29/chembl_29_sqlite.tar.gz.

The built-in configuration files are located in:
configs/curators/. Here we provide the 96 config files to reproduce the 96 datasets in our paper. Meanwhile, you can also customize your own datasets by changing the config files.

Run tools/curate.py to generate dataset. Here are some examples:

Generate datasets for the LBAP task, with assay as domain, core as noise level, IC50 as measurement type, LBAP as task type.:

python tools/curate.py --cfg configs/curators/lbap_core_ic50_assay.py

Generate datasets for the SBAP task, with protein as domain, refined as noise level, EC50 as measurement type, SBAP as task type.:

python tools/curate.py --cfg configs/curator/sbap_refined_ec50_protein.py

Benchmarking SOTA OOD Algorithms

Currently we support 6 different baseline algorithms:

  • ERM
  • IRM
  • GroupDro
  • Coral
  • MixUp
  • DANN

Meanwhile, we support various GNN backbones:

  • GIN
  • GCN
  • Weave
  • ShcNet
  • GAT
  • MGCN
  • NF
  • ATi-FPGNN
  • GTransformer

And different backbones for protein sequence modeling:

  • Bert
  • ProteinBert

How to Run?

Firstly, run the following command to install.

python setup.py develop

Run the LBAP task with ERM algorithm:

python tools/train.py configs/algorithms/erm/lbap_core_ec50_assay_erm.py

If you would like to run ERM on other datasets, change the corresponding options inside the above config file. For example, ann_file = 'data/lbap_core_ec50_assay.json' specifies the input data.

Similarly, run the SBAP task with ERM algorithm:

python tools/train.py configs/algorithms/erm/sbap_core_ec50_assay_erm.py

Reference

😄 If you find this repo is useful, please consider to cite our paper:

@ARTICLE{2022arXiv220109637J,
    author = {{Ji}, Yuanfeng and {Zhang}, Lu and {Wu}, Jiaxiang and {Wu}, Bingzhe and {Huang}, Long-Kai and {Xu}, Tingyang and {Rong}, Yu and {Li}, Lanqing and {Ren}, Jie and {Xue}, Ding and {Lai}, Houtim and {Xu}, Shaoyong and {Feng}, Jing and {Liu}, Wei and {Luo}, Ping and {Zhou}, Shuigeng and {Huang}, Junzhou and {Zhao}, Peilin and {Bian}, Yatao},
    title = "{DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations}",
    journal = {arXiv e-prints},
    keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Quantitative Biology - Quantitative Methods},
    year = 2022,
    month = jan,
    eid = {arXiv:2201.09637},
    pages = {arXiv:2201.09637},
    archivePrefix = {arXiv},
    eprint = {2201.09637},
    primaryClass = {cs.LG}
}

Disclaimer

This is not an officially supported Tencent product.

Owner
Research repositories.
LAMDA: Label Matching Deep Domain Adaptation

LAMDA: Label Matching Deep Domain Adaptation This is the implementation of the paper LAMDA: Label Matching Deep Domain Adaptation which has been accep

Tuan Nguyen 9 Sep 06, 2022
StableSims is an open-source project aimed at simulating MakerDAO's Dai stablecoin system

StableSims is an open-source project aimed at simulating MakerDAO's Dai stablecoin system, initially used for researching optimal incentive parameters for Liquidations 2.0.

Blockchain at Berkeley 52 Nov 21, 2022
Single Image Super-Resolution (SISR) with SRResNet, EDSR and SRGAN

Single Image Super-Resolution (SISR) with SRResNet, EDSR and SRGAN Introduction Image super-resolution (SR) is the process of recovering high-resoluti

8 Apr 15, 2022
This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

Pruning Self-attentions into Convolutional Layers in Single Path This is the official repository for our paper: Pruning Self-attentions into Convoluti

Zhuang AI Group 77 Dec 26, 2022
Code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge.

Open Sesame This repository contains the code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge. Credits We built the project on t

9 Jul 24, 2022
Project for tracking occupancy in Tel-Aviv parking lots.

Ahuzat Dibuk - Tracking occupancy in Tel-Aviv parking lots main.py This module was set-up to be executed on Google Cloud Platform. I run it every 15 m

Geva Kipper 35 Nov 22, 2022
GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration

GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration Stefan Abi-Karam*, Yuqi He*, Rishov Sarkar*, Lakshmi Sathidevi, Zihang Qiao, Co

Sharc-Lab 19 Dec 15, 2022
YOLOv5 + ROS2 object detection package

YOLOv5-ROS YOLOv5 + ROS2 object detection package This program changes the input of detect.py (ultralytics/yolov5) to sensor_msgs/Image of ROS2. Requi

Ar-Ray 23 Dec 19, 2022
Code repository for Semantic Terrain Classification for Off-Road Autonomous Driving

BEVNet Datasets Datasets should be put inside data/. For example, data/semantic_kitti_4class_100x100. Training BEVNet-S Example: cd experiments bash t

(Brian) JoonHo Lee 24 Dec 12, 2022
AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

AgeGuesser AgeGuesser is an end-to-end, deep-learning based Age Estimation system, presented at the CAIP 2021 conference. You can find the related pap

5 Nov 10, 2022
A Python framework for conversational search

Chatty Goose Multi-stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting Installation Ma

Castorini 36 Oct 23, 2022
This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

Ditch the Gold Standard: Re-evaluating Conversational Question Answering This is the repository for our paper Ditch the Gold Standard: Re-evaluating C

Princeton Natural Language Processing 38 Dec 16, 2022
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP mod

Moein Shariatnia 226 Jan 05, 2023
Code for "Long-tailed Distribution Adaptation"

Long-tailed Distribution Adaptation (Accepted in ACM MM2021) This project is built upon BBN. Installation pip install -r requirements.txt Usage Traini

Zhiliang Peng 10 May 18, 2022
The Noise Contrastive Estimation for softmax output written in Pytorch

An NCE implementation in pytorch About NCE Noise Contrastive Estimation (NCE) is an approximation method that is used to work around the huge computat

Kaiyu Shi 287 Nov 25, 2022
Codebase for ECCV18 "The Sound of Pixels"

Sound-of-Pixels Codebase for ECCV18 "The Sound of Pixels". *This repository is under construction, but the core parts are already there. Environment T

Hang Zhao 318 Dec 20, 2022
Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

flownet2-pytorch Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. Multiple GPU training is supported, a

NVIDIA Corporation 2.8k Dec 27, 2022
simple demo codes for Learning to Teach with Dynamic Loss Functions

Learning to Teach with Dynamic Loss Functions This repo contains the simple demo for the NeurIPS-18 paper: Learning to Teach with Dynamic Loss Functio

Lijun Wu 15 Dec 30, 2021
Machine learning library for fast and efficient Gaussian mixture models

This repository contains code which implements the Stochastic Gaussian Mixture Model (S-GMM) for event-based datasets Dependencies CMake Premake4 Blaz

Omar Oubari 1 Dec 19, 2022
Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning.

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning. Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive

<a href=[email protected](SZ)"> 7 Dec 16, 2021