Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

Related tags

Deep LearningDeepCDR
Overview

DeepCDR

Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

This work has been accepted to ECCB2020 and was also published in the journal Bioinformatics.

model

DeepCDR is a hybrid graph convolutional network for cancer drug response prediction. It takes both multi-omics data of cancer cell lines and drug structure as inputs and predicts the drug sensitivity (binary or contineous IC50 value).

Requirements

  • Keras==2.1.4
  • TensorFlow==1.13.1
  • hickle >= 2.1.0

Installation

DeepCDR can be downloaded by

git clone https://github.com/kimmo1019/DeepCDR

Installation has been tested in a Linux/MacOS platform.

Instructions

We provide detailed step-by-step instructions for running DeepCDR model including data preprocessing, model training, and model test.

Model implementation

Step 1: Data Preparing

Three types of raw data are required to generate genomic mutation matrix, gene expression matrix and DNA methylation matrix from CCLE database.

CCLE_mutations.csv - Genomic mutation profile from CCLE database

CCLE_expression.csv - Gene expression profile from CCLE database

CCLE_RRBS_TSS_1kb_20180614.txt - DNA methylation profile from CCLE database

The three types of raw data genomic mutation file, gene expression file and DNA methylation file can be downloaded from CCLE database or from our provided Cloud Server.

After data preprocessed, the three following preprocessed files will be in located in data folder.

genomic_mutation_34673_demap_features.csv -- genomic mutation matrix where each column denotes mutation locus and each row denotes a cell line

genomic_expression_561celllines_697genes_demap_features.csv -- gene expression matrix where each column denotes a coding gene and each row denotes a cell line

genomic_methylation_561celllines_808genes_demap_features.csv -- DNA methylation matrix where each column denotes a methylation locus and each row denotes a cell line

We recommend to start from the preprocessed data. Please note that each preprocessed file is in csv format, of which the column and row name are provided to speficy mutation location, gene name, methylation location and corresponding Cell line.

Step 2: Drug feature representation

Each drug in our study will be represented as a graph containing nodes and edges. From the GDSC database, we collected 223 drugs that have unique Pubchem ids. Note that a drug under different screening condition (different GDSC drug id) may share the same Pubchem id. Here, we used deepchem library for extracting node features and gragh of a drug. The node feature (75 dimension) corresponds to a stom in within a drug, which includes atom type, degree and hybridization, etc.

We recorded three types of features in a list as following

drug_feat = [node_feature, adj_list, degree_list]
node_feature - features of all atoms within a drug with size (nb_atom, 75)
adj_list - adjacent list of all atoms within a drug. It denotes the all the neighboring atoms indexs
degree_list - degree list of all atoms within a drug. It denotes the number of neighboring atoms 

The above feature list will be further compressed as pubchem_id.hkl using hickle library.

Please note that we provided the extracted features of 223 drugs from GDSC database, just unzip the drug_graph_feat.zip file in data/GDSC folder

Step 3: DeepCDR model training and testing

Here, we provide both DeepCDR regression and classification model here.

DeepCDR regression model

python run_DeepCDR.py -gpu_id [gpu_id] -use_mut [use_mut] -use_gexp [use_gexp] -use_methy [use_methy] 
[gpu_id] - set GPU card id (default:0)
[use_mut] - whether use genomic mutation data (default: True)
[use_gexp] - whether use gene expression data (default: True)
[use_methy] - whether use DNA methylation data (default: True)

One can run python run_DeepCDR.py -gpu_id 0 -use_mut True -use_gexp True -use_methy True to implement the DeepCDR regression model.

The trained model will be saved in data/checkpoint folder. The overall Pearson's correlation will be calculated.

DeepCDR classification model

python run_DeepCDR_classify.py -gpu_id [gpu_id] -use_mut [use_mut] -use_gexp [use_gexp] -use_methy [use_methy] 
[gpu_id] - set GPU card id (default:0)
[use_mut] - whether use genomic mutation data (default: True)
[use_gexp] - whether use gene expression data (default: True)
[use_methy] - whether use DNA methylation data (default: True)

One can run python run_DeepCDR_classify.py -gpu_id 0 -use_mut True -use_gexp True -use_methy True to implement the DeepCDR lassification model.

The trained model will be saved in data/checkpoint folder. The overall AUC and auPRn will be calculated.

External patient data

We also provided the external patient data downloaded from Firehose Broad GDAC. The patient data were preprocessed the same way as cell line data. The preprocessed data can be downloaded from our Server.

The preprocessed data contain three important files:

mut.csv - Genomic mutation profile of patients

expr.csv - Gene expression profile of patients

methy.csv - DNA methylation profile of patients

Note that the preprocessed patient data (csv format) have exact the same columns names as the three cell line data (genomic_mutation_34673_demap_features.csv, genomic_expression_561celllines_697genes_demap_features.csv, genomic_methylation_561celllines_808genes_demap_features.csv). The only difference is that the row name of patient data were replaced with patient unique barcode instead of cell line name.

Such format-consistent data is easy for external evaluation by repacing the cell line data with patient data.

Predicted missing data

As GDSC database only measured IC50 of part cell line and drug paires. We applied DeepCDR to predicted the missing IC50 values in GDSC database. The predicted results can be find at data/Missing_data_pre/records_pre_all.txt. Each record represents a predicted drug and cell line pair. The records were sorted by the predicted median IC50 values of a drug (see Fig.2E).

Contact

If you have any question regard our code or data, please do not hesitate to open a issue or directly contact me ([email protected])

Cite

If you used our work in your research, please consider citing our paper

Qiao Liu, Zhiqiang Hu, Rui Jiang, Mu Zhou, DeepCDR: a hybrid graph convolutional network for predicting cancer drug response, Bioinformatics, 2020, 36(2):i911-i918.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Owner
Qiao Liu
Qiao Liu
[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

Self-Supervised Learning of Image Scale and Orientation Estimation (BMVC 2021) This is the official implementation of the paper "Self-Supervised Learn

Jongmin Lee 17 Nov 10, 2022
Code accompanying our paper Feature Learning in Infinite-Width Neural Networks

Empirical Experiments in "Feature Learning in Infinite-width Neural Networks" This repo contains code to replicate our experiments (Word2Vec, MAML) in

Edward Hu 37 Dec 14, 2022
CVNets: A library for training computer vision networks

CVNets: A library for training computer vision networks This repository contains the source code for training computer vision models. Specifically, it

Apple 1.1k Jan 03, 2023
Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

shindo.py Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data stored in NumPy array Introduction Japa

RR_Inyo 3 Sep 23, 2022
A computer vision pipeline to identify the "icons" in Christian paintings

Christian-Iconography A computer vision pipeline to identify the "icons" in Christian paintings. A bit about iconography. Iconography is related to id

Rishab Mudliar 3 Jul 30, 2022
deep-prae

Deep Probabilistic Accelerated Evaluation (Deep-PrAE) Our work presents an efficient rare event simulation methodology for black box autonomy using Im

Safe AI Lab 4 Apr 17, 2021
MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

MusicYOLO MusicYOLO framework uses the object detection model, YOLOX, to locate notes in the spectrogram. Its performance on the ISMIR2014 dataset, MI

Xianke Wang 2 Aug 02, 2022
Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution Figure: Example visualization of the method and baseline as a

Oliver Hahn 16 Dec 23, 2022
Nodule Generation Algorithm Baseline and template code for node21 generation track

Nodule Generation Algorithm This codebase implements a simple baseline model, by following the main steps in the paper published by Litjens et al. for

node21challenge 10 Apr 21, 2022
This is an implementation for the CVPR2020 paper "Learning Invariant Representation for Unsupervised Image Restoration"

Learning Invariant Representation for Unsupervised Image Restoration (CVPR 2020) Introduction This is an implementation for the paper "Learning Invari

GarField 88 Nov 07, 2022
A blender add-on that automatically re-aligns wrong axis objects.

Auto Align A blender add-on that automatically re-aligns wrong axis objects. Usage There are three options available in the 3D Viewport Sidebar It

29 Nov 25, 2022
Focal and Global Knowledge Distillation for Detectors

FGD Paper: Focal and Global Knowledge Distillation for Detectors Install MMDetection and MS COCO2017 Our codes are based on MMDetection. Please follow

Mesopotamia 261 Dec 23, 2022
ExCon: Explanation-driven Supervised Contrastive Learning

ExCon: Explanation-driven Supervised Contrastive Learning Link to the paper: https://arxiv.org/pdf/2111.14271.pdf Contributors of this repo: Zhibo Zha

Zhibo (Darren) Zhang 18 Nov 01, 2022
Predictive AI layer for existing databases.

MindsDB is an open-source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning

MindsDB Inc 12.2k Jan 03, 2023
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

107 Dec 02, 2022
Code for paper [ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot] (ICCV 2021, oral))

ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot This repository is the official PyTorch implementation of ICCV-21 pape

Jiarui 21 May 09, 2022
TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

TCube: Domain-Agnostic Neural Time series Narration This repository contains the code for the paper: "TCube: Domain-Agnostic Neural Time series Narrat

Mandar Sharma 7 Oct 31, 2021
Repository for reproducing `Model-Based Robust Deep Learning`

Model-Based Robust Deep Learning (MBRDL) In this repository, we include the code necessary for reproducing the code used in Model-Based Robust Deep Le

Alex Robey 16 Sep 19, 2022
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

2.7k Jan 05, 2023
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Counterfactual VQA (CF-VQA) This repository is the Pytorch implementation of our paper "Counterfactual VQA: A Cause-Effect Look at Language Bias" in C

Yulei Niu 94 Dec 03, 2022