Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

Related tags

Deep LearningDeepCDR
Overview

DeepCDR

Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

This work has been accepted to ECCB2020 and was also published in the journal Bioinformatics.

model

DeepCDR is a hybrid graph convolutional network for cancer drug response prediction. It takes both multi-omics data of cancer cell lines and drug structure as inputs and predicts the drug sensitivity (binary or contineous IC50 value).

Requirements

  • Keras==2.1.4
  • TensorFlow==1.13.1
  • hickle >= 2.1.0

Installation

DeepCDR can be downloaded by

git clone https://github.com/kimmo1019/DeepCDR

Installation has been tested in a Linux/MacOS platform.

Instructions

We provide detailed step-by-step instructions for running DeepCDR model including data preprocessing, model training, and model test.

Model implementation

Step 1: Data Preparing

Three types of raw data are required to generate genomic mutation matrix, gene expression matrix and DNA methylation matrix from CCLE database.

CCLE_mutations.csv - Genomic mutation profile from CCLE database

CCLE_expression.csv - Gene expression profile from CCLE database

CCLE_RRBS_TSS_1kb_20180614.txt - DNA methylation profile from CCLE database

The three types of raw data genomic mutation file, gene expression file and DNA methylation file can be downloaded from CCLE database or from our provided Cloud Server.

After data preprocessed, the three following preprocessed files will be in located in data folder.

genomic_mutation_34673_demap_features.csv -- genomic mutation matrix where each column denotes mutation locus and each row denotes a cell line

genomic_expression_561celllines_697genes_demap_features.csv -- gene expression matrix where each column denotes a coding gene and each row denotes a cell line

genomic_methylation_561celllines_808genes_demap_features.csv -- DNA methylation matrix where each column denotes a methylation locus and each row denotes a cell line

We recommend to start from the preprocessed data. Please note that each preprocessed file is in csv format, of which the column and row name are provided to speficy mutation location, gene name, methylation location and corresponding Cell line.

Step 2: Drug feature representation

Each drug in our study will be represented as a graph containing nodes and edges. From the GDSC database, we collected 223 drugs that have unique Pubchem ids. Note that a drug under different screening condition (different GDSC drug id) may share the same Pubchem id. Here, we used deepchem library for extracting node features and gragh of a drug. The node feature (75 dimension) corresponds to a stom in within a drug, which includes atom type, degree and hybridization, etc.

We recorded three types of features in a list as following

drug_feat = [node_feature, adj_list, degree_list]
node_feature - features of all atoms within a drug with size (nb_atom, 75)
adj_list - adjacent list of all atoms within a drug. It denotes the all the neighboring atoms indexs
degree_list - degree list of all atoms within a drug. It denotes the number of neighboring atoms 

The above feature list will be further compressed as pubchem_id.hkl using hickle library.

Please note that we provided the extracted features of 223 drugs from GDSC database, just unzip the drug_graph_feat.zip file in data/GDSC folder

Step 3: DeepCDR model training and testing

Here, we provide both DeepCDR regression and classification model here.

DeepCDR regression model

python run_DeepCDR.py -gpu_id [gpu_id] -use_mut [use_mut] -use_gexp [use_gexp] -use_methy [use_methy] 
[gpu_id] - set GPU card id (default:0)
[use_mut] - whether use genomic mutation data (default: True)
[use_gexp] - whether use gene expression data (default: True)
[use_methy] - whether use DNA methylation data (default: True)

One can run python run_DeepCDR.py -gpu_id 0 -use_mut True -use_gexp True -use_methy True to implement the DeepCDR regression model.

The trained model will be saved in data/checkpoint folder. The overall Pearson's correlation will be calculated.

DeepCDR classification model

python run_DeepCDR_classify.py -gpu_id [gpu_id] -use_mut [use_mut] -use_gexp [use_gexp] -use_methy [use_methy] 
[gpu_id] - set GPU card id (default:0)
[use_mut] - whether use genomic mutation data (default: True)
[use_gexp] - whether use gene expression data (default: True)
[use_methy] - whether use DNA methylation data (default: True)

One can run python run_DeepCDR_classify.py -gpu_id 0 -use_mut True -use_gexp True -use_methy True to implement the DeepCDR lassification model.

The trained model will be saved in data/checkpoint folder. The overall AUC and auPRn will be calculated.

External patient data

We also provided the external patient data downloaded from Firehose Broad GDAC. The patient data were preprocessed the same way as cell line data. The preprocessed data can be downloaded from our Server.

The preprocessed data contain three important files:

mut.csv - Genomic mutation profile of patients

expr.csv - Gene expression profile of patients

methy.csv - DNA methylation profile of patients

Note that the preprocessed patient data (csv format) have exact the same columns names as the three cell line data (genomic_mutation_34673_demap_features.csv, genomic_expression_561celllines_697genes_demap_features.csv, genomic_methylation_561celllines_808genes_demap_features.csv). The only difference is that the row name of patient data were replaced with patient unique barcode instead of cell line name.

Such format-consistent data is easy for external evaluation by repacing the cell line data with patient data.

Predicted missing data

As GDSC database only measured IC50 of part cell line and drug paires. We applied DeepCDR to predicted the missing IC50 values in GDSC database. The predicted results can be find at data/Missing_data_pre/records_pre_all.txt. Each record represents a predicted drug and cell line pair. The records were sorted by the predicted median IC50 values of a drug (see Fig.2E).

Contact

If you have any question regard our code or data, please do not hesitate to open a issue or directly contact me ([email protected])

Cite

If you used our work in your research, please consider citing our paper

Qiao Liu, Zhiqiang Hu, Rui Jiang, Mu Zhou, DeepCDR: a hybrid graph convolutional network for predicting cancer drug response, Bioinformatics, 2020, 36(2):i911-i918.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Owner
Qiao Liu
Qiao Liu
PyTorch code for the "Deep Neural Networks with Box Convolutions" paper

Box Convolution Layer for ConvNets Single-box-conv network (from `examples/mnist.py`) learns patterns on MNIST What This Is This is a PyTorch implemen

Egor Burkov 515 Dec 18, 2022
Pipeline for employing a Lightweight deep learning models for LOW-power systems

PL-LOW A high-performance deep learning model lightweight pipeline that gradually lightens deep neural networks in order to utilize high-performance d

POSTECH Data Intelligence Lab 9 Aug 13, 2022
[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search The official implementation of the paper LightTra

Multimedia Research 290 Dec 24, 2022
Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Segformer - Pytorch Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch. Install $ pip install segformer-pytorch

Phil Wang 208 Dec 25, 2022
Tutorials and implementations for "Self-normalizing networks"

Self-Normalizing Networks Tutorials and implementations for "Self-normalizing networks"(SNNs) as suggested by Klambauer et al. (arXiv pre-print). Vers

Institute of Bioinformatics, Johannes Kepler University Linz 1.6k Jan 07, 2023
Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks

Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks - Official Project Page This repository contains the code develope

Amirsina Torfi 1.7k Dec 18, 2022
Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"

Light Field Networks Project Page | Paper | Data | Pretrained Models Vincent Sitzmann*, Semon Rezchikov*, William Freeman, Joshua Tenenbaum, Frédo Dur

Vincent Sitzmann 130 Dec 29, 2022
Implementation of momentum^2 teacher

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning Requirements All experiments are done with python3.6, torch

jemmy li 121 Sep 26, 2022
Efficient training of deep recommenders on cloud.

HybridBackend Introduction HybridBackend is a training framework for deep recommenders which bridges the gap between evolving cloud infrastructure and

Alibaba 111 Dec 23, 2022
EsViT: Efficient self-supervised Vision Transformers

Efficient Self-Supervised Vision Transformers (EsViT) PyTorch implementation for EsViT, built with two techniques: A multi-stage Transformer architect

Microsoft 352 Dec 25, 2022
Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Jiaxi Jiang 282 Jan 02, 2023
Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

Cheng Zhang 149 Jan 08, 2023
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

16 Dec 14, 2022
Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation

FCN_MSCOCO_Food_Segmentation Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation Input data: [http://mscoco.org/dataset/#ove

Alexander Kalinovsky 11 Jan 08, 2019
PyTorch implementation of the TTC algorithm

Trust-the-Critics This repository is a PyTorch implementation of the TTC algorithm and the WGAN misalignment experiments presented in Trust the Critic

0 Nov 29, 2021
Ipython notebook presentations for getting starting with basic programming, statistics and machine learning techniques

Data Science 45-min Intros Every week*, our data science team @Gnip (aka @TwitterBoulder) gets together for about 50 minutes to learn something. While

Scott Hendrickson 1.6k Dec 31, 2022
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.7k Jan 09, 2023
An end-to-end library for editing and rendering motion of 3D characters with deep learning [SIGGRAPH 2020]

Deep-motion-editing This library provides fundamental and advanced functions to work with 3D character animation in deep learning with Pytorch. The co

1.2k Dec 29, 2022
Implementation of "Fast and Flexible Temporal Point Processes with Triangular Maps" (Oral @ NeurIPS 2020)

Fast and Flexible Temporal Point Processes with Triangular Maps This repository includes a reference implementation of the algorithms described in "Fa

Oleksandr Shchur 20 Dec 02, 2022
Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, L

3 Dec 02, 2022