Official implementation of the method ContIG, for self-supervised learning from medical imaging with genomics

Related tags

Deep LearningContIG
Overview

ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics

This is the code implementation of the paper "ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics".

If you find this repository useful, please consider citing our paper in your work:

@misc{contig2021,
      title={ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics}, 
      author={Aiham Taleb and Matthias Kirchler and Remo Monti and Christoph Lippert},
      year={2021},
      eprint={2111.13424},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

To run the experiments, you will have to have access to UK Biobank data (requires application) and will need to set up the data modalities properly.

We handle the paths to different external files with the paths.toml. Model checkpoints are stored in CHECKPOINTS_BASE_PATH (='checkpoints' by default). For some parts, we use plink and plink2 software, which you can download from here and here. Unzip and set the corresponding paths in the paths.toml file.

Python

Install the dependencies via

conda env create --file environment.yml

Setting up image data

See image_preprocessing for the code. We first use resize.py to find the retinal fundus circle, crop to that part of the image, and then filter out the darkest and brightest images with filtering_images.py.

After preprocessing the images, make sure to set BASE_IMG in paths.toml to the directory that contains the directories {left|right}/512_{left|right}/processed/.

Ancestry prediction

We only included individuals that were genetically most likely to be of european ancestry. We used the genotype-based prediction pipeline GenoPred; see documentation on the site, and put the path to the output (a .model_pred file in tsv format) into the ANCESTRY variable in paths.toml.

This ancestry prediction can also be replaced by the UKB variable 22006. In this case, create a tsv file with two columns, IID and EUR; set EUR = 1 for caucasians and EUR = 0 for others, and point the ANCESTRY variable in paths.toml to this file. Explicit ancestry prediction and the caucasian variable are mostly identical, but our ancestry prediction is a little more lenient and includes a few more individuals.

Setting up genetic data

We use three different genetic modalities in the paper.

Setting up Raw SNPs

Raw SNPs work mostly without preprocessing and use the basic microarray data from UKB. Make sure to set the BASE_GEN path in paths.toml to the directory that contains all the bed/bim/fam files from the UKB.

Setting up Polygenic Scores

PGS requires the imputed data. See the pgs directory for a reference to set everything up. Make sure to update the BASE_PGS to point to the output directory from that. We also include a list of scores used in the main paper.

Setting up Burden Scores

Burden scores are computed using the whole exome sequencing release from the UKB. We used faatpipe to preprocess this data; see there for details. Update the BASE_BURDEN variable in paths.toml to include the results (should point to a directory with combined_burdens_colnames.txt, combined_burdens_iid.txt and combined_burdens.h5).

Setting up phenotypic UKB data

Point the UKB_PHENO_FILE variable in paths.toml to the full phenotype csv file from the UKB data release and run export_card() from data.data_ukb.py to preprocess the data (only needs to be run once; there may be a bug with pandas >= 1.3 on some systems, so consider using pandas = 1.2.5 for this step).

You can ignore the BLOOD_BIOMARKERS variable, since it's not used in any of the experiments.

Setting up downstream tasks

Download and unzip the downstream tasks from PALM, RFMiD and APTOS and point the {PALM|RFMID|APTOS}_PATH variables in paths.toml correspondingly.

UKB downstream tasks are set up with the main UKB set above.

Training self-supervised models

ContIG

In order to train models with our method ContIG, use the script train_contig.py. In this script, it is possible to set many of the constants used in training, such as IMG_SIZE, BATCH_SIZE, LR, CM_EMBEDDING_SIZE, GENETICS_MODALITY and many others. We provide default values at the beginning of this script, which we use in our reported values. Please make sure to set the paths to datasets in paths.toml beforehand.

Baseline models

In order to train the baseline models, each script is named after the algorithm: SimCLR simclr.py, NNCLR nnclr.py, Simsiam simsiam.py, Barlow Twins barlow_twins.py, and BYOL byol.py

Each of these scripts allow for setting all the relevant hyper-parameters for training these baselines, such as max_epochs, PROJECTION_DIM, TEMPERATURE, and others. Please make sure to set the paths to datasets in paths.toml beforehand.

Evaluating Models

To fine-tune (=train) the models on downstream tasks, the following scripts are the starting points:

  • For APTOS Retinopathy detection: use aptos_diabetic_retinopathy.py
  • For RFMiD Multi-Disease classification: use rfmid_retinal_disease_classification.py
  • For PALM Myopia Segmentation: use palm_myopia_segmentation.py
  • For UK Biobank Cardiovascular discrete risk factors classification: use ukb_covariate_classification.py
  • For UK Biobank Cardiovascular continuous risk factors prediction (regression): use ukb_covariate_prediction.py

Each of the above scripts defines its hyper-parameters at the beginning of the respective files. A common variable however is CHECKPOINT_PATH, whose default value is None. If set to None, this means to train the model from scratch without loading any pretrained checkpoint. Otherwise, it loads the encoder weights from pretrained models.

Running explanations

Global explanations

Global explanations are implemented in feature_explanations.py. See the final_plots function for an example to create explanations with specific models.

Local explanations

Local explanations are implemented in local_explanations.py. Individuals for which to create explanations can be set with the INDIVIDUALS variable. See the final_plots function for an example to create explanations with specific models.

Running the GWAS

The GWAS is implemented in downstream_gwas.py. You can specify models for which to run the GWAS in the WEIGHT_PATHS dict and then run the run_all_gwas function to iterate over this dict.

Owner
Digital Health & Machine Learning
Digital Health & Machine Learning
SEC'21: Sparse Bitmap Compression for Memory-Efficient Training onthe Edge

Training Deep Learning Models on The Edge Training on the Edge enables continuous learning from new data for deployed neural networks on memory-constr

Brown University Scale Lab 4 Nov 18, 2022
ArtEmis: Affective Language for Art

ArtEmis: Affective Language for Art Created by Panos Achlioptas, Maks Ovsjanikov, Kilichbek Haydarov, Mohamed Elhoseiny, Leonidas J. Guibas Introducti

Panos 268 Dec 12, 2022
Unofficial implementation of Fast-SCNN: Fast Semantic Segmentation Network

Fast-SCNN: Fast Semantic Segmentation Network Unofficial implementation of the model architecture of Fast-SCNN. Real-time Semantic Segmentation and mo

Philip Popien 69 Aug 11, 2022
Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

1 Jan 23, 2022
Specification language for generating Generalized Linear Models (with or without mixed effects) from conceptual models

tisane Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships TL;DR: Analysts can use Tisane to author gener

Eunice Jun 11 Nov 15, 2022
BraTs-VNet - BraTS(Brain Tumour Segmentation) using V-Net

BraTS(Brain Tumour Segmentation) using V-Net This project is an approach to dete

Rituraj Dutta 7 Nov 27, 2022
MvtecAD unsupervised Anomaly Detection

MvtecAD unsupervised Anomaly Detection This respository is the unofficial implementations of DFR: Deep Feature Reconstruction for Unsupervised Anomaly

0 Feb 25, 2022
Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets.

Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets. Introduction We propose our dataloader API for loading and

1 Nov 19, 2021
Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Sensor-Guided Optical Flow Demo code for "Sensor-Guided Optical Flow", ICCV 2021 This code is provided to replicate results with flow hints obtained f

10 Mar 16, 2022
A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

faceswap-GAN Adding Adversarial loss and perceptual loss (VGGface) to deepfakes'(reddit user) auto-encoder architecture. Updates Date Update 2018-08-2

3.2k Dec 30, 2022
The challenge for Quantum Coalition Hackathon 2021

Qchack 2021 Google Challenge This is a challenge for the brave 2021 qchack.io participants. Instructions Hello, intrepid qchacker, welcome to the G|o

quantumlib 18 May 04, 2022
Python implementation of the multistate Bennett acceptance ratio (MBAR)

pymbar Python implementation of the multistate Bennett acceptance ratio (MBAR) method for estimating expectations and free energy differences from equ

Chodera lab // Memorial Sloan Kettering Cancer Center 169 Dec 02, 2022
A Keras implementation of YOLOv3 (Tensorflow backend)

keras-yolo3 Introduction A Keras implementation of YOLOv3 (Tensorflow backend) inspired by allanzelener/YAD2K. Quick Start Download YOLOv3 weights fro

7.1k Jan 03, 2023
Neural Scene Flow Prior (NeurIPS 2021 spotlight)

Neural Scene Flow Prior Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey Will appear on Thirty-fifth Conference on Neural Information Processing Syste

Lilac Lee 85 Jan 03, 2023
中文语音识别系列,读者可以借助它快速训练属于自己的中文语音识别模型,或直接使用预训练模型测试效果。

MASR中文语音识别(pytorch版) 开箱即用 自行训练 使用与训练分离(增量训练) 识别率高 说明:因为每个人电脑机器不同,而且有些安装包安装起来比较麻烦,强烈建议直接用我编译好的docker环境跑 目前docker基础环境为ubuntu-cuda10.1-cudnn7-pytorch1.6.

发送小信号 180 Dec 17, 2022
Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

VFedPCA+VFedAKPCA This is the official source code for the Paper: Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-

John 9 Sep 18, 2022
SOLOv2 on onnx & tensorRT

SOLOv2.tensorRT: NOTE: code based on WXinlong/SOLO add support to TensorRT inference onnxruntime tensorRT full_dims and dynamic shape postprocess with

47 Nov 26, 2022
Resilience from Diversity: Population-based approach to harden models against adversarial attacks

Resilience from Diversity: Population-based approach to harden models against adversarial attacks Requirements To install requirements: pip install -r

0 Nov 23, 2021
Deep Learning for Human Part Discovery in Images - Chainer implementation

Deep Learning for Human Part Discovery in Images - Chainer implementation NOTE: This is not official implementation. Original paper is Deep Learning f

Shintaro Shiba 63 Sep 25, 2022
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

piglet PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like

Rowan Zellers 51 Oct 08, 2022