Joint learning of images and text via maximization of mutual information

Last update: Dec 22, 2022

Related tags

Overview

mutual_info_img_txt

Joint learning of images and text via maximization of mutual information.

This repository incorporates the algorithms presented in
Ruizhi Liao, Daniel Moyer, Miriam Cha, Keegan Quigley, Seth Berkowitz, Steven Horng, Polina Golland, William M Wells. Multimodal Representation Learning via Maximization of Local Mutual Information. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2021.

This repo is a work-in-progress. As of now, we have released the code for joint representation learning of images and text by maximizing the mutual information between the feature embeddings of the two modalities. We demonstrate its application in learning from chest radiographs and radiology reports.

Instructions

Conda environment

Set up the conda environment using conda_environment.yml:

conda env create -f conda_environment.yml

BERT

Download the pre-trained BERT model, tokenizer, etc. from Dropbox. You should download the folder bert_pretrain_all_notes_150000 that contains seven files. The path to bert_pretrain_all_notes_150000 should be passed to --bert_pretrained_dir.

Model training

Train the model in an unsupervised fashion, i.e., optimizing Eq (2):

python train_img_txt.py

When you run model training for the first time, it may take a while to tokenize the text. Afterwards, this process won't be repeated and the tokenized data will be saved for reuse.

Notes on Data

MIMIC-CXR

We have experimented this algorithm on MIMIC-CXR, which is a large publicly available dataset of chest x-ray images with free-text radiology reports. The dataset contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA.

Example data

We provide 16 example image-text pairs to test the code, listed in training_chexpert_mini.csv.

Contact

Ruizhi (Ray) Liao: ruizhi [at] mit.edu

Joint learning of images and text via maximization of mutual information

Related tags

Overview

mutual_info_img_txt

Instructions

Conda environment

BERT

Model training

Notes on Data

MIMIC-CXR

Example data

Contact

Owner

Ruizhi Liao

Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Simple machine learning library / 簡單易用的機器學習套件

Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"

This tutorial aims to learn the basics of deep learning by hands, and master the basics through combination of lectures and exercises

Scheduling BilinearRewards

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

DeepFill v1/v2 with Contextual Attention and Gated Convolution, CVPR 2018, and ICCV 2019 Oral

Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

face property detection pytorch

TensorFlow Tutorials with YouTube Videos

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

TakeInfoatNistforICS - Take Information in NIST NVD for ICS

A Kaggle competition: discriminate gender based on handwriting

Implementation of our recent paper, WOOD: Wasserstein-based Out-of-Distribution Detection.

Pipeline code for Sequential-GAM(Genome Architecture Mapping).

Key information extraction from invoice document with Graph Convolution Network

Code for "The Box Size Confidence Bias Harms Your Object Detector"