The easiest tool for extracting radiomics features and training ML models on them.

Last update: Aug 04, 2022

Overview

Simple pipeline for experimenting with radiomics features

Installation

git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git
cd classrad
pip install -e .

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

df = pd.read_csv(table_dir / "paths.csv")
extractor = FeatureExtractor(
    df=df,
    out_path=(table_dir / "features.csv"),
    image_col="img_path",
    mask_col="seg_path",
    verbose=True,
)
extractor.extract_features()

Create a dataset from the features table

feature_df = pd.read_csv(table_dir / "features.csv")
data = Dataset(
    dataframe=feature_df,
    features=feature_cols,
    target=label_col="Hydronephrosis",
    task_name="Hydronephrosis detection"
)
data.cross_validation_split_test_from_column(
    column_name="cohort", test_value="control"
)

Select classifiers to compare

classifier_names = [
    "Gaussian Process Classifier",
    "Logistic Regression",
    "SVM",
    "Random Forest",
    "XGBoost",
]
classifiers = [MLClassifier(name) for name in classifier_names]

Create an evaluator to train and evaluate selected classifiers

evaluator = Evaluator(dataset=data, models=classifiers)
evaluator.evaluate_cross_validation()
evaluator.boxplot_by_class()
evaluator.plot_all_cross_validation()
evaluator.plot_test()

Comments

Preprocessing features fails during machine learning

Describe the bug

Trying to use Machine Learning in the self-hosted webapp, as well as in example_WORC.ipynb fails.

Steps/Code to Reproduce

import pandas as pd
from pathlib import Path
from autorad.external.download_WORC import download_WORCDatabase

# Set where we will save our data and results
base_dir = Path.cwd() / "autorad_tutorial"
data_dir = base_dir / "data"
result_dir = base_dir / "results"
data_dir.mkdir(exist_ok=True, parents=True)
result_dir.mkdir(exist_ok=True, parents=True)

%load_ext autoreload
%autoreload 2

download data (it may take a few minutes)
download_WORCDatabase(
dataset="Desmoid",
data_folder=data_dir,
n_subjects=100,
)

from autorad.utils.preprocessing import get_paths_with_separate_folder_per_case

# create a table with all the paths
paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
paths_df.sample(5)


from autorad.data.dataset import ImageDataset
from autorad.feature_extraction.extractor import FeatureExtractor
import logging

logging.getLogger().setLevel(logging.CRITICAL)

image_dataset = ImageDataset(
    paths_df,
    ID_colname="ID",
    root_dir=data_dir,
)

# Let's take a look at the data, plotting random 10 cases
image_dataset.plot_examples(n=10, window=None)

extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")
feature_df = extractor.run()

feature_df.head()

label_df = pd.read_csv(data_dir / "labels.csv")
label_df.sample(5)

from autorad.data.dataset import FeatureDataset

merged_feature_df = feature_df.merge(label_df, left_on="ID",
    right_on="patient_ID", how="left")
feature_dataset = FeatureDataset(
    merged_feature_df,
    target="diagnosis",
    ID_colname="ID"
)

splits_path = result_dir / "splits.json"
feature_dataset.split(method="train_val_test", save_path=splits_path)

from autorad.models.classifier import MLClassifier
from autorad.training.trainer import Trainer

models = MLClassifier.initialize_default_sklearn_models()
print(models)

trainer = Trainer(
    dataset=feature_dataset,
    models=models,
    result_dir=result_dir,
    experiment_name="Fibromatosis_vs_sarcoma_classification",
)
trainer.run_auto_preprocessing(
        selection_methods=["boruta"],
        oversampling=False,
        )

Expected Results

Initialising the trainer and running preprocessing on the features

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [15], in <cell line: 7>()
      1 trainer = Trainer(
      2     dataset=feature_dataset,
      3     models=models,
      4     result_dir=result_dir,
      5     experiment_name="Fibromatosis_vs_sarcoma_classification",
      6 )
----> 7 trainer.run_auto_preprocessing(
      8         selection_methods=["boruta"],
      9         oversampling=False,
     10         )

File ~/AutoRadiomics/autorad/training/trainer.py:78, in Trainer.run_auto_preprocessing(self, oversampling, selection_methods)
     70 preprocessor = Preprocessor(
     71     normalize=True,
     72     feature_selection_method=selection_method,
     73     oversampling_method=oversampling_method,
     74 )
     75 try:
     76     preprocessed[selection_method][
     77         oversampling_method
---> 78     ] = preprocessor.fit_transform(self.dataset.data)
     79 except AssertionError:
     80     log.error(
     81         f"Preprocessing with {selection_method} and {oversampling_method} failed."
     82     )

File ~/AutoRadiomics/autorad/preprocessing/preprocessor.py:66, in Preprocessor.fit_transform(self, data)
     64 result_y = {}
     65 all_features = X.train.columns.tolist()
---> 66 X_train_trans, y_train_trans = self.pipeline.fit_transform(
     67     X.train, y.train
     68 )
     69 self.selected_features = self.pipeline["select"].selected_features(
     70     column_names=all_features
     71 )
     72 result_X["train"] = pd.DataFrame(
     73     X_train_trans, columns=self.selected_features
     74 )

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/pipeline.py:434, in Pipeline.fit_transform(self, X, y, **fit_params)
    432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
    433 if hasattr(last_step, "fit_transform"):
--> 434     return last_step.fit_transform(Xt, y, **fit_params_last_step)
    435 else:
    436     return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)

File ~/AutoRadiomics/autorad/feature_selection/selector.py:47, in CoreSelector.fit_transform(self, X, y)
     44 def fit_transform(
     45     self, X: np.ndarray, y: np.ndarray
     46 ) -> tuple[np.ndarray, np.ndarray]:
---> 47     self.fit(X, y)
     48     return X[:, self.selected_columns], y

File ~/AutoRadiomics/autorad/feature_selection/selector.py:124, in BorutaSelector.fit(self, X, y, verbose)
    122 with warnings.catch_warnings():
    123     warnings.simplefilter("ignore")
--> 124     model.fit(X, y)
    125 self.selected_columns = np.where(model.support_)[0].tolist()
    126 if not self.selected_columns:

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:201, in BorutaPy.fit(self, X, y)
    188 def fit(self, X, y):
    189     """
    190     Fits the Boruta feature selection with the provided estimator.
    191 
   (...)
    198         The target values.
    199     """
--> 201     return self._fit(X, y)

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:251, in BorutaPy._fit(self, X, y)
    249 def _fit(self, X, y):
    250     # check input params
--> 251     self._check_params(X, y)
    252     self.random_state = check_random_state(self.random_state)
    253     # setup variables for Boruta

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:517, in BorutaPy._check_params(self, X, y)
    513 """
    514 Check hyperparameters as well as X and y before proceeding with fit.
    515 """
    516 # check X and y are consistent len, X is Array and y is column
--> 517 X, y = check_X_y(X, y)
    518 if self.perc <= 0 or self.perc > 100:
    519     raise ValueError('The percentile should be between 0 and 100.')

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:964, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    961 if y is None:
    962     raise ValueError("y cannot be None")
--> 964 X = check_array(
    965     X,
    966     accept_sparse=accept_sparse,
    967     accept_large_sparse=accept_large_sparse,
    968     dtype=dtype,
    969     order=order,
    970     copy=copy,
    971     force_all_finite=force_all_finite,
    972     ensure_2d=ensure_2d,
    973     allow_nd=allow_nd,
    974     ensure_min_samples=ensure_min_samples,
    975     ensure_min_features=ensure_min_features,
    976     estimator=estimator,
    977 )
    979 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)
    981 check_consistent_length(X, y)

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:746, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    744         array = array.astype(dtype, casting="unsafe", copy=False)
    745     else:
--> 746         array = np.asarray(array, order=order, dtype=dtype)
    747 except ComplexWarning as complex_warning:
    748     raise ValueError(
    749         "Complex data not supported\n{}\n".format(array)
    750     ) from complex_warning

ValueError: could not broadcast input array from shape (60,1015) into shape (60,)

opened by wagon-master 3

BUG: Time and memory inefficient concating in pandas on every case.

In the feature extraction, we concat a pd.DataFrame for every case. AFAIK this construction of a pd.DataFrame leads to a new memory allocation (and copying) every time, which is highly memory inefficient. Especially, when parallelized on many CPUs, combined with the already memory intensive forking in joblib this can lead to OOM-Events (and is slow of course). Wouldn't it be more convenient to return only the feature set, that is currently processed. https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L109-L115 These are subsequently collected in results anyways: https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L135-L144

opened by laqua-stack 2
Feature/add inference mlflow
Major changes:

fixed training with autologging of training parameters, preprocessor and classifier in MLFlow

webapp: added Predict subpage for inference on a single case, giving out class probability and Shap explanation

webapp: moved all steps into subpages

webapp: added Getting started in the landing page

Fixes:

webapp: fixed extraction params discarding Feature Names selected from Feature Classes
opened by pwoznicki 1

example_WORC.ipynb not being up to date with the repository

Describe the bug

In example_WORC.ipynb there are function calls that do not work due to code in the repository being changed while the example_WORC.ipynb code wasn't updated to reflect those changes

Steps/Code to Reproduce

import pandas as pd
from pathlib import Path
from autorad.external.download_WORC import download_WORCDatabase

# Set where we will save our data and results
base_dir = Path.cwd() / "autorad_tutorial"
data_dir = base_dir / "data"
result_dir = base_dir / "results"
data_dir.mkdir(exist_ok=True, parents=True)
result_dir.mkdir(exist_ok=True, parents=True)

%load_ext autoreload
%autoreload 2

download data (it may take a few minutes)
download_WORCDatabase(
dataset="Desmoid",
data_folder=data_dir,
n_subjects=100,
)



from autorad.data.utils import get_paths_with_separate_folder_per_case  # 1

# create a table with all the paths
paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
paths_df.sample(5)


from autorad.data.dataset import ImageDataset
from autorad.feature_extraction.extractor import FeatureExtractor
import logging

logging.getLogger().setLevel(logging.CRITICAL)

image_dataset = ImageDataset(
    paths_df,
    ID_colname="ID",
    root_dir=data_dir,
)

# Let's take a look at the data, plotting random 10 cases
image_dataset.plot_examples(n=10, window=None)

extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") # 2
feature_df = extractor.run()

Expected Results

1: Importing the function get_paths_with_separate_folder_per_case

2: Using default_MR.yaml as value for extraction_params

Actual Results

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 from autorad.data.utils import get_paths_with_separate_folder_per_case
      3 # create a table with all the paths
      4 paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)

ModuleNotFoundError: No module named 'autorad.data.utils'

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml")
      2 feature_df = extractor.run()

File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:41, in FeatureExtractor.__init__(self, dataset, feature_set, extraction_params, n_jobs)
     39 self.dataset = dataset
     40 self.feature_set = feature_set
---> 41 self.extraction_params = self._get_extraction_param_path(
     42     extraction_params
     43 )
     44 log.info(f"Using extraction params from {self.extraction_params}")
     45 self.n_jobs = set_n_jobs(n_jobs)

File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:55, in FeatureExtractor._get_extraction_param_path(self, extraction_params)
     53     result = default_extraction_param_dir / extraction_params
     54 else:
---> 55     raise ValueError(
     56         f"Extraction parameter file {extraction_params} not found."
     57     )
     58 return result

ValueError: Extraction parameter file default_MR.yaml not found.

Fix

1: change from autorad.data.utils to from autorad.utils.preprocessing 2: change extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") to extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")

opened by wagon-master 1

Bugfix/refactor
New features:

log feature dataset and splits in MLFlow

update docs & add getting-started

Fixes:

fix evaluation in the web app

fix docs build in readthedocss
opened by pwoznicki 0
Support various readers (Nibabel, ITK)

Currently we use Nibabel for loading images. It works only for Nifti images, but a user may want to load a DICOM image, without converting it to Nifti.

Consider using MONAI LoadImage() function that provides a common interface for loading both Nifti and DICOM images.
enhancement

opened by pwoznicki 0

Releases(v0.2.2)

v0.2.2(Jul 30, 2022)

Includes fixes for the web application, fixed bugs in spatial util functions, and function for voxel-based extraction
Source code(tar.gz)
Source code(zip)

Owner

Piotr Woźnicki

Recently graduated medical doctor, working on medical image analysis.

GitHub Repository

This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Vision-Transformer-Multiprocess-DistributedDataParallel-Apex Introduction This project uses ViT to perform image classification tasks on DATA set CIFA

3 Jun 03, 2022

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Towards Understanding and Mitigating Social Biases in Language Models This repo contains code and data for evaluating and mitigating bias from generat

42 Jan 03, 2023

Toward Multimodal Image-to-Image Translation

BicycleGAN Project Page | Paper | Video Pytorch implementation for multimodal image-to-image translation. For example, given the same night image, our

1.4k Dec 22, 2022

"Graph Neural Controlled Differential Equations for Traffic Forecasting", AAAI 2022

Graph Neural Controlled Differential Equations for Traffic Forecasting Setup Python environment for STG-NCDE Install python environment $ conda env cr

55 Dec 28, 2022

This repository provides an efficient PyTorch-based library for training deep models.

s3sec Test AWS S3 buckets for read/write/delete access This tool was developed to quickly test a list of s3 buckets for public read, write and delete

123 Jan 05, 2023

Direct application of DALLE-2 to video synthesis, using factored space-time Unet and Transformers

DALLE2 Video (wip) ** only to be built after DALLE2 image is done and replicated, and the importance of the prior network is validated ** Direct appli

105 May 15, 2022

Its a Plant Leaf Disease Detection System based on Machine Learning.

My_Project_Code Its a Plant Leaf Disease Detection System based on Machine Learning. I have used Tomato Leaves Dataset from kaggle. This system detect

3 Jun 15, 2022

Robust Consistent Video Depth Estimation

[CVPR 2021] Robust Consistent Video Depth Estimation This repository contains Python and C++ implementation of Robust Consistent Video Depth, as descr

213 Dec 17, 2022

Train SN-GAN with AdaBelief

SNGAN-AdaBelief Train a state-of-the-art spectral normalization GAN with AdaBelief https://github.com/juntang-zhuang/Adabelief-Optimizer Acknowledgeme

10 Jun 11, 2022

Code for our paper "Multi-scale Guided Attention for Medical Image Segmentation"

Medical Image Segmentation with Guided Attention This repository contains the code of our paper: "'Multi-scale self-guided attention for medical image

394 Dec 28, 2022

A really easy-to-use and powerful sudoku solver.

SodukuSolver This is a really useful sudoku solver with a Qt gui. USAGE Enter the numbers in and click "RUN"! If you don't want to wait, simply press

11 Jun 02, 2022

SMCA replication There are no extra compiled components in SMCA DETR and package dependencies are minimal

Usage There are no extra compiled components in SMCA DETR and package dependencies are minimal, so the code is very simple to use. We provide instruct

22 May 06, 2022

Expand human face editing via Global Direction of StyleCLIP, especially to maintain similarity during editing.

Oh-My-Face This project is based on StyleCLIP, RIFE, and encoder4editing, which aims to expand human face editing via Global Direction of StyleCLIP, e

51 Nov 17, 2022

Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC.

Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC. Para los Laboratorios de la materia, vamos a utilizar el len

18 Dec 12, 2022

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

15 Oct 14, 2022

Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transformers to Guarantee TopologyPreservation in Segmentations"

TEDS-Net Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transfo

14 Jan 04, 2023

The easiest tool for extracting radiomics features and training ML models on them.

Related tags

Overview

Simple pipeline for experimenting with radiomics features

Installation

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

Create a dataset from the features table

Select classifiers to compare

Create an evaluator to train and evaluate selected classifiers

Comments

Preprocessing features fails during machine learning

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

BUG: Time and memory inefficient concating in pandas on every case.

Feature/add inference mlflow

example_WORC.ipynb not being up to date with the repository

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Fix

Bugfix/refactor

Support various readers (Nibabel, ITK)

Releases(v0.2.2)

v0.2.2(Jul 30, 2022)

Owner

Piotr Woźnicki

This project uses ViT to perform image classification tasks on DATA set CIFAR10.

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Toward Multimodal Image-to-Image Translation

"Graph Neural Controlled Differential Equations for Traffic Forecasting", AAAI 2022

This repository provides an efficient PyTorch-based library for training deep models.

Direct application of DALLE-2 to video synthesis, using factored space-time Unet and Transformers

Its a Plant Leaf Disease Detection System based on Machine Learning.

Robust Consistent Video Depth Estimation

Train SN-GAN with AdaBelief

Code for our paper "Multi-scale Guided Attention for Medical Image Segmentation"

A really easy-to-use and powerful sudoku solver.

SMCA replication There are no extra compiled components in SMCA DETR and package dependencies are minimal

Expand human face editing via Global Direction of StyleCLIP, especially to maintain similarity during editing.

Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC.

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transformers to Guarantee TopologyPreservation in Segmentations"

New approach to benchmark VQA models

PolyTrack: Tracking with Bounding Polygons

PartImageNet is a large, high-quality dataset with part segmentation annotations

Learning trajectory representations using self-supervision and programmatic supervision.