The easiest tool for extracting radiomics features and training ML models on them.

Overview


ClassyRadiomics

License CI Build codecov

Simple pipeline for experimenting with radiomics features

Installation

git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git
cd classrad
pip install -e .

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

df = pd.read_csv(table_dir / "paths.csv")
extractor = FeatureExtractor(
    df=df,
    out_path=(table_dir / "features.csv"),
    image_col="img_path",
    mask_col="seg_path",
    verbose=True,
)
extractor.extract_features()

Create a dataset from the features table

feature_df = pd.read_csv(table_dir / "features.csv")
data = Dataset(
    dataframe=feature_df,
    features=feature_cols,
    target=label_col="Hydronephrosis",
    task_name="Hydronephrosis detection"
)
data.cross_validation_split_test_from_column(
    column_name="cohort", test_value="control"
)

Select classifiers to compare

classifier_names = [
    "Gaussian Process Classifier",
    "Logistic Regression",
    "SVM",
    "Random Forest",
    "XGBoost",
]
classifiers = [MLClassifier(name) for name in classifier_names]

Create an evaluator to train and evaluate selected classifiers

evaluator = Evaluator(dataset=data, models=classifiers)
evaluator.evaluate_cross_validation()
evaluator.boxplot_by_class()
evaluator.plot_all_cross_validation()
evaluator.plot_test()
Comments
  • Preprocessing features fails during machine learning

    Preprocessing features fails during machine learning

    Describe the bug

    Trying to use Machine Learning in the self-hosted webapp, as well as in example_WORC.ipynb fails.

    Steps/Code to Reproduce

    import pandas as pd
    from pathlib import Path
    from autorad.external.download_WORC import download_WORCDatabase
    
    # Set where we will save our data and results
    base_dir = Path.cwd() / "autorad_tutorial"
    data_dir = base_dir / "data"
    result_dir = base_dir / "results"
    data_dir.mkdir(exist_ok=True, parents=True)
    result_dir.mkdir(exist_ok=True, parents=True)
    
    %load_ext autoreload
    %autoreload 2
    
    download data (it may take a few minutes)
    download_WORCDatabase(
    dataset="Desmoid",
    data_folder=data_dir,
    n_subjects=100,
    )
    
    from autorad.utils.preprocessing import get_paths_with_separate_folder_per_case
    
    # create a table with all the paths
    paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    paths_df.sample(5)
    
    
    from autorad.data.dataset import ImageDataset
    from autorad.feature_extraction.extractor import FeatureExtractor
    import logging
    
    logging.getLogger().setLevel(logging.CRITICAL)
    
    image_dataset = ImageDataset(
        paths_df,
        ID_colname="ID",
        root_dir=data_dir,
    )
    
    # Let's take a look at the data, plotting random 10 cases
    image_dataset.plot_examples(n=10, window=None)
    
    extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")
    feature_df = extractor.run()
    
    feature_df.head()
    
    label_df = pd.read_csv(data_dir / "labels.csv")
    label_df.sample(5)
    
    from autorad.data.dataset import FeatureDataset
    
    merged_feature_df = feature_df.merge(label_df, left_on="ID",
        right_on="patient_ID", how="left")
    feature_dataset = FeatureDataset(
        merged_feature_df,
        target="diagnosis",
        ID_colname="ID"
    )
    
    splits_path = result_dir / "splits.json"
    feature_dataset.split(method="train_val_test", save_path=splits_path)
    
    from autorad.models.classifier import MLClassifier
    from autorad.training.trainer import Trainer
    
    models = MLClassifier.initialize_default_sklearn_models()
    print(models)
    
    trainer = Trainer(
        dataset=feature_dataset,
        models=models,
        result_dir=result_dir,
        experiment_name="Fibromatosis_vs_sarcoma_classification",
    )
    trainer.run_auto_preprocessing(
            selection_methods=["boruta"],
            oversampling=False,
            )
    
    

    Expected Results

    Initialising the trainer and running preprocessing on the features

    Actual Results

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [15], in <cell line: 7>()
          1 trainer = Trainer(
          2     dataset=feature_dataset,
          3     models=models,
          4     result_dir=result_dir,
          5     experiment_name="Fibromatosis_vs_sarcoma_classification",
          6 )
    ----> 7 trainer.run_auto_preprocessing(
          8         selection_methods=["boruta"],
          9         oversampling=False,
         10         )
    
    File ~/AutoRadiomics/autorad/training/trainer.py:78, in Trainer.run_auto_preprocessing(self, oversampling, selection_methods)
         70 preprocessor = Preprocessor(
         71     normalize=True,
         72     feature_selection_method=selection_method,
         73     oversampling_method=oversampling_method,
         74 )
         75 try:
         76     preprocessed[selection_method][
         77         oversampling_method
    ---> 78     ] = preprocessor.fit_transform(self.dataset.data)
         79 except AssertionError:
         80     log.error(
         81         f"Preprocessing with {selection_method} and {oversampling_method} failed."
         82     )
    
    File ~/AutoRadiomics/autorad/preprocessing/preprocessor.py:66, in Preprocessor.fit_transform(self, data)
         64 result_y = {}
         65 all_features = X.train.columns.tolist()
    ---> 66 X_train_trans, y_train_trans = self.pipeline.fit_transform(
         67     X.train, y.train
         68 )
         69 self.selected_features = self.pipeline["select"].selected_features(
         70     column_names=all_features
         71 )
         72 result_X["train"] = pd.DataFrame(
         73     X_train_trans, columns=self.selected_features
         74 )
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/pipeline.py:434, in Pipeline.fit_transform(self, X, y, **fit_params)
        432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
        433 if hasattr(last_step, "fit_transform"):
    --> 434     return last_step.fit_transform(Xt, y, **fit_params_last_step)
        435 else:
        436     return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)
    
    File ~/AutoRadiomics/autorad/feature_selection/selector.py:47, in CoreSelector.fit_transform(self, X, y)
         44 def fit_transform(
         45     self, X: np.ndarray, y: np.ndarray
         46 ) -> tuple[np.ndarray, np.ndarray]:
    ---> 47     self.fit(X, y)
         48     return X[:, self.selected_columns], y
    
    File ~/AutoRadiomics/autorad/feature_selection/selector.py:124, in BorutaSelector.fit(self, X, y, verbose)
        122 with warnings.catch_warnings():
        123     warnings.simplefilter("ignore")
    --> 124     model.fit(X, y)
        125 self.selected_columns = np.where(model.support_)[0].tolist()
        126 if not self.selected_columns:
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:201, in BorutaPy.fit(self, X, y)
        188 def fit(self, X, y):
        189     """
        190     Fits the Boruta feature selection with the provided estimator.
        191 
       (...)
        198         The target values.
        199     """
    --> 201     return self._fit(X, y)
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:251, in BorutaPy._fit(self, X, y)
        249 def _fit(self, X, y):
        250     # check input params
    --> 251     self._check_params(X, y)
        252     self.random_state = check_random_state(self.random_state)
        253     # setup variables for Boruta
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:517, in BorutaPy._check_params(self, X, y)
        513 """
        514 Check hyperparameters as well as X and y before proceeding with fit.
        515 """
        516 # check X and y are consistent len, X is Array and y is column
    --> 517 X, y = check_X_y(X, y)
        518 if self.perc <= 0 or self.perc > 100:
        519     raise ValueError('The percentile should be between 0 and 100.')
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:964, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
        961 if y is None:
        962     raise ValueError("y cannot be None")
    --> 964 X = check_array(
        965     X,
        966     accept_sparse=accept_sparse,
        967     accept_large_sparse=accept_large_sparse,
        968     dtype=dtype,
        969     order=order,
        970     copy=copy,
        971     force_all_finite=force_all_finite,
        972     ensure_2d=ensure_2d,
        973     allow_nd=allow_nd,
        974     ensure_min_samples=ensure_min_samples,
        975     ensure_min_features=ensure_min_features,
        976     estimator=estimator,
        977 )
        979 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)
        981 check_consistent_length(X, y)
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:746, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
        744         array = array.astype(dtype, casting="unsafe", copy=False)
        745     else:
    --> 746         array = np.asarray(array, order=order, dtype=dtype)
        747 except ComplexWarning as complex_warning:
        748     raise ValueError(
        749         "Complex data not supported\n{}\n".format(array)
        750     ) from complex_warning
    
    ValueError: could not broadcast input array from shape (60,1015) into shape (60,)
    
    opened by wagon-master 3
  • BUG: Time and memory inefficient concating in pandas on every case.

    BUG: Time and memory inefficient concating in pandas on every case.

    In the feature extraction, we concat a pd.DataFrame for every case. AFAIK this construction of a pd.DataFrame leads to a new memory allocation (and copying) every time, which is highly memory inefficient. Especially, when parallelized on many CPUs, combined with the already memory intensive forking in joblib this can lead to OOM-Events (and is slow of course). Wouldn't it be more convenient to return only the feature set, that is currently processed. https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L109-L115 These are subsequently collected in results anyways: https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L135-L144

    opened by laqua-stack 2
  • Feature/add inference mlflow

    Feature/add inference mlflow

    Major changes:

    • fixed training with autologging of training parameters, preprocessor and classifier in MLFlow
    • webapp: added Predict subpage for inference on a single case, giving out class probability and Shap explanation
    • webapp: moved all steps into subpages
    • webapp: added Getting started in the landing page

    Fixes:

    • webapp: fixed extraction params discarding Feature Names selected from Feature Classes
    opened by pwoznicki 1
  • example_WORC.ipynb not being up to date with the repository

    example_WORC.ipynb not being up to date with the repository

    Describe the bug

    In example_WORC.ipynb there are function calls that do not work due to code in the repository being changed while the example_WORC.ipynb code wasn't updated to reflect those changes

    Steps/Code to Reproduce

    import pandas as pd
    from pathlib import Path
    from autorad.external.download_WORC import download_WORCDatabase
    
    # Set where we will save our data and results
    base_dir = Path.cwd() / "autorad_tutorial"
    data_dir = base_dir / "data"
    result_dir = base_dir / "results"
    data_dir.mkdir(exist_ok=True, parents=True)
    result_dir.mkdir(exist_ok=True, parents=True)
    
    %load_ext autoreload
    %autoreload 2
    
    download data (it may take a few minutes)
    download_WORCDatabase(
    dataset="Desmoid",
    data_folder=data_dir,
    n_subjects=100,
    )
    
    
    
    from autorad.data.utils import get_paths_with_separate_folder_per_case  # 1
    
    # create a table with all the paths
    paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    paths_df.sample(5)
    
    
    from autorad.data.dataset import ImageDataset
    from autorad.feature_extraction.extractor import FeatureExtractor
    import logging
    
    logging.getLogger().setLevel(logging.CRITICAL)
    
    image_dataset = ImageDataset(
        paths_df,
        ID_colname="ID",
        root_dir=data_dir,
    )
    
    # Let's take a look at the data, plotting random 10 cases
    image_dataset.plot_examples(n=10, window=None)
    
    extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") # 2
    feature_df = extractor.run()
    
    

    Expected Results

    1: Importing the function get_paths_with_separate_folder_per_case

    2: Using default_MR.yaml as value for extraction_params

    Actual Results

    1:

    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    Input In [7], in <cell line: 1>()
    ----> 1 from autorad.data.utils import get_paths_with_separate_folder_per_case
          3 # create a table with all the paths
          4 paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    
    ModuleNotFoundError: No module named 'autorad.data.utils'
    

    2:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [18], in <cell line: 1>()
    ----> 1 extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml")
          2 feature_df = extractor.run()
    
    File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:41, in FeatureExtractor.__init__(self, dataset, feature_set, extraction_params, n_jobs)
         39 self.dataset = dataset
         40 self.feature_set = feature_set
    ---> 41 self.extraction_params = self._get_extraction_param_path(
         42     extraction_params
         43 )
         44 log.info(f"Using extraction params from {self.extraction_params}")
         45 self.n_jobs = set_n_jobs(n_jobs)
    
    File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:55, in FeatureExtractor._get_extraction_param_path(self, extraction_params)
         53     result = default_extraction_param_dir / extraction_params
         54 else:
    ---> 55     raise ValueError(
         56         f"Extraction parameter file {extraction_params} not found."
         57     )
         58 return result
    
    ValueError: Extraction parameter file default_MR.yaml not found.
    

    Fix

    1: change from autorad.data.utils to from autorad.utils.preprocessing 2: change extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") to extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")

    opened by wagon-master 1
  • Bugfix/refactor

    Bugfix/refactor

    New features:

    • log feature dataset and splits in MLFlow
    • update docs & add getting-started

    Fixes:

    • fix evaluation in the web app
    • fix docs build in readthedocss
    opened by pwoznicki 0
  • Support various readers (Nibabel, ITK)

    Support various readers (Nibabel, ITK)

    Currently we use Nibabel for loading images. It works only for Nifti images, but a user may want to load a DICOM image, without converting it to Nifti.

    Consider using MONAI LoadImage() function that provides a common interface for loading both Nifti and DICOM images.

    enhancement 
    opened by pwoznicki 0
Releases(v0.2.2)
  • v0.2.2(Jul 30, 2022)

Owner
Piotr Woźnicki
Recently graduated medical doctor, working on medical image analysis.
Piotr Woźnicki
Code for `BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery`, Neurips 2021

This folder contains the code for 'Scalable Variational Approaches for Bayesian Causal Discovery'. Installation To install, use conda with conda env c

14 Sep 21, 2022
Photo2cartoon - 人像卡通化探索项目 (photo-to-cartoon translation project)

人像卡通化 (Photo to Cartoon) 中文版 | English Version 该项目为小视科技卡通肖像探索项目。您可使用微信扫描下方二维码或搜索“AI卡通秀”小程序体验卡通化效果。

Minivision_AI 3.5k Dec 30, 2022
A graph adversarial learning toolbox based on PyTorch and DGL.

GraphWar: Arms Race in Graph Adversarial Learning NOTE: GraphWar is still in the early stages and the API will likely continue to change. 🚀 Installat

Jintang Li 54 Jan 05, 2023
Robust, modular and efficient implementation of advanced Hamiltonian Monte Carlo algorithms

AdvancedHMC.jl AdvancedHMC.jl provides a robust, modular and efficient implementation of advanced HMC algorithms. An illustrative example for Advanced

The Turing Language 167 Jan 01, 2023
PyTorch-Multi-Style-Transfer - Neural Style and MSG-Net

PyTorch-Style-Transfer This repo provides PyTorch Implementation of MSG-Net (ours) and Neural Style (Gatys et al. CVPR 2016), which has been included

Hang Zhang 906 Jan 04, 2023
Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

marge This repository releases the code for Generating Query Focused Summaries from Query-Free Resources. Please cite the following paper [bib] if you

Yumo Xu 28 Nov 10, 2022
RoIAlign & crop_and_resize for PyTorch

RoIAlign for PyTorch This is a PyTorch version of RoIAlign. This implementation is based on crop_and_resize and supports both forward and backward on

Long Chen 530 Jan 07, 2023
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 04, 2023
ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

ILVR + ADM This is the implementation of ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral). This repository is h

Jooyoung Choi 225 Dec 28, 2022
[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

Xiefan Guo 122 Dec 11, 2022
Who calls the shots? Rethinking Few-Shot Learning for Audio (WASPAA 2021)

rethink-audio-fsl This repo contains the source code for the paper "Who calls the shots? Rethinking Few-Shot Learning for Audio." (WASPAA 2021) Table

Yu Wang 34 Dec 24, 2022
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks This repository contains a TensorFlow implementation of "

Jingwei Zheng 5 Jan 08, 2023
Implementation of RegretNet with Pytorch

Dependencies are Python 3, a recent PyTorch, numpy/scipy, tqdm, future and tensorboard. Plotting with Matplotlib. Implementation of the neural network

Horris zhGu 1 Nov 05, 2021
Bare bones use-case for deploying a containerized web app (built in streamlit) on AWS.

Containerized Streamlit web app This repository is featured in a 3-part series on Deploying web apps with Streamlit, Docker, and AWS. Checkout the blo

Collin Prather 62 Jan 02, 2023
Simple Python project using Opencv and datetime package to recognise faces and log attendance data in a csv file.

Attendance-System-based-on-Facial-recognition-Attendance-data-stored-in-csv-file- Simple Python project using Opencv and datetime package to recognise

3 Aug 09, 2022
An implementation of the methods presented in Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

An implementation of the methods presented in Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

Andrew Jesson 9 Apr 04, 2022
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 3.1k Jan 04, 2023
MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

MASA-SR Official PyTorch implementation of our CVPR2021 paper MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Re

DV Lab 126 Dec 20, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Learning Compatible Embeddings, ICCV 2021

LCE Learning Compatible Embeddings, ICCV 2021 by Qiang Meng, Chixiang Zhang, Xiaoqiang Xu and Feng Zhou Paper: Arxiv We cannot release source codes pu

Qiang Meng 25 Dec 17, 2022