The easiest tool for extracting radiomics features and training ML models on them.

Overview


ClassyRadiomics

License CI Build codecov

Simple pipeline for experimenting with radiomics features

Installation

git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git
cd classrad
pip install -e .

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

df = pd.read_csv(table_dir / "paths.csv")
extractor = FeatureExtractor(
    df=df,
    out_path=(table_dir / "features.csv"),
    image_col="img_path",
    mask_col="seg_path",
    verbose=True,
)
extractor.extract_features()

Create a dataset from the features table

feature_df = pd.read_csv(table_dir / "features.csv")
data = Dataset(
    dataframe=feature_df,
    features=feature_cols,
    target=label_col="Hydronephrosis",
    task_name="Hydronephrosis detection"
)
data.cross_validation_split_test_from_column(
    column_name="cohort", test_value="control"
)

Select classifiers to compare

classifier_names = [
    "Gaussian Process Classifier",
    "Logistic Regression",
    "SVM",
    "Random Forest",
    "XGBoost",
]
classifiers = [MLClassifier(name) for name in classifier_names]

Create an evaluator to train and evaluate selected classifiers

evaluator = Evaluator(dataset=data, models=classifiers)
evaluator.evaluate_cross_validation()
evaluator.boxplot_by_class()
evaluator.plot_all_cross_validation()
evaluator.plot_test()
Comments
  • Preprocessing features fails during machine learning

    Preprocessing features fails during machine learning

    Describe the bug

    Trying to use Machine Learning in the self-hosted webapp, as well as in example_WORC.ipynb fails.

    Steps/Code to Reproduce

    import pandas as pd
    from pathlib import Path
    from autorad.external.download_WORC import download_WORCDatabase
    
    # Set where we will save our data and results
    base_dir = Path.cwd() / "autorad_tutorial"
    data_dir = base_dir / "data"
    result_dir = base_dir / "results"
    data_dir.mkdir(exist_ok=True, parents=True)
    result_dir.mkdir(exist_ok=True, parents=True)
    
    %load_ext autoreload
    %autoreload 2
    
    download data (it may take a few minutes)
    download_WORCDatabase(
    dataset="Desmoid",
    data_folder=data_dir,
    n_subjects=100,
    )
    
    from autorad.utils.preprocessing import get_paths_with_separate_folder_per_case
    
    # create a table with all the paths
    paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    paths_df.sample(5)
    
    
    from autorad.data.dataset import ImageDataset
    from autorad.feature_extraction.extractor import FeatureExtractor
    import logging
    
    logging.getLogger().setLevel(logging.CRITICAL)
    
    image_dataset = ImageDataset(
        paths_df,
        ID_colname="ID",
        root_dir=data_dir,
    )
    
    # Let's take a look at the data, plotting random 10 cases
    image_dataset.plot_examples(n=10, window=None)
    
    extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")
    feature_df = extractor.run()
    
    feature_df.head()
    
    label_df = pd.read_csv(data_dir / "labels.csv")
    label_df.sample(5)
    
    from autorad.data.dataset import FeatureDataset
    
    merged_feature_df = feature_df.merge(label_df, left_on="ID",
        right_on="patient_ID", how="left")
    feature_dataset = FeatureDataset(
        merged_feature_df,
        target="diagnosis",
        ID_colname="ID"
    )
    
    splits_path = result_dir / "splits.json"
    feature_dataset.split(method="train_val_test", save_path=splits_path)
    
    from autorad.models.classifier import MLClassifier
    from autorad.training.trainer import Trainer
    
    models = MLClassifier.initialize_default_sklearn_models()
    print(models)
    
    trainer = Trainer(
        dataset=feature_dataset,
        models=models,
        result_dir=result_dir,
        experiment_name="Fibromatosis_vs_sarcoma_classification",
    )
    trainer.run_auto_preprocessing(
            selection_methods=["boruta"],
            oversampling=False,
            )
    
    

    Expected Results

    Initialising the trainer and running preprocessing on the features

    Actual Results

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [15], in <cell line: 7>()
          1 trainer = Trainer(
          2     dataset=feature_dataset,
          3     models=models,
          4     result_dir=result_dir,
          5     experiment_name="Fibromatosis_vs_sarcoma_classification",
          6 )
    ----> 7 trainer.run_auto_preprocessing(
          8         selection_methods=["boruta"],
          9         oversampling=False,
         10         )
    
    File ~/AutoRadiomics/autorad/training/trainer.py:78, in Trainer.run_auto_preprocessing(self, oversampling, selection_methods)
         70 preprocessor = Preprocessor(
         71     normalize=True,
         72     feature_selection_method=selection_method,
         73     oversampling_method=oversampling_method,
         74 )
         75 try:
         76     preprocessed[selection_method][
         77         oversampling_method
    ---> 78     ] = preprocessor.fit_transform(self.dataset.data)
         79 except AssertionError:
         80     log.error(
         81         f"Preprocessing with {selection_method} and {oversampling_method} failed."
         82     )
    
    File ~/AutoRadiomics/autorad/preprocessing/preprocessor.py:66, in Preprocessor.fit_transform(self, data)
         64 result_y = {}
         65 all_features = X.train.columns.tolist()
    ---> 66 X_train_trans, y_train_trans = self.pipeline.fit_transform(
         67     X.train, y.train
         68 )
         69 self.selected_features = self.pipeline["select"].selected_features(
         70     column_names=all_features
         71 )
         72 result_X["train"] = pd.DataFrame(
         73     X_train_trans, columns=self.selected_features
         74 )
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/pipeline.py:434, in Pipeline.fit_transform(self, X, y, **fit_params)
        432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
        433 if hasattr(last_step, "fit_transform"):
    --> 434     return last_step.fit_transform(Xt, y, **fit_params_last_step)
        435 else:
        436     return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)
    
    File ~/AutoRadiomics/autorad/feature_selection/selector.py:47, in CoreSelector.fit_transform(self, X, y)
         44 def fit_transform(
         45     self, X: np.ndarray, y: np.ndarray
         46 ) -> tuple[np.ndarray, np.ndarray]:
    ---> 47     self.fit(X, y)
         48     return X[:, self.selected_columns], y
    
    File ~/AutoRadiomics/autorad/feature_selection/selector.py:124, in BorutaSelector.fit(self, X, y, verbose)
        122 with warnings.catch_warnings():
        123     warnings.simplefilter("ignore")
    --> 124     model.fit(X, y)
        125 self.selected_columns = np.where(model.support_)[0].tolist()
        126 if not self.selected_columns:
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:201, in BorutaPy.fit(self, X, y)
        188 def fit(self, X, y):
        189     """
        190     Fits the Boruta feature selection with the provided estimator.
        191 
       (...)
        198         The target values.
        199     """
    --> 201     return self._fit(X, y)
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:251, in BorutaPy._fit(self, X, y)
        249 def _fit(self, X, y):
        250     # check input params
    --> 251     self._check_params(X, y)
        252     self.random_state = check_random_state(self.random_state)
        253     # setup variables for Boruta
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:517, in BorutaPy._check_params(self, X, y)
        513 """
        514 Check hyperparameters as well as X and y before proceeding with fit.
        515 """
        516 # check X and y are consistent len, X is Array and y is column
    --> 517 X, y = check_X_y(X, y)
        518 if self.perc <= 0 or self.perc > 100:
        519     raise ValueError('The percentile should be between 0 and 100.')
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:964, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
        961 if y is None:
        962     raise ValueError("y cannot be None")
    --> 964 X = check_array(
        965     X,
        966     accept_sparse=accept_sparse,
        967     accept_large_sparse=accept_large_sparse,
        968     dtype=dtype,
        969     order=order,
        970     copy=copy,
        971     force_all_finite=force_all_finite,
        972     ensure_2d=ensure_2d,
        973     allow_nd=allow_nd,
        974     ensure_min_samples=ensure_min_samples,
        975     ensure_min_features=ensure_min_features,
        976     estimator=estimator,
        977 )
        979 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)
        981 check_consistent_length(X, y)
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:746, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
        744         array = array.astype(dtype, casting="unsafe", copy=False)
        745     else:
    --> 746         array = np.asarray(array, order=order, dtype=dtype)
        747 except ComplexWarning as complex_warning:
        748     raise ValueError(
        749         "Complex data not supported\n{}\n".format(array)
        750     ) from complex_warning
    
    ValueError: could not broadcast input array from shape (60,1015) into shape (60,)
    
    opened by wagon-master 3
  • BUG: Time and memory inefficient concating in pandas on every case.

    BUG: Time and memory inefficient concating in pandas on every case.

    In the feature extraction, we concat a pd.DataFrame for every case. AFAIK this construction of a pd.DataFrame leads to a new memory allocation (and copying) every time, which is highly memory inefficient. Especially, when parallelized on many CPUs, combined with the already memory intensive forking in joblib this can lead to OOM-Events (and is slow of course). Wouldn't it be more convenient to return only the feature set, that is currently processed. https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L109-L115 These are subsequently collected in results anyways: https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L135-L144

    opened by laqua-stack 2
  • Feature/add inference mlflow

    Feature/add inference mlflow

    Major changes:

    • fixed training with autologging of training parameters, preprocessor and classifier in MLFlow
    • webapp: added Predict subpage for inference on a single case, giving out class probability and Shap explanation
    • webapp: moved all steps into subpages
    • webapp: added Getting started in the landing page

    Fixes:

    • webapp: fixed extraction params discarding Feature Names selected from Feature Classes
    opened by pwoznicki 1
  • example_WORC.ipynb not being up to date with the repository

    example_WORC.ipynb not being up to date with the repository

    Describe the bug

    In example_WORC.ipynb there are function calls that do not work due to code in the repository being changed while the example_WORC.ipynb code wasn't updated to reflect those changes

    Steps/Code to Reproduce

    import pandas as pd
    from pathlib import Path
    from autorad.external.download_WORC import download_WORCDatabase
    
    # Set where we will save our data and results
    base_dir = Path.cwd() / "autorad_tutorial"
    data_dir = base_dir / "data"
    result_dir = base_dir / "results"
    data_dir.mkdir(exist_ok=True, parents=True)
    result_dir.mkdir(exist_ok=True, parents=True)
    
    %load_ext autoreload
    %autoreload 2
    
    download data (it may take a few minutes)
    download_WORCDatabase(
    dataset="Desmoid",
    data_folder=data_dir,
    n_subjects=100,
    )
    
    
    
    from autorad.data.utils import get_paths_with_separate_folder_per_case  # 1
    
    # create a table with all the paths
    paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    paths_df.sample(5)
    
    
    from autorad.data.dataset import ImageDataset
    from autorad.feature_extraction.extractor import FeatureExtractor
    import logging
    
    logging.getLogger().setLevel(logging.CRITICAL)
    
    image_dataset = ImageDataset(
        paths_df,
        ID_colname="ID",
        root_dir=data_dir,
    )
    
    # Let's take a look at the data, plotting random 10 cases
    image_dataset.plot_examples(n=10, window=None)
    
    extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") # 2
    feature_df = extractor.run()
    
    

    Expected Results

    1: Importing the function get_paths_with_separate_folder_per_case

    2: Using default_MR.yaml as value for extraction_params

    Actual Results

    1:

    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    Input In [7], in <cell line: 1>()
    ----> 1 from autorad.data.utils import get_paths_with_separate_folder_per_case
          3 # create a table with all the paths
          4 paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    
    ModuleNotFoundError: No module named 'autorad.data.utils'
    

    2:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [18], in <cell line: 1>()
    ----> 1 extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml")
          2 feature_df = extractor.run()
    
    File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:41, in FeatureExtractor.__init__(self, dataset, feature_set, extraction_params, n_jobs)
         39 self.dataset = dataset
         40 self.feature_set = feature_set
    ---> 41 self.extraction_params = self._get_extraction_param_path(
         42     extraction_params
         43 )
         44 log.info(f"Using extraction params from {self.extraction_params}")
         45 self.n_jobs = set_n_jobs(n_jobs)
    
    File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:55, in FeatureExtractor._get_extraction_param_path(self, extraction_params)
         53     result = default_extraction_param_dir / extraction_params
         54 else:
    ---> 55     raise ValueError(
         56         f"Extraction parameter file {extraction_params} not found."
         57     )
         58 return result
    
    ValueError: Extraction parameter file default_MR.yaml not found.
    

    Fix

    1: change from autorad.data.utils to from autorad.utils.preprocessing 2: change extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") to extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")

    opened by wagon-master 1
  • Bugfix/refactor

    Bugfix/refactor

    New features:

    • log feature dataset and splits in MLFlow
    • update docs & add getting-started

    Fixes:

    • fix evaluation in the web app
    • fix docs build in readthedocss
    opened by pwoznicki 0
  • Support various readers (Nibabel, ITK)

    Support various readers (Nibabel, ITK)

    Currently we use Nibabel for loading images. It works only for Nifti images, but a user may want to load a DICOM image, without converting it to Nifti.

    Consider using MONAI LoadImage() function that provides a common interface for loading both Nifti and DICOM images.

    enhancement 
    opened by pwoznicki 0
Releases(v0.2.2)
  • v0.2.2(Jul 30, 2022)

Owner
Piotr Woźnicki
Recently graduated medical doctor, working on medical image analysis.
Piotr Woźnicki
Semantic Segmentation with SegFormer on Drone Dataset.

SegFormer_Segmentation Semantic Segmentation with SegFormer on Drone Dataset. You can check out the blog on Medium You can also try out the model with

Praneet 8 Oct 20, 2022
Road Crack Detection Using Deep Learning Methods

Road-Crack-Detection-Using-Deep-Learning-Methods This is my Diploma Thesis ¨Road Crack Detection Using Deep Learning Methods¨ under the supervision of

Aggelos Katsaliros 3 May 03, 2022
BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

weijiawu 47 Dec 26, 2022
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Cross-Speaker-Emotion-Transfer - PyTorch Implementation PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Conditio

Keon Lee 114 Jan 08, 2023
Codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

DominoSearch This is repository for codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense n

11 Sep 10, 2022
Neural Fixed-Point Acceleration for Convex Optimization

Licensing The majority of neural-scs is licensed under the CC BY-NC 4.0 License, however, portions of the project are available under separate license

Facebook Research 27 Oct 06, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 29 Jan 08, 2023
Codebase for Inducing Causal Structure for Interpretable Neural Networks

Interchange Intervention Training (IIT) Codebase for Inducing Causal Structure for Interpretable Neural Networks Release Notes 12/01/2021: Code and Pa

Zen 6 Oct 10, 2022
On Generating Extended Summaries of Long Documents

ExtendedSumm This repository contains the implementation details and datasets used in On Generating Extended Summaries of Long Documents paper at the

Georgetown Information Retrieval Lab 76 Sep 05, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

268 Jan 09, 2023
Official Pytorch implementation of the paper: "Locally Shifted Attention With Early Global Integration"

Locally-Shifted-Attention-With-Early-Global-Integration Pretrained models You can download all the models from here. Training Imagenet python -m torch

Shelly Sheynin 14 Apr 15, 2022
Fast Learning of MNL Model From General Partial Rankings with Application to Network Formation Modeling

Fast-Partial-Ranking-MNL This repo provides a PyTorch implementation for the CopulaGNN models as described in the following paper: Fast Learning of MN

Xingjian Zhang 3 Aug 19, 2022
TLoL (Python Module) - League of Legends Deep Learning AI (Research and Development)

TLoL-py - League of Legends Deep Learning Library TLoL-py is the Python component of the TLoL League of Legends deep learning library. It provides a s

7 Nov 29, 2022
SMD-Nets: Stereo Mixture Density Networks

SMD-Nets: Stereo Mixture Density Networks This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021)

Fabio Tosi 115 Dec 26, 2022
Woosung Choi 63 Nov 14, 2022
LogAvgExp - Pytorch Implementation of LogAvgExp

LogAvgExp - Pytorch Implementation of LogAvgExp for Pytorch Install $ pip instal

Phil Wang 31 Oct 14, 2022
CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

CMUA-Watermark The official code for CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022) arxiv. It is bas

50 Nov 26, 2022
The official code repository for examples in the O'Reilly book 'Generative Deep Learning'

Generative Deep Learning Teaching Machines to paint, write, compose and play The official code repository for examples in the O'Reilly book 'Generativ

David Foster 1.3k Dec 29, 2022
Code for the Paper: Alexandra Lindt and Emiel Hoogeboom.

Discrete Denoising Flows This repository contains the code for the experiments presented in the paper Discrete Denoising Flows [1]. To give a short ov

Alexandra Lindt 3 Oct 09, 2022
PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

deepGCFX PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning" Pr

Thilini Cooray 4 Aug 11, 2022