This repository gives an example on how to preprocess the data of the HECKTOR challenge

HECKTOR 2021 challenge

This repository gives an example on how to preprocess the data of the HECKTOR challenge. Any other preprocessing is welcomed and any framework can be used for the challenge, the only requirement is to submit the results in the same coordinate system as the original CT images (same spacing and same origin). This repository also contains the code used to prepare the data of the challenge (DICOM to NIFTI, SUV computation and bounding box generation, not needed for the participants). Moreover, it contains an example of implementation to resample the data within the bounding boxes and resample back to the original resolution.

Download Data

To access the data, visit the challenge website: and follow the instructions. The code included here was intended to work with a specific repository structure described in Section Project Organization. Following git clone, create a data/ folder in the repository and place the unzipped data in it.

Install Dependencies

To install the necessary dependencies you can use pip install -r requirements.txt. It is advised to use it within a python3 virtual environment.

Resample Data

Run python src/resampling/ to crop and resample the data following the repository structure or use arguments (type python src/resamping/ --help for more informations).


An example of how the segmentation (task 1) will be evaluated is illustrated in the notebook notebooks/evaluate_segmentation.ipynb. Note that the Hausdorff distance at 95 % implemented in will be used in the challenge (not the one found in src/evaluation/

The concordance index used to evaluate task 2 and 3 is implemented in the function concordance_index(event_times, predicted_scores, event_observed=None) from the file src/aicrowd_evaluator/ It was adapted from to account for missing predictions (missing predictions are handled as non-concordant).


Dummy examples of correct submission for task 1 and 2 can be found in notebooks/example_seg_submission.ipynb and notebooks/example_surv_submission.ipynbrespectively.

Project Organization

├── data                              <- NOT in the version control
│   ├── resampled                     <- The data in NIFTI resampled and cropped according to the bounding boxes (bbox.csv).
│   ├── hecktor_nii                   <- The data converted in the nifty format with the original geometric frame,
|   |                                    e.i. the one downloaded form AIcrowd
│   └── bbox.csv                      <- The bounding box for each patient computed with bbox_auto function from
├── requirements.txt                  <- The requirements file for reproducing the analysis environment, e.g.
│                                        generated with `pip freeze > requirements.txt`
├── Makefile                          <- Used to do set up the environment and make the conversion of DICOM to NIFTI
├── notebooks
|   ├── example_seg_submission.ipynb  <- Example of a correct submission for the segmentation task (task 1).
|   ├── example_surv_submission.ipynb <- Example of a correct submission for the survival task (task 2).
│   └── evaluate_segmentation.ipynb   <- Example of how the evaluation will be computed.
└── src                               <- Source code for use in this project
    ├── aicrowd_evaluator             <- Source code for the evaluation on the AIcrowd platform
    │   ├──
    │   ├── surface-distance/         <- code to compute the robust Hausdorff distance availabe at        
    │   ├──              <- Define the evaluator class for task 1 and 2
    │   ├──   <- Define the metrics used in the segmentation task.
    |   ├── requirements.txt          <- The requirements file specific to this submodule
    │   └──       <- Define the metrics used for the survival task.
    ├── data                          <- Scripts to generate data
    │   ├──
    │   ├──        
    │   ├──                  <- Define functions used in
    │   └──           <- Conversion of the DICOM to NIFTI and computation of the bounding boxes
    ├── evaluation
    |   ├──
    │   └──                 <- (DEPRECATED) used to illustrate how the segmentation is evaluated. Refer to `src/aicrowd_evaluator`
    |                                    submodule for the actual evaluation of the challenge.
    └── resampling                    <- Code to resample the data 

Project based on the cookiecutter data science project template. #cookiecutterdatascience

    The validation dice is similar when 10% or 100% train datasets were used with the same validation sets.

    Hello! I found an uncommon result.The validation results were similar when I used different numbers of patient case in 10% and 100% of training datasets.I have tested different codes, including this repository (the 3D dense_vnet result:23 train case:0.5914 ,180 train case:0.6233).At last, I got the same conclusion,especially in 2D,and the results are almost the same. The case did not happen,when I used the type of dataset,which was randomly split as train and validation set in the way of shuffle all slices of all patient cases, instead of shuffling all patient cases. Why ? The cause of data distribution? I would appreciate you,if you can help me.Thanks!!!

    opened by szhang963 5
    Resampling issue

    I'm facing a problem in running python src/resampling/ line.

    C:\Users\x\y\hecktor_project\hecktor-master1>python src/resampling/
    2020-07-02 22:30:28,050 - __main__ - INFO - Resampling
    resampling is (1.0, 1.0, 1.0)
    Traceback (most recent call last):
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\", line 2646, in get_loc
        return self._engine.get_loc(key)
      File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
      File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
      File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
      File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
    KeyError: 'hecktor'
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\multiprocessing\", line 125, in worker
        result = (True, func(*args, **kwds))
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\multiprocessing\", line 48, in mapstar
        return list(map(*args))
      File "c:\users\x\y\hecktor_project\hecktor-master1\src\resampling\", line 35, in __call__
        bb = (self.bb_df.loc[patient_name, 'x1'], self.bb_df.loc[patient_name,
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\", line 1762, in __getitem__
        return self._getitem_tuple(key)
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\", line 1272, in _getitem_tuple
        return self._getitem_lowerdim(tup)
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\", line 1389, in _getitem_lowerdim
        section = self._getitem_axis(key, axis=i)
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\", line 1965, in _getitem_axis
        return self._get_label(key, axis=axis)
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\", line 625, in _get_label
        return self.obj._xs(label, axis=axis)
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\", line 3537, in xs
        loc = self.index.get_loc(key)
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\", line 2648, in get_loc
        return self._engine.get_loc(self._maybe_cast_indexer(key))
      File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
      File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
      File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
      File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
    KeyError: 'hecktor'
    The above exception was the direct cause of the following exception:
    Traceback (most recent call last):
      File "src/resampling/", line 76, in <module>
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\click\", line 829, in __call__
        return self.main(*args, **kwargs)
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\click\", line 782, in main
        rv = self.invoke(ctx)
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\click\", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\site-packages\click\", line 610, in invoke
        return callback(*args, **kwargs)
      File "src/resampling/", line 68, in main, files_list)
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\multiprocessing\", line 364, in map
        return self._map_async(func, iterable, mapstar, chunksize).get()
      File "C:\Users\mahaw\AppData\Local\Programs\Python\Python38\lib\multiprocessing\", line 771, in get
        raise self._value
    KeyError: 'hecktor'
    opened by Mahaals 5
  • No module named 'src.resampling.utils'

    No module named 'src.resampling.utils'

    When I try to run evaluate_predictions.ipynb in the notebook dir, I get the following message:

    No module named 'src.resampling.utils'

    I cannot find the module in the src directory. Is there any solution?

    opened by JYeonLee 4
  • update crop_dataset.ipynb

    update crop_dataset.ipynb

    Hi @voreille ,

    Thanks for organizing the great challenge. It seems that the cropping and resampling notebook has not been updated.

    Looking forward to your update:)

    opened by JunMa11 2
  • wrong when using

    wrong when using "evaluate_predictions.ipynb" to evaluate 3d dice

    I found the evaluated result wrong when I used the code "evaluate_predictions.ipynb".After,I made a test. I used resampled gtvt data generated by the operation “python src/resampling/” as "prediction_folder" to evaluate 3d dice score,and I got the dice of 0.9528 instead of 1.Why?

    opened by szhang963 2
  • Question about okapy.dicomconverter

    Question about okapy.dicomconverter

    Hi. I am getting "okapy.dicomconverter.converter" could not be resolved when I try to use, (I have installed okapy using pip). Is the okapy library deprecated? Looking forward to your advice.

    opened by TravisL24 0
  • baseline CNN (niftynet)

    baseline CNN (niftynet)

    Hi, I am interested in hecktor challenges although it is closed now. I am quite new to the idea of CNN. I see that it says there is a baseline CNN (niftynet) available in this repository but I cannot find it here. Is that deleted or could you share it with me? That would be very helpful. Thank you.

    opened by Wenhui-Zhang-5 1
  • Error occurs when running baseline model

    Error occurs when running baseline model

    When I run "net_segment evaluation -c config3D.ini" after inference, "KeyError: ("label",)" was raised. I just followed the steps mentioned in README. Is there any solution to this problem?

    opened by charlieisacat 1
