All kinds of models and scripts for data processing and visualizations.
Python3.12 is recommended. Libraries are managed by https://python-poetry.org/
After installing poetry, run this to start poetry venv:
poetry install
Create a jupyter kernel from venv:
poetry run python -m ipykernel install --user --name ctseg-py3.12
To edit or remove kernels:
jupyter kernelspec list
jupyter kernelspec remove KERNEL_NAME
- Main folder:
2d_seg_and_video - Training:
2d_seg_and_video/2d_box_seg.ipynb - Processing logic mostly in:
2d_seg_and_video/dataset.py- Cropping and resizing
- Adding bbox prompt
- Green: true positive
- Red: false positive
- Purple: false negative
DeepLesion 3D subset of the ULS23 dataset
- 743 3D lesion segmentations
- 4538 2D slices
- train/val split done by lesions to avoid data leakage
Dataset that we are labeling and reviewing (2024-2025).
Script: notebooks/ct_video.ipynb
- Download all 6 .zip parts files from https://github.com/DIAGNijmegen/ULS23/
- Unzip all 6 parts such that they are in the same folder
- The folder should contain fragmented zip files (.z01, .z02, ...) for all 6 parts
- Unzip the fragmented zip files
sudo apt update
sudo apt install p7zip-full
7z x ULS23_Part1.zip
7z x ULS23_Part2.zip
...
Labels in git repo - merge the label folder with the existing ULS23 data folder.
- Main folder:
3d_seg
- Project page: https://aim.hms.harvard.edu/ct-fm
- Model: https://huggingface.co/project-lighter/whole_body_segmentation
patching.ipynbScripts to split existing datasets into patchesct-fm.ipynbScripts for testing and training ct-fm seg modellesion3DDataset folder in the Kingston SSD
nnUNet repo All commands below should be run in the nnUNet project directory.
nnunet_data_proc.ipynbScripts to generate raw datannUNet_rawDataset folder in the Kingston SSD
Set env variables e.g. by creating an env.sh file:
export nnUNet_raw="path/to/nnUNet_raw"
export nnUNet_preprocessed="/path/to/nnUNet_preprocessed/"
export nnUNet_results="/path/to/nnUNet_results"Apply them in terminal:
source env.shAt 64GB memory, 2 processes -np 2 is the max possible on this dataset. Proocessed data will be stored at the nnUNet_preprocessed path.
nnUNetv2_plan_and_preprocess -d 1 -np 2Train model dataset (1 in this example) on a specified cross-validation fold (0 in this example).
nnUNetv2_train 1 3d_fullres 0 --npzOther training config are 2d, 3d_lowres, 3d_cascade_fullres. The log and results will be stored at the nnUNet_preprocessed path.
Save the best and final checkpoints from previous run before running this:
nnUNetv2_train 1 3d_fullres all --npz -pretrained_weights path/to/checkpoint.pth- CT channel is always 0000
- Seg masks from CT-FM seg model is 0001
- Boxes and other priors are 0002
Sample dataset.json
{
"channel_names":{
"0":"CT",
"1": "noNorm"
},
"labels":{
"background":0,
"lesion":1
},
"numTraining":782,
"file_ending":".nii.gz"
}
studies_ctfm_seg_mask.ipynbgenerates the anatomical mask channelsstudies_weak_labels.ipnybgenerates the box and mask channels
The default 5 folds are used. I used the same folds every dataset
by copying the splits_final.json file from the preprocessed folder of the first
dataset i.e. nnUNet_preprocessed/Dataset001_3dlesion/
3d_seg/visual.ipynband3d_seg/visualize.pycontains code for visualizing 3D predictions for all the studies along with the CT and label masks.- Visualization of val set can be found on the Kingston SSD
3d_val_visualizationfolder
Both voxel and lesion level metrics are calculated.
3d_seg/metrics.ipynbcontains code for calculating metrics3d_seg/metrics/contains the csv files with metrics for every lesion in val set