Description
Greetings. I've been using CacheDataset for some time although currently, I don't have the resources to cache the whole dataset as the number of samples increased. To workaround this I've gone for PersistentDataset, though some strange events are happening.
Essentially, some samples raise this error during training:
N Foreground 0, N background 27944254,unable to generate class balanced samples.
=== Transform input info -- RandCropByPosNegLabeld ===
image statistics:
Type: <class 'torch.Tensor'> torch.float32
Shape: torch.Size([1, 365, 298, 549])
Value range: (0.0, 1.0)
label statistics:
Type: <class 'torch.Tensor'> torch.float32
Shape: torch.Size([1, 365, 298, 549])
Value range: (0.0, 0.0)
image_meta_dict statistics:
Type: <class 'dict'> None
Value: {'sizeof_hdr': array(348, dtype=int32), 'extents': array(0, dtype=int32), 'session_error': array(0, dtype=int16), 'dim_info': array(0, dtype=uint8), 'dim': array([ 3, 512, 512, 101, 1, 1, 1, 1], dtype=int16), 'intent_p1': array(0., dtype=float32), 'intent_p2': array(0., dtype=float32), 'intent_p3': array(0., dtype=float32), 'intent_code': array(0, dtype=int16), 'datatype': array(4, dtype=int16), 'bitpix': array(16, dtype=int16), 'slice_start': array(0, dtype=int16), 'pixdim': array([-1. , 1.071777, 1.071777, 3. , 0. , 0. ,
0. , 0. ], dtype=float32), 'vox_offset': array(0., dtype=float32), 'scl_slope': array(nan, dtype=float32), 'scl_inter': array(nan, dtype=float32), 'slice_end': array(0, dtype=int16), 'slice_code': array(0, dtype=uint8), 'xyzt_units': array(10, dtype=uint8), 'cal_max': array(0., dtype=float32), 'cal_min': array(0., dtype=float32), 'slice_duration': array(0., dtype=float32), 'toffset': array(0., dtype=float32), 'glmax': array(0, dtype=int32), 'glmin': array(0, dtype=int32), 'qform_code': array(1, dtype=int16), 'sform_code': array(1, dtype=int16), 'quatern_b': array(0., dtype=float32), 'quatern_c': array(0.70710677, dtype=float32), 'quatern_d': array(0.70710677, dtype=float32), 'qoffset_x': array(283.613, dtype=float32), 'qoffset_y': array(123.125, dtype=float32), 'qoffset_z': array(-510.42804, dtype=float32), 'srow_x': array([ -1.071777, 0. , -0. , 283.613 ], dtype=float32), 'srow_y': array([ -0. , 0. , -3. , 123.125], dtype=float32), 'srow_z': array([ 0. , 1.071777, 0. , -510.42804 ], dtype=float32), 'affine': array([[ 1. , 0. , 0. , -264.06503],
[ 0. , 1. , 0. , -176.875 ],
[ 0. , 0. , 1. , -510.42804],
[ 0. , 0. , 0. , 1. ]], dtype=float32), 'original_affine': array([[ -1.07177699, 0. , -0. , 283.61300659],
[ -0. , 0. , -3. , 123.125 ],
[ 0. , 1.07177699, 0. , -510.42803955],
[ 0. , 0. , 0. , 1. ]]), 'as_closest_canonical': False, 'spatial_shape': array([512, 512, 101], dtype=int16), 'original_channel_dim': 'no_channel', 'filename_or_obj': '/home/dev/verse/images/sub-verse763_ct.nii.gz'}
label_meta_dict statistics:
Type: <class 'dict'> None
Value: {'sizeof_hdr': array(348, dtype=int32), 'extents': array(0, dtype=int32), 'session_error': array(0, dtype=int16), 'dim_info': array(0, dtype=uint8), 'dim': array([ 3, 512, 512, 101, 1, 1, 1, 1], dtype=int16), 'intent_p1': array(0., dtype=float32), 'intent_p2': array(0., dtype=float32), 'intent_p3': array(0., dtype=float32), 'intent_code': array(0, dtype=int16), 'datatype': array(64, dtype=int16), 'bitpix': array(64, dtype=int16), 'slice_start': array(0, dtype=int16), 'pixdim': array([-1. , 1.071777, 1.071777, 3. , 1. , 1. ,
1. , 1. ], dtype=float32), 'vox_offset': array(0., dtype=float32), 'scl_slope': array(nan, dtype=float32), 'scl_inter': array(nan, dtype=float32), 'slice_end': array(0, dtype=int16), 'slice_code': array(0, dtype=uint8), 'xyzt_units': array(0, dtype=uint8), 'cal_max': array(0., dtype=float32), 'cal_min': array(0., dtype=float32), 'slice_duration': array(0., dtype=float32), 'toffset': array(0., dtype=float32), 'glmax': array(0, dtype=int32), 'glmin': array(0, dtype=int32), 'qform_code': array(0, dtype=int16), 'sform_code': array(2, dtype=int16), 'quatern_b': array(0., dtype=float32), 'quatern_c': array(0.70710677, dtype=float32), 'quatern_d': array(0.70710677, dtype=float32), 'qoffset_x': array(283.613, dtype=float32), 'qoffset_y': array(123.125, dtype=float32), 'qoffset_z': array(-510.42804, dtype=float32), 'srow_x': array([ -1.071777, 0. , 0. , 283.613 ], dtype=float32), 'srow_y': array([ 0. , 0. , -3. , 123.125], dtype=float32), 'srow_z': array([ 0. , 1.071777, 0. , -510.42804 ], dtype=float32), 'affine': array([[ 1. , 0. , 0. , -264.06503],
[ 0. , 1. , 0. , -176.875 ],
[ 0. , 0. , 1. , -510.42804],
[ 0. , 0. , 0. , 1. ]], dtype=float32), 'original_affine': array([[ -1.07177699, 0. , 0. , 283.61300659],
[ 0. , 0. , -3. , 123.125 ],
[ 0. , 1.07177699, 0. , -510.42803955],
[ 0. , 0. , 0. , 1. ]]), 'as_closest_canonical': False, 'spatial_shape': array([512, 512, 101], dtype=int16), 'original_channel_dim': 'no_channel', 'filename_or_obj': '/home/dev/verse/labels/new_sub-verse763_seg-vert_msk.nii.gz'}
image_transforms statistics:
Type: <class 'list'> None
Value: [{'class': 'Orientationd', 'id': 140249334551840, 'orig_size': (512, 101, 512), 'extra_info': {'meta_key': 'image_meta_dict', 'old_affine': array([[ -1.07177699, 0. , -0. , 283.61300659],
[ -0. , 0. , -3. , 123.125 ],
[ 0. , 1.07177699, 0. , -510.42803955],
[ 0. , 0. , 0. , 1. ]])}}, {'class': 'Spacingd', 'id': 140249334551984, 'orig_size': (512, 101, 512), 'extra_info': {'meta_key': 'image_meta_dict', 'old_affine': array([[ 1.071777, 0. , 0. , -264.06503 ],
[ 0. , 3. , 0. , -176.875 ],
[ 0. , 0. , 1.071777, -510.42804 ],
[ 0. , 0. , 0. , 1. ]],
dtype=float32), 'mode': 'bilinear', 'padding_mode': 'border', 'align_corners': False}}, {'class': 'CropForegroundd', 'id': 140249334552272, 'orig_size': (549, 301, 549), 'extra_info': {'box_start': array([91, 0, 0]), 'box_end': array([456, 298, 549])}}, {'class': 'ToTensord', 'id': 140249334552368, 'orig_size': (365, 298, 549)}]
label_transforms statistics:
Type: <class 'list'> None
Value: [{'class': 'Orientationd', 'id': 140249334551840, 'orig_size': (512, 101, 512), 'extra_info': {'meta_key': 'label_meta_dict', 'old_affine': array([[ -1.07177699, 0. , 0. , 283.61300659],
[ 0. , 0. , -3. , 123.125 ],
[ 0. , 1.07177699, 0. , -510.42803955],
[ 0. , 0. , 0. , 1. ]])}}, {'class': 'Spacingd', 'id': 140249334551984, 'orig_size': (512, 101, 512), 'extra_info': {'meta_key': 'label_meta_dict', 'old_affine': array([[ 1.071777, 0. , 0. , -264.06503 ],
[ 0. , 3. , 0. , -176.875 ],
[ 0. , 0. , 1.071777, -510.42804 ],
[ 0. , 0. , 0. , 1. ]],
dtype=float32), 'mode': 'nearest', 'padding_mode': 'border', 'align_corners': False}}, {'class': 'CropForegroundd', 'id': 140249334552272, 'orig_size': (549, 301, 549), 'extra_info': {'box_start': array([91, 0, 0]), 'box_end': array([456, 298, 549])}}, {'class': 'ToTensord', 'id': 140249334552368, 'orig_size': (365, 298, 549)}]
foreground_start_coord statistics:
Type: <class 'numpy.ndarray'> int64
Shape: (3,)
Value range: (0, 91)
foreground_end_coord statistics:
Type: <class 'numpy.ndarray'> int64
Shape: (3,)
Value range: (298, 549)
Epoch 0: 2%|█▏ | 6/287 [01:28<1:09:06, 14.76s/it, loss=1.4, v_num=11]Traceback (most recent call last):
File "/home/andre/workspace/App-WHSeg/src/train.py", line 173, in <module>
main(args.bodypart, args.config, args.gpu_device)
File "/home/andre/workspace/App-WHSeg/src/train.py", line 167, in main
trainer.fit(model)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
self._call_and_handle_interrupt(
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
results = self._run_stage()
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
return self._run_train()
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
self.fit_loop.run()
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 174, in advance
batch = next(data_fetcher)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 184, in __next__
return self.fetching_function()
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 263, in fetching_function
self._fetch_next_batch(self.dataloader_iter)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 277, in _fetch_next_batch
batch = next(iterator)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 557, in __next__
return self.request_next_batch(self.loader_iters)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 569, in request_next_batch
return apply_to_collection(loader_iters, Iterator, next)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/utilities/apply_func.py", line 99, in apply_to_collection
return function(data, *args, **kwargs)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1356, in _next_data
return self._process_data(data)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 6.
Original Traceback (most recent call last):
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/transform.py", line 82, in apply_transform
return _apply_transform(transform, data, unpack_items)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/transform.py", line 53, in _apply_transform
return transform(parameters)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/croppad/dictionary.py", line 1161, in __call__
self.randomize(label, fg_indices, bg_indices, image)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/croppad/dictionary.py", line 1143, in randomize
self.centers = generate_pos_neg_label_crop_centers(
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/utils.py", line 502, in generate_pos_neg_label_crop_centers
random_int = rand_state.randint(len(indices_to_use))
File "mtrand.pyx", line 748, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/data/dataset.py", line 96, in __getitem__
return self._transform(index)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/data/dataset.py", line 289, in _transform
return self._post_transform(pre_random_item)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/data/dataset.py", line 235, in _post_transform
item_transformed = apply_transform(_transform, item_transformed)
File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/transform.py", line 106, in apply_transform
raise RuntimeError(f"applying transform {transform}") from e
RuntimeError: applying transform <monai.transforms.croppad.dictionary.RandCropByPosNegLabeld object at 0x7fe501edcf40>
I'm using UNETR with (96,96,96) kernel, and all the samples are above 96px in all the axis.
Oddly enough, when using CacheDataset or SmartCacheDataset this issue was never raised.
The training transform pipeline is the following:
def train_transforms(labels: list, roi_size: tuple):
"""
Training transforms.
Args
------
labels: labels to select
roi_size: patch cube dimensions
"""
transforms = Compose(
[
LoadImaged(keys=["image", "label"]),
SelectLabels(keys=["label"], labels=labels),
AddChanneld(keys=["image", "label"]),
Orientationd(keys=["image", "label"], axcodes="RAS"),
Spacingd(
keys=["image", "label"],
pixdim=(1.0, 1.0, 1.0),
mode=("bilinear", "nearest"),
),
ScaleIntensityRanged(
keys=["image"],
a_min=-175,
a_max=250,
b_min=0.0,
b_max=1.0,
clip=True,
),
CropForegroundd(keys=["image", "label"], source_key="image"),
RandCropByPosNegLabeld(
keys=["image", "label"],
label_key="label",
spatial_size=roi_size,
pos=1,
neg=1,
num_samples=4,
image_key="image",
image_threshold=0,
),
RandFlipd(
keys=["image", "label"],
spatial_axis=[0],
prob=0.10,
),
RandFlipd(
keys=["image", "label"],
spatial_axis=[1],
prob=0.10,
),
RandFlipd(
keys=["image", "label"],
spatial_axis=[2],
prob=0.10,
),
RandRotate90d(
keys=["image", "label"],
prob=0.10,
max_k=3,
),
RandShiftIntensityd(
keys=["image"],
offsets=0.10,
prob=0.50,
),
ToTensord(keys=["image", "label"]),
]
)
return transforms
I gathered that this is supposed to happen when pos=0
and neg=0
, however, why is this happening? Is it because some of the croppings from those 4 result sometimes in samples that don't have foreground voxels?
From my understanding regarding the loading and feeding process differences, if this happens it should happen for both dataset types. Why only happens with PersistentDataset, since I've tested considerably with the others, and what is actually happening during this transform?
Thanks
Setup:
================================
Printing MONAI config...
================================
MONAI version: 0.8.0
Numpy version: 1.23.3
Pytorch version: 1.12.1+cu116
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: 714d00dffe6653e21260160666c4c201ab66511b
Optional dependencies:
Pytorch Ignite version: 0.4.6
Nibabel version: 4.0.2
scikit-image version: 0.19.3
Pillow version: 9.2.0
Tensorboard version: 2.10.0
gdown version: 4.5.1
TorchVision version: 0.13.1+cu116
tqdm version: 4.64.1
lmdb version: 1.3.0
psutil version: 5.9.2
pandas version: 1.4.3
einops version: 0.4.1
transformers version: 4.22.1
mlflow version: 1.28.0
For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
================================
Printing system config...
================================
System: Linux
Linux version: Ubuntu 20.04.2 LTS
Platform: Linux-5.15.0-50-generic-x86_64-with-glibc2.31
Processor: x86_64
Machine: x86_64
Python version: 3.9.7
Process name: python
Command: ['python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: [popenfile(path='/home/andre/workspace/App-WHSeg/src/cufile.log', fd=3, position=14286, mode='a', flags=33793), popenfile(path='/home/andre/.vscode-server/data/logs/20221013T182351/remoteagent.log', fd=19, position=1498, mode='a', flags=33793), popenfile(path='/home/andre/.vscode-server/data/logs/20221013T182351/ptyhost.log', fd=20, position=3920, mode='a', flags=33793), popenfile(path='/home/andre/.vscode-server/bin/74b1f979648cc44d385a2286793c226e611f59e7/vscode-remote-lock.andre.74b1f979648cc44d385a2286793c226e611f59e7', fd=99, position=0, mode='w', flags=32769)]
Num physical CPUs: 32
Num logical CPUs: 64
Num usable CPUs: 64
CPU usage (%): [11.8, 5.1, 5.0, 4.5, 3.8, 3.8, 4.5, 5.1, 4.5, 5.1, 4.5, 5.1, 5.1, 5.1, 5.0, 4.5, 5.1, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 7.6, 5.1, 5.1, 4.5, 4.5, 4.5, 92.4, 4.5, 4.5, 4.5, 4.5, 5.1, 4.5, 5.1, 4.5, 4.5, 5.1, 5.1, 3.8, 4.5, 5.1, 100.0, 5.1, 4.5, 6.3, 5.1, 5.1, 5.1, 4.5, 3.8, 4.5, 5.1, 4.5, 4.4, 4.5, 6.3, 3.8, 5.7, 11.3]
CPU freq. (MHz): 2231
Load avg. in last 1, 5, 15 mins (%): [4.8, 10.3, 10.9]
Disk usage (%): 96.5
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 125.6
Available memory (GB): 72.5
Used memory (GB): 42.9
================================
Printing GPU config...
================================
Num GPUs: 3
Has CUDA: True
CUDA version: 11.6
cuDNN enabled: True
cuDNN version: 8302
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86']
GPU 0 Name: NVIDIA GeForce RTX 3090
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 82
GPU 0 Total memory (GB): 23.7
GPU 0 CUDA capability (maj.min): 8.6
GPU 1 Name: NVIDIA GeForce RTX 3090
GPU 1 Is integrated: False
GPU 1 Is multi GPU board: False
GPU 1 Multi processor count: 82
GPU 1 Total memory (GB): 23.7
GPU 1 CUDA capability (maj.min): 8.6
GPU 2 Name: NVIDIA GeForce RTX 3090
GPU 2 Is integrated: False
GPU 2 Is multi GPU board: False
GPU 2 Multi processor count: 82
GPU 2 Total memory (GB): 23.7
GPU 2 CUDA capability (maj.min): 8.6