Skip to content

runtime error applying RandCropByPosNegLabeld in some samples when using PersistentDataset #5330

Closed as not planned
@AKdeeplearner

Description

@AKdeeplearner

Greetings. I've been using CacheDataset for some time although currently, I don't have the resources to cache the whole dataset as the number of samples increased. To workaround this I've gone for PersistentDataset, though some strange events are happening.

Essentially, some samples raise this error during training:

N Foreground 0, N  background 27944254,unable to generate class balanced samples.

=== Transform input info -- RandCropByPosNegLabeld ===
image statistics:
Type: <class 'torch.Tensor'> torch.float32
Shape: torch.Size([1, 365, 298, 549])
Value range: (0.0, 1.0)
label statistics:
Type: <class 'torch.Tensor'> torch.float32
Shape: torch.Size([1, 365, 298, 549])
Value range: (0.0, 0.0)
image_meta_dict statistics:
Type: <class 'dict'> None
Value: {'sizeof_hdr': array(348, dtype=int32), 'extents': array(0, dtype=int32), 'session_error': array(0, dtype=int16), 'dim_info': array(0, dtype=uint8), 'dim': array([  3, 512, 512, 101,   1,   1,   1,   1], dtype=int16), 'intent_p1': array(0., dtype=float32), 'intent_p2': array(0., dtype=float32), 'intent_p3': array(0., dtype=float32), 'intent_code': array(0, dtype=int16), 'datatype': array(4, dtype=int16), 'bitpix': array(16, dtype=int16), 'slice_start': array(0, dtype=int16), 'pixdim': array([-1.      ,  1.071777,  1.071777,  3.      ,  0.      ,  0.      ,
        0.      ,  0.      ], dtype=float32), 'vox_offset': array(0., dtype=float32), 'scl_slope': array(nan, dtype=float32), 'scl_inter': array(nan, dtype=float32), 'slice_end': array(0, dtype=int16), 'slice_code': array(0, dtype=uint8), 'xyzt_units': array(10, dtype=uint8), 'cal_max': array(0., dtype=float32), 'cal_min': array(0., dtype=float32), 'slice_duration': array(0., dtype=float32), 'toffset': array(0., dtype=float32), 'glmax': array(0, dtype=int32), 'glmin': array(0, dtype=int32), 'qform_code': array(1, dtype=int16), 'sform_code': array(1, dtype=int16), 'quatern_b': array(0., dtype=float32), 'quatern_c': array(0.70710677, dtype=float32), 'quatern_d': array(0.70710677, dtype=float32), 'qoffset_x': array(283.613, dtype=float32), 'qoffset_y': array(123.125, dtype=float32), 'qoffset_z': array(-510.42804, dtype=float32), 'srow_x': array([ -1.071777,   0.      ,  -0.      , 283.613   ], dtype=float32), 'srow_y': array([ -0.   ,   0.   ,  -3.   , 123.125], dtype=float32), 'srow_z': array([   0.      ,    1.071777,    0.      , -510.42804 ], dtype=float32), 'affine': array([[   1.     ,    0.     ,    0.     , -264.06503],
       [   0.     ,    1.     ,    0.     , -176.875  ],
       [   0.     ,    0.     ,    1.     , -510.42804],
       [   0.     ,    0.     ,    0.     ,    1.     ]], dtype=float32), 'original_affine': array([[  -1.07177699,    0.        ,   -0.        ,  283.61300659],
       [  -0.        ,    0.        ,   -3.        ,  123.125     ],
       [   0.        ,    1.07177699,    0.        , -510.42803955],
       [   0.        ,    0.        ,    0.        ,    1.        ]]), 'as_closest_canonical': False, 'spatial_shape': array([512, 512, 101], dtype=int16), 'original_channel_dim': 'no_channel', 'filename_or_obj': '/home/dev/verse/images/sub-verse763_ct.nii.gz'}
label_meta_dict statistics:
Type: <class 'dict'> None
Value: {'sizeof_hdr': array(348, dtype=int32), 'extents': array(0, dtype=int32), 'session_error': array(0, dtype=int16), 'dim_info': array(0, dtype=uint8), 'dim': array([  3, 512, 512, 101,   1,   1,   1,   1], dtype=int16), 'intent_p1': array(0., dtype=float32), 'intent_p2': array(0., dtype=float32), 'intent_p3': array(0., dtype=float32), 'intent_code': array(0, dtype=int16), 'datatype': array(64, dtype=int16), 'bitpix': array(64, dtype=int16), 'slice_start': array(0, dtype=int16), 'pixdim': array([-1.      ,  1.071777,  1.071777,  3.      ,  1.      ,  1.      ,
        1.      ,  1.      ], dtype=float32), 'vox_offset': array(0., dtype=float32), 'scl_slope': array(nan, dtype=float32), 'scl_inter': array(nan, dtype=float32), 'slice_end': array(0, dtype=int16), 'slice_code': array(0, dtype=uint8), 'xyzt_units': array(0, dtype=uint8), 'cal_max': array(0., dtype=float32), 'cal_min': array(0., dtype=float32), 'slice_duration': array(0., dtype=float32), 'toffset': array(0., dtype=float32), 'glmax': array(0, dtype=int32), 'glmin': array(0, dtype=int32), 'qform_code': array(0, dtype=int16), 'sform_code': array(2, dtype=int16), 'quatern_b': array(0., dtype=float32), 'quatern_c': array(0.70710677, dtype=float32), 'quatern_d': array(0.70710677, dtype=float32), 'qoffset_x': array(283.613, dtype=float32), 'qoffset_y': array(123.125, dtype=float32), 'qoffset_z': array(-510.42804, dtype=float32), 'srow_x': array([ -1.071777,   0.      ,   0.      , 283.613   ], dtype=float32), 'srow_y': array([  0.   ,   0.   ,  -3.   , 123.125], dtype=float32), 'srow_z': array([   0.      ,    1.071777,    0.      , -510.42804 ], dtype=float32), 'affine': array([[   1.     ,    0.     ,    0.     , -264.06503],
       [   0.     ,    1.     ,    0.     , -176.875  ],
       [   0.     ,    0.     ,    1.     , -510.42804],
       [   0.     ,    0.     ,    0.     ,    1.     ]], dtype=float32), 'original_affine': array([[  -1.07177699,    0.        ,    0.        ,  283.61300659],
       [   0.        ,    0.        ,   -3.        ,  123.125     ],
       [   0.        ,    1.07177699,    0.        , -510.42803955],
       [   0.        ,    0.        ,    0.        ,    1.        ]]), 'as_closest_canonical': False, 'spatial_shape': array([512, 512, 101], dtype=int16), 'original_channel_dim': 'no_channel', 'filename_or_obj': '/home/dev/verse/labels/new_sub-verse763_seg-vert_msk.nii.gz'}
image_transforms statistics:
Type: <class 'list'> None
Value: [{'class': 'Orientationd', 'id': 140249334551840, 'orig_size': (512, 101, 512), 'extra_info': {'meta_key': 'image_meta_dict', 'old_affine': array([[  -1.07177699,    0.        ,   -0.        ,  283.61300659],
       [  -0.        ,    0.        ,   -3.        ,  123.125     ],
       [   0.        ,    1.07177699,    0.        , -510.42803955],
       [   0.        ,    0.        ,    0.        ,    1.        ]])}}, {'class': 'Spacingd', 'id': 140249334551984, 'orig_size': (512, 101, 512), 'extra_info': {'meta_key': 'image_meta_dict', 'old_affine': array([[   1.071777,    0.      ,    0.      , -264.06503 ],
       [   0.      ,    3.      ,    0.      , -176.875   ],
       [   0.      ,    0.      ,    1.071777, -510.42804 ],
       [   0.      ,    0.      ,    0.      ,    1.      ]],
      dtype=float32), 'mode': 'bilinear', 'padding_mode': 'border', 'align_corners': False}}, {'class': 'CropForegroundd', 'id': 140249334552272, 'orig_size': (549, 301, 549), 'extra_info': {'box_start': array([91,  0,  0]), 'box_end': array([456, 298, 549])}}, {'class': 'ToTensord', 'id': 140249334552368, 'orig_size': (365, 298, 549)}]
label_transforms statistics:
Type: <class 'list'> None
Value: [{'class': 'Orientationd', 'id': 140249334551840, 'orig_size': (512, 101, 512), 'extra_info': {'meta_key': 'label_meta_dict', 'old_affine': array([[  -1.07177699,    0.        ,    0.        ,  283.61300659],
       [   0.        ,    0.        ,   -3.        ,  123.125     ],
       [   0.        ,    1.07177699,    0.        , -510.42803955],
       [   0.        ,    0.        ,    0.        ,    1.        ]])}}, {'class': 'Spacingd', 'id': 140249334551984, 'orig_size': (512, 101, 512), 'extra_info': {'meta_key': 'label_meta_dict', 'old_affine': array([[   1.071777,    0.      ,    0.      , -264.06503 ],
       [   0.      ,    3.      ,    0.      , -176.875   ],
       [   0.      ,    0.      ,    1.071777, -510.42804 ],
       [   0.      ,    0.      ,    0.      ,    1.      ]],
      dtype=float32), 'mode': 'nearest', 'padding_mode': 'border', 'align_corners': False}}, {'class': 'CropForegroundd', 'id': 140249334552272, 'orig_size': (549, 301, 549), 'extra_info': {'box_start': array([91,  0,  0]), 'box_end': array([456, 298, 549])}}, {'class': 'ToTensord', 'id': 140249334552368, 'orig_size': (365, 298, 549)}]
foreground_start_coord statistics:
Type: <class 'numpy.ndarray'> int64
Shape: (3,)
Value range: (0, 91)
foreground_end_coord statistics:
Type: <class 'numpy.ndarray'> int64
Shape: (3,)
Value range: (298, 549)
Epoch 0:   2%|█▏                                                    | 6/287 [01:28<1:09:06, 14.76s/it, loss=1.4, v_num=11]Traceback (most recent call last):
  File "/home/andre/workspace/App-WHSeg/src/train.py", line 173, in <module>
    main(args.bodypart, args.config, args.gpu_device)
  File "/home/andre/workspace/App-WHSeg/src/train.py", line 167, in main
    trainer.fit(model)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
    results = self._run_stage()
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
    return self._run_train()
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
    self.fit_loop.run()
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 174, in advance
    batch = next(data_fetcher)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 184, in __next__
    return self.fetching_function()
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 263, in fetching_function
    self._fetch_next_batch(self.dataloader_iter)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 277, in _fetch_next_batch
    batch = next(iterator)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 557, in __next__
    return self.request_next_batch(self.loader_iters)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py", line 569, in request_next_batch
    return apply_to_collection(loader_iters, Iterator, next)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/pytorch_lightning/utilities/apply_func.py", line 99, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1356, in _next_data
    return self._process_data(data)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 6.
Original Traceback (most recent call last):
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/transform.py", line 82, in apply_transform
    return _apply_transform(transform, data, unpack_items)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/transform.py", line 53, in _apply_transform
    return transform(parameters)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/croppad/dictionary.py", line 1161, in __call__
    self.randomize(label, fg_indices, bg_indices, image)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/croppad/dictionary.py", line 1143, in randomize
    self.centers = generate_pos_neg_label_crop_centers(
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/utils.py", line 502, in generate_pos_neg_label_crop_centers
    random_int = rand_state.randint(len(indices_to_use))
  File "mtrand.pyx", line 748, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/data/dataset.py", line 96, in __getitem__
    return self._transform(index)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/data/dataset.py", line 289, in _transform
    return self._post_transform(pre_random_item)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/data/dataset.py", line 235, in _post_transform
    item_transformed = apply_transform(_transform, item_transformed)
  File "/home/andre/workspace/App-WHSeg/.envm/lib/python3.9/site-packages/monai/transforms/transform.py", line 106, in apply_transform
    raise RuntimeError(f"applying transform {transform}") from e
RuntimeError: applying transform <monai.transforms.croppad.dictionary.RandCropByPosNegLabeld object at 0x7fe501edcf40>

I'm using UNETR with (96,96,96) kernel, and all the samples are above 96px in all the axis.
Oddly enough, when using CacheDataset or SmartCacheDataset this issue was never raised.
The training transform pipeline is the following:

def train_transforms(labels: list, roi_size: tuple):
    """
    Training transforms.

    Args
    ------
    labels: labels to select
    roi_size: patch cube dimensions

    """
    transforms = Compose(
        [
            LoadImaged(keys=["image", "label"]),
            SelectLabels(keys=["label"], labels=labels),
            AddChanneld(keys=["image", "label"]),
            Orientationd(keys=["image", "label"], axcodes="RAS"),
            Spacingd(
                keys=["image", "label"],
                pixdim=(1.0, 1.0, 1.0),
                mode=("bilinear", "nearest"),
            ),
            ScaleIntensityRanged(
                keys=["image"],
                a_min=-175,
                a_max=250,
                b_min=0.0,
                b_max=1.0,
                clip=True,
            ),
            CropForegroundd(keys=["image", "label"], source_key="image"),
            RandCropByPosNegLabeld(
                keys=["image", "label"],
                label_key="label",
                spatial_size=roi_size,
                pos=1,
                neg=1,
                num_samples=4,
                image_key="image",
                image_threshold=0,
            ),
            RandFlipd(
                keys=["image", "label"],
                spatial_axis=[0],
                prob=0.10,
            ),
            RandFlipd(
                keys=["image", "label"],
                spatial_axis=[1],
                prob=0.10,
            ),
            RandFlipd(
                keys=["image", "label"],
                spatial_axis=[2],
                prob=0.10,
            ),
            RandRotate90d(
                keys=["image", "label"],
                prob=0.10,
                max_k=3,
            ),
            RandShiftIntensityd(
                keys=["image"],
                offsets=0.10,
                prob=0.50,
            ),
            ToTensord(keys=["image", "label"]),
        ]
    )
    return transforms

I gathered that this is supposed to happen when pos=0 and neg=0, however, why is this happening? Is it because some of the croppings from those 4 result sometimes in samples that don't have foreground voxels?
From my understanding regarding the loading and feeding process differences, if this happens it should happen for both dataset types. Why only happens with PersistentDataset, since I've tested considerably with the others, and what is actually happening during this transform?
Thanks

Setup:

================================
Printing MONAI config...
================================
MONAI version: 0.8.0
Numpy version: 1.23.3
Pytorch version: 1.12.1+cu116
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: 714d00dffe6653e21260160666c4c201ab66511b

Optional dependencies:
Pytorch Ignite version: 0.4.6
Nibabel version: 4.0.2
scikit-image version: 0.19.3
Pillow version: 9.2.0
Tensorboard version: 2.10.0
gdown version: 4.5.1
TorchVision version: 0.13.1+cu116
tqdm version: 4.64.1
lmdb version: 1.3.0
psutil version: 5.9.2
pandas version: 1.4.3
einops version: 0.4.1
transformers version: 4.22.1
mlflow version: 1.28.0

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
System: Linux
Linux version: Ubuntu 20.04.2 LTS
Platform: Linux-5.15.0-50-generic-x86_64-with-glibc2.31
Processor: x86_64
Machine: x86_64
Python version: 3.9.7
Process name: python
Command: ['python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: [popenfile(path='/home/andre/workspace/App-WHSeg/src/cufile.log', fd=3, position=14286, mode='a', flags=33793), popenfile(path='/home/andre/.vscode-server/data/logs/20221013T182351/remoteagent.log', fd=19, position=1498, mode='a', flags=33793), popenfile(path='/home/andre/.vscode-server/data/logs/20221013T182351/ptyhost.log', fd=20, position=3920, mode='a', flags=33793), popenfile(path='/home/andre/.vscode-server/bin/74b1f979648cc44d385a2286793c226e611f59e7/vscode-remote-lock.andre.74b1f979648cc44d385a2286793c226e611f59e7', fd=99, position=0, mode='w', flags=32769)]
Num physical CPUs: 32
Num logical CPUs: 64
Num usable CPUs: 64
CPU usage (%): [11.8, 5.1, 5.0, 4.5, 3.8, 3.8, 4.5, 5.1, 4.5, 5.1, 4.5, 5.1, 5.1, 5.1, 5.0, 4.5, 5.1, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 7.6, 5.1, 5.1, 4.5, 4.5, 4.5, 92.4, 4.5, 4.5, 4.5, 4.5, 5.1, 4.5, 5.1, 4.5, 4.5, 5.1, 5.1, 3.8, 4.5, 5.1, 100.0, 5.1, 4.5, 6.3, 5.1, 5.1, 5.1, 4.5, 3.8, 4.5, 5.1, 4.5, 4.4, 4.5, 6.3, 3.8, 5.7, 11.3]
CPU freq. (MHz): 2231
Load avg. in last 1, 5, 15 mins (%): [4.8, 10.3, 10.9]
Disk usage (%): 96.5
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 125.6
Available memory (GB): 72.5
Used memory (GB): 42.9

================================
Printing GPU config...
================================
Num GPUs: 3
Has CUDA: True
CUDA version: 11.6
cuDNN enabled: True
cuDNN version: 8302
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86']
GPU 0 Name: NVIDIA GeForce RTX 3090
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 82
GPU 0 Total memory (GB): 23.7
GPU 0 CUDA capability (maj.min): 8.6
GPU 1 Name: NVIDIA GeForce RTX 3090
GPU 1 Is integrated: False
GPU 1 Is multi GPU board: False
GPU 1 Multi processor count: 82
GPU 1 Total memory (GB): 23.7
GPU 1 CUDA capability (maj.min): 8.6
GPU 2 Name: NVIDIA GeForce RTX 3090
GPU 2 Is integrated: False
GPU 2 Is multi GPU board: False
GPU 2 Multi processor count: 82
GPU 2 Total memory (GB): 23.7
GPU 2 CUDA capability (maj.min): 8.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions