cerebras.modelzoo.data.vision.segmentation.Hdf5BaseIterDataProcessor.Hdf5BaseIterDataProcessor#

class cerebras.modelzoo.data.vision.segmentation.Hdf5BaseIterDataProcessor.Hdf5BaseIterDataProcessor(*args, **kwargs)[source]#

Bases: abc.ABC, torch.utils.data.IterableDataset

A HDF5 dataset processor for UNet HDF dataset. Performs on-the-fly augmentation of image and labek.

Functionality includes:

Reading data from HDF5 documents Augmenting data

Parameters

params (dict) – dict containing training input parameters for creating dataset.

Expects the following fields:

  • “data_dir” (str or list of str): Path to dataset HDF5 files

  • “num_classes (int): Maximum length of the sequence to generate

  • “image_shape” (int): Expected shape of output images and label, used in assert checks.

  • “loss” (str): Loss type, supported: {“bce”, “multilabel_bce”, “ssce”}

  • “normalize_data_method” (str): Can be one of {None, “zero_centered”, “zero_one”}

  • “batch_size” (int): Batch size.

  • “shuffle” (bool): Flag to enable data shuffling.

  • “shuffle_buffer” (int): Size of shuffle buffer in samples.

  • “shuffle_seed” (int): Shuffle seed.

  • “num_workers” (int): How many subprocesses to use for data loading.

  • “drop_last” (bool): If True and the dataset size is not divisible

    by the batch size, the last incomplete batch will be dropped.

  • “prefetch_factor” (int): Number of samples loaded in advance by each worker.

  • “persistent_workers” (bool): If True, the data loader will not shutdown

    the worker processes after a dataset has been consumed once.

Methods

create_dataloader

Classmethod to create the dataloader object.

create_dataloader(is_training=True)[source]#

Classmethod to create the dataloader object.