cerebras.modelzoo.data.vision.segmentation.Hdf5BaseDataProcessor.Hdf5BaseDataProcessor#
- class cerebras.modelzoo.data.vision.segmentation.Hdf5BaseDataProcessor.Hdf5BaseDataProcessor(*args, **kwargs)[source]#
Bases:
torch.utils.data.Dataset
A HDF5 dataset processor for UNet HDF dataset. Performs on-the-fly augmentation of image and labek.
- Functionality includes:
Reading data from HDF5 documents Augmenting data
- Parameters
params (dict) – dict containing training input parameters for creating dataset.
Expects the following fields:
“data_dir” (str or list of str): Path to dataset HDF5 files
“num_classes (int): Maximum length of the sequence to generate
“image_shape” (int): Expected shape of output images and label, used in assert checks.
“loss” (str): Loss type, supported: {“bce”, “multilabel_bce”, “ssce”}
“normalize_data_method” (str): Can be one of {None, “zero_centered”, “zero_one”}
“batch_size” (int): Batch size.
“shuffle” (bool): Flag to enable data shuffling.
“shuffle_buffer” (int): Size of shuffle buffer in samples.
“shuffle_seed” (int): Shuffle seed.
“num_workers” (int): How many subprocesses to use for data loading.
- “drop_last” (bool): If True and the dataset size is not divisible
by the batch size, the last incomplete batch will be dropped.
“prefetch_factor” (int): Number of samples loaded in advance by each worker.
- “persistent_workers” (bool): If True, the data loader will not shutdown
the worker processes after a dataset has been consumed once.
Methods
Classmethod to create the dataloader object.