cerebras.modelzoo.data.common.h5_map_dataset.dataset.MLMHDF5Dataset#

class cerebras.modelzoo.data.common.h5_map_dataset.dataset.MLMHDF5Dataset(*args, **kwargs)[source]#

Bases: cerebras.modelzoo.data.common.h5_map_dataset.dataset.HDF5Dataset

Dataset class to handle text preprocessing in bert mlm datasets.

Parameters

params (dict) – A dictionary containing parameters that HDF5Dataset accepts along with the following add-ons: - “data_dir” (str): the path to the directory containing the images. - “transforms” (list[dict]): a specification of the torchvision transforms.

Methods

generate_sample

load_state_dict

map

state_dict

Attributes

by_sample