cerebras.modelzoo.data.common.SyntheticDataProcessor.SyntheticDataProcessor#

class cerebras.modelzoo.data.common.SyntheticDataProcessor.SyntheticDataProcessor(params)[source]#

Bases: object

Creates a synthetic dataset.

Constructs a SyntheticDataset from the user-provided nested structure of input tensors and returns a torch.utils.data.DataLoader from the SyntheticDataset and the regular torch.utils.data.DataLoader inputs specified in params.yaml. The torch.utils.data.DataLoader is returned by calling the create_dataloader() method.

Parameters

params – Dictionary containing dataset inputs and specifications. Within this dictionary, the user provides the additional ‘synthetic_inputs’ field that corresponds to a nested tree structure of input tensor specifications used to construct the SyntheticDataset.
params.yaml (In) –

data_processor: “SyntheticDataProcessor”. Must set this input to
use this class

batch_size: int shuffle_seed: Optional[int] = None. If it is not None, then

torch.manual_seed(seed=shuffle_seed) will be called when creating the dataloader.

num_examples: Optional[int] = None. If it is not None, then
the it specifies the number of examples/samples in the SyntheticDataset. Otherwise, the SyntheticDataset will generate samples indefinitely.

… synthetic_inputs:

Methods

create_dataloader

Returns torch.utils.data.DataLoader that corresponds to the created SyntheticDataset.

create_dataloader()[source]#: Returns torch.utils.data.DataLoader that corresponds to the created SyntheticDataset.

cerebras.modelzoo.data.common.SyntheticDataProcessor.custom_dict_unflatten

cerebras.modelzoo.data.common.config