cerebras.modelzoo.data.common.SyntheticDataProcessor.SyntheticDataProcessor#

class cerebras.modelzoo.data.common.SyntheticDataProcessor.SyntheticDataProcessor(params)[source]#

Bases: object

Creates a synthetic dataset.

Constructs a SyntheticDataset from the user-provided nested structure of input tensors and returns a torch.utils.data.DataLoader from the SyntheticDataset and the regular torch.utils.data.DataLoader inputs specified in params.yaml. The torch.utils.data.DataLoader is returned by calling the create_dataloader() method.

Parameters
  • params – Dictionary containing dataset inputs and specifications. Within this dictionary, the user provides the additional ‘synthetic_inputs’ field that corresponds to a nested tree structure of input tensor specifications used to construct the SyntheticDataset.

  • params.yaml (In) –

    data_processor: “SyntheticDataProcessor”. Must set this input to

    use this class

    batch_size: int shuffle_seed: Optional[int] = None. If it is not None, then

    torch.manual_seed(seed=shuffle_seed) will be called when creating the dataloader.

    num_examples: Optional[int] = None. If it is not None, then

    the it specifies the number of examples/samples in the SyntheticDataset. Otherwise, the SyntheticDataset will generate samples indefinitely.

    … synthetic_inputs:

Methods

create_dataloader

Returns torch.utils.data.DataLoader that corresponds to the created SyntheticDataset.

create_dataloader()[source]#

Returns torch.utils.data.DataLoader that corresponds to the created SyntheticDataset.