cerebras.modelzoo.data.nlp.bert.bert_utils.shard_and_shuffle_data#

cerebras.modelzoo.data.nlp.bert.bert_utils.shard_and_shuffle_data(files_per_task, shuffle, shuffle_seed)[source]#

Shard the data across the processes.

Param

list files_per_task: List of files with input data.

Parameters
  • shuffle (bool) – Whether to shuffle data or not.

  • shuffle_seed (bool) – Seed to use for shuffling.

Returns

A tuple with: * int processed_buffers: Counter for how many buffers of data processed so far. * list files_per_worker: Files to process for the input worker. * int shuffle_seed: Updated shuffle seed. * random.Random rng: Object with shuffle function.