cerebras.modelzoo.data.nlp.bert.bert_utils.shard_and_shuffle_data#
- cerebras.modelzoo.data.nlp.bert.bert_utils.shard_and_shuffle_data(files_per_task, shuffle, shuffle_seed)[source]#
Shard the data across the processes.
- Param
list files_per_task: List of files with input data.
- Parameters
shuffle (bool) – Whether to shuffle data or not.
shuffle_seed (bool) – Seed to use for shuffling.
- Returns
A tuple with: * int processed_buffers: Counter for how many buffers of data processed so far. * list files_per_worker: Files to process for the input worker. * int shuffle_seed: Updated shuffle seed. * random.Random rng: Object with shuffle function.