cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.utils.process_dataset#
- cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.utils.process_dataset(files, dataset_processor, processes)[source]#
Process a dataset and write it into HDF5 format.
- Parameters
files (list) – List of files to process.
dataset_processor – Class containing methods that specify how the dataset will be processed and written into HDF5 files.
processes (int) – Number of processes to use.
- Returns
- Dictionary containing results of execution, specifically as number of
processed, discarded, and successful files as well as number of examples from all processes.