cerebras.modelzoo.data.common.h5_map_dataset#

dataset

preprocess_pile

Preprocess a dataste saved in the Eleuther lm_dataformat format such as Pile for use in a data processor such as the GptG5MapDataProcessor which is backed by a H5Reader.

readers