cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.hdf5_dataset_preprocessors#

Classes

FIMDataPreprocessor

LMDataPreprocessor

LlavaBasePreprocessor

LlavaPhaseOnePreprocessor

LlavaPhaseTwoPreprocessor

SummarizationPreprocessor

VSLLMDataPreprocessor

VSLSummarizationPreprocessor

self.chunk_lengths stores List(List(Tuple)) The outer list is chunks, inner list is sequences, tuples are prompt + completion pairs

Exceptions

ContinueLoopException