cerebras.modelzoo.data_preparation.data_preprocessing#
This module implements a generic data preprocessor called DataPreprocessor. |
|
This module contains helper functions and classes to read data from different formats, process them, and save in HDF5 format. |
|
FIMTokenGenerator Module |
|
Script to generate an HDF5 dataset for GPT Models. |
|
PretrainingTokenGenerator Module |
|
This module provides the VSLFinetuningTokenGenerator class, which extends the FinetuningTokenGenerator for processing tokenized text data specifically for variable-length sequence summarization (VSLS). |
|
This module provides the VSLPretrainingTokenGenerator class, extending PretrainingTokenGenerator for advanced processing of tokenized text data tailored for variable-length sequence language modeling (VSLLM). |