cerebras.modelzoo.data_preparation.data_preprocessing#

custom_hook_examples

custom_tokenizer_example

data_dedup

data_preprocessor

This module implements a generic data preprocessor called DataPreprocessor.

data_reader

This module contains helper functions and classes to read data from different formats, process them, and save in HDF5 format.

dpo_token_generator

fim_token_generator

FIMTokenGenerator Module

finetuning_token_generator

hooks

multimodal_finetuning_token_generator

multimodal_pretraining_token_generator

nlg_token_generator

preprocess_data

Script to generate an HDF5 dataset for GPT Models.

pretraining_token_generator

PretrainingTokenGenerator Module

tokenflow

utils

vsl_finetuning_token_generator

This module provides the VSLFinetuningTokenGenerator class, which extends the FinetuningTokenGenerator for processing tokenized text data specifically for variable-length sequence summarization (VSLS).

vsl_pretraining_token_generator

This module provides the VSLPretrainingTokenGenerator class, extending PretrainingTokenGenerator for advanced processing of tokenized text data tailored for variable-length sequence language modeling (VSLLM).