cerebras.modelzoo.data_preparation.data_preprocessing.fim_token_generator#
FIMTokenGenerator Module
This module offers the FIMTokenGenerator class, an extension of the PretrainingTokenGenerator class, tailored for fill in the middle (FIM) tasks.
- Usage:
from your_module_name import FIMTokenGenerator
# Initialize the token generator with the required parameters tokenizer = FIMTokenGenerator(params, tokenizer_impl, eos_id, pad_id)
# Tokenize and encode text data tokenized_data, stats = tokenizer.encode(“Your sample text to process.”)
Classes
Initialize the FIMTokenGenerator class. |