cerebras.modelzoo.data_preparation.data_preprocessing.nlg_token_generator.NLGTokenGenerator#
- class cerebras.modelzoo.data_preparation.data_preprocessing.nlg_token_generator.NLGTokenGenerator(max_seq_length)[source]#
Bases:
object
Token Generator for NLG data sets such as E2E, DART, and WebNLG. Assumes the dataset has already been tokenized. Expect .jsonl input files that contains a “context” and a “completion” key. Used with GptHDF5DataProcessor.
Methods
encode
parse_semantic_data_array