cerebras.modelzoo.data_preparation.nlp.bert.bertsum_data_processor#
Common pre-processing functions for BERTSUM data processing
Functions
Format input tokenized files into simpler json files. |
|
Split sentences and perform tokenization. |
Classes
Converts input into bert format. |
|
JsonConverter simplifies the input and convert it into json files format with source and target (summarized) texts. |
|
Based on the reference n-grams, RougeBasedLabelsFormatter selects sentences from the input with the highest rouge-score calculated between them and the reference. |
|
Tokenizes files from the input path into output path. |