cerebras.modelzoo.data.nlp.t5.t5_utils.split_sequences#
- cerebras.modelzoo.data.nlp.t5.t5_utils.split_sequences(tokens, length)[source]#
Split a long sequence into shorter sequences of the specified length. :param list tokens: A list of token indices. :param int length: The maximum allowed length of a sample.
- Returns
A list of sequences containing exactly the same samples as before split into seperate samples such that no element of the dataset has length longer than specified.