cerebras.modelzoo.data_preparation.nlp.t5.utils.construct_denoising_objective#
- cerebras.modelzoo.data_preparation.nlp.t5.utils.construct_denoising_objective(tokens, vocab_size, sos_token, eos_token, rng)[source]#
Formats a raw sequence into a corrupted sequence and corresponding denoising targets. :param list tokens: A list of uncorrupted token indices. :param int vocab_size: The size of the vocabulary. :param int sos_token: The index of the SOS token in the vocabulary. :param int eos_token: The index of the EOS token in the vocabulary. :param np.random.Generator rng: The numpy random generator to be used as
the source of randomness for this function.
- Returns
a tuple (feature_dict, label) of denoising source and target numpy arrays.