cerebras.modelzoo.data_preparation.nlp.t5.utils.random_spans_noise_mask#
- cerebras.modelzoo.data_preparation.nlp.t5.utils.random_spans_noise_mask(length, noise_density=0.15, mean_noise_span_length=3.0, rng=None)[source]#
Noise mask consisting of random spans of noise tokens. The number of noise tokens and the number of noise spans and non-noise spans are determined deterministically as follows:
num_noise_tokens = round(length * noise_density) num_nonnoise_spans = num_noise_spans = round( num_noise_tokens / mean_noise_span_length)
Spans alternate between non-noise and noise, beginning with non-noise. Subject to the above restrictions, all masks are equally likely. :param int length: Length of the incoming token sequence. :param float noise_density: A float - approximate density of output mask. :param float mean_noise_span_length: A number used in the noise mask calculation. :param np.random.Generator rng: The numpy random generator to be used as
the source of randomness for this function.
- Returns
A boolean np.array with shape [length].