cerebras.modelzoo.common.utils.model.transformer_utils#

Functions

create_2D_autoregressive_mask

Creates a reverted autoregressive (upper triangular) mask where the 0s refers to the tokens

create_2D_full_mask

Create autoregressive (triangular) mask.

create_broadcasted_autoregressive_mask

Create broadcasted causal attention mask optionally with VSL masking.

create_vsl_mask

Creates a VSL attention mask.

get_embedding_dtype

get_extended_attention_mask

Makes broadcastable attention and causal masks so that future and masked tokens are ignored. :param attention_mask: Mask with ones indicating tokens to attend to, zeros for tokens to ignore. :type attention_mask: torch.Tensor :param input_shape: The shape of the input to the model (required for causal masks). :type input_shape: Tuple[int] :param causal: (bool): If enabled the returned mask will be causal. :param device: (torch.device): The device of the input to the model.

make_key_padding_mask_broadcastable

Makes broadcastable key_padding masks so that padding tokens are ignored.

make_sparse_mask_broadcastable

Create broadcastable sparse mask so that masked positions are ignored.

replace_with_zero_and_neg_inf

Replace the values in mask tensor with 0 and -inf.

smooth_loss

Add label smoothing to loss function, this is a workaround method of label smoothing in our system