cerebras.modelzoo.common.utils.model.transformer_utils.create_vsl_mask#

cerebras.modelzoo.common.utils.model.transformer_utils.create_vsl_mask(attention_span, position_ids=None, num_heads=1, is_causal=True, device=None, dtype=torch.float16, use_neg_inf=True)[source]#

Creates a VSL attention mask.

E.g. for a VSL sequence that consists of a sequence of length 3 and a sequence of length 2, then the causal mask is: ``` [

[0, -inf, -inf, -inf, -inf], [0, 0, -inf, -inf, -inf], [0, 0, 0, -inf, -inf], [-inf, -inf, -inf, 0, -inf,], [-inf, -inf, -inf, 0, 0],

whereas the non-causal mask is: ``` [

[0, 0, 0, -inf, -inf], [0, 0, 0, -inf, -inf], [0, 0, 0, -inf, -inf], [-inf, -inf, -inf, 0, 0], [-inf, -inf, -inf, 0, 0],

Parameters
  • attention_span (torch.Tensor) – Attention span of keys for VSL, has shape [batch_size, seq_len].

  • position_ids (torch.Tensor) – Optional position id of keys for VSL, has shape [batch_size, seq_len].

  • num_heads (int) – Number of heads.

  • is_causal (bool) – The mask is causal or not (bidirectional), default to True.

  • device (torch.device) – The device of the input to the model, used for causal mask creation.

  • dtype (torch.dtype) – Dtype of the resulting mask, default to torch.float16.

  • use_neg_inf (bool) – Use negative infinity instead of one in the resulting mask, default to True.

Returns

The attention mask of shape [batch_size, num_heads, seq_len, seq_len].