cerebras.modelzoo.common.utils.model.transformer_utils.create_vsl_mask#
- cerebras.modelzoo.common.utils.model.transformer_utils.create_vsl_mask(attention_span, position_ids=None, num_heads=1, is_causal=True, device=None, dtype=torch.float16, use_neg_inf=True)[source]#
Creates a VSL attention mask.
E.g. for a VSL sequence that consists of a sequence of length 3 and a sequence of length 2, then the causal mask is: ``` [
[0, -inf, -inf, -inf, -inf], [0, 0, -inf, -inf, -inf], [0, 0, 0, -inf, -inf], [-inf, -inf, -inf, 0, -inf,], [-inf, -inf, -inf, 0, 0],
whereas the non-causal mask is: ``` [
[0, 0, 0, -inf, -inf], [0, 0, 0, -inf, -inf], [0, 0, 0, -inf, -inf], [-inf, -inf, -inf, 0, 0], [-inf, -inf, -inf, 0, 0],
- Parameters
attention_span (torch.Tensor) – Attention span of keys for VSL, has shape [batch_size, seq_len].
position_ids (torch.Tensor) – Optional position id of keys for VSL, has shape [batch_size, seq_len].
num_heads (int) – Number of heads.
is_causal (bool) – The mask is causal or not (bidirectional), default to True.
device (torch.device) – The device of the input to the model, used for causal mask creation.
dtype (torch.dtype) – Dtype of the resulting mask, default to torch.float16.
use_neg_inf (bool) – Use negative infinity instead of one in the resulting mask, default to True.
- Returns
The attention mask of shape [batch_size, num_heads, seq_len, seq_len].