cerebras.modelzoo.data.nlp.bert.BertClassifierDataProcessor.MNLIDataset#
- class cerebras.modelzoo.data.nlp.bert.BertClassifierDataProcessor.MNLIDataset(*args, **kwargs)[source]#
Bases:
cerebras.modelzoo.data.nlp.bert.BertClassifierDataProcessor.ClassifierDataset
SST2 dataset processor for sentiment analysis.
- Parameters
params (dict) – List of training input parameters for creating dataset.
is_training (bool) – Indicator for training or validation dataset.
Methods
Tokenizes a single text (if text2 is None) or a pair of texts.
read_tsv
- encode_sequence(text1, text2=None)#
Tokenizes a single text (if text2 is None) or a pair of texts. Truncates and adds special tokens as needed.
- Parameters
text1 (str) – First text to encode.
text2 (str) – Second text to encode or None.
- Returns
A list for input_ids, segment_ids and attention_mask. - input_ids (np.array[int.32]): Numpy array with input token indices.
Shape: (max_sequence_length).
- segment_ids (np.array[int.32]): Numpy array with segment indices.
Shape: (max_sequence_length).
- attention_mask (np.array[int.32]): Numpy array with input masks.
Shape: (max_sequence_length).