cerebras.modelzoo.data.nlp.bert.BertClassifierDataProcessor.SST2Dataset#

class cerebras.modelzoo.data.nlp.bert.BertClassifierDataProcessor.SST2Dataset(*args, **kwargs)[source]#

Bases: cerebras.modelzoo.data.nlp.bert.BertClassifierDataProcessor.ClassifierDataset

SST2 dataset processor for sentiment analysis.

Parameters
  • params (dict) – List of training input parameters for creating dataset.

  • is_training (bool) – Indicator for training or validation dataset.

Methods

encode_sequence

Tokenizes a single text (if text2 is None) or a pair of texts.

read_tsv

encode_sequence(text1, text2=None)#

Tokenizes a single text (if text2 is None) or a pair of texts. Truncates and adds special tokens as needed.

Parameters
  • text1 (str) – First text to encode.

  • text2 (str) – Second text to encode or None.

Returns

A list for input_ids, segment_ids and attention_mask. - input_ids (np.array[int.32]): Numpy array with input token indices.

Shape: (max_sequence_length).

  • segment_ids (np.array[int.32]): Numpy array with segment indices.

    Shape: (max_sequence_length).

  • attention_mask (np.array[int.32]): Numpy array with input masks.

    Shape: (max_sequence_length).