cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.create_bertsum_feature#
- cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.create_bertsum_feature(input_ids, segment_ids, cls_indices, labels, max_sequence_length, max_cls_tokens, pad_id)[source]#
Creates the feature dict for bertsum model after applying padding.
- Parameters
input_ids (list) – Token ids to pad.
segment_ids (list) – Segment ids to pad.
cls_indices (list) – Class ids to pad.
labels (list) – Labels to pad.
max_sequence_length (int) – Maximum sequence length.
max_cls_tokens (int) – Max class tokens.
pad_id (int) – Padding id.
tokenize (callable) – Method to tokenize the input sequence.
- Returns
dict for feature which includes keys: * ‘input_tokens’: Numpy array with input token indices.
shape: (max_sequence_length), dtype: int32.
- ’attention_mask’: Numpy array with attention mask.
shape: (max_sequence_length), dtype: int32.
- ’token_type_ids’: Numpy array with segment ids.
shape: (max_sequence_length), dtype: int32.
- ’labels’: Numpy array with labels.
shape: (max_cls_tokens), dtype: int32.
- ’cls_indices’: Numpy array with class indices.
Shape: (max_cls_tokens).
- ’cls_weights’: Numpy array with class weights.
Shape: (max_cls_tokens).