cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.utils.pad_helper#
- cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.utils.pad_helper(samples_lst, diff, fim_pad_tok_id)[source]#
Helper for padding. We put all padding tokens into the last sequence.
- Parameters
samples_lst (List[List[int]]) – List of lists that contain token ids
diff (int) – Number of tokens to pad
fim_pad_tok_id (int) – Id for padding token
- Returns
List of lists of token ids with padding
- Return type
(List[List[int]])