cerebras.modelzoo.data.nlp.gpt.InferenceDataProcessor.get_token_ids#

cerebras.modelzoo.data.nlp.gpt.InferenceDataProcessor.get_token_ids(text, tokenizer)[source]#

Get encoded token ids from a string using the specified tokenizer.

Parameters
  • text (str) – The input string.

  • tokenizer (PreTrainedTokenizerBase) – Tokenizer class from huggingface transformers library.

Returns

List of token ids.

Return type

List[int]