cerebras.modelzoo.data_preparation.nlp.tokenizers.HFTokenizer.HFTokenizer#
- class cerebras.modelzoo.data_preparation.nlp.tokenizers.HFTokenizer.HFTokenizer(vocab_file, special_tokens=None)[source]#
Bases:
object
Designed to integrate the HF’s Tokenizer library :param vocab_file: A vocabulary file to create the tokenizer from. :type vocab_file: str :param special_tokens: A list or a string representing the special
tokens that are to be added to the tokenizer.
Methods
add_special_tokens
add_token
decode
encode
get_token
This api is designed to extract token information from the tokenizer config json file.
get_token_id
set_eos_pad_tokens
Attributes
eos
pad