cerebras.modelzoo.data.nlp.bert.bert_utils.build_vocab#
- cerebras.modelzoo.data.nlp.bert.bert_utils.build_vocab(vocab_file, do_lower, oov_token)[source]#
Load up the vocab file. :param: str vocab_file: Path to the vocab file. :param: bool do_lower: Whether the tokens should be
converted to lower case.
- Parameters
oov_token (str) – Token reserved for the out of vocabulary tokens.
- Returns
A tuple with: * dict vocab: Contains the words from the vocab as keys
and indices as values.
int vocab_size: Size of the resulted vocab.