cerebras.modelzoo.data_preparation.nlp.pile.download.get_urls_for_tokenizer_files#

cerebras.modelzoo.data_preparation.nlp.pile.download.get_urls_for_tokenizer_files()[source]#

Get urls for downloading files for tokenization.

Returns

A dictionary containing urls for original GPT2 tokenizaiton and GPT-NeoX tokenization schemes