Concepts and Guides
Release Notes
Using Hugging Face datasets for auto-regressive LM
Creating HDF5 dataset for GPT models
Shuffling Samples for HDF5 dataset of GPT models
Optimizing SlimPajama dataset pre-processing
Data preprocessing scripts
#
Using Hugging Face datasets for auto-regressive LM
Creating HDF5 dataset for GPT models
Shuffling Samples for HDF5 dataset of GPT models
Optimizing SlimPajama dataset pre-processing