Cerebras Model Zoo Extensions#
This module contains integrations of external tools to the Trainer.
Downstream Validation Callbacks#
The set of callbacks that implement eval harness, i.e. external frameworks for running
downstream validation with the Trainer
.
BigCodeEvalHarness
- class cerebras.modelzoo.trainer.extensions.bigcode.BigCodeEvalHarness(bigcode_args, keep_data_dir=False, every_n_vals=1, flags=None, name_scope=None, batch_size=None, data_dir=None, max_sequence_length=None, tokenizer_file_path=None, eos_id=None, **dataloader_args)[source]#
Bases:
cerebras.modelzoo.trainer.callbacks.callback.ValidationCallback
ValidationCallback class to run BigCode’s Evaluation Harness.
- Parameters
bigcode_args (Union[cerebras.modelzoo.trainer.extensions.bigcode.bigcode_eval_harness.BigCodeCLIArgs, Dict[str, Any]]) – BigCodeCLIArgs dataclass or dict capturing BCEH’s CLI args
keep_data_dir (bool) – Specifies whether dumped data samples should be kept for reuse. Defaults to False, i.e. data samples are deleted after the run.
every_n_vals (int) –
Run the BigCode eval harness script every N validations. e.g. If the eval_frequency is set to 200 and N=2,
then BigCode eval harness runs every 400 training steps.
The BigCode eval harness script will also always run after the final training iteration.
flags (Optional[dict]) – A optional dictionary of scoped global flags to set during the BigCode eval harness run.
name_scope (Optional[str]) – An optional string that gets added to the trainer’s name scope.
batch_size (Optional[int]) – Batch size to BigCodeEvalHarness to preprocess input data samples from the specified eval harness tasks.
data_dir (Optional[str]) – Path to data directory
max_sequence_length (Optional[int]) – Maximum sequence length
tokenizer_file_path (Optional[str]) – Path to tokenizer file
eos_id (Optional[int]) – End of sentence token id
dataloader_args – Any additional dataloader args, e.g. num_workers.
EleutherEvalHarness
- class cerebras.modelzoo.trainer.extensions.eleuther.EleutherEvalHarness(eeh_args, keep_data_dir=False, every_n_vals=1, flags=None, name_scope=None, batch_size=None, data_dir=None, max_sequence_length=None, tokenizer_file_path=None, eos_id=None, **dataloader_args)[source]#
Bases:
cerebras.modelzoo.trainer.callbacks.callback.ValidationCallback
Callback class to run EleutherAI’s Evaluation Harness.
- Parameters
eeh_args (Union[cerebras.modelzoo.trainer.extensions.eleuther.eval_harness_utils.EleutherCLIArgs, Dict[str, Any]]) – EleutherCLIArgs dataclass or dict capturing EEH’s CLI args
keep_data_dir (bool) – Specifies whether dumped data samples should be kept for reuse. Defaults to False, i.e. data samples are deleted after the run.
every_n_vals (int) –
Run the EEH script every N validations e.g. If the eval_frequency is set to 200 and N=2,
then EEH runs every 400 training steps.
The EEH script will also always run after the final training iteration.
flags (Optional[dict]) – An optional dictionary of scoped global flags to set during the EEH run.
name_scope (Optional[str]) – An optional string that gets added to the trainer’s name scope.
batch_size (Optional[int]) – Batch size to EleutherEvalHarness to preprocess input data samples from the specified eval harness tasks.
data_dir (Optional[str]) – Path to data directory
max_sequence_length (Optional[int]) – Maximum sequence length
tokenizer_file_path (Optional[str]) – Path to tokenizer file
eos_id (Optional[int]) – End of sentence token id
dataloader_args – Any additional dataloader args, e.g. num_workers.
- property has_generative_task#
Returns True if the task dictionary contains a generative task.
- property has_non_generative_task#
Returns True if the task dictionary contains a non-generative task.
Eval Harness Utils#
Util classes capturing the command line interface arguments for the supported eval harness frameworks.
BigCodeCLIArgs
- class cerebras.modelzoo.trainer.extensions.bigcode.BigCodeCLIArgs(prefix='', do_sample=True, temperature=None, top_k=None, top_p=None, n_samples=1, seed=0, tasks=None, instruction_tokens=None, max_length_generation=512, limit=None, limit_start=0, save_every_k_tasks=- 1, postprocess=True, allow_code_execution=False, generation_only=True, load_generations_path=None, load_data_path=None, metric_output_path='evaluation_results.json', save_generations=True, load_generations_intermediate_paths=None, save_generations_path='generations.json', save_references=True, save_references_path='references.json', prompt='prompt', check_references=False)[source]#
Captures BigCode EH’s CLI arguments with defaults.
- Fields:
prefix: Prefix to add to the prompt. For example InCoder needs prefix=’<| file ext=.py |>n’ do_sample: Sample from the language model’s output distribution. temperature: Sampling temperature used for generation. top_k: Top-k parameter used for generation. top_p: Top-p parameter used for nucleus sampling. n_samples: Number of completions to generate for each sample. seed: Random seed used for evaluation. tasks: List of tasks to evaluate code evals instruction_tokens: A series of instruction tokens used for instruction-tuning
benchamrks separated by comma e.g. <user_message>,<end_user_message>,<assistant_message>
max_length_generation: Maximum length of generated sequence (prompt+generation). limit: Number of samples to solve and evaluate from the benchmark limit_start: Optional offset to start from when limiting the number of samples save_every_k_tasks: Optional saving after every k tasks postprocess: Postprocess model outputs before execution, always on except
during generation tests
- allow_code_execution: Allow code evaluation to execute external/untrusted Python
code on your machine
generation_only: Do code generation but no evaluation load_generations_path: Path of file with previously generated solutions, if
provided generation is skipped and only evaluation is done
load_data_path: Path of additional data to load for the tasks metric_output_path: Path to save the results save_generations: Whether to save code generations load_generations_intermediate_paths: List of paths for saving the
intermediate code generations
save_generations_path: Path for saving the code generations save_references: Whether to save reference solutions/tests save_references_path: Path for saving the references solutions/tests prompt: Prompt type to use for generation in HumanEvalPack tasks check_references: Don’t run generation but benchmark groundtruth (useful for debugging)
EleutherCLIArgs
- class cerebras.modelzoo.trainer.extensions.eleuther.EleutherCLIArgs(tasks, num_fewshot=None, output_path=None, limit=None, use_cache=None, cache_requests=None, check_integrity=False, write_out=False, log_samples=False, show_config=False, include_path=None, predict_only=False, seed='0,1234,1234', trust_remote_code=False, verbosity='INFO', max_length_generation=None, temperature=None, top_k=None, top_p=None)[source]#
Captures EEH’s CLI arguments with defaults.
- Fields:
- tasks: List of tasks to evaluate
To get full list of tasks, use the command
lm-eval --tasks list
num_fewshot: Number of examples in few-shot context output_path: The path to the output file where the result metrics
will be saved. If the path is a directory and log_samples is true, the results will be saved in the directory. Else the parent directory will be used.
- limit: Limit the number of examples per task.
If <1, limit is a percentage of the total number of examples.
- use_cache: A path to a sqlite db file for caching model responses.
None if not caching.
- cache_requests: Speed up evaluation by caching the building of
dataset requests. None if not caching.
- check_integrity: Whether to run the relevant part of the test suite
for the tasks.
write_out: Prints the prompt for the first few documents. log_samples: If True, write out all model outputs and documents for
per-sample measurement and post-hoc analysis. Use with –output_path.
- show_config: If True, shows the the full config of all tasks at the
end of the evaluation.
- include_path: Additional path to include if there are external tasks
to include.
- predict_only: Use with –log_samples. Only model outputs will be
saved and metrics will not be evaluated.
- seed: Set seed for python’s random, numpy and torch.
Accepts a comma-separated list of 3 values for python’s random, numpy, and torch seeds, respectively, or a single integer to set the same seed for all three. The values are either an integer or
None
to not set the seed. Default is0,1234,1234
(for backward compatibility). E.g.--seed 0,None,8
setsrandom.seed(0)
andtorch.manual_seed(8)
. Here numpy’s seed is not set since the second value isNone
. E.g,--seed 42
sets all three seeds to 42.- trust_remote_code: Sets trust_remote_code to True to execute code to
create HF Datasets from the Hub
verbosity: EEH logging level max_length_generation: Maximum length of generated sequence (prompt+generation). temperature: Sampling temperature used for generation. top_k: Top-k parameter used for generation. top_p: Top-p parameter used for nucleus sampling.
Other Extensions#
Other extensions implemented as callbacks that can be used to enhance the
Trainer
.
HFCacheDir
#
- class cerebras.modelzoo.trainer.extensions.HFCacheDir(cache_dir)[source]#
Bases:
cerebras.modelzoo.trainer.callbacks.callback.Callback
A callback that sets up the HuggingFace cache directory.
- Parameters
cache_dir (str) – The cache directory to use for HuggingFace utilities.
WandbLogger
#
- class cerebras.modelzoo.trainer.extensions.WandbLogger(project=None, group=None, run_id=None, run_name=None, job_type=None, tags=None, resume='auto', entity=None)[source]#
Bases:
cerebras.modelzoo.trainer.loggers.logger.Logger
Logger class for logging metrics to Weights and Biases.
- Parameters
project (Optional[str]) – The name of the project to which the run belongs.
group (Optional[str]) – The name of the group to which the run belongs.
run_id (Optional[str]) – The unique identifier for the run.
run_name (Optional[str]) – The name of the run.
job_type (Optional[str]) – The type of job.
tags (Optional[List[str]]) – List of tags to be associated with the run.
resume (str) – Resume mode for the run. It can be one of the following: - “never”: Do not resume the run. - “allow”: Allow the run to resume if a previous run exists. - “auto”: Automatically resume the run if a previous run exists. - “must”: Resume the run if a previous run exists.
entity (str) – An entity is a username or team name where you’re sending runs. This entity must exist before you can send runs there, so make sure to create your account or team in the UI before starting to log runs.