Cerebras Model Zoo Extensions#

This module contains integrations of external tools to the Trainer.

Downstream Validation Callbacks#

The set of callbacks that implement eval harness, i.e. external frameworks for running downstream validation with the Trainer.

BigCodeEvalHarness

class cerebras.modelzoo.trainer.extensions.bigcode.BigCodeEvalHarness(bigcode_args, keep_data_dir=False, every_n_vals=1, flags=None, name_scope=None, batch_size=None, data_dir=None, max_sequence_length=None, tokenizer_file_path=None, eos_id=None, **dataloader_args)[source]#

Bases: cerebras.modelzoo.trainer.callbacks.callback.ValidationCallback

ValidationCallback class to run BigCode’s Evaluation Harness.

Parameters
  • bigcode_args (Union[cerebras.modelzoo.trainer.extensions.bigcode.bigcode_eval_harness.BigCodeCLIArgs, Dict[str, Any]]) – BigCodeCLIArgs dataclass or dict capturing BCEH’s CLI args

  • keep_data_dir (bool) – Specifies whether dumped data samples should be kept for reuse. Defaults to False, i.e. data samples are deleted after the run.

  • every_n_vals (int) –

    Run the BigCode eval harness script every N validations. e.g. If the eval_frequency is set to 200 and N=2,

    then BigCode eval harness runs every 400 training steps.

    The BigCode eval harness script will also always run after the final training iteration.

  • flags (Optional[dict]) – A optional dictionary of scoped global flags to set during the BigCode eval harness run.

  • name_scope (Optional[str]) – An optional string that gets added to the trainer’s name scope.

  • batch_size (Optional[int]) – Batch size to BigCodeEvalHarness to preprocess input data samples from the specified eval harness tasks.

  • data_dir (Optional[str]) – Path to data directory

  • max_sequence_length (Optional[int]) – Maximum sequence length

  • tokenizer_file_path (Optional[str]) – Path to tokenizer file

  • eos_id (Optional[int]) – End of sentence token id

  • dataloader_args – Any additional dataloader args, e.g. num_workers.

run(trainer)[source]#

Run BigCode Eval Harness.

Parameters

trainer – the Trainer object

EleutherEvalHarness

class cerebras.modelzoo.trainer.extensions.eleuther.EleutherEvalHarness(eeh_args, keep_data_dir=False, every_n_vals=1, flags=None, name_scope=None, batch_size=None, data_dir=None, max_sequence_length=None, tokenizer_file_path=None, eos_id=None, **dataloader_args)[source]#

Bases: cerebras.modelzoo.trainer.callbacks.callback.ValidationCallback

Callback class to run EleutherAI’s Evaluation Harness.

Parameters
  • eeh_args (Union[cerebras.modelzoo.trainer.extensions.eleuther.eval_harness_utils.EleutherCLIArgs, Dict[str, Any]]) – EleutherCLIArgs dataclass or dict capturing EEH’s CLI args

  • keep_data_dir (bool) – Specifies whether dumped data samples should be kept for reuse. Defaults to False, i.e. data samples are deleted after the run.

  • every_n_vals (int) –

    Run the EEH script every N validations e.g. If the eval_frequency is set to 200 and N=2,

    then EEH runs every 400 training steps.

    The EEH script will also always run after the final training iteration.

  • flags (Optional[dict]) – An optional dictionary of scoped global flags to set during the EEH run.

  • name_scope (Optional[str]) – An optional string that gets added to the trainer’s name scope.

  • batch_size (Optional[int]) – Batch size to EleutherEvalHarness to preprocess input data samples from the specified eval harness tasks.

  • data_dir (Optional[str]) – Path to data directory

  • max_sequence_length (Optional[int]) – Maximum sequence length

  • tokenizer_file_path (Optional[str]) – Path to tokenizer file

  • eos_id (Optional[int]) – End of sentence token id

  • dataloader_args – Any additional dataloader args, e.g. num_workers.

property has_generative_task#

Returns True if the task dictionary contains a generative task.

property has_non_generative_task#

Returns True if the task dictionary contains a non-generative task.

run(trainer)[source]#

Run the EleutherAI Evaluation Harness.

Parameters

trainer – the Trainer object

Eval Harness Utils#

Util classes capturing the command line interface arguments for the supported eval harness frameworks.

BigCodeCLIArgs

class cerebras.modelzoo.trainer.extensions.bigcode.BigCodeCLIArgs(prefix='', do_sample=True, temperature=None, top_k=None, top_p=None, n_samples=1, seed=0, tasks=None, instruction_tokens=None, max_length_generation=512, limit=None, limit_start=0, save_every_k_tasks=- 1, postprocess=True, allow_code_execution=False, generation_only=True, load_generations_path=None, load_data_path=None, metric_output_path='evaluation_results.json', save_generations=True, load_generations_intermediate_paths=None, save_generations_path='generations.json', save_references=True, save_references_path='references.json', prompt='prompt', check_references=False)[source]#

Captures BigCode EH’s CLI arguments with defaults.

Fields:

prefix: Prefix to add to the prompt. For example InCoder needs prefix=’<| file ext=.py |>n’ do_sample: Sample from the language model’s output distribution. temperature: Sampling temperature used for generation. top_k: Top-k parameter used for generation. top_p: Top-p parameter used for nucleus sampling. n_samples: Number of completions to generate for each sample. seed: Random seed used for evaluation. tasks: List of tasks to evaluate code evals instruction_tokens: A series of instruction tokens used for instruction-tuning

benchamrks separated by comma e.g. <user_message>,<end_user_message>,<assistant_message>

max_length_generation: Maximum length of generated sequence (prompt+generation). limit: Number of samples to solve and evaluate from the benchmark limit_start: Optional offset to start from when limiting the number of samples save_every_k_tasks: Optional saving after every k tasks postprocess: Postprocess model outputs before execution, always on except

during generation tests

allow_code_execution: Allow code evaluation to execute external/untrusted Python

code on your machine

generation_only: Do code generation but no evaluation load_generations_path: Path of file with previously generated solutions, if

provided generation is skipped and only evaluation is done

load_data_path: Path of additional data to load for the tasks metric_output_path: Path to save the results save_generations: Whether to save code generations load_generations_intermediate_paths: List of paths for saving the

intermediate code generations

save_generations_path: Path for saving the code generations save_references: Whether to save reference solutions/tests save_references_path: Path for saving the references solutions/tests prompt: Prompt type to use for generation in HumanEvalPack tasks check_references: Don’t run generation but benchmark groundtruth (useful for debugging)

EleutherCLIArgs

class cerebras.modelzoo.trainer.extensions.eleuther.EleutherCLIArgs(tasks, num_fewshot=None, output_path=None, limit=None, use_cache=None, cache_requests=None, check_integrity=False, write_out=False, log_samples=False, show_config=False, include_path=None, predict_only=False, seed='0,1234,1234', trust_remote_code=False, verbosity='INFO', max_length_generation=None, temperature=None, top_k=None, top_p=None)[source]#

Captures EEH’s CLI arguments with defaults.

Fields:
tasks: List of tasks to evaluate

To get full list of tasks, use the command lm-eval --tasks list

num_fewshot: Number of examples in few-shot context output_path: The path to the output file where the result metrics

will be saved. If the path is a directory and log_samples is true, the results will be saved in the directory. Else the parent directory will be used.

limit: Limit the number of examples per task.

If <1, limit is a percentage of the total number of examples.

use_cache: A path to a sqlite db file for caching model responses.

None if not caching.

cache_requests: Speed up evaluation by caching the building of

dataset requests. None if not caching.

check_integrity: Whether to run the relevant part of the test suite

for the tasks.

write_out: Prints the prompt for the first few documents. log_samples: If True, write out all model outputs and documents for

per-sample measurement and post-hoc analysis. Use with –output_path.

show_config: If True, shows the the full config of all tasks at the

end of the evaluation.

include_path: Additional path to include if there are external tasks

to include.

predict_only: Use with –log_samples. Only model outputs will be

saved and metrics will not be evaluated.

seed: Set seed for python’s random, numpy and torch.

Accepts a comma-separated list of 3 values for python’s random, numpy, and torch seeds, respectively, or a single integer to set the same seed for all three. The values are either an integer or None to not set the seed. Default is 0,1234,1234 (for backward compatibility). E.g. --seed 0,None,8 sets random.seed(0) and torch.manual_seed(8). Here numpy’s seed is not set since the second value is None. E.g, --seed 42 sets all three seeds to 42.

trust_remote_code: Sets trust_remote_code to True to execute code to

create HF Datasets from the Hub

verbosity: EEH logging level max_length_generation: Maximum length of generated sequence (prompt+generation). temperature: Sampling temperature used for generation. top_k: Top-k parameter used for generation. top_p: Top-p parameter used for nucleus sampling.

Other Extensions#

Other extensions implemented as callbacks that can be used to enhance the Trainer.

HFCacheDir#

class cerebras.modelzoo.trainer.extensions.HFCacheDir(cache_dir)[source]#

Bases: cerebras.modelzoo.trainer.callbacks.callback.Callback

A callback that sets up the HuggingFace cache directory.

Parameters

cache_dir (str) – The cache directory to use for HuggingFace utilities.

WandbLogger#

class cerebras.modelzoo.trainer.extensions.WandbLogger(project=None, group=None, run_id=None, run_name=None, job_type=None, tags=None, resume='auto', entity=None)[source]#

Bases: cerebras.modelzoo.trainer.loggers.logger.Logger

Logger class for logging metrics to Weights and Biases.

Parameters
  • project (Optional[str]) – The name of the project to which the run belongs.

  • group (Optional[str]) – The name of the group to which the run belongs.

  • run_id (Optional[str]) – The unique identifier for the run.

  • run_name (Optional[str]) – The name of the run.

  • job_type (Optional[str]) – The type of job.

  • tags (Optional[List[str]]) – List of tags to be associated with the run.

  • resume (str) – Resume mode for the run. It can be one of the following: - “never”: Do not resume the run. - “allow”: Allow the run to resume if a previous run exists. - “auto”: Automatically resume the run if a previous run exists. - “must”: Resume the run if a previous run exists.

  • entity (str) – An entity is a username or team name where you’re sending runs. This entity must exist before you can send runs there, so make sure to create your account or team in the UI before starting to log runs.

check_presence_of_wandb_dir(rundir)[source]#

Check if the wandb directory is present in the run directory.

Parameters

rundir – The directory where the run is being stored.