Loop#
This page will cover the two LoopCallback
subclasses and how to configure the training/validation loop of the
Trainer
by using one of them.
Prerequisites#
Make sure to have read through Trainer Overview and Trainer Configuration Overview which provide the basic overview of how to run Model Zoo models. In this document, you will be using the tools and configurations outlined in those pages.
Configure the loop
#
The loop
argument allows you to manage the training and/or validation loop.
The Trainer
takes in a
LoopCallback
subclass that is
used to configure loop options such as number of steps/epochs to run for and how
often to run validation. A LoopCallback
cannot be instantiated directly, TrainingLoop
or ValidationLoop
must be used instead.
Configure for training#
The TrainingLoop
callback is used
to configure the Trainer
to run a fit
task.
The majority of loop arguments reference step
. The step
is simply a batch
of training/validation data.
Arguments
num_steps
: The total number of steps to train for.max_steps
: The maximum number of global steps to train for.num_steps
supersedes this.num_epochs
: The number of epochs to train for. Mutually exclusive withnum_steps
.steps_per_epoch
: The number of steps to train for in each epoch.eval_frequency
: The frequency at which validation is performed. SeeLoopCallback
for more details on options.eval_steps
: The number of validation steps to perform.grad_accum_steps
: The number of steps to accumulate gradients before performing and optimizer step. Only relevant for"CPU"
and"GPU"
runs.
Note
If you plan on running any kind of training (calling fit
), you must use a
TrainingLoop
. If you plan on
running only validation, you may use a
ValidationLoop
.
In the example below, we configure the Trainer
to
run for 1000 steps and run validation for 50 steps every 100 training steps.
trainer:
init:
...
loop:
num_steps: 1000
eval_steps: 50
eval_frequency: 100
...
fit:
...
from cerebras.modelzoo import Trainer
from cerebras.modelzoo.trainer.callbacks import TrainingLoop
trainer = Trainer(
...,
loop=TrainingLoop(
num_steps=1000,
eval_steps=50,
eval_frequency=100,
),
...,
)
trainer.fit(...)
Configure for Validation#
The ValidationLoop
callback is used
to configure the Trainer
to run a validate
or
validate_all
task.
Arguments
eval_steps
: The number of validation steps to perform.hook
: The base name of the validation hooks to run. Used to extend validation functionality by implementing custom validation callbacks. SeeEleutherEvalHarnessLoop
for an example. Defaults to"validate"
.
Note
ValidationLoop
can only be used
if you plan on running only validation tasks (calling validate
or
validate_all
). Otherwise, use TrainingLoop
.
In the example below, we configure the Trainer
to
run validation for 100 steps. We do not need to set any training related options
such as num_steps
or eval_frequency
since we are only running validation.
trainer:
init:
...
loop:
eval_frequency: 100
...
validate:
...
from cerebras.modelzoo import Trainer
from cerebras.modelzoo.trainer.callbacks import ValidationLoop
trainer = Trainer(
...,
loop=ValidationLoop(
eval_frequency=100,
),
...,
)
trainer.validate(...)
Note
TrainingLoop
supports both
training and validation because it instantiates a
ValidationLoop
on
initalization.
Note
Everytime validation runs, we are restarting the validation dataloaders from scratch. This is not the same for training where we resume training from the where we left off in the training dataloader.
Conclusion#
That covers how to configure the Trainer
for
training and/or validation. You should now understand how to use a
LoopCallback
subclass to
configure training loop parameters such as number of steps and validation frequency.
Further Reading#
To learn more about how you can use the Trainer
in some core workflows, you can check out:
To learn more about how you can extend the capabilities of the
Trainer
class, you can check out: