Reproducibility#
Reproducibility is an essential component of training ML models. The
Trainer
class features a way to enable
determinism across runs, if so desired.
On this page, you will learn how to configure the Trainer to ensure reproducibility.
Prerequisites#
Make sure to have read through Trainer Overview and Trainer Configuration Overview which provide the basic overview of how to run Model Zoo models. In this document, you will be using the tools and configurations outlined in those pages.
Trainer Seed#
The Trainer
supports configuring reproducibility
by piggybacking off of
torch seed
settings. While it is possible to manually set the torch seed outside of the
Trainer class, it is strongly recommended to use the seed
argument of the
Trainer
class to handle that for you.
The following example shows how you can set the seed to 1234:
trainer:
init:
...
seed: 1234
...
from cerebras.modelzoo import Trainer
trainer = Trainer(
...,
seed=1234,
)
...
Note
If the seed is not provided or is None (the default value), determinism across runs is not ensured.
Note
Torch modules
initialize their weights upon instantiation. Setting the seed after a Module
has already been instantiated may not necessarily ensure determinism. To
avoid this pitfall, instead of passing an already-constructed model instance
to the Trainer
class, you should pass
a callable that returns a torch Module. The Trainer will set the seed
before invoking the callback, thus ensuring reproducibility. This is
in line with deferred weight initialization, as described in
Defer Weight Initialization.
For a given run, the seed settings may affect any of the following:
The order of input data;
The global seed captured in the graph which may affect the values generated by random ops in the model;
The compile hash. For example, a model that has a random op, such as Dropout, may have a different compile hash for different seed settings. To avoid unnecessary recompiles, make sure to set the trainer seed.
Conclusion#
Ensuring reproducibility in ML model training is crucial for consistency and
reliability of results. By leveraging the seed
argument in the
Trainer
class, you can achieve deterministic
behavior across runs. This guide has provided step-by-step instructions on
configuring the Trainer for reproducibility using both YAML and Python.
Further Reading#
To learn more about how you can extend the capabilities of the
Trainer
class, you can check out: