Logging#
The Trainer
class features an extendable logging
mechanism that can be used to log metrics to various backends.
On this page, you will learn about how to set up logging to the console via the
Logging
class as well as
how to add Logger
classes to the
Trainer
as well.
Prerequisites#
Please ensure that you have read through the Cerebras Model Zoo Trainer Overview beforehand. The rest of this page assumes that you already have at least a cursory understanding of what the Cerebras Model Zoo Trainer is and how to use the python API.
Also, make sure that you’ve read through Customizing the Trainer with Callbacks
as this page will assume that you are familiar with the
Callback
mechanism.
Logging to Console#
The Trainer
exposes a
logger
attr which returns a Python
logger object which can be used to log various messages to the console with
different levels.
For example,
from cerebras.modelzoo import Trainer
trainer = Trainer(...)
trainer.loggers.info("This is an INFO message")
trainer.loggers.debug("This is a DEBUG message")
trainer.loggers.warning("This is a WARNING message")
trainer.loggers.error("This is a ERROR message")
The logger
can be configured by passing in
a Logging
object to the
Trainer
’s constructor.
from cerebras.modelzoo import Trainer
from cerebras.modelzoo.trainer.callbacks import Logging
trainer = Trainer(
...,
logging=Logging(
log_steps=1,
log_level="INFO",
),
...,
)
...
trainer:
init:
...
logging:
log_steps: 1
log_level: INFO
...
...
In the above example, the logger
has been
configured to print INFO
messages to the console by default.
See Control Logging Frequency for an explanation of the log_steps
argument.
Logging Metrics#
The way to log metrics using the Trainer
is to
construct and pass in Logger
subclasses.
Included out-of-the-box are
ProgressLogger
: Logs progress metrics to the consoleTensorBoardLogger
: Logs metrics to a TensorBoard event file.
These Logger
subclasses can be
constructed and passed into the trainer via the loggers
argument:
from cerebras.modelzoo import Trainer
from cerebras.modelzoo.trainer.loggers import (
ProgressLogger,
TensorBoardLogger,
)
trainer = Trainer(
...,
loggers=[
ProgressLogger(),
TensorBoardLogger(),
]
...,
)
...
trainer:
init:
...
loggers:
- ProgressLogger: {}
- TensorBoardLogger: {}
...
...
With these loggers, you can now call
trainer.log_metrics
to log
some metric to all loggers.
from cerebras.modelzoo import Trainer
from cerebras.modelzoo.trainer.loggers import TensorBoardLogger
trainer = Trainer(
...,
loggers=[TensorBoardLogger()]
...,
)
trainer.log_metrics(loss=...)
In the above example, the loss
is being logged to the
TensorBoardLogger
at the current
global step.
Logging Name Scope#
The trainer also features a name_scope
mechanism for logging which is intended to be used to group related logs
together.
from cerebras.modelzoo import Trainer
from cerebras.modelzoo.trainer.loggers import TensorBoardLogger
trainer = Trainer(
...,
loggers=[TensorBoardLogger()]
...,
)
with trainer.name_scope("train"):
trainer.log_metrics(loss=...)
trainer.log_metrics(accuracy=...)
In the above example, the metrics get recorded in the log as
train/loss
and train/accuracy
.
Control Logging Frequency#
It is often the case in very long runs that logging metrics every step
is undesirably verbose. To remedy this, you can specify log_steps
to the Logging
class.
from cerebras.modelzoo import Trainer
from cerebras.modelzoo.trainer.callbacks import Logging
trainer = Trainer(
...,
logging=Logging(log_steps=10),
...,
)
...
trainer:
init:
...
logging:
log_steps: 10
...
...
In the above example, the trainer is configured to log metrics every 10 steps.
This means that even if log_metrics
is called
every step, only every 10 steps does the metric actually get logged.
To query whether or not current step is a log step, you can call
trainer.is_log_step
.
Writing a Custom Logger#
Now that you know all about the Logger
class and how it’s integrated into the Trainer
class,
it is fairly straightforward to write your own custom loggers.
To write your own custom Logger class, all you need to do is inherit from the
base Logger
class and
override the following methods:
log_metrics
: Logs the provided metrics at the provided step.flush
: Flushes the logs
For example, let’s implement a simple logger that just logs the metrics to console
from cerebras.modelzoo.trainer.logger import Logger
class ConsoleLogger(Logger):
def setup(self, trainer):
self.trainer = trainer
def flush(self):
for handler in self.trainer.loggers.handlers:
handler.flush()
def log_metrics(self, metrics, step):
for name, value in metrics.items():
self.trainer.loggers.info(
f"Step={step}, {name}={value}
)
Note
All Logger
instances
inherit from Callback
.
This means that loggers may override any of the hooks that are exposed
via the Callback
mechanism too.
That is all there is to it. This logger can now be used inside the
Trainer
as follows:
from cerebras.modelzoo import Trainer
trainer = Trainer(
...,
logger=[ConsoleLogger()],
...
)
...
As long as the logger class exists in the Python global namespace, you can add any custom logger to a YAML configuration file in exactly the same way you would any other pre-packaged loggers.
trainer:
init:
...
loggers:
- ConsoleLogger: {}
...
In order for the callback class to exist in the Python global namespace,
the Python interpreter must have seen it at some point. Implementing
your custom logger in the run.py
or in the same file as the model
class are two ways to ensure that the logger is seen by the Python
interpreter and loaded into the Python global namespace.
Conclusion#
By this point, you should have a cursory understanding of how Loggers are integrated into the Trainer. There are a few useful loggers that come pre-packaged inside the Model Zoo. If there is someplace you wish to write logs to that is not covered, you should be confortable with writing your own to implement that logging functionality.