Input Function Report#
When you compile your TensorFlow model for training on the Cerebras system, the compiler automatically analyzes your input function and generates a detailed log. This log contains information identifying any missing functions and provides recommendations on parameter values to enhance the training performance on the CS system. This section describes how to interpret this log.
Important
The analyzer will run automatically even when you only precompile your network, either with validate_only
or compile_only
option. You do not need to run your network on the CS system to run the analyzer.
The analyzer will display its recommendations in the output log of the compiler run, usually displayed on stdout.
Example analyzer output on stdout#
The following shows an example of the analyzer output, displayed as a part of the compiler output log on the terminal (or stdout).
Hint
In the compiler log output on stdout, each analyzer output statement will begin with the text string [input_fn]
.
INFO:tensorflow:Running analyze_input_fn_compile
WARNING:root:[input_fn] - interleave(): in ParallelInterleaveDatasetV3, cycle_length is not being set to CS_AUTOTUNE. Currently, it is set to 64. If determinism is not required, Using CS_AUTOTUNE is likely to improve performance unless you are deliberately using a fine-tuned value.e.g. dataset = dataset.interleave(map_func, cycle_length=cerebras.tf.tools.analyze_input_fn.CS_AUTOTUNE)
WARNING:root:Tensorflow recommends that most dataset input pipelines end with a call to prefetch, but ShuffleDataset used in input_fn after prefetch(). Unless this is a careful design choice, consider calling prefetch last
WARNING:root:[input_fn] - interleave(): in ParallelInterleaveDatasetV3_1, cycle_length is not being set to CS_AUTOTUNE. Currently, it is set to 64. If determinism is not required, Using CS_AUTOTUNE is likely to improve performance unless you are deliberately using a fine-tuned value.e.g. dataset = dataset.interleave(map_func, cycle_length=cerebras.tf.tools.analyze_input_fn.CS_AUTOTUNE)
WARNING:root:Tensorflow recommends that most dataset input pipelines end with a call to prefetch, but RepeatDataset used in input_fn after prefetch(). Unless this is a careful design choice, consider calling prefetch last
WARNING:root:Tensorflow recommends that most dataset input pipelines end with a call to prefetch, but BatchDatasetV2 used in input_fn after prefetch(). Unless this is a careful design choice, consider calling prefetch last
WARNING:root:Map is called prior to Batch. Consider reverting the order and performing the map function in a batched fashion to increase the performance of the input function
WARNING:root:[input_fn] - flat_map(): use map() instead of flat_map() to improve performance and parallelize reads. If you are not calling flat_map directly, check if you are using: from_generator, TextLineDataset, TFRecordDataset, or FixedLenthRecordDataset. If so, set num_parallel_reads to > 1 or cerebras.tf.tools.analyze_input_fn.CS_AUTOTUNE, and map() will be used automatically.
Analyzer recommendations#
The analyzer output will contain detailed recommendations. This section describes these recommendations and how to interpret them. See also:
Limitations of the CerebrasEstimator for a contextual description of some of these recommendations.
Interleave#
If
num_parallel_calls
is set toCS_AUTOTUNE
with a statement:num_parallel_calls = CS_AUTOTUNE
then the compiler will automatically setnum_parallel_calls
to the number of threads. The worker or the job running the input pipeline has access to the number of threads.If
num_parallel_calls
is not set toCS_AUTOTUNE
then:
If you have not set
num_parallel_calls
at all, then the analyzer recommends you to set this toCS_AUTOTUNE
.If you have set
num_parallel_calls
to a value other thanCS_AUTOTUNE
, then the analyzer displays the following message:“
num_parallel_calls
is not being set toCS_AUTOTUNE
. Currently, it is being set to {value}. UsingCS_AUTOTUNE
is likely to improve performance unless you are deliberately using a fine-tuned value”.If
cycle_length
is set toCS_AUTOTUNE
with a statement:cycle_length = CS_AUTOTUNE
then the compiler will automatically setcycle_length
to the number of threads. The worker or the job running the input pipeline has access to the number of threads.If
cycle_length
is not set toCS_AUTOTUNE
then:
If you have not set
cycle_length
at all, then the analyzer recommends you to set this toCS_AUTOTUNE
.If you have set
cycle_length
to a value other thanCS_AUTOTUNE
, then the analyzer displays the following message:“
cycle_length
is not being set toCS_AUTOTUNE
. Currently, it is being set to {value}. UsingCS_AUTOTUNE
is likely to improve performance unless you are deliberately using a fine-tuned value”.
Map#
Checks if the method
flat_map()
is used. Ifflat_map()
is used, then recommends using the methodmap()
instead offlat_map()
to improve performance and parallelize reads. If you are not callingflat_map
directly, check if you are using:from_generator
,TextLineDataset
,TFRecordDataset
, orFixedLenthRecordDataset
. If you using either of the above, then setnum_parallel_reads
to > 1 or toCS_AUTOTUNE
, andmap()
will be used automatically.Recommends setting
num_parallel_calls
toCS_AUTOTUNE
as in:
dataset = dataset.map(extract_fn, num_parallel_calls=CS_AUTOTUNE)
Checks if this is called before
batch
. If called before, then suggests to vectorize the op and call it afterbatch
as it is more performant.
ListFiles#
Ensure a deterministic seed is used, especially if you are sharding after.
If
shuffle=False
(note this is not default), then states that not shuffling here may require a very large shuffle buffer (not that important).
Prefetch#
Checks if the
input_fn
uses the functionprefetch()
. If theprefetch()
function is not found, then displays the following message:“Input function is not using
prefetch()
. Using this is likely to improve the performance. Make use of theprefetch()
, as, for example:
dataset = dataset.prefetch("buffer_size=CS_AUTOTUNE)"
.Make sure that you are pre-fetching. If you are not pre-fetching, then suggests that you do so.
Make sure
prefetch
is the last op called. See The best practice for input pipeline performance is to insert a tf.data.Dataset.prefetch transformation at the end of your tf.data pipeline. .
Shuffle#
Ensure this is done prior to
batch
.Recommends setting the shuffle
buffer_size = CS_AUTOTUNE
.Checks if the
seed
parameter in thedataset.shuffle()
is set toNone
.
If not set to
None
, then recommends that the globalseed
must be set toNone
, as in:dataset = dataset.shuffle(buffer_size, seed=None)
. This is recommended so that each worker does not send the same batch of data and cause overfitting. Also displays the current value ofseed
.
Batch#
Checks if the
input_fn
uses the functionbatch()
. If therepeat()
function is not found, then displays the following message:“Input function is not using
batch()
. Setting this is required for static graph compilation. Make use of thebatch(
), as, for example:dataset = dataset.batch(batch_size, drop_remainder=True)
.Checks if
drop_remainder
is set toTrue
. If not set toTrue
, then recommends the following:Setting to
True
is required to provide a static graph to the CS system. Set it, for example, as:dataset = dataset.batch(batch_size, drop_remainder=True)
.
shuffle() → repeat() → batch()
is slightly more performant thanshuffle() → batch() → repeat()
, although a batch could straddle epoch boundaries.
Repeat#
Checks if the
input_fn
uses the functionrepeat()
. If therepeat()
function is not found, then displays the following message:“Input function is not using
repeat()
. This is required during CS system training for the many parallel input workers to send enough samples to the system. Make use of therepeat(
), as, for example:dataset = dataset.repeat()
. Note: Make sure this is specified for training only (not validation)”.Checks if the
count
parameter fordataset.repeat()
is set to its default value ofNone
. If not set toNone
, then recommends that thecount
parameter must beNone
for the CS system to correctly stream data.None
is the default value. Also displays the current value ofcount
.
FixedLengthRecordDataset#
For tf.data.FixedLengthRecordDataset
:
Recommends setting
num_parallel_reads
toCS_AUTOTUNE
.Recommends setting
buffer_size
toCS_AUTOTUNE
.
TFRecordDataset#
For tf.data.TFRecordDataset
:
Recommends setting
num_parallel_reads
toCS_AUTOTUNE
.Recommends setting
buffer_size
toCS_AUTOTUNE
.
TextLineDataset#
For tf.data.TextLineDataset
:
Recommends setting
num_parallel_reads
toCS_AUTOTUNE
.Recommends setting
buffer_size
toCS_AUTOTUNE
.
Input function requirements#
The input function not only must adhere to the above analyzer recommendations, but it must also satisfy the following requirements:
The input function must return tf.data.Dataset object.
The above-returned Dataset object must consist of features that must be a tensor, and labels that can be a tensor or None.
The input function should accept only one dictionary input for params, which will be passed through to the Estimator constructor.
See also Limitations of the CerebrasEstimator.
Manually using the analyzer#
To manually use the analyze_input_fn_compile
tool in your Python code, follow these steps:
Import and call the tool:
from cerebras.tf.tools.analyze_input_fn_compile import analyze_input_fn_compile ... dataset = input_fn(params) analyze_input_fn_compile(dataset)
Make sure you are either running this code within the Singularity container or, if running through Slurm, with only one worker on one worker node.
Signature#
analyze_input_fn_compile(dataset, hard_check=False)
Example#
from cerebras.tf.tools.analyze_input_fn import analyze_input_fn_compile
dataset = input_fn(params)
analyze_input_fn_compile(dataset)
where:
input_fn
is your input function.params
is a Python dictionary of parameters.dataset
is a Dataset object returned by theinput_fn(params)
.
Parameters#
- dataset
Input. A Dataset object returned by the
input_fn(params)
.- hard_check
Input. Boolean. Default value:
False
. If set toFalse
, any error will be logged, and execution will not stop. When the analyzer is used in the CS system training workflow, this is set toFalse
.
Note
You can use this analyzer as a standalone tool or include it in your CS system training workflow, prior to the training.