cerebras.modelzoo.tools.checkpoint_converters.base_converter.ConversionRule#

class cerebras.modelzoo.tools.checkpoint_converters.base_converter.ConversionRule(segments, exists='both', action=None)[source]#

Bases: object

ConversionRule defines a “rule” which:
  1. a key can be matched against

  2. procedure for converting this old key to a new one upon a successful match

  3. and an action to be taken once the new key is created (ex: updating the state dictionary).

A rule consists of a sequence of regex pattern (supplied as a string), EquivalentSubkey object, and (possibly) a BaseDictionaryConverter as long as this object is last in the sequence. It also contains an “exists” argument which can be set to “left”, “both”, or “right”. The “left” and “right” arguments are used to describe if a key exists in one checkpoint format but not the other and should be ignored. Without this behavior, keys that exist in one but not the other wouldn’t be matched by any conversion rules, causing a failure as drop_unmatched_keys is disabled by default.

Example: The following describes the conversion rule for mapping HF’s layer normalization key to CS layer normalization in the GPT2 model.

>>> ConversionRule(
>>>     [
>>>         EquivalentSubkey("h", "transformer_decoder.layers"),
>>>         "\.\d+\.",
>>>         EquivalentSubkey("ln_1", "norm1"),
>>>         "\.(weight|bias)",
>>>     ],
>>>     action=BaseCheckpointConverter.replaceKey,
>>> )
This should be interpreted as:
  1. HF uses ‘h’ to represent the decoder name while CS uses ‘transformer_decoder.layers’

  2. Both will have keys that follow with a dot, the decoder number, and then another dot

  3. HF uses ‘ln_1’ for the first layer norm while CS names it ‘norm1’

  4. Both will have keys that follow with a dot and then either weight or bias

This representation should make it easy to see how we can 1) build a regex which matches against old keys, and 2) use the matched result & EquivalentSubkey information to create a new key. Finally, once the new key is constructed the conversion rule will apply the ‘action’ described by the user in order to complete the conversion (in this case simply copying the value at old_state’s old key into the new_state at the new key).

As previously mentioned, a conversion rule object can also contain a checkpoint converter at the end of the sequence. This is used to create a new checkpoint converter which uses another converter to handle a portion of the conversion. Doing so reduces the amount of copy & pasted conversion rules. For example, many models have base model classes which are extended with additional layers for fine-tuning. For example, HF’s GP2Model doesn’t contain a language model head while GP2LMHeadModel does. Rather than copying the conversion rules, we could instead define a new checkpoint converter as follows:

>>> class Converter_GPT2LMHeadModel_HF_CS17(BaseDictionaryConverter):
>>>     def __init__(self):
>>>         super().__init__()
>>>         self.rules = [
>>>             ConversionRule(
>>>                 ["lm_head\.(weight|bias)"],
>>>                 action=BaseCheckpointConverter.replaceKey,
>>>             ),
>>>             ConversionRule(
>>>                 [
>>>                     EquivalentSubkey("transformer.", ""),
>>>                     Converter_GPT2Model_HF_CS17(),
>>>                 ],
>>>                 action=None,
>>>             ),
>>>         ]

The first rule simply notates that the lm_head key now exists (and is named the same in both models). The second rule notates that if the “transformer.” prefix is encountered, we should try all of the GPT2Model HF -> CS 1.7 conversion rules.

Methods

convert_key

exists_in_index

segment_is_converter

validate_segments